special-issue-11-wiki2html/README.md


## Depencencies
* python3
* [pip](https://pip.pypa.io/en/stable/installing/) Python library installed
    * Install:
        * `curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py`
        *  `python3 get-pip.py`

* [mwclient](https://mwclient.readthedocs.io/en/latest/index.html) Python library
    * Install:
        * `pip3 install mwclient`
* [jinja2](https://jinja.palletsprojects.com/en/2.11.x/) Python library
    * Install:
        * `pip3 install jinja2`
* [Pillow](https://pillow.readthedocs.io/en/stable/) Python library for image processing
    * `pip3 install Pillow`
* [pandoc](https://pandoc.org/)
    * Install:
        * Debian/Ubuntu: `sudo apt install pandoc`
        * Mac: `brew install pandoc`
* [html5lib](https://github.com/html5lib/html5lib-python)
    * Install:
        * `pip3 install html5lib`

## login.txt
`login.txt` is a local and individual file, ignored by git, where you place you itch wiki username and password, in separate lines.

It is used to let mwclient access the wiki, since it is close for reading and writing.
```
myusername
mypassword
```

## Create archive from wiki:

### on sandbox server
`python dumpwiki.py`

### locally on your own machine:
create archive folder: `mkdir archive`

run script outputting to archive folder and displaying the images from the wiki:

`python dumpwiki.py --output archive --local`


### Categories and Templates:
For each Wiki Category in [Category Publish](https://hub.xpub.nl/sandbox/itchwiki/index.php/Category:Publish) 
there should be an HTML [jinja2 template](https://jinja.palletsprojects.com/en/2.11.x/) 
, with the same name of the category this repository `templates/`
 
If not the `templates/default.html` will be used to render the pages under that Category.

**CSS/JS files** are stored in `static/`. See `templates/default.html` to see how it links to `static/archive.css`


### run on server
* script (repository) location: `/var/www/html/archive/0`
* go there `cd /var/www/html/archive/0`
* run script `python3 dumpwiki.py`

### git pull most recent changes to archive:  

**Allow your sandbox pi user to make `git pull` by:**
* in the sandbox pi, creating one ssh-key pair: `ssh-keygen`
* the content of the public ssh key need to be copied: `cat ~/.ssh/id_rsa.pub`
* And added to the user's public ssh keys in the gitear user profile: https://git.xpub.nl/user/settings/keys
* Your gitea user is now is associated the public ssh key you just generated in the sandbox pi 

* **Now you are able to `git pull` from `/var/www/html/archive/0` when ever need.**


---

# query2html.py
## Run

`cd special-issue-11-wiki2html/`

Run scripts together with `./run.sh`


1 script at a time:

`python3 download_imgs.py`
* Downloads all images from wiki to `images/` directory
* and stores each image's metadata to `images.json`

`python3 query2html.py`
* with ask API perform a query:
    * help `python3 query2html.py --help`
    * run dry `python3 query2html.py --dry` only printing request, not executing it
    * build custom query with arguments `--conditions  --printouts  --sort  --order`
    * default query is: `[[File:+]][[Title::+]][[Part::+]][[Date::+]]|?Title|?Date|?Part|?Partof|sort=Date,Title,Part|order=asc,asc,asc`  
    * custom queries
        * `python3 query2html.py --conditions '[[Date::>=1970/01/01]][[Date::<=1979/12/31]]'`
        * `python3 query2html.py --conditions '[[Creator::~*task force*]]'`

Note: to avoid confusion or problems is better to leave the `--printouts` `--sort`  `--order` arguments as the default.
Otherwise document parts will start to get grouped not according to their Title, hence creating documents made from different original parts.


## How does query2html.py work?

Based on the query made:
MW API will send back a number of Page titles that match the query conditions,
together with its printouts (metadata proprety::value pairs).

For each Page:
* its locally stored image is found
* its text retrieved from MW
* a fragment of html (`document_part_html`) is generated based on the `templates/document_part.html`

All Pages that *share the same metadata's Title value*, will:
* gather all their html fragments in `all_document_parts`
* render `templates/document.html` with the content of `all_document_parts`   
* save the render template to `'static_html/DocumentTitle.html'`,

Each of the saved documents:
* render `templates/index.html` with the info on each document has been saved into `documentslist`  
* resulting in `static_html/index.html`


# Bulk image upload upload_imgs_dir.py

Get Help: `python3 upload_imgs_dir.py --help`

**Edit and run via** `.helper-upload_imgs_dir.sh`


# Convert PDFs to folder of JPGs with pdf2jpg.sh
By either:
* running it from this folder and using absolute path to PDF
`./pdf2jpg.sh "/full/path/to/2020_bantayog/PDFname.pdf"`

* copying pdf2jpg.sh to 2020_bantayog/ and running with relative path to PDF
`./pdf2jpg.sh "PDFname.pdf"`

It is 

to convert pdfs to jpgs:
convert -quality 100 -density 300 [name-of-pdf] %02d.jpg

# Wiki to HTML pages script
![](https://pzwiki.wdka.nl/mw-mediadesign/images/8/82/Workflow-wiki2html.svg)
images being downloaded 5 years ago
			`## Depencencies`
			`* python3`
updates to readme & run.sh 5 years ago			`* [pip](https://pip.pypa.io/en/stable/installing/) Python library installed`
images being downloaded 5 years ago			`* Install:`
			* `curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py`
			* `python3 get-pip.py`

			`* [mwclient](https://mwclient.readthedocs.io/en/latest/index.html) Python library`
			`* Install:`
			* `pip3 install mwclient`
images2html 5 years ago			`* [jinja2](https://jinja.palletsprojects.com/en/2.11.x/) Python library`
			`* Install:`
			* `pip3 install jinja2`
resizing Images after downloaded with PIL 5 years ago			`* [Pillow](https://pillow.readthedocs.io/en/stable/) Python library for image processing`
			* `pip3 install Pillow`
images2html 5 years ago			`* [pandoc](https://pandoc.org/)`
			`* Install:`
			* Debian/Ubuntu: `sudo apt install pandoc`
			* Mac: `brew install pandoc`
option --local; documentation 5 years ago			`* [html5lib](https://github.com/html5lib/html5lib-python)`
			`* Install:`
			* `pip3 install html5lib`
images being downloaded 5 years ago
			`## login.txt`
updates to readme & run.sh 5 years ago			`login.txt` is a local and individual file, ignored by git, where you place you itch wiki username and password, in separate lines.
images being downloaded 5 years ago
			`It is used to let mwclient access the wiki, since it is close for reading and writing.`
			```
			`myusername`
			`mypassword`
			```

option --local; documentation 5 years ago			`## Create archive from wiki:`

			`### on sandbox server`
			`python dumpwiki.py`

			`### locally on your own machine:`
			create archive folder: `mkdir archive`

			`run script outputting to archive folder and displaying the images from the wiki:`

			`python dumpwiki.py --output archive --local`
images being downloaded 5 years ago
option --local; documentation 5 years ago
			`### Categories and Templates:`
			`For each Wiki Category in [Category Publish](https://hub.xpub.nl/sandbox/itchwiki/index.php/Category:Publish)`
			`there should be an HTML [jinja2 template](https://jinja.palletsprojects.com/en/2.11.x/)`
			, with the same name of the category this repository `templates/`

			If not the `templates/default.html` will be used to render the pages under that Category.

added var staticpath to templates to allow JS/CSS files to be reached with running on --local or archive 5 years ago			CSS/JS files are stored in `static/`. See `templates/default.html` to see how it links to `static/archive.css`
option --local; documentation 5 years ago

documentation git pull in archive 5 years ago			`### run on server`
			* script (repository) location: `/var/www/html/archive/0`
			* go there `cd /var/www/html/archive/0`
			* run script `python3 dumpwiki.py`

			`### git pull most recent changes to archive:`

			Allow your sandbox pi user to make `git pull` by:
			* in the sandbox pi, creating one ssh-key pair: `ssh-keygen`
			* the content of the public ssh key need to be copied: `cat ~/.ssh/id_rsa.pub`
			`* And added to the user's public ssh keys in the gitear user profile: https://git.xpub.nl/user/settings/keys`
			`* Your gitea user is now is associated the public ssh key you just generated in the sandbox pi`

			* Now you are able to `git pull` from `/var/www/html/archive/0` when ever need.


option --local; documentation 5 years ago			`---`

			`# query2html.py`
images being downloaded 5 years ago			`## Run`
updates to readme & run.sh 5 years ago
			`cd special-issue-11-wiki2html/`

			Run scripts together with `./run.sh`


			`1 script at a time:`
sh script 5 years ago
added convert command to readme 5 years ago			`python3 download_imgs.py`
			* Downloads all images from wiki to `images/` directory
README + change script name 5 years ago			* and stores each image's metadata to `images.json`

change name of script publication2html.py --> ask2html.py 5 years ago			`python3 query2html.py`
added convert command to readme 5 years ago			`* with ask API perform a query:`
change name of script publication2html.py --> ask2html.py 5 years ago			* help `python3 query2html.py --help`
			* run dry `python3 query2html.py --dry` only printing request, not executing it
ask broken down into several arguments; --dry run 5 years ago			* build custom query with arguments `--conditions --printouts --sort --order`
			* default query is: `[[File:+]][[Title::+]][[Part::+]][[Date::+]]\|?Title\|?Date\|?Part\|?Partof\|sort=Date,Title,Part\|order=asc,asc,asc`
added convert command to readme 5 years ago			`* custom queries`
documentation 5 years ago			* `python3 query2html.py --conditions '[[Date::>=1970/01/01]][[Date::<=1979/12/31]]'`
			* `python3 query2html.py --conditions '[[Creator::~task force]]'`
ask broken down into several arguments; --dry run 5 years ago
added convert command to readme 5 years ago			Note: to avoid confusion or problems is better to leave the `--printouts` `--sort` `--order` arguments as the default.
documentation 5 years ago			`Otherwise document parts will start to get grouped not according to their Title, hence creating documents made from different original parts.`
README + change script name 5 years ago

documentation 5 years ago			`## How does query2html.py work?`

			`Based on the query made:`
added convert command to readme 5 years ago			`MW API will send back a number of Page titles that match the query conditions,`
documentation 5 years ago			`together with its printouts (metadata proprety::value pairs).`

			`For each Page:`
			`* its locally stored image is found`
			`* its text retrieved from MW`
			* a fragment of html (`document_part_html`) is generated based on the `templates/document_part.html`

			`All Pages that share the same metadata's Title value, will:`
added convert command to readme 5 years ago			* gather all their html fragments in `all_document_parts`
documentation 5 years ago			* render `templates/document.html` with the content of `all_document_parts`
added convert command to readme 5 years ago			* save the render template to `'static_html/DocumentTitle.html'`,

documentation 5 years ago			`Each of the saved documents:`
			* render `templates/index.html` with the info on each document has been saved into `documentslist`
			* resulting in `static_html/index.html`
added convert command to readme 5 years ago
sh script 5 years ago
documentation & helper script 5 years ago			`# Bulk image upload upload_imgs_dir.py`

python3 in README 5 years ago			Get Help: `python3 upload_imgs_dir.py --help`
documentation & helper script 5 years ago
added convert command to readme 5 years ago			Edit and run via `.helper-upload_imgs_dir.sh`
documentation & helper script 5 years ago
pdf2jpg.sh ready 5 years ago
			`# Convert PDFs to folder of JPGs with pdf2jpg.sh`
			`By either:`
			`* running it from this folder and using absolute path to PDF`
			`./pdf2jpg.sh "/full/path/to/2020_bantayog/PDFname.pdf"`

			`* copying pdf2jpg.sh to 2020_bantayog/ and running with relative path to PDF`
			`./pdf2jpg.sh "PDFname.pdf"`

			`It is`

added convert command to readme 5 years ago			`to convert pdfs to jpgs:`
			`convert -quality 100 -density 300 [name-of-pdf] %02d.jpg`
test push 5 years ago
option --local; documentation 5 years ago			`# Wiki to HTML pages script`
			`![](https://pzwiki.wdka.nl/mw-mediadesign/images/8/82/Workflow-wiki2html.svg)`