You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
100 lines
3.3 KiB
Markdown
100 lines
3.3 KiB
Markdown
# Wiki to HTML pages script
|
|
![](https://pzwiki.wdka.nl/mw-mediadesign/images/8/82/Workflow-wiki2html.svg)
|
|
|
|
## Depencencies
|
|
* python3
|
|
* [pip](https://pip.pypa.io/en/stable/installing/) Python library installed
|
|
* Install:
|
|
* `curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py`
|
|
* `python3 get-pip.py`
|
|
|
|
* [mwclient](https://mwclient.readthedocs.io/en/latest/index.html) Python library
|
|
* Install:
|
|
* `pip3 install mwclient`
|
|
* [jinja2](https://jinja.palletsprojects.com/en/2.11.x/) Python library
|
|
* Install:
|
|
* `pip3 install jinja2`
|
|
* [pandoc](https://pandoc.org/)
|
|
* Install:
|
|
* Debian/Ubuntu: `sudo apt install pandoc`
|
|
* Mac: `brew install pandoc`
|
|
|
|
|
|
## login.txt
|
|
`login.txt` is a local and individual file, ignored by git, where you place you itch wiki username and password, in separate lines.
|
|
|
|
It is used to let mwclient access the wiki, since it is close for reading and writing.
|
|
```
|
|
myusername
|
|
mypassword
|
|
```
|
|
|
|
|
|
## Run
|
|
|
|
`cd special-issue-11-wiki2html/`
|
|
|
|
Run scripts together with `./run.sh`
|
|
|
|
|
|
1 script at a time:
|
|
|
|
`python3 download_imgs.py`
|
|
* Downloads all images from wiki to `images/` directory
|
|
* and stores each image's metadata to `images.json`
|
|
|
|
`python3 query2html.py`
|
|
* with ask API perform a query:
|
|
* help `python3 query2html.py --help`
|
|
* run dry `python3 query2html.py --dry` only printing request, not executing it
|
|
* build custom query with arguments `--conditions --printouts --sort --order`
|
|
* default query is: `[[File:+]][[Title::+]][[Part::+]][[Date::+]]|?Title|?Date|?Part|?Partof|sort=Date,Title,Part|order=asc,asc,asc`
|
|
* custom queries
|
|
* `python3 query2html.py --conditions '[[Date::>=1970/01/01]][[Date::<=1979/12/31]]'`
|
|
* `python3 query2html.py --conditions '[[Creator::~*task force*]]'`
|
|
|
|
Note: to avoid confusion or problems is better to leave the `--printouts` `--sort` `--order` arguments as the default.
|
|
Otherwise document parts will start to get grouped not according to their Title, hence creating documents made from different original parts.
|
|
|
|
|
|
## How does query2html.py work?
|
|
|
|
Based on the query made:
|
|
MW API will send back a number of Page titles that match the query conditions,
|
|
together with its printouts (metadata proprety::value pairs).
|
|
|
|
For each Page:
|
|
* its locally stored image is found
|
|
* its text retrieved from MW
|
|
* a fragment of html (`document_part_html`) is generated based on the `templates/document_part.html`
|
|
|
|
All Pages that *share the same metadata's Title value*, will:
|
|
* gather all their html fragments in `all_document_parts`
|
|
* render `templates/document.html` with the content of `all_document_parts`
|
|
* save the render template to `'static_html/DocumentTitle.html'`,
|
|
|
|
Each of the saved documents:
|
|
* render `templates/index.html` with the info on each document has been saved into `documentslist`
|
|
* resulting in `static_html/index.html`
|
|
|
|
|
|
# Bulk image upload upload_imgs_dir.py
|
|
|
|
Get Help: `python3 upload_imgs_dir.py --help`
|
|
|
|
**Edit and run via** `.helper-upload_imgs_dir.sh`
|
|
|
|
|
|
# Convert PDFs to folder of JPGs with pdf2jpg.sh
|
|
By either:
|
|
* running it from this folder and using absolute path to PDF
|
|
`./pdf2jpg.sh "/full/path/to/2020_bantayog/PDFname.pdf"`
|
|
|
|
* copying pdf2jpg.sh to 2020_bantayog/ and running with relative path to PDF
|
|
`./pdf2jpg.sh "PDFname.pdf"`
|
|
|
|
It is
|
|
|
|
to convert pdfs to jpgs:
|
|
convert -quality 100 -density 300 [name-of-pdf] %02d.jpg
|