You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
146 lines
5.0 KiB
Markdown
146 lines
5.0 KiB
Markdown
|
|
## Depencencies
|
|
* python3
|
|
* [pip](https://pip.pypa.io/en/stable/installing/) Python library installed
|
|
* Install:
|
|
* `curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py`
|
|
* `python3 get-pip.py`
|
|
|
|
* [mwclient](https://mwclient.readthedocs.io/en/latest/index.html) Python library
|
|
* Install:
|
|
* `pip3 install mwclient`
|
|
* [jinja2](https://jinja.palletsprojects.com/en/2.11.x/) Python library
|
|
* Install:
|
|
* `pip3 install jinja2`
|
|
* [Pillow](https://pillow.readthedocs.io/en/stable/) Python library for image processing
|
|
* `pip3 install Pillow`
|
|
* [pandoc](https://pandoc.org/)
|
|
* Install:
|
|
* Debian/Ubuntu: `sudo apt install pandoc`
|
|
* Mac: `brew install pandoc`
|
|
* [html5lib](https://github.com/html5lib/html5lib-python)
|
|
* Install:
|
|
* `pip3 install html5lib`
|
|
|
|
## login.txt
|
|
`login.txt` is a local and individual file, ignored by git, where you place you itch wiki username and password, in separate lines.
|
|
|
|
It is used to let mwclient access the wiki, since it is close for reading and writing.
|
|
```
|
|
myusername
|
|
mypassword
|
|
```
|
|
|
|
## Create archive from wiki:
|
|
|
|
### on sandbox server
|
|
`python dumpwiki.py`
|
|
|
|
### locally on your own machine:
|
|
create archive folder: `mkdir archive`
|
|
|
|
run script outputting to archive folder and displaying the images from the wiki:
|
|
|
|
`python dumpwiki.py --output archive --local`
|
|
|
|
|
|
### Categories and Templates:
|
|
For each Wiki Category in [Category Publish](https://hub.xpub.nl/sandbox/itchwiki/index.php/Category:Publish)
|
|
there should be an HTML [jinja2 template](https://jinja.palletsprojects.com/en/2.11.x/)
|
|
, with the same name of the category this repository `templates/`
|
|
|
|
If not the `templates/default.html` will be used to render the pages under that Category.
|
|
|
|
**CSS/JS files** are stored in `static/`. See `templates/default.html` to see how it links to `static/archive.css`
|
|
|
|
|
|
### run on server
|
|
* script (repository) location: `/var/www/html/archive/0`
|
|
* go there `cd /var/www/html/archive/0`
|
|
* run script `python3 dumpwiki.py`
|
|
|
|
### git pull most recent changes to archive:
|
|
|
|
**Allow your sandbox pi user to make `git pull` by:**
|
|
* in the sandbox pi, creating one ssh-key pair: `ssh-keygen`
|
|
* the content of the public ssh key need to be copied: `cat ~/.ssh/id_rsa.pub`
|
|
* And added to the user's public ssh keys in the gitear user profile: https://git.xpub.nl/user/settings/keys
|
|
* Your gitea user is now is associated the public ssh key you just generated in the sandbox pi
|
|
|
|
* **Now you are able to `git pull` from `/var/www/html/archive/0` when ever need.**
|
|
|
|
|
|
---
|
|
|
|
# query2html.py
|
|
## Run
|
|
|
|
`cd special-issue-11-wiki2html/`
|
|
|
|
Run scripts together with `./run.sh`
|
|
|
|
|
|
1 script at a time:
|
|
|
|
`python3 download_imgs.py`
|
|
* Downloads all images from wiki to `images/` directory
|
|
* and stores each image's metadata to `images.json`
|
|
|
|
`python3 query2html.py`
|
|
* with ask API perform a query:
|
|
* help `python3 query2html.py --help`
|
|
* run dry `python3 query2html.py --dry` only printing request, not executing it
|
|
* build custom query with arguments `--conditions --printouts --sort --order`
|
|
* default query is: `[[File:+]][[Title::+]][[Part::+]][[Date::+]]|?Title|?Date|?Part|?Partof|sort=Date,Title,Part|order=asc,asc,asc`
|
|
* custom queries
|
|
* `python3 query2html.py --conditions '[[Date::>=1970/01/01]][[Date::<=1979/12/31]]'`
|
|
* `python3 query2html.py --conditions '[[Creator::~*task force*]]'`
|
|
|
|
Note: to avoid confusion or problems is better to leave the `--printouts` `--sort` `--order` arguments as the default.
|
|
Otherwise document parts will start to get grouped not according to their Title, hence creating documents made from different original parts.
|
|
|
|
|
|
## How does query2html.py work?
|
|
|
|
Based on the query made:
|
|
MW API will send back a number of Page titles that match the query conditions,
|
|
together with its printouts (metadata proprety::value pairs).
|
|
|
|
For each Page:
|
|
* its locally stored image is found
|
|
* its text retrieved from MW
|
|
* a fragment of html (`document_part_html`) is generated based on the `templates/document_part.html`
|
|
|
|
All Pages that *share the same metadata's Title value*, will:
|
|
* gather all their html fragments in `all_document_parts`
|
|
* render `templates/document.html` with the content of `all_document_parts`
|
|
* save the render template to `'static_html/DocumentTitle.html'`,
|
|
|
|
Each of the saved documents:
|
|
* render `templates/index.html` with the info on each document has been saved into `documentslist`
|
|
* resulting in `static_html/index.html`
|
|
|
|
|
|
# Bulk image upload upload_imgs_dir.py
|
|
|
|
Get Help: `python3 upload_imgs_dir.py --help`
|
|
|
|
**Edit and run via** `.helper-upload_imgs_dir.sh`
|
|
|
|
|
|
# Convert PDFs to folder of JPGs with pdf2jpg.sh
|
|
By either:
|
|
* running it from this folder and using absolute path to PDF
|
|
`./pdf2jpg.sh "/full/path/to/2020_bantayog/PDFname.pdf"`
|
|
|
|
* copying pdf2jpg.sh to 2020_bantayog/ and running with relative path to PDF
|
|
`./pdf2jpg.sh "PDFname.pdf"`
|
|
|
|
It is
|
|
|
|
to convert pdfs to jpgs:
|
|
convert -quality 100 -density 300 [name-of-pdf] %02d.jpg
|
|
|
|
# Wiki to HTML pages script
|
|
![](https://pzwiki.wdka.nl/mw-mediadesign/images/8/82/Workflow-wiki2html.svg)
|