XPUB

special-issue-11-wiki2html

You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

2.8 KiB

Raw Permalink Blame History

Wiki to HTML pages script

Depencencies

python3
pip Python library installed
- Install:
  - curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
  - python3 get-pip.py
mwclient Python library
- Install:
  - pip3 install mwclient
jinja2 Python library
- Install:
  - pip3 install jinja2
pandoc
- Install:
  - Debian/Ubuntu: sudo apt install pandoc
  - Mac: brew install pandoc

login.txt is a local and individual file, ignored by git, where you place you itch wiki username and password, in separate lines.

It is used to let mwclient access the wiki, since it is close for reading and writing.

myusername
mypassword

Run

cd special-issue-11-wiki2html/

Run scripts together with ./run.sh

1 script at a time:

python3 download_imgs.py

Downloads all images from wiki to images/ directory
and stores each image's metadata to images.json

python3 query2html.py

with ask API perform a query:
- help python3 query2html.py --help
- run dry python3 query2html.py --dry only printing request, not executing it
- build custom query with arguments --conditions --printouts --sort --order
- default query is: [[File:+]][[Title::+]][[Part::+]][[Date::+]]|?Title|?Date|?Part|?Partof|sort=Date,Title,Part|order=asc,asc,asc
- custom queries
  - python3 query2html.py --conditions '[[Date::>=1970/01/01]][[Date::<=1979/12/31]]'
  - python3 query2html.py --conditions '[[Creator::~*task force*]]'

Note: to avoid confusion or problems is better to leave the --printouts --sort --order arguments as the default. Otherwise document parts will start to get grouped not according to their Title, hence creating documents made from different original parts.

How does query2html.py work?

Based on the query made: MW API will send back a number of Page titles that match the query conditions, together with its printouts (metadata proprety::value pairs).

For each Page:

its locally stored image is found
its text retrieved from MW
a fragment of html (document_part_html) is generated based on the templates/document_part.html

All Pages that share the same metadata's Title value, will:

gather all their html fragments in all_document_parts
render templates/document.html with the content of all_document_parts
save the render template to 'static_html/DocumentTitle.html',

Each of the saved documents:

render templates/index.html with the info on each document has been saved into documentslist
resulting in static_html/index.html

2.8 KiB Raw Permalink Blame History

Wiki to HTML pages script

Depencencies

login.txt

Run

How does query2html.py work?

2.8 KiB

Raw Permalink Blame History