You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Your Name e9cc85cc9c margin top about page 2 months ago
examples example of ask as the outer loop 3 months ago
sandbox playing wiht Jinja 3 months ago
static margin top about page 2 months ago
static_website/item_page item page tryout 3 months ago
templates final changes 2 months ago
.gitignore adding ignore line for archive/ dir 3 months ago removed --local 2 months ago removed --local 2 months ago Overview main page: img link to publication pages 2 months ago example calling mediawiki api via mwclient to make thumbnails and then to get semantic data 4 months ago todos 4 months ago Missing backslash 4 months ago images2html 5 months ago ready 4 months ago popuylate all title value pages 3 months ago popuylate all title value pages 3 months ago query and css changes 3 months ago change name of script --> 4 months ago
titles.json popuylate all title value pages 3 months ago sh/py 4 months ago


  • python3
  • pip Python library installed

    • Install:
      • curl -o
      • python3
  • mwclient Python library

    • Install:
      • pip3 install mwclient
  • jinja2 Python library

    • Install:
      • pip3 install jinja2
  • Pillow Python library for image processing

    • pip3 install Pillow
  • pandoc

    • Install:
      • Debian/Ubuntu: sudo apt install pandoc
      • Mac: brew install pandoc
  • html5lib

    • Install:
      • pip3 install html5lib


login.txt is a local and individual file, ignored by git, where you place you itch wiki username and password, in separate lines.

It is used to let mwclient access the wiki, since it is close for reading and writing.


Create archive from wiki:

on sandbox server


locally on your own machine:

create archive folder: mkdir archive

run script outputting to archive folder and displaying the images from the wiki:

python3 --imgsrc remote

run script outputting to archive folder and displaying the images from local ../archive/images:

  • requires running python3

Categories and Templates:

For each Wiki Category in Category Publish there should be an HTML jinja2 template , with the same name of the category this repository templates/

If not the templates/default.html will be used to render the pages under that Category.

CSS/JS files are stored in static/. See templates/default.html to see how it links to static/archive.css

run on server

  • script (repository) location: /var/www/html/archive/0
  • go there cd /var/www/html/archive/0
  • run script python3

git pull most recent changes to archive:

Allow your sandbox pi user to make git pull by:

  • in the sandbox pi, creating one ssh-key pair: ssh-keygen
  • the content of the public ssh key need to be copied: cat ~/.ssh/
  • And added to the user’s public ssh keys in the gitear user profile:
  • Your gitea user is now is associated the public ssh key you just generated in the sandbox pi

  • Now you are able to git pull from /var/www/html/archive/0 when ever need.


cd special-issue-11-wiki2html/

Run scripts together with ./

1 script at a time:


  • Downloads all images from wiki to ../archive/images/ directory
  • and stores each image’s metadata to images.json


  • with ask API perform a query:
    • help python3 --help
    • run dry python3 --dry only printing request, not executing it
    • build custom query with arguments --conditions --printouts --sort --order
    • default query is: [[File:+]][[Title::+]][[Part::+]][[Date::+]]|?Title|?Date|?Part|?Partof|sort=Date,Title,Part|order=asc,asc,asc
    • custom queries
      • python3 --conditions '[[Date::>=1970/01/01]][[Date::<=1979/12/31]]'
      • python3 --conditions '[[Creator::~*task force*]]'

Note: to avoid confusion or problems is better to leave the --printouts --sort --order arguments as the default. Otherwise document parts will start to get grouped not according to their Title, hence creating documents made from different original parts.

How does work?

Based on the query made: MW API will send back a number of Page titles that match the query conditions, together with its printouts (metadata proprety::value pairs).

For each Page:

  • its locally stored image is found
  • its text retrieved from MW
  • a fragment of html (document_part_html) is generated based on the templates/document_part.html

All Pages that share the same metadata’s Title value, will:

  • gather all their html fragments in all_document_parts
  • render templates/document.html with the content of all_document_parts
  • save the render template to 'static_html/DocumentTitle.html',

Each of the saved documents:

  • render templates/index.html with the info on each document has been saved into documentslist
  • resulting in static_html/index.html

Bulk image upload

Get Help: python3 --help

Edit and run via

Convert PDFs to folder of JPGs with

By either:

  • running it from this folder and using absolute path to PDF ./ "/full/path/to/2020_bantayog/PDFname.pdf"

  • copying to 2020_bantayog/ and running with relative path to PDF ./ "PDFname.pdf"

It is

to convert pdfs to jpgs: convert -quality 100 -density 300 [name-of-pdf] %02d.jpg

Wiki to HTML pages script