added notes in Readme for erase/replace scripts

master
nberting 7 years ago
parent a3c037aaaa
commit bccedb1d81

@ -20,18 +20,12 @@ Natasha Berting, Angeliki Diakrousi, Joca van der Horst, Alexander Roidl, Alice
* Python3 * Python3
* GNU make * GNU make
* Python3 NLTK `pip3 install nltk` * Python3 NLTK `pip3 install nltk`
* NLTK English Corpus:
* run NLTK downloader `python -m nltk.downloader`
* select menu "Corpora"
* select "stopwords"
* "Dowload"
# Make commands # Make commands
## N+7 (example) Author ## N+7 (example) Author
Description: Replaces every word with the 7th next word in a dictionary. Description: Replaces every noun with the 7th next noun in a dictionary. Inspired by an Oulipo work of the same name.
run: `make N+7` run: `make N+7`
@ -55,7 +49,6 @@ Python Libaries:
* TermColor: `sudo pip3 install termcolor` * TermColor: `sudo pip3 install termcolor`
* PyAudio: `pip3 install pyaudio` * PyAudio: `pip3 install pyaudio`
## Reading the Structure: Joca ## Reading the Structure: Joca
Description: Uses OCR'ed text as an input, labels each word for Part-of-Speech, stopwords and sentiment. Then it generates a reading interface Description: Uses OCR'ed text as an input, labels each word for Part-of-Speech, stopwords and sentiment. Then it generates a reading interface
where words with a specific label are hidden. Output can be saved as poster, or exported as json featuring the full data set. where words with a specific label are hidden. Output can be saved as poster, or exported as json featuring the full data set.
@ -72,3 +65,24 @@ Specific Dependencies:
* jinja2 (http://jinja.pocoo.org/docs/2.10/intro/#installation) * jinja2 (http://jinja.pocoo.org/docs/2.10/intro/#installation)
* font: PT Sans (os font https://www.fontsquirrel.com/fonts/pt-serif) * font: PT Sans (os font https://www.fontsquirrel.com/fonts/pt-serif)
* font: Ubuntu Mono (os font https://www.fontsquirrel.com/fonts/ubuntu-mono) * font: Ubuntu Mono (os font https://www.fontsquirrel.com/fonts/ubuntu-mono)
## Erase / Replace: Natasha
Description: Receives your scanned pages in order, then analyzes each image and its vocabulary. Finds and crops the least common words, and either erases them, or replaces them with the most common words. Outputs a PDF of increasingly distorted scan images.
for erase script run: `make erase`
for replace script run: `make replace`
Specific Dependencies:
* NLTK English Corpus:
* run NLTK downloader `python -m nltk.downloader`
* select menu "Corpora"
* select "stopwords"
* "Download"
* Python Image Library (PIL): `pip3 install Pillow`
* PDF generation for Python (FPDF): `pip3 install fpdf`
* HTML5lib: `pip3 install html5lib`
Notes & Bugs:
This script is very picky about the input images it can work with. For best results, please use high resolution images in RGB colorspace. Errors can occur when image modes do not match or tesseract cannot successfully make HOCR files.

Loading…
Cancel
Save