You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
72 lines
2.0 KiB
Plaintext
72 lines
2.0 KiB
Plaintext
# OuNuPo Make
|
|
Software experiments for the OuNuPo bookscanner, part of Special Issue 5
|
|
|
|
https://issue.xpub.nl/05/
|
|
|
|
https://xpub.nl/
|
|
|
|
|
|
## License
|
|
|
|
## Authors
|
|
Natasha Berting, Angeliki Diakrousi, Joca van der Horst, Alexander Roidl, Alice Strete and Zalán Szakács.
|
|
|
|
|
|
## Clone Repository
|
|
`git clone https://git.xpub.nl/repos/OuNuPo-make.git`
|
|
|
|
|
|
## General depencies
|
|
* Python3
|
|
* GNU make
|
|
* Python3 NLTK `pip3 install nltk`
|
|
* NLTK English Corpus:
|
|
* run NLTK downloader `python -m nltk.downloader`
|
|
* select menu "Corpora"
|
|
* select "stopwords"
|
|
* "Dowload"
|
|
|
|
|
|
|
|
# Make commands
|
|
|
|
## N+7 (example) Author
|
|
Description: Replaces every word with the 7th next word in a dictionary.
|
|
|
|
run: `make N+7`
|
|
|
|
Specific Dependencies:
|
|
* a
|
|
* b
|
|
* c
|
|
|
|
|
|
## Sitting inside a pocket(sphinx): Angeliki
|
|
Description: Speech recognition feedback loops using the first sentence of a scanned text as input
|
|
|
|
run: `make ttssr-human-only`
|
|
|
|
Specific Dependencies:
|
|
|
|
* PocketSphinx pacakge `sudo aptitude install pocketsphinx pocketsphinx-en-us`
|
|
Python Libaries:
|
|
* PocketSphinx: `sudo pip3 install PocketSphinx`
|
|
* Speech Recognition: `sudo pip3 install SpeechRecognition`
|
|
* TermColor: `sudo pip3 install termcolor`
|
|
* PyAudio: `pip3 install pyaudio`
|
|
|
|
|
|
## Reading the Structure: Joca
|
|
Description: Uses OCR'ed text as an input, labels each word for Part-of-Speech, stopwords and sentiment. Then it generates a reading interface
|
|
where words with a specific label are hidden. Output can be saved as poster, or exported as json featuring the full data set.
|
|
|
|
run: `make output/reading_structure/index.html`
|
|
|
|
Specific Dependencies:
|
|
* nltk (http://www.nltk.org/install.html)
|
|
* nltk.tokenize.punkt, ne_chunk, pos_tag, word_tokenize, sentiment.vader (https://www.nltk.org/data.html)
|
|
* weasyprint (http://weasyprint.readthedocs.io/en/latest/install.html)
|
|
* jinja2 (http://jinja.pocoo.org/docs/2.10/intro/#installation)
|
|
* font: PT Sans (os font https://www.fontsquirrel.com/fonts/pt-serif)
|
|
* font: Ubuntu Mono (os font https://www.fontsquirrel.com/fonts/ubuntu-mono)
|