OuNuPo

Software experiments for the OuNuPo bookscanner. Part of Special Issue 5.

You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

Go to file

jvdhorst 0b520856e4 Changes to poster export		7 years ago
ocr	Changes to poster export	7 years ago
src	Changes to poster export	7 years ago
.gitignore	cleared the read me file for xpub	7 years ago
HELP-makefile.md	cleared the read me file for xpub	7 years ago
Makefile	Merge branch 'master' of git.xpub.nl:/var/www/git.xpub.nl/repos/OuNuPo-make	7 years ago
README	changes in README	7 years ago

README

# OuNuPo Make
Software experiments for the OuNuPo bookscanner, part of Special Issue 5

https://issue.xpub.nl/05/

https://xpub.nl/


## License

## Authors
Natasha Berting, Angeliki Diakrousi, Joca van der Horst, Alexander Roidl, Alice Strete and Zalán Szakács.


## Clone Repository
`git clone https://git.xpub.nl/repos/OuNuPo-make.git`


## General depencies
* Python3
* GNU make
* Python3 NLTK  `pip3 install nltk`
* NLTK English Corpus:
    * run NLTK downloader `python -m nltk.downloader`
    * select menu "Corpora"
    * select "stopwords"
    * "Dowload"



# Make commands

## N+7 (example) Author
Description: Replaces every word with the 7th next word in a dictionary.

run: `make N+7`

Specific Dependencies:
* a
* b
* c


## Sitting inside a pocket(sphinx): Angeliki
Description: Speech recognition feedback loops using the first sentence of a scanned text as input

run: `make ttssr-human-only`

Specific Dependencies:

* PocketSphinx pacakge `sudo aptitude install pocketsphinx pocketsphinx-en-us`
Python Libaries:
* PocketSphinx: `sudo pip3 install PocketSphinx`, install dependencies: `sudo apt-get install gcc automake autoconf libtool bison swig python-dev libpulse-dev`
* Speech Recognition: `sudo pip3 install SpeechRecognition`
* TermColor: `sudo pip3 install termcolor`
* PyAudio: `pip3 install pyaudio`


## Reading the Structure: Joca
Description: Uses OCR'ed text as an input, labels each word for Part-of-Speech, stopwords and sentiment. Then it generates a reading interface
where words with a specific label are hidden. Output can be saved as poster, or exported as json featuring the full data set.

run: `make reading_structure`

Specific Dependencies:
* nltk (http://www.nltk.org/install.html)
* nltk.tokenize.punkt, ne_chunk, pos_tag, word_tokenize, sentiment.vader
*
  nltk.download('vader_lexicon')
 (https://www.nltk.org/data.html)
* weasyprint (http://weasyprint.readthedocs.io/en/latest/install.html)
* jinja2 (http://jinja.pocoo.org/docs/2.10/intro/#installation)
* font: PT Sans (os font https://www.fontsquirrel.com/fonts/pt-serif)
* font: Ubuntu Mono (os font https://www.fontsquirrel.com/fonts/ubuntu-mono)