OuNuPo/README

# OuNuPo Make
Software experiments for the OuNuPo bookscanner, part of Special Issue 5

https://issue.xpub.nl/05/

https://xpub.nl/


## License

## Authors
Natasha Berting, Angeliki Diakrousi, Joca van der Horst, Alexander Roidl, Alice Strete and Zalán Szakács.


## Clone Repository
`git clone https://git.xpub.nl/repos/OuNuPo-make.git`


## General depencies
* Python3
* GNU make
* Python3 NLTK  `pip3 install nltk`


# Make commands

## N+7 (example) Author
Description: Replaces every noun with the 7th next noun in a dictionary. Inspired by an Oulipo work of the same name.

run: `make N+7`

Specific Dependencies:
* a
* b
* c


## Sitting inside a pocket(sphinx): Angeliki
Description: Speech recognition feedback loops using the first sentence of a scanned text as input

run: `make ttssr-human-only`

Specific Dependencies:

* PocketSphinx pacakge `sudo aptitude install pocketsphinx pocketsphinx-en-us`
Python Libaries:
* PocketSphinx: `sudo pip3 install PocketSphinx`, install dependencies: `sudo apt-get install gcc automake autoconf libtool bison swig python-dev libpulse-dev`
* Speech Recognition: `sudo pip3 install SpeechRecognition`
* TermColor: `sudo pip3 install termcolor`
* PyAudio: `pip3 install pyaudio`

## Reading the Structure: Joca
Description: Uses OCR'ed text as an input, labels each word for Part-of-Speech, stopwords and sentiment. Then it generates a reading interface
where words with a specific label are hidden. Output can be saved as poster, or exported as json featuring the full data set.

run: `make reading_structure`

Specific Dependencies:
* nltk (http://www.nltk.org/install.html)
* nltk.tokenize.punkt, ne_chunk, pos_tag, word_tokenize, sentiment.vader
*
  nltk.download('vader_lexicon')
 (https://www.nltk.org/data.html)
* weasyprint (http://weasyprint.readthedocs.io/en/latest/install.html)
* jinja2 (http://jinja.pocoo.org/docs/2.10/intro/#installation)
* font: PT Sans (os font https://www.fontsquirrel.com/fonts/pt-serif)
* font: Ubuntu Mono (os font https://www.fontsquirrel.com/fonts/ubuntu-mono)

## Erase / Replace: Natasha
Description: Receives your scanned pages in order, then analyzes each image and its vocabulary. Finds and crops the least common words, and either erases them, or replaces them with the most common words. Outputs a PDF of increasingly distorted scan images. 

for erase script run: `make erase`
for replace script run: `make replace`

Specific Dependencies:
* NLTK English Corpus:
    * run NLTK downloader `python -m nltk.downloader`
    * select menu "Corpora"
    * select "stopwords"
    * "Download"
* Python Image Library (PIL):  `pip3 install Pillow` 
* PDF generation for Python (FPDF): `pip3 install fpdf`
* HTML5lib: `pip3 install html5lib`

Notes & Bugs:
This script is very picky about the input images it can work with. For best results, please use high resolution images in RGB colorspace. Errors can occur when image modes do not match or tesseract cannot successfully make HOCR files.
cleared the read me file for xpub 7 years ago			`# OuNuPo Make`
changes to README 7 years ago			`Software experiments for the OuNuPo bookscanner, part of Special Issue 5`

			`https://issue.xpub.nl/05/`

			`https://xpub.nl/`

Moved output list.txt,plain.txt from src/ to output/. src/. Otherwise git will want to track these outputs, when it should only be tracking the tools :) Added some color vars 7 years ago
cleared the read me file for xpub 7 years ago			`## License`
Clarifying target names to avoid doubling tasks. Self-documentation in Makefile. Extensive documentation in README 7 years ago
template in Readme 7 years ago			`## Authors`
changes to README 7 years ago			`Natasha Berting, Angeliki Diakrousi, Joca van der Horst, Alexander Roidl, Alice Strete and Zalán Szakács.`
template in Readme 7 years ago

changes to README 7 years ago			`## Clone Repository`
template in Readme 7 years ago			`git clone https://git.xpub.nl/repos/OuNuPo-make.git`


small README + Makefile file changes to dependencies 7 years ago			`## General depencies`
			`* Python3`
			`* GNU make`
changes to README 7 years ago			* Python3 NLTK `pip3 install nltk`
small README + Makefile file changes to dependencies 7 years ago

template in Readme 7 years ago			`# Make commands`

			`## N+7 (example) Author`
added notes in Readme for erase/replace scripts 7 years ago			`Description: Replaces every noun with the 7th next noun in a dictionary. Inspired by an Oulipo work of the same name.`
template in Readme 7 years ago
			run: `make N+7`

changes to README 7 years ago			`Specific Dependencies:`
template in Readme 7 years ago			`* a`
			`* b`
			`* c`


changes to README 7 years ago			`## Sitting inside a pocket(sphinx): Angeliki`
Edited README 7 years ago			`Description: Speech recognition feedback loops using the first sentence of a scanned text as input`

			run: `make ttssr-human-only`

changes to README 7 years ago			`Specific Dependencies:`
resolved conflict in README 7 years ago
pocket sphinx dependencies 7 years ago			* PocketSphinx pacakge `sudo aptitude install pocketsphinx pocketsphinx-en-us`
angeliki: dependencies 7 years ago			`Python Libaries:`
changes in README 7 years ago			* PocketSphinx: `sudo pip3 install PocketSphinx`, install dependencies: `sudo apt-get install gcc automake autoconf libtool bison swig python-dev libpulse-dev`
pocket sphinx dependencies 7 years ago			* Speech Recognition: `sudo pip3 install SpeechRecognition`
			* TermColor: `sudo pip3 install termcolor`
Updated make command 7 years ago			* PyAudio: `pip3 install pyaudio`
small README + Makefile file changes to dependencies 7 years ago
Added first version of README for Reading the Structure 7 years ago			`## Reading the Structure: Joca`
			`Description: Uses OCR'ed text as an input, labels each word for Part-of-Speech, stopwords and sentiment. Then it generates a reading interface`
			`where words with a specific label are hidden. Output can be saved as poster, or exported as json featuring the full data set.`
template in Readme 7 years ago
Updated make command 7 years ago			run: `make reading_structure`
Clarifying target names to avoid doubling tasks. Self-documentation in Makefile. Extensive documentation in README 7 years ago
Added first version of README for Reading the Structure 7 years ago			`Specific Dependencies:`
Added install links 7 years ago			`* nltk (http://www.nltk.org/install.html)`
Updated make command 7 years ago			`* nltk.tokenize.punkt, ne_chunk, pos_tag, word_tokenize, sentiment.vader`
			`*`
			`nltk.download('vader_lexicon')`
			`(https://www.nltk.org/data.html)`
Added install links 7 years ago			`* weasyprint (http://weasyprint.readthedocs.io/en/latest/install.html)`
			`* jinja2 (http://jinja.pocoo.org/docs/2.10/intro/#installation)`
Added first version of README for Reading the Structure 7 years ago			`* font: PT Sans (os font https://www.fontsquirrel.com/fonts/pt-serif)`
			`* font: Ubuntu Mono (os font https://www.fontsquirrel.com/fonts/ubuntu-mono)`
added notes in Readme for erase/replace scripts 7 years ago
			`## Erase / Replace: Natasha`
			`Description: Receives your scanned pages in order, then analyzes each image and its vocabulary. Finds and crops the least common words, and either erases them, or replaces them with the most common words. Outputs a PDF of increasingly distorted scan images.`

			for erase script run: `make erase`
			for replace script run: `make replace`

			`Specific Dependencies:`
			`* NLTK English Corpus:`
			* run NLTK downloader `python -m nltk.downloader`
			`* select menu "Corpora"`
			`* select "stopwords"`
			`* "Download"`
			* Python Image Library (PIL): `pip3 install Pillow`
			* PDF generation for Python (FPDF): `pip3 install fpdf`
			* HTML5lib: `pip3 install html5lib`

			`Notes & Bugs:`
			`This script is very picky about the input images it can work with. For best results, please use high resolution images in RGB colorspace. Errors can occur when image modes do not match or tesseract cannot successfully make HOCR files.`