Software experiments for the OuNuPo bookscanner. Part of Special Issue 5.
You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 
ange e52f2280bf Licensed my scripts 6 years ago
ocr fixed RGBA bug in replace script 6 years ago
src Licensed my scripts 6 years ago
.gitignore edited the gitignore file 6 years ago
HELP-makefile.md cleared the read me file for xpub 6 years ago
Makefile fixed RGBA bug in replace script 6 years ago
README Licensed my scripts 6 years ago

README

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

# OuNuPo Make
Software experiments for the OuNuPo bookscanner, part of Special Issue 5

https://issue.xpub.nl/05/

https://xpub.nl/


## Licenses
© 2018 WTFPL  Do What the Fuck You Want to Public License
© 2018 BSD 3-Clause  Berkeley Software Distribution 

## Authors
Natasha Berting, Angeliki Diakrousi, Joca van der Horst, Alexander Roidl, Alice Strete and Zalán Szakács.


## Clone Repository
`git clone https://git.xpub.nl/repos/OuNuPo-make.git`


## General depencies
* Python3
* GNU make
* Python3 NLTK  `pip3 install nltk`


# Make commands

## N+7 (example) Author
Description: Replaces every noun with the 7th next noun in a dictionary. Inspired by an Oulipo work of the same name.

run: `make N+7`

Specific Dependencies:
* a
* b
* c


## Sitting inside a pocket(sphinx): Angeliki
Description: Speech recognition feedback loops using the first sentence of a scanned text as input

run: `make ttssr-human-only`

Specific Dependencies:

* PocketSphinx package `sudo aptitude install pocketsphinx pocketsphinx-en-us`
* PocketSphinx: `sudo pip3 install PocketSphinx`
* Python Libaries:`sudo apt-get install gcc automake autoconf libtool bison swig python-dev libpulse-dev`
* Speech Recognition: `sudo pip3 install SpeechRecognition`
* TermColor: `sudo pip3 install termcolor`
* PyAudio: `pip3 install pyaudio`

## Reading the Structure: Joca
Description: Uses OCR'ed text as an input, labels each word for Part-of-Speech, stopwords and sentiment. Then it generates a reading interface
where words with a specific label are hidden. Output can be saved as poster, or exported as json featuring the full data set.

run: `make reading_structure`

Specific Dependencies:
* nltk (http://www.nltk.org/install.html)
* nltk.tokenize.punkt, ne_chunk, pos_tag, word_tokenize, sentiment.vader
*
  nltk.download('vader_lexicon')
 (https://www.nltk.org/data.html)
* weasyprint (http://weasyprint.readthedocs.io/en/latest/install.html)
* jinja2 (http://jinja.pocoo.org/docs/2.10/intro/#installation)
* font: PT Sans (os font https://www.fontsquirrel.com/fonts/pt-serif)
* font: Ubuntu Mono (os font https://www.fontsquirrel.com/fonts/ubuntu-mono)

## Erase / Replace: Natasha
Description: Receives your scanned pages in order, then analyzes each image and its vocabulary. Finds and crops the least common words, and either erases them, or replaces them with the most common words. Outputs a PDF of increasingly distorted scan images. 

for erase script run: `make erase`
for replace script run: `make replace`

Specific Dependencies:
* NLTK English Corpus:
    * run NLTK downloader `python -m nltk.downloader`
    * select menu "Corpora"
    * select "stopwords"
    * "Download"
* Python Image Library (PIL):  `pip3 install Pillow` 
* PDF generation for Python (FPDF): `pip3 install fpdf`
* HTML5lib: `pip3 install html5lib`

Notes & Bugs:
This script is very picky about the input images it can work with. For best results, please use high resolution images in RGB colorspace. Errors can occur when image modes do not match or tesseract cannot successfully make HOCR files.