From 7b7def3b7888c74e7738bd7b8b9f9b79ac6e54a6 Mon Sep 17 00:00:00 2001 From: jvdhorst Date: Fri, 23 Mar 2018 16:21:23 +0100 Subject: [PATCH] Added first version of README for Reading the Structure --- README | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/README b/README index bea5d99..60b15d1 100644 --- a/README +++ b/README @@ -48,11 +48,18 @@ run: `make ttssr-human-only` Specific Dependencies: * [pocketsphinx](https://github.com/bambocher/pocketsphinx-python) `sudo pip3 install pocketsphinx` ---> FOLLOW THIS EXAMPLE -* SpeechRecognition 3.8.1 -* PyAudio - - - +* SpeechRecognition 3.8.1 +* PyAudio +## Reading the Structure: Joca +Description: Uses OCR'ed text as an input, labels each word for Part-of-Speech, stopwords and sentiment. Then it generates a reading interface +where words with a specific label are hidden. Output can be saved as poster, or exported as json featuring the full data set. +run: `make output/reading_structure/index.html` +Specific Dependencies: +* nltk: nltk.tokenize.punkt, ne_chunk, pos_tag, word_tokenize, sentiment.vader +* weasyprint +* jinja2 +* font: PT Sans (os font https://www.fontsquirrel.com/fonts/pt-serif) +* font: Ubuntu Mono (os font https://www.fontsquirrel.com/fonts/ubuntu-mono)