DIY Book Scanner Workflow

## Getting started These set of scripts was written for the Text Laundrette workshop. It is a workflow to turn the pictures from the DIY Book Scanner into a final OCRed PDF. In case you want to skip any of the scripts just comment out in the shell code, workshop_stream.sh. ##Dependencies ###Brew (MAC) or apt-get (LINUX)

You’ll need the command-line tools for Xcode installed.

```bash xcode-select --install ```

After install Homebrew.

```bash ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" ```

Run the following command once you’re done to ensure Homebrew is installed and working properly:

```bash brew doctor ``` ```bash sudo apt-get install python3 python3-pip imagemagick poppler pdfunite ``` ```bash brew install python3 python3-pip imagemagick poppler pdfunite ``` ###PIP3 sudo pip3 install pdf2image Pillow time logging opencv-python pytesseract ##How to use

Add your pictures from the book scanner to the folder "/scans"

Make all the files executable.

```bash sudo chmod 777 merge_scans.sh workshop_stream.sh marge_files.sh ```

Run ./workshop_stream.sh

Wait :)

##Aditional information ###Create 5 directories ```bash mkdir split mkdir rotated mkdir ocred mkdir bounding_box mkdir cropped ``` ###Merge the files in the directory scans

All the scans will be appended to one pdf called out.pdf

```bash ./merge_scans.sh ``` ###Burst the pdf in scans

Burst this pdf, renaming all the files so they can be iterated later.

```bash python3 burstpdf.py ``` ###Rotate the pdfs

The book scanner takes pictures of the pdfs, this scrip iterates through the odd and even pages rotating them to their original position.

```bash python3 rotation.py ``` ###Cropping the bounding boxes

The pages are now in their original position, but they have a bounding box. This script iterates through them and crops the highest contrast area found.

```bash python3 bounding_box.py ``` ###Cropping the mirror

The pages are now cropped, but the mirror is still visible in the middle.

```bash python3 mirror_crop.py ``` ###OCR

In this part we OCR the jpg, turning these into PDFs.

```bash python3 tesseract_ocr.py ``` ###Merge all the files and create the pdf

The OCRed pages are now joined into their final PDF, your book is ready :)

```bash ./merge_files.sh ``` ## License The package is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).