DIY_Book_Scanner_Workflow/readme.md

<h1 align="center">DIY Book Scanner Workflow</h1>

## Getting started

These set of scripts was written for the Text Laundrette workshop.<br>It is a workflow to turn the pictures from the DIY Book Scanner into a final OCRed PDF.

In case you want to skip any of the scripts just comment out in the shell code, <em>workshop_stream.sh</em>.

##Dependencies
###Brew (MAC) or apt-get (LINUX)
<p>You’ll need the command-line tools for Xcode installed.</p>

```bash
xcode-select --install
```

<p>After install Homebrew.</p>

```bash
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
```

<p>Run the following command once you’re done to ensure Homebrew is installed and working properly:</p>

```bash
brew doctor
```

```bash
sudo apt-get install python3 python3-pip imagemagick poppler pdfunite
```

```bash
brew install python3 python3-pip imagemagick poppler pdfunite
```

###PIP3
sudo pip3 install pdf2image Pillow time logging opencv-python pytesseract


##How to use
<p>Add your pictures from the book scanner to the folder "/scans"</p>

<p>Make all the files executable.</p>

```bash
sudo chmod 777 merge_scans.sh workshop_stream.sh marge_files.sh
```

<p>Run ./workshop_stream.sh</p>

<p>Wait :)</p>


##Aditional information
###Create 5 directories

```bash
mkdir split
mkdir rotated
mkdir ocred
mkdir bounding_box
mkdir cropped
```
###Merge the files in the directory <em>scans</em>
<p>All the scans will be appended to one pdf called out.pdf</p>
```bash
./merge_scans.sh
```

###Burst the pdf in <em>scans</em>
<p>Burst this pdf, renaming all the files so they can be iterated later.</p>
```bash
python3 burstpdf.py
```

###Rotate the pdfs
<p>The book scanner takes pictures of the pdfs, this scrip iterates through the odd and even pages rotating them to their original position.</p>
```bash
python3 rotation.py
```

###Cropping the bounding boxes
<p>The pages are now in their original position, but they have a bounding box. This script iterates through them and crops the highest contrast area found.</p>
```bash
python3 bounding_box.py
```

###Cropping the mirror
<p>The pages are now cropped, but the mirror is still visible in the middle.</p>
```bash
python3 mirror_crop.py
```

###OCR
<p>In this part we OCR the jpg, turning these into PDFs.</p>
```bash
python3 tesseract_ocr.py
```

###Merge all the files and create the pdf
<p>The OCRed pages are now joined into their final PDF, your book is ready :)</p>
```bash
./merge_files.sh
```

## License
The package is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
-												Added readme file

											
										
										
											5 years ago
+								<h1 align="center">DIY Book Scanner Workflow</h1>
 								## Getting started
-												Update 'readme.md'

											
										
										
											5 years ago
+								These set of scripts was written for the Text Laundrette workshop.<br>It is a workflow to turn the pictures from the DIY Book Scanner into a final OCRed PDF.
-												Added readme file

											
										
										
											5 years ago
 								In case you want to skip any of the scripts just comment out in the shell code, <em>workshop_stream.sh</em>.
 								##Dependencies
 								###Brew (MAC) or apt-get (LINUX)
 								<p>You’ll need the command-line tools for Xcode installed.</p>
-												Update 'readme.md'

											
										
										
											5 years ago
-												Added readme file

											
										
										
											5 years ago
+								```bash
 								xcode-select --install
 								```
 								<p>After install Homebrew.</p>
-												Update 'readme.md'

											
										
										
											5 years ago
-												Added readme file

											
										
										
											5 years ago
+								```bash
 								ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
 								```
 								<p>Run the following command once you’re done to ensure Homebrew is installed and working properly:</p>
-												Update 'readme.md'

											
										
										
											5 years ago
-												Added readme file

											
										
										
											5 years ago
+								```bash
 								brew doctor
 								```
 								```bash
 								sudo apt-get install python3 python3-pip imagemagick poppler pdfunite
 								```
 								```bash
 								brew install python3 python3-pip imagemagick poppler pdfunite
 								```
 								###PIP3
 								sudo pip3 install pdf2image Pillow time logging opencv-python pytesseract
 								##How to use
 								<p>Add your pictures from the book scanner to the folder "/scans"</p>
 								<p>Make all the files executable.</p>
-												Update 'readme.md'

											
										
										
											5 years ago
-												Added readme file

											
										
										
											5 years ago
+								```bash
 								sudo chmod 777 merge_scans.sh workshop_stream.sh marge_files.sh
 								```
 								<p>Run ./workshop_stream.sh</p>
 								<p>Wait :)</p>
 								##Aditional information
 								###Create 5 directories
-												Update 'readme.md'

											
										
										
											5 years ago
-												Added readme file

											
										
										
											5 years ago
+								```bash
 								mkdir split
 								mkdir rotated
 								mkdir ocred
 								mkdir bounding_box
 								mkdir cropped
 								```
 								###Merge the files in the directory <em>scans</em>
 								<p>All the scans will be appended to one pdf called out.pdf</p>
 								```bash
 								./merge_scans.sh
 								```
 								###Burst the pdf in <em>scans</em>
 								<p>Burst this pdf, renaming all the files so they can be iterated later.</p>
 								```bash
 								python3 burstpdf.py
 								```
 								###Rotate the pdfs
 								<p>The book scanner takes pictures of the pdfs, this scrip iterates through the odd and even pages rotating them to their original position.</p>
 								```bash
 								python3 rotation.py
 								```
 								###Cropping the bounding boxes
 								<p>The pages are now in their original position, but they have a bounding box. This script iterates through them and crops the highest contrast area found.</p>
 								```bash
 								python3 bounding_box.py
 								```
 								###Cropping the mirror
 								<p>The pages are now cropped, but the mirror is still visible in the middle.</p>
 								```bash
 								python3 mirror_crop.py
 								```
 								###OCR
 								<p>In this part we OCR the jpg, turning these into PDFs.</p>
 								```bash
 								python3 tesseract_ocr.py
 								```
 								###Merge all the files and create the pdf
 								<p>The OCRed pages are now joined into their final PDF, your book is ready :)</p>
 								```bash
 								./merge_files.sh
 								```
 								## License
 								The package is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).