DIY_Book_Scanner_Workflow/readme.md

<h1 align="center">DIY Book Scanner Workflow</h1>

## Getting started

This set of scripts was written for the Text Laundrette workshop. The workshop takes place in the Publication Station, WDkA building.<br> Rotterdam, 03-02-2020<br>It is a workflow to turn the pictures from the DIY Book Scanner into a final OCRed PDF.<br>
<br>
## About the Workshop

<em>DESCRIPTION</em>
<p>We will use a home-made, DIY book scanner, and open-source software to scan, process, and add digital features to printed texts brought by the participants to the workshop. Ultimately, we will include them in the “bootleg library”, a shadow library accessible over a local network.</p>

<p>Shadow libraries operate outside of legal copyright frameworks, in response to decreased open access to knowledge. This workshop aims to extend our research on libraries, their sociability, and methods by which we can add provenance to texts included in public or private, legal or extra-legal collections.</p>

<p>Participants should bring: a printed text, which they’d like to digitize and share.</p>

<br><br>
##Dependencies
###Brew (MAC) or apt-get (LINUX)
<p>You’ll need the command-line tools for Xcode installed.</p>

```bash
xcode-select --install
```

<p>After install Homebrew.</p>

```bash
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
```

<p>Run the following command once you’re done to ensure Homebrew is installed and working properly:</p>

```bash
brew doctor
```

```bash
sudo apt-get install python3 python3-pip imagemagick poppler pdfunite
```

```bash
brew install python3 python3-pip imagemagick poppler pdfunite
```
<br>
###PIP3
sudo pip3 install pdf2image Pillow time logging opencv-python pytesseract

<br>
##How to use
<p>Add your pictures from the book scanner to the folder "/scans"</p>

<p>Make all the files executable.</p>

```bash
sudo chmod 777 merge_scans.sh workshop_stream.sh marge_files.sh
```

<p>In case you want to skip any of the scripts just comment out in the shell code, <em>workshop_stream.sh</em>.</p>

<p>Run ./workshop_stream.sh</p>


<p>Wait :)</p>

<br><br>
##Aditional information
The workflow follows these scripts, by successive order:

###Create 5 directories

```bash
mkdir split
mkdir rotated
mkdir ocred
mkdir bounding_box
mkdir cropped
```
###Merge the files in the directory <em>scans</em>
<p>All the scans will be appended to one pdf called out.pdf</p>
```bash
./merge_scans.sh
```

###Burst the pdf in <em>scans</em>
<p>Burst this pdf, renaming all the files so they can be iterated later.</p>
```bash
python3 burstpdf.py
```

###Rotate the pdfs
<p>The book scanner takes pictures of the pdfs, this scrip iterates through the odd and even pages rotating them to their original position.</p>
```bash
python3 rotation.py
```

###Cropping the bounding boxes
<p>The pages are now in their original position, but they have a bounding box. This script iterates through them and crops the highest contrast area found.</p>
```bash
python3 bounding_box.py
```

###Cropping the mirror
<p>The pages are now cropped, but the mirror is still visible in the middle.</p>
```bash
python3 mirror_crop.py
```

###OCR
<p>In this part we OCR the jpg, turning these into PDFs.</p>
```bash
python3 tesseract_ocr.py
```

###Merge all the files and create the pdf
<p>The OCRed pages are now joined into their final PDF, your book is ready :)</p>
```bash
./merge_files.sh
```
<br><br>
## License
The package is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
-												Added readme file

											
										
										
											5 years ago
+								<h1 align="center">DIY Book Scanner Workflow</h1>
 								## Getting started
-												Update 'readme.md'

											
										
										
											5 years ago
+								This set of scripts was written for the Text Laundrette workshop. The workshop takes place in the Publication Station, WDkA building.<br> Rotterdam, 03-02-2020<br>It is a workflow to turn the pictures from the DIY Book Scanner into a final OCRed PDF.<br>
-												Update 'readme.md'

											
										
										
											5 years ago
+								<br>
-												Update 'readme.md'

											
										
										
											5 years ago
+								## About the Workshop
-												Update 'readme.md'

											
										
										
											5 years ago
+								<em>DESCRIPTION</em>
 								<p>We will use a home-made, DIY book scanner, and open-source software to scan, process, and add digital features to printed texts brought by the participants to the workshop. Ultimately, we will include them in the “bootleg library”, a shadow library accessible over a local network.</p>
-												Update 'readme.md'

											
										
										
											5 years ago
-												Update 'readme.md'

											
										
										
											5 years ago
+								<p>Shadow libraries operate outside of legal copyright frameworks, in response to decreased open access to knowledge. This workshop aims to extend our research on libraries, their sociability, and methods by which we can add provenance to texts included in public or private, legal or extra-legal collections.</p>
-												Update 'readme.md'

											
										
										
											5 years ago
-												Update 'readme.md'

											
										
										
											5 years ago
+								<p>Participants should bring: a printed text, which they’d like to digitize and share.</p>
-												Added readme file

											
										
										
											5 years ago
-												Update 'readme.md'

											
										
										
											5 years ago
+								<br><br>
-												Added readme file

											
										
										
											5 years ago
+								##Dependencies
 								###Brew (MAC) or apt-get (LINUX)
 								<p>You’ll need the command-line tools for Xcode installed.</p>
-												Update 'readme.md'

											
										
										
											5 years ago
-												Added readme file

											
										
										
											5 years ago
+								```bash
 								xcode-select --install
 								```
 								<p>After install Homebrew.</p>
-												Update 'readme.md'

											
										
										
											5 years ago
-												Added readme file

											
										
										
											5 years ago
+								```bash
 								ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
 								```
 								<p>Run the following command once you’re done to ensure Homebrew is installed and working properly:</p>
-												Update 'readme.md'

											
										
										
											5 years ago
-												Added readme file

											
										
										
											5 years ago
+								```bash
 								brew doctor
 								```
 								```bash
 								sudo apt-get install python3 python3-pip imagemagick poppler pdfunite
 								```
 								```bash
 								brew install python3 python3-pip imagemagick poppler pdfunite
 								```
-												Update 'readme.md'

											
										
										
											5 years ago
+								<br>
-												Added readme file

											
										
										
											5 years ago
+								###PIP3
 								sudo pip3 install pdf2image Pillow time logging opencv-python pytesseract
-												Update 'readme.md'

											
										
										
											5 years ago
+								<br>
-												Added readme file

											
										
										
											5 years ago
+								##How to use
 								<p>Add your pictures from the book scanner to the folder "/scans"</p>
 								<p>Make all the files executable.</p>
-												Update 'readme.md'

											
										
										
											5 years ago
-												Added readme file

											
										
										
											5 years ago
+								```bash
 								sudo chmod 777 merge_scans.sh workshop_stream.sh marge_files.sh
 								```
-												Update 'readme.md'

											
										
										
											5 years ago
+								<p>In case you want to skip any of the scripts just comment out in the shell code, <em>workshop_stream.sh</em>.</p>
-												Added readme file

											
										
										
											5 years ago
+								<p>Run ./workshop_stream.sh</p>
-												Update 'readme.md'

											
										
										
											5 years ago
-												Added readme file

											
										
										
											5 years ago
+								<p>Wait :)</p>
-												Update 'readme.md'

											
										
										
											5 years ago
+								<br><br>
-												Added readme file

											
										
										
											5 years ago
+								##Aditional information
-												Update 'readme.md'

											
										
										
											5 years ago
+								The workflow follows these scripts, by successive order:
-												Added readme file

											
										
										
											5 years ago
+								###Create 5 directories
-												Update 'readme.md'

											
										
										
											5 years ago
-												Added readme file

											
										
										
											5 years ago
+								```bash
 								mkdir split
 								mkdir rotated
 								mkdir ocred
 								mkdir bounding_box
 								mkdir cropped
 								```
 								###Merge the files in the directory <em>scans</em>
 								<p>All the scans will be appended to one pdf called out.pdf</p>
 								```bash
 								./merge_scans.sh
 								```
 								###Burst the pdf in <em>scans</em>
 								<p>Burst this pdf, renaming all the files so they can be iterated later.</p>
 								```bash
 								python3 burstpdf.py
 								```
 								###Rotate the pdfs
 								<p>The book scanner takes pictures of the pdfs, this scrip iterates through the odd and even pages rotating them to their original position.</p>
 								```bash
 								python3 rotation.py
 								```
 								###Cropping the bounding boxes
 								<p>The pages are now in their original position, but they have a bounding box. This script iterates through them and crops the highest contrast area found.</p>
 								```bash
 								python3 bounding_box.py
 								```
 								###Cropping the mirror
 								<p>The pages are now cropped, but the mirror is still visible in the middle.</p>
 								```bash
 								python3 mirror_crop.py
 								```
 								###OCR
 								<p>In this part we OCR the jpg, turning these into PDFs.</p>
 								```bash
 								python3 tesseract_ocr.py
 								```
 								###Merge all the files and create the pdf
 								<p>The OCRed pages are now joined into their final PDF, your book is ready :)</p>
 								```bash
 								./merge_files.sh
 								```
-												Update 'readme.md'

											
										
										
											5 years ago
+								<br><br>
-												Added readme file

											
										
										
											5 years ago
+								## License
 								The package is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).