# Markdown - HTML - print

In [11]:
import pypandoc
from weasyprint import HTML, CSS
from weasyprint.fonts import FontConfiguration

## Markdown → HTML

Pandoc: "If you need to convert files from one markup format into another, **pandoc is your swiss-army knife**."

https://pandoc.org/

The Python library for Pandoc:

https://github.com/bebraw/pypandoc 


### Convert a Markdown file to HTML ...


In [39]:
# ... directly from a file
html = pypandoc.convert_file('language.md', 'html')
print(html)

<h1 id="language">Language</h1>
<h2 id="florian-cramer">Florian Cramer</h2>
<p>Software and language are <strong>intrinsically</strong> related, since software may process language, and is constructed in language. Yet <em>language</em> means different things in the context of <sub>computing</sub>: formal languages in which algorithms are expressed and software is implemented, and in so-called “natural” spoken languages. There are at least two layers of formal language in software: programming language in which the software is written, and the language implemented within the software as its symbolic controls. In the case of compilers, shells, and macro languages, for example, these layers can overlap. “Natural” language is what can be processed as data by software; since this processing is formal, however, it is restricted to syntactical operations. While differentiation of computer programming languages as “artificial languages” from languages like English as “natural languages” is con

In [37]:
# ... or from a pad

from urllib.request import urlopen

url = 'https://pad.xpub.nl/p/language/export/txt'
response = urlopen(url)
md = response.read().decode('UTF-8')

with open('language.md', 'w') as f:
    f.write(md)

In [65]:
html = pypandoc.convert_file('language.md', 'html')
print(html)

<h1 id="language">Language</h1>
<h2 id="florian-cramer">Florian Cramer</h2>
<p>Software and language are <strong>intrinsically</strong> related, since software may process language, and is constructed in language. Yet <em>language</em> means different things in the context of <sub>computing</sub>: formal languages in which algorithms are expressed and software is implemented, and in so-called “natural” spoken languages. There are at least two layers of formal language in software: programming language in which the software is written, and the language implemented within the software as its symbolic controls. In the case of compilers, shells, and macro languages, for example, these layers can overlap. “Natural” language is what can be processed as data by software; since this processing is formal, however, it is restricted to syntactical operations. While differentiation of computer programming languages as “artificial languages” from languages like English as “natural languages” is con

## HTML → PDF

for this we can use Weasyprint again

In [66]:
html = HTML(string=html)
print(html)

<weasyprint.HTML object at 0xaeefaeb0>


In [67]:
css = CSS(string='''
@page{
        size: A4;
        margin: 15mm;
    
        counter-increment: page;
        
        @top-left{
            content: "hello?";
        }
        @top-center{
            content: counter(page);
            font-size: 7pt;
            font-family: monospace;
            color: blue;
        }
        @bottom-center{
            content: "this is the bottom center!";
        }
    }
    
    body{
        color: magenta;
    }
''')

It's actually interesting and useful to have a close look at paged media properties in CSS: 

https://developer.mozilla.org/en-US/docs/Web/CSS/%40page/size

In [68]:
html.write_pdf('language.pdf', stylesheets=[css])