XPUB

S13-Words-for-the-Future-notebooks

You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

6.1 KiB

Raw Blame History

Weasyprint¶

Weasyprint is a python library to layout HTML (and CSS) as print pages, saving to a PDF. In this way, it can be a part of a "web to print" workflow.

https://weasyprint.readthedocs.io/en/latest/tutorial.html

In [5]:

from weasyprint import HTML, CSS
from weasyprint.fonts import FontConfiguration

In [6]:

# If you use @font-face in your stylesheet, you would need Weasyprint's FontConfiguration()
font_config = FontConfiguration()

HTML¶

The main class that weasyprint is HTML, it represents an HTML document, and provides functions to save as PDF (or PNG). When creating an HTML object you can specify the HTML either via HTML source as a string (via the string option), a file (via the filename option), or even an online page (via url).

In [7]:

html = HTML(string='<h1>hello</h1>')

In [4]:

html = HTML(filename="path/to/some.html")

In [2]:

html = HTML(url="https://pzwiki.wdka.nl/mediadesign/Category:WordsfortheFuture")

The CSS class lets you include an (additional) CSS file. Just as with the HTML class, you can give a string, filename, or URL. If the HTML already has stylesheets, they will be combined. (is this true?)

In [12]:

css = CSS(string='''
@page{
        size: A4;
        margin: 15mm;
        background-color: lightgrey;
        font-family: monospace;
        font-size: 8pt;
        color: red;
        border:1px dotted red;
        
        @top-left{
            content: "natural";
        }
        @top-center{
            content: "language";
        }
        @top-right{
            content: "artificial";
        }
        @top-middle{
            content: ""
        }
        @left-top{
            content: "computer control";
        }
        @right-top{
            content: "markup";
        }
        @bottom-left{
            content: "formal";
        }
        @bottom-center{
            content: "programming";
        }
        @bottom-right{
            content: "machine";
        }
    }
    body{
        font-family: serif;
        font-size: 12pt;
        line-height: 1.4;
        color: magenta;
    }
    h1{
        width: 100%;
        text-align: center;
        font-size: 250%;
        line-height: 1.25;
        color: orange;
    }
    strong{
        color: blue;
    }
    em{
        color: green;
    }


''', font_config=font_config)

In [8]:

html.write_pdf('mydocument.pdf', font_config=font_config)

Using NLTK to automatically markup a (plain) text with POS tags¶

In [9]:

import nltk

txt = open('txt/language.txt').read()
words = nltk.word_tokenize(txt)
tagged_words = nltk.pos_tag(words)

In [10]:

content = ''
content += '<h1>Language and Software Studies, by Florian Cramer</h1>'

for word, tag in tagged_words:
    content += f'<span class="{tag}">{ word }</span> '

with open("txt/language.html", "w") as f:
    f.write(f"""<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8">
    <link rel="stylesheet" type="text/css" href="language.css">
    <title></title>
</head>
<body>
{content}
</body>
""")

In [11]:

html = HTML("txt/language.html")
html.write_pdf('txt/language.pdf', font_config=font_config)

In [ ]:

6.1 KiB Raw Blame History

Weasyprint¶

HTML¶

Using NLTK to automatically markup a (plain) text with POS tags¶

6.1 KiB

Raw Blame History