You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
248 lines
6.0 KiB
Plaintext
248 lines
6.0 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Weasyprint"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from weasyprint import HTML, CSS\n",
|
|
"from weasyprint.fonts import FontConfiguration"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"https://weasyprint.readthedocs.io/en/latest/tutorial.html"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# If you use @font-face in your stylesheet, you would need Weasyprint's FontConfiguration()\n",
|
|
"font_config = FontConfiguration()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## HTML"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# small example HTML object\n",
|
|
"html = HTML(string='<h1>hello</h1>')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"or in this case let's use python + nltk to make a custom HTML page with parts of speech used as CSS classes..."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 42,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import nltk\n",
|
|
"\n",
|
|
"txt = open('txt/language.txt').read()\n",
|
|
"words = nltk.word_tokenize(txt)\n",
|
|
"tagged_words = nltk.pos_tag(words)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 23,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"content = ''\n",
|
|
"content += '<h1>Language and Software Studies, by Florian Cramer</h1>'\n",
|
|
"\n",
|
|
"for word, tag in tagged_words:\n",
|
|
" content += f'<span class=\"{tag}\">{ word }</span> '\n",
|
|
"\n",
|
|
"with open(\"txt/language.html\", \"w\") as f:\n",
|
|
" f.write(f\"\"\"<!DOCTYPE html>\n",
|
|
"<html>\n",
|
|
"<head>\n",
|
|
" <meta charset=\"utf-8\">\n",
|
|
" <link rel=\"stylesheet\" type=\"text/css\" href=\"language.css\">\n",
|
|
" <title></title>\n",
|
|
"</head>\n",
|
|
"<body>\n",
|
|
"{content}\n",
|
|
"</body>\n",
|
|
"\"\"\")\n",
|
|
"\n",
|
|
"html = HTML(\"txt/language.html\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Saved to [language.html](txt/language.html). Fun fact: jupyter filters HTML pages that are displayed in the notebook. To see the HTML unfiltered, use an iframe (as below), or right-click and select Open in New Tab in the file list.\n",
|
|
"\n",
|
|
"Maybe useful evt. https://stackoverflow.com/questions/23358444/how-can-i-use-word-tokenize-in-nltk-and-keep-the-spaces"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"NB: The above HTML refers to the stylesheet [language.css](txt/language.css) (notice that the path is relative to the HTML page, so no need to say txt in the link)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 34,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"\n",
|
|
" <iframe\n",
|
|
" width=\"1024\"\n",
|
|
" height=\"600\"\n",
|
|
" src=\"txt/language.html\"\n",
|
|
" frameborder=\"0\"\n",
|
|
" allowfullscreen\n",
|
|
" ></iframe>\n",
|
|
" "
|
|
],
|
|
"text/plain": [
|
|
"<IPython.lib.display.IFrame at 0x7f0bc93b9668>"
|
|
]
|
|
},
|
|
"execution_count": 34,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"from IPython.display import IFrame\n",
|
|
"IFrame(\"txt/language.html\", width=1024, height=600)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Generating the PDF!\n",
|
|
"\n",
|
|
"Now let's let weasyprint do it's stuff! Write_pdf actually calculates the layout, behaving like a web browser to render the HTML visibly and following the CSS guidelines for page media (notice the special rules in the CSS that weasy print recognizes and uses that the browser does not). Notice that the CSS file gets mentioned again explicitly (and here we need to refer to its path relative to this folder)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 39,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"## If we had not linked the CSS in the HTML, you could specify it in this way\n",
|
|
"# css = CSS(\"txt/language.css\", font_config=font_config)\n",
|
|
"# html.write_pdf('txt/language.pdf', stylesheets=[css], font_config=font_config)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 40,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"html.write_pdf('txt/language.pdf', font_config=font_config)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 41,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"\n",
|
|
" <iframe\n",
|
|
" width=\"1024\"\n",
|
|
" height=\"600\"\n",
|
|
" src=\"txt/language.pdf\"\n",
|
|
" frameborder=\"0\"\n",
|
|
" allowfullscreen\n",
|
|
" ></iframe>\n",
|
|
" "
|
|
],
|
|
"text/plain": [
|
|
"<IPython.lib.display.IFrame at 0x7f0bcbe67630>"
|
|
]
|
|
},
|
|
"execution_count": 41,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"from IPython.display import IFrame\n",
|
|
"IFrame(\"txt/language.pdf\", width=1024, height=600)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.7.3"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 4
|
|
}
|