You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

257 lines
37 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Markdown - HTML - print"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"import pypandoc\n",
"from weasyprint import HTML, CSS\n",
"from weasyprint.fonts import FontConfiguration"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Markdown → HTML"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Pandoc: \"If you need to convert files from one markup format into another, **pandoc is your swiss-army knife**.\"\n",
"\n",
"https://pandoc.org/\n",
"\n",
"The Python library for Pandoc:\n",
"\n",
"https://github.com/bebraw/pypandoc \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Convert a Markdown file to HTML ...\n"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<h1 id=\"language\">Language</h1>\n",
"<h2 id=\"florian-cramer\">Florian Cramer</h2>\n",
"<p>Software and language are <strong>intrinsically</strong> related, since software may process language, and is constructed in language. Yet <em>language</em> means different things in the context of <sub>computing</sub>: formal languages in which algorithms are expressed and software is implemented, and in so-called “natural” spoken languages. There are at least two layers of formal language in software: programming language in which the software is written, and the language implemented within the software as its symbolic controls. In the case of compilers, shells, and macro languages, for example, these layers can overlap. “Natural” language is what can be processed as data by software; since this processing is formal, however, it is restricted to syntactical operations. While differentiation of computer programming languages as “artificial languages” from languages like English as “natural languages” is conceptually important and undisputed, it remains problematic in its pure terminology: There is nothing “natural” about spoken language; it is a cultural construct and thus just as “artificial” as any formal machine control language. To call programming languages “machine languages” doesnt solve the problem either, as it obscures that “machine languages” are human creations. High-level machine-independent programming languages such as Fortran, C, Java, and Basic are not even direct mappings of machine logic. If programming languages are human languages for machine control, they could be called cybernetic languages. But these languages can also be used outside machines—in programming handbooks, for example, in programmers dinner table jokes, or as abstract formal languages for expressing logical constructs, such as in Hugh Kenners use of the Pascal programming language to explain aspects of the structure of Samuel Becketts writing.1 In this sense, computer control languages could be more broadly defined as syntactical languages as opposed to semantic languages. But this terminology is not without its problems either. Common languages like English are both formal and semantic; although their scope extends beyond the formal, anything that can be expressed in a computer control language can also be expressed in common language. It follows that computer control languages are a formal (and as such rather primitive) subset of common human languages. To complicate things even further, computer science has its own understanding of “operational semantics” in programming languages, for example in the construction of a programming language interpreter or compiler. Just as this interpreter doesnt perform “interpretations” in a hermeneutic sense of semantic text explication, the computer science notion of “semantics” defies linguistic and common sense understanding of the word, since compiler construction is purely syntactical, and programming languages denote nothing but syntactical manipulations of symbols. What might more suitably be called the semantics of computer control languages resides in the symbols with which those operations are denoted in most programming languages: English words like “if,” “then,” “else,” “for,” “while,” “goto,” and “print,” in conjunction with arithmetical and punctuation symbols; in alphabetic software controls, words like “list,” “move,” “copy,” and “paste”; in graphical software controls, such as symbols like the trash can. Ferdinand de Saussure states that the signs of common human language are arbitrary2 because its purely a cultural-social convention that assigns phonemes to concepts. Likewise, its purely a cultural convention to assign symbols to machine operations. But just as the cultural choice of phonemes in spoken language is restrained by what the human voice can pronounce, the assignment of symbols to machine operations is limited to what can be efficiently processed by the machine and of good use to humans.3 This compromise between operability and usability is obvious
"<p>Notes</p>\n",
"<ol type=\"1\">\n",
"<li>Hugh Kenner, “Beckett Thinking,” in Hugh Kenner, The Mechanic Muse, 83107.</li>\n",
"<li>Ferdinand de Saussure, Course in General Linguistics, ”Chapter I: Nature of the Linguistic Sign.”</li>\n",
"<li>See the section, “Saussurean Signs and Material Matters,” in N. Katherine Hayles, My Mother Was a Computer, 4245.</li>\n",
"<li>For example, Steve Wozniaks design of the Apple I mainboard was consijdered “a beautiful work of art” in its time according to Steven Levy, Insanely Great: The Life and Times of Macintosh, 81.</li>\n",
"<li>Joseph Weizenbaum, “ELIZA—A Computer Program for the Study of Natural Language Communication between Man and Machine.”</li>\n",
"<li>Marsha Pascual, “Black Monday, Causes and Effects.”</li>\n",
"<li>Among them concrete poetry writers, French Oulipo poets, the German poet Hans Magnus Enzensberger, and the Austrian poets Ferdinand Schmatz and Franz Josef Czernin.</li>\n",
"<li>Jef Raskin, The Humane Interface: New Directions for Designing Interactive Systems.</li>\n",
"<li>According to Nelson Goodmans definition of writing in The Languages of Art, 143.</li>\n",
"<li>Alan Kay, an inventor of the graphical user interface, conceded in 1990 that “it would not be surprising if the visual system were less able in this area than the mechanism that solve noun phrases for natural language. Although it is not fair to say that iconic languages cant work just because no one has been able to design a good one, it is likely that the above explanation is close to truth.” This status quo hasnt changed since. Alan Kay, “User Interface: A Personal View,” in, Brenda Laurel ed. The Art of Human-Computer Interface Design, Reading: Addison Wesley, 1989, 203.</li>\n",
"<li>Swift, Jonathan, Gullivers Travels, Project Gutenberg Ebook, available at http:// www.gutenberg.org / dirs / extext197 / gltrv10.txt / .</li>\n",
"<li>See Wolfgang Hagen, “The Style of Source Codes.”</li>\n",
"</ol>\n",
"\n"
]
}
],
"source": [
"# ... directly from a file\n",
"html = pypandoc.convert_file('language.md', 'html')\n",
"print(html)"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [],
"source": [
"# ... or from a pad\n",
"\n",
"from urllib.request import urlopen\n",
"\n",
"url = 'https://pad.xpub.nl/p/language/export/txt'\n",
"response = urlopen(url)\n",
"md = response.read().decode('UTF-8')\n",
"\n",
"with open('language.md', 'w') as f:\n",
" f.write(md)"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<h1 id=\"language\">Language</h1>\n",
"<h2 id=\"florian-cramer\">Florian Cramer</h2>\n",
"<p>Software and language are <strong>intrinsically</strong> related, since software may process language, and is constructed in language. Yet <em>language</em> means different things in the context of <sub>computing</sub>: formal languages in which algorithms are expressed and software is implemented, and in so-called “natural” spoken languages. There are at least two layers of formal language in software: programming language in which the software is written, and the language implemented within the software as its symbolic controls. In the case of compilers, shells, and macro languages, for example, these layers can overlap. “Natural” language is what can be processed as data by software; since this processing is formal, however, it is restricted to syntactical operations. While differentiation of computer programming languages as “artificial languages” from languages like English as “natural languages” is conceptually important and undisputed, it remains problematic in its pure terminology: There is nothing “natural” about spoken language; it is a cultural construct and thus just as “artificial” as any formal machine control language. To call programming languages “machine languages” doesnt solve the problem either, as it obscures that “machine languages” are human creations. High-level machine-independent programming languages such as Fortran, C, Java, and Basic are not even direct mappings of machine logic. If programming languages are human languages for machine control, they could be called cybernetic languages. But these languages can also be used outside machines—in programming handbooks, for example, in programmers dinner table jokes, or as abstract formal languages for expressing logical constructs, such as in Hugh Kenners use of the Pascal programming language to explain aspects of the structure of Samuel Becketts writing.1 In this sense, computer control languages could be more broadly defined as syntactical languages as opposed to semantic languages. But this terminology is not without its problems either. Common languages like English are both formal and semantic; although their scope extends beyond the formal, anything that can be expressed in a computer control language can also be expressed in common language.</p>\n",
"<p>It follows that computer control languages are a formal (and as such rather primitive) subset of common human languages. To complicate things even further, computer science has its own understanding of “operational semantics” in programming languages, for example in the construction of a programming language interpreter or compiler. Just as this interpreter doesnt perform “interpretations” in a hermeneutic sense of semantic text explication, the computer science notion of “semantics” defies linguistic and common sense understanding of the word, since compiler construction is purely syntactical, and programming languages denote nothing but syntactical manipulations of symbols. What might more suitably be called the semantics of computer control languages resides in the symbols with which those operations are denoted in most programming languages: English words like “if,” “then,” “else,” “for,” “while,” “goto,” and “print,” in conjunction with arithmetical and punctuation symbols; in alphabetic software controls, words like “list,” “move,” “copy,” and “paste”; in graphical software controls, such as symbols like the trash can. Ferdinand de Saussure states that the signs of common human language are arbitrary2 because its purely a cultural-social convention that assigns phonemes to concepts. Likewise, its purely a cultural convention to assign symbols to machine operations. But just as the cultural choice of phonemes in spoken language is restrained by what the human voice can pronounce, the assignment of symbols to machine operations is limited to what can be efficiently processed by the machine and of good use to humans.3 This compromise between operability and usability is obvious in, for example, Unix commands. Originally used on teletype terminals, the operation “copy” was abbreviated to the command “cp,” “move” to “mv,” “list” to “ls,” etc., in order to cut down machine memory use, teletype paper consumption, and human typing effort at the same time. Any computer control language is thus a cultural compromise between the constraints of machine design—which is far from objective, but based on human choices, culture, and thinking style itself 4—and the equally subjective user preferences, involving fuzzy factors like readability, elegance, and usage efficiency. The symbols of computer control languages inevitably do have semantic connotations simply because there exist no symbols with which humans would not associate some meaning. But symbols cant denote any semantic statements, that is, they do not express meaning in their own terms; humans metaphorically read meaning into them through associations they make. Languages without semantic denotation are not historically new phenomena; mathematical formulas are their oldest example. In comparison to common human languages, the multitude of programming languages is of lesser significance. The criterion of Turing completeness of a programming language, that is, that any computation can be expressed in it, means that every programming language is, formally speaking, just a riff on every other programming language. Nothing can be expressed in a Turingcomplete language such as C that couldnt also be expressed in another Turingcomplete language such as Lisp (or Fortran, Smalltalk, Java …) and vice versa. This ultimately proves the importance of human and cultural factors in programming languages: while they are interchangeable in regard to their control of machine functions, their different structures—semantic descriptors, grammar and style in which algorithms can be expressed—lend themselves not only to different problem sets, but also to different styles of thinking. Just as programming languages are a subset of common languages, Turingincomplete computer control languages are a constrained subset of Turingcomplete languages. This prominently includes markup languages (such as HTML), file formats, network protocols, and most user controls (see the entry “Interf
"<p>Notes</p>\n",
"<ol type=\"1\">\n",
"<li>Hugh Kenner, “Beckett Thinking,” in Hugh Kenner, The Mechanic Muse, 83107.</li>\n",
"<li>Ferdinand de Saussure, Course in General Linguistics, ”Chapter I: Nature of the Linguistic Sign.”</li>\n",
"<li>See the section, “Saussurean Signs and Material Matters,” in N. Katherine Hayles, My Mother Was a Computer, 4245.</li>\n",
"<li>For example, Steve Wozniaks design of the Apple I mainboard was consijdered “a beautiful work of art” in its time according to Steven Levy, Insanely Great: The Life and Times of Macintosh, 81.</li>\n",
"<li>Joseph Weizenbaum, “ELIZA—A Computer Program for the Study of Natural Language Communication between Man and Machine.”</li>\n",
"<li>Marsha Pascual, “Black Monday, Causes and Effects.”</li>\n",
"<li>Among them concrete poetry writers, French Oulipo poets, the German poet Hans Magnus Enzensberger, and the Austrian poets Ferdinand Schmatz and Franz Josef Czernin.</li>\n",
"<li>Jef Raskin, The Humane Interface: New Directions for Designing Interactive Systems.</li>\n",
"<li>According to Nelson Goodmans definition of writing in The Languages of Art, 143.</li>\n",
"<li>Alan Kay, an inventor of the graphical user interface, conceded in 1990 that “it would not be surprising if the visual system were less able in this area than the mechanism that solve noun phrases for natural language. Although it is not fair to say that iconic languages cant work just because no one has been able to design a good one, it is likely that the above explanation is close to truth.” This status quo hasnt changed since. Alan Kay, “User Interface: A Personal View,” in, Brenda Laurel ed. The Art of Human-Computer Interface Design, Reading: Addison Wesley, 1989, 203.</li>\n",
"<li>Swift, Jonathan, Gullivers Travels, Project Gutenberg Ebook, available at http:// www.gutenberg.org / dirs / extext197 / gltrv10.txt / .</li>\n",
"<li>See Wolfgang Hagen, “The Style of Source Codes.”</li>\n",
"</ol>\n",
"\n"
]
}
],
"source": [
"html = pypandoc.convert_file('language.md', 'html')\n",
"print(html)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## HTML → PDF"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"for this we can use Weasyprint again"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<weasyprint.HTML object at 0xaeefaeb0>\n"
]
}
],
"source": [
"html = HTML(string=html)\n",
"print(html)"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [],
"source": [
"css = CSS(string='''\n",
"@page{\n",
" size: A4;\n",
" margin: 15mm;\n",
" \n",
" counter-increment: page;\n",
" \n",
" @top-left{\n",
" content: \"hello?\";\n",
" }\n",
" @top-center{\n",
" content: counter(page);\n",
" font-size: 7pt;\n",
" font-family: monospace;\n",
" color: blue;\n",
" }\n",
" @bottom-center{\n",
" content: \"this is the bottom center!\";\n",
" }\n",
" }\n",
" \n",
" body{\n",
" color: magenta;\n",
" }\n",
"''')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It's actually interesting and useful to have a close look at paged media properties in CSS: \n",
"\n",
"https://developer.mozilla.org/en-US/docs/Web/CSS/%40page/size"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [],
"source": [
"html.write_pdf('language.pdf', stylesheets=[css])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}