saving the nltk pos-tagging example as separate notebook + pushing latest versions of weasyprint & md-html-pdf notebook

master
manetta 4 years ago
parent c7ee402849
commit fd53543fe2

@ -1,9 +1,9 @@
Language
# Language
Florian Cramer
## Florian Cramer
Software and language are intrinsically related, since software may process language, and is constructed in language.
Yet language means different things in the context of computing: formal languages in which algorithms are expressed and software is implemented, and in so-called “natural” spoken languages.
Software and language are **intrinsically** related, since software may process language, and is constructed in language.
Yet *language* means different things in the context of ~computing~: formal languages in which algorithms are expressed and software is implemented, and in so-called “natural” spoken languages.
There are at least two layers of formal language in software: programming language in which the software is written, and the language implemented within the software as its symbolic controls.
In the case of compilers, shells, and macro languages, for example, these layers can overlap.
“Natural” language is what can be processed as data by software; since this processing is formal, however, it is restricted to syntactical operations.
@ -14,6 +14,7 @@ If programming languages are human languages for machine control, they could be
But these languages can also be used outside machines—in programming handbooks, for example, in programmers dinner table jokes, or as abstract formal languages for expressing logical constructs, such as in Hugh Kenners use of the Pascal programming language to explain aspects of the structure of Samuel Becketts writing.1 In this sense, computer control languages could be more broadly defined as syntactical languages as opposed to semantic languages.
But this terminology is not without its problems either.
Common languages like English are both formal and semantic; although their scope extends beyond the formal, anything that can be expressed in a computer control language can also be expressed in common language.
It follows that computer control languages are a formal (and as such rather primitive) subset of common human languages.
To complicate things even further, computer science has its own understanding of “operational semantics” in programming languages, for example in the construction of a programming language interpreter or compiler.
Just as this interpreter doesnt perform “interpretations” in a hermeneutic sense of semantic text explication, the computer science notion of “semantics” defies linguistic and common sense understanding of the word, since compiler construction is purely syntactical, and programming languages denote nothing but syntactical manipulations of symbols.
@ -86,3 +87,4 @@ Notes
10. Alan Kay, an inventor of the graphical user interface, conceded in 1990 that “it would not be surprising if the visual system were less able in this area than the mechanism that solve noun phrases for natural language. Although it is not fair to say that iconic languages cant work just because no one has been able to design a good one, it is likely that the above explanation is close to truth.” This status quo hasnt changed since. Alan Kay, “User Interface: A Personal View,” in, Brenda Laurel ed. The Art of Human-Computer Interface Design, Reading: Addison Wesley, 1989, 203.
11. Swift, Jonathan, Gullivers Travels, Project Gutenberg Ebook, available at http:// www.gutenberg.org / dirs / extext197 / gltrv10.txt / .
12. See Wolfgang Hagen, “The Style of Source Codes.”

File diff suppressed because one or more lines are too long

@ -0,0 +1,163 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# NLTK pos-tagged HTML → PDF"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import nltk\n",
"from weasyprint import HTML, CSS"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# open the input file\n",
"txt = open('../txt/language.txt').read()\n",
"words = nltk.word_tokenize(txt)\n",
"tagged_words = nltk.pos_tag(words)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# collect all the pieces of HTML\n",
"content = ''\n",
"content += '<h1>Language and Software Studies, by Florian Cramer</h1>'\n",
"\n",
"for word, tag in tagged_words:\n",
" content += f'<span class=\"{ tag }\">{ word }</span> '"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# write the HTML file\n",
"with open(\"language.html\", \"w\") as f:\n",
" f.write(f\"\"\"<!DOCTYPE html>\n",
"<html>\n",
"<head>\n",
" <meta charset=\"utf-8\">\n",
" <link rel=\"stylesheet\" type=\"text/css\" href=\"language.css\">\n",
" <title></title>\n",
"</head>\n",
"<body>\n",
"{ content }\n",
"</body>\n",
"\"\"\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# write a CSS file\n",
"with open(\"language.css\", \"w\") as f:\n",
" f.write(\"\"\"\n",
"\n",
"@page{\n",
" size:A4;\n",
" background-color:lightgrey;\n",
" margin:10mm;\n",
"}\n",
".JJ{\n",
" color:red;\n",
"}\n",
".VB,\n",
".VBG{\n",
" color:magenta;\n",
"}\n",
".NN,\n",
".NNP{\n",
" color:green;\n",
"}\n",
".EX{\n",
" color: blue;\n",
"}\n",
" \"\"\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# If you use @font-face in your stylesheet, you would need Weasyprint's FontConfiguration()\n",
"from weasyprint.fonts import FontConfiguration\n",
"\n",
"font_config = FontConfiguration()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# collect all the files and write the PDF\n",
"html = HTML(\"language.html\")\n",
"css = CSS(\"language.css\")\n",
"html.write_pdf('language.pdf', stylesheets=[css], font_config=font_config)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Preview your PDF in the notebook!\n",
"from IPython.display import IFrame, display\n",
"IFrame(\"language.pdf\", width=900, height=600)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

File diff suppressed because one or more lines are too long
Loading…
Cancel
Save