'"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from urllib.request import urlopen\n",
"import json\n",
"\n",
"url = \"https://en.wikipedia.org/w/api.php?action=parse&page=Fats_Waller&format=json&formatversion=2\"\n",
"data = json.load(urlopen(url))\n",
"\n",
"# print (data['parse']['text'][:1000])\n",
"data['parse']['text'][:1000]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"For the purposes of an IRC bot, we would like unformatted \"plain\" text. [html5lib](https://html5lib.readthedocs.io/en/latest/) is a the modern python library to parse or read HTML, translating the textual source to a structure called an [ElementTree](https://docs.python.org/3/library/xml.etree.elementtree.html). This can then be [rendered as plain text](https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.tostring), effectively stripping away the HTML markup."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"American jazz pianist and composer\n",
"\n",
"\n",
"Fats WallerWaller in 1938Background informationBirth nameThomas Wright WallerBorn(1904-05-21)May 21, 1904New York City, New York, U.S.DiedDecember 15, 1943(1943-12-15) (aged 39)Kansas City, Missouri, U.S.GenresDixieland, jazz, swing, stride, ragtimeOccupation(s)Musician, composerInstrumentsPiano, vocals, organYears active1918–1943\n",
"Thomas Wright \"Fats\" Waller (May 21, 1904 – December 15, 1943) was an American jazz pianist, organist, composer, violinist, singer, and comedic entertainer.[1] His innovations in the Harlem stride style laid the groundwork for modern jazz piano. His best-known compositions, \"Ain't Misbehavin'\" and \"Honeysuckle Rose\", were inducted into the Grammy Hall of Fame in 1984 and 1999.[2] Waller copyrighted over 400 songs, many of them co-written with his closest collaborator, Andy Razaf. Razaf described his partner as \"the soul of melody... a man who made the piano sing... both big in body and in mind... known for his generosity..\n"
]
}
],
"source": [
"import html5lib\n",
"t = html5lib.parse(data['parse']['text'])\n",
"\n",
"from xml.etree import ElementTree as ET\n",
"text = ET.tostring(t, method=\"text\", encoding=\"unicode\")\n",
"\n",
"print (text[:1000])"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Eventually, this process could make better use of the HTML to avoid certain kinds of non-sentence content (such as figures and tables). In this case, however, I decided simply to do the cleaning (later after the next step) by hand. First, however, I will use the [nltk](http://nltk.org/) library's sentence tokenizer to do some of the work."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"[nltk_data] Downloading package punkt to /home/murtaugh/nltk_data...\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"270 sentences\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"[nltk_data] Package punkt is already up-to-date!\n"
]
}
],
"source": [
"import nltk\n",
"\n",
"nltk.download(\"punkt\")\n",
"sent_tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')\n",
"sentences = sent_tokenizer.tokenize(text)\n",
"\n",
"print (f\"{len(sentences)} sentences\")"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"American jazz pianist and composer\n",
"\n",
"\n",
"Fats WallerWaller in 1938Background informationBirth nameThomas Wright WallerBorn(1904-05-21)May 21, 1904New York City, New York, U.S.DiedDecember 15, 1943(1943-12-15) (aged 39)Kansas City, Missouri, U.S.GenresDixieland, jazz, swing, stride, ragtimeOccupation(s)Musician, composerInstrumentsPiano, vocals, organYears active1918–1943\n",
"Thomas Wright \"Fats\" Waller (May 21, 1904 – December 15, 1943) was an American jazz pianist, organist, composer, violinist, singer, and comedic entertainer.\n",
"---\n",
"[1] His innovations in the Harlem stride style laid the groundwork for modern jazz piano.\n",
"---\n",
"His best-known compositions, \"Ain't Misbehavin'\" and \"Honeysuckle Rose\", were inducted into the Grammy Hall of Fame in 1984 and 1999.\n",
"---\n",
"[2] Waller copyrighted over 400 songs, many of them co-written with his closest collaborator, Andy Razaf.\n",
"---\n",
"Razaf described his partner as \"the soul of melody... a man who made the piano sing... both big in body and in mind... known for his generosity... a bubbling bundle of joy\".\n",
"---\n",
"It's possible he composed many more popular songs and sold them to other performers when times were tough.\n",
"---\n",
"Waller started playing the piano at the age of six, and became a professional organist aged 15.\n",
"---\n",
"By the age of 18 he was a recording artist.\n",
"---\n",
"Waller's first recordings, \"Muscle Shoals Blues\" and \"Birmingham Blues\", were made in October 1922 for Okeh Records.\n",
"---\n",
"[3] That year, he also made his first player piano roll, \"Got to Cool My Doggies Now\".\n",
"---\n"
]
}
],
"source": [
"for s in sentences[:10]:\n",
" print (s)\n",
" print (\"---\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"I then save the sentences, one sentence per line, to a file."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"with open (\"waller_sentences.txt\", \"w\") as f:\n",
" for s in sentences:\n",
" print (s, file=f)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"I then edited this file by hand, removing non-sentences (text from tables of information and things like headers and images). I then saved the resulting file \"[waller_sentences_edited.txt](waller_sentences_edited.txt)\" so as not to lose my edits."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Now, to do the rewriting. When performing the bot myself, I replaced certain words that spoke of Waller in the *third person* with the equivilent in the *first person*. The file contains sentences of the form:\n",
"> Waller was an American jazz pianist, organist, composer, violinist, singer, and comedic entertainer.\n",
">\n",
"> His innovations in the Harlem stride style laid the groundwork for modern jazz piano.\n",
">\n",
"> His best-known compositions, \"Ain't Misbehavin'\" and \"Honeysuckle Rose\", were inducted into the Grammy Hall of Fame in 1984 and 1999.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"One option is to just use the string.replace function:"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Waller was an American jazz pianist, organist, composer, violinist, singer, and comedic entertainer.\n",
"His innovations in tI Harlem stride style laid tI groundwork for modern jazz piano.\n",
"His best-known compositions, \"Ain't Misbehavin'\" and \"Honeysuckle Rose\", were inducted into tI Grammy Hall of Fame in 1984 and 1999.\n",
"Waller copyrighted over 400 songs, many of tIm co-written with his closest collaborator, Andy Razaf.\n",
"It's possible I composed many more popular songs and sold tIm to otIr performers wIn times were tough.\n",
"Waller started playing tI piano at tI age of six, and became a professional organist aged 15.\n",
"By tI age of 18 I was a recording artist.\n",
"Waller's first recordings, \"Muscle Shoals Blues\" and \"Birmingham Blues\", were made in October 1922 for Okeh Records.\n",
"That year, I also made his first player piano roll, \"Got to Cool My Doggies Now\".\n",
"Waller's first publisId composition, \"Squeeze Me\", was publisId in 1924.\n",
"He became one of tI most popular performers of his era, touring intern\n"
]
}
],
"source": [
"text = open(\"waller_sentences_edited.txt\").read()\n",
"# text = text.replace(\"Waller\", \"I\")\n",
"# text = text.replace(\"His\", \"My\")\n",
"# text = text.replace(\"his\", \"my\")\n",
"# text = text.replace(\" he \", \" I \")\n",
"print (text[:1000])"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"With this simple approach, you see glitches appearing with searches for \"he\" matching *inside* words like \"them\", and becoming \"tIm\". You could strategically change the order of substitution and/or think of including spaces in the search and replace. But regular expressions offer another solution, with the use of \"word boundary\" anchors (\\b)."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"I was an American jazz pianist, organist, composer, violinist, singer, and comedic entertainer.\n",
"My innovations in the Harlem stride style laid the groundwork for modern jazz piano.\n",
"My best-known compositions, \"Ain't Misbehavin'\" and \"Honeysuckle Rose\", were inducted into the Grammy Hall of Fame in 1984 and 1999.\n",
"I copyrighted over 400 songs, many of them co-written with my closest collaborator, Andy Razaf.\n",
"It's possible I composed many more popular songs and sold them to other performers when times were tough.\n",
"I started playing the piano at the age of six, and became a professional organist aged 15.\n",
"By the age of 18 I was a recording artist.\n",
"My first recordings, \"Muscle Shoals Blues\" and \"Birmingham Blues\", were made in October 1922 for Okeh Records.\n",
"That year, I also made my first player piano roll, \"Got to Cool My Doggies Now\".\n",
"My first published composition, \"Squeeze Me\", was published in 1924.\n",
"I became one of the most popular performers of my era, touring internationally and achieving critical and commercial success in the United States and Europe.\n",
"I died from pneumonia, aged 39.\n",
"I was the seventh child of 11 (five of whom survived childhood) born to Adeline Locket I, a musician, and Reverend Edward Martin I, a trucker and pastor in New York City.\n",
"I started playing the piano when I was six and graduated to playing the organ at my father's church four years later.\n",
"My mother instructed me in my youth, and I attended other music lessons, paying for them by working in a grocery store.\n",
"I attended DeWitt Clinton High School for one semester, but left school at 15 to work as an organist at the Lincoln Theater in Harlem, where I earned $32 a week.\n",
"Within 12 months I had composed my first rag.\n",
"I was the prize pupil and later the friend and colleague of the stride pianist James P. Johnson.\n",
"My mother died on November 10, 1920 from a stroke due to diabetes.\n",
"My first recordings, \"Muscle Shoals Blues\" and \"Birmingham Blues\", were made in October 1922 for Okeh Records.\n",
"That ye\n"
]
}
],
"source": [
"import re\n",
"\n",
"text = open(\"waller_sentences_edited.txt\").read()\n",
"text = re.sub(r\"\\bWaller's\\b\", \"My\", text)\n",
"text = re.sub(r\"\\bWaller\\b\", \"I\", text)\n",
"text = re.sub(r\"\\bHis\\b\", \"My\", text)\n",
"text = re.sub(r\"\\bhis\\b\", \"my\", text)\n",
"text = re.sub(r\"\\bhe\\b\", \"I\", text)\n",
"text = re.sub(r\"\\bHe\\b\", \"I\", text)\n",
"text = re.sub(r\"\\bhim\\b\", \"me\", text)\n",
"\n",
"print (text[:2000])"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"There are still some glitches related to grammar that go beyond the limits of simple word replacement (such as when to use me or I). But it's good enough to begin, so I save the output in [another file](waller_sentences_first_person.txt)."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"with open(\"waller_sentences_first_person.txt\", \"w\") as f:\n",
" print (text, file=f)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Responding like a search engine -- Whoosh to the rescue, Step 1: Create an index\n",
"\n",
"Now, to make a chat bot based on these sentences! I could roll my own matching algorithm, attempting to find useful overlapping terms from a chat message and the sentences. In many ways and \"infobot\" style bot is precursor of a kind of search engine like Altavista, Ask Jeeves, or finally Google.\n",
"\n",
"Rather than roll my own, I choose to make use of [whoosh](https://pypi.org/project/Whoosh/) a pure python library that is designed specifically to support search engine style applications. It also provides some more refined abstractions that reflect some best practices from the information retrieval and indexing. For instance, I make use of the \"StemmingAnalyzer\" to compare the \"roots\" of words (rather than there exact forms) and a \"stop word\" filter to help avoid matches based on common question words that might occur in chat messages, but which we don't want to use in searching for a matching response."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"from whoosh.index import create_in, open_dir\n",
"from whoosh.fields import *\n",
"from whoosh.analysis import StemmingAnalyzer, LowercaseFilter, StopFilter\n",
"from whoosh import qparser\n",
"from whoosh.highlight import WholeFragmenter, UppercaseFormatter\n",
"import os\n",
"\n",
"\n",
"indexdir = \"index\"\n",
"s = StopFilter()\n",
"stop_words = set(s.stops) | set([\"more\", \"which\", \"get\", \"did\", \"each\", \"that\", \"were\", \"about\", \"tell\", \"my\", \"his\", \"her\", \"after\", \"been\", \"me\", \"i\", \"wa\", \"you\", \"have\", \"there\", \"where\", \"what\", \"why\", \"how\"])\n",
"custom_ana = StemmingAnalyzer(stoplist = stop_words ) # | StopFilter(stoplist = stop_words)\n",
"schema = Schema(\n",
" text=TEXT(stored=True, analyzer=custom_ana),\n",
" years=KEYWORD(stored=True),\n",
" source=ID(stored=True)\n",
")\n",
"\n",
"os.makedirs(indexdir, exist_ok=True)\n",
"ix = create_in(indexdir, schema)\n",
"writer = ix.writer()\n",
"\n",
"with open(\"waller_sentences_first_person.txt\") as f:\n",
" for line in f:\n",
" line = line.strip()\n",
" if line:\n",
" # extract years\n",
" years = \" \".join(re.findall(r\"\\b\\d{4}\\b\", line))\n",
" writer.add_document(text=line, years=years, source=line)\n",
"writer.commit()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Step 2: Query the index\n",
"\n",
"Another useful readymade feature of whoosh, is the ability to parse a free text query to then se to search our index. To make the logic of the search more visible, we use a \"highlighter\" to show which words were matched in (IRC-friendly) uppercase."
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdin",
"output_type": "stream",
"text": [
" father\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"I started playing the piano when I was six and graduated to playing the organ at my FATHER's church four years later.\n"
]
}
],
"source": [
"from random import choice\n",
"ix = open_dir(indexdir)\n",
"parser = qparser.QueryParser(\"text\", schema=ix.schema, group=qparser.OrGroup)\n",
"with ix.searcher() as searcher:\n",
" line = input()\n",
" line = line.rstrip().rstrip(\"?\")\n",
" query = parser.parse(line)\n",
" results = searcher.search(query, terms=True)\n",
" results.fragmenter = WholeFragmenter()\n",
" uf = UppercaseFormatter()\n",
" results.formatter = UppercaseFormatter()\n",
" # could eventually use results[x].score\n",
" if len(results) > 0:\n",
" results = list(results)\n",
" r = choice(results)\n",
" print (r.highlights(\"text\"))\n",
"\n",
" # print (r.get(\"text\").encode(\"utf-8\"))\n",
" # print (r.matched_terms())\n",
" # print (u\", \".join(r.matched_terms()).encode(\"utf-8\"))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"> Q: Tell me about your father.\n",
">\n",
"> A: I started playing the piano when I was six and graduated to playing the organ at my FATHER's church four years later.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Now in IRC Bot form\n",
"\n",
"Now we place the above code in the body of an IRC bot. This code uses the [irc](https://pypi.org/project/irc/) module, and specifically extends the class [SingleServerIRCBot](https://python-irc.readthedocs.io/en/latest/irc.html#irc.bot.SingleServerIRCBot). NB: This code should be saved in it's [own file](botswaller.py), the code is pasted here for convenience, but the use of argparse in a notebook produces an error."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"import irc.bot\n",
"from random import choice\n",
"import whoosh\n",
"from whoosh import qparser\n",
"import whoosh.index\n",
"\n",
"\n",
"class BotsWaller (irc.bot.SingleServerIRCBot):\n",
" def __init__(self, indexdir, channel, nickname, server, port=6667):\n",
" irc.bot.SingleServerIRCBot.__init__(self, [(server, port)], nickname, nickname)\n",
" self.channel = channel\n",
" self.indexdir = indexdir\n",
" self.ix = whoosh.index.open_dir(self.indexdir)\n",
" self.parser = whoosh.qparser.QueryParser(\"text\", schema=self.ix.schema, group=qparser.OrGroup)\n",
"\n",
" def on_welcome(self, c, e):\n",
" c.join(self.channel)\n",
" print (\"join\")\n",
" \n",
" def on_privmsg(self, c, e):\n",
" pass\n",
"\n",
" def on_pubmsg(self, c, e):\n",
" # print e.arguments, e.target, e.source, e.arguments, e.type\n",
" msg = e.arguments[0]\n",
" with self.ix.searcher() as searcher:\n",
" query = self.parser.parse(msg)\n",
" results = searcher.search(query, terms=True)\n",
" results.fragmenter = whoosh.highlight.WholeFragmenter()\n",
" results.formatter = whoosh.highlight.UppercaseFormatter()\n",
" # could eventually use results[x].score as \"confidence\" to respond\n",
" if len(results) > 0:\n",
" results = list(results)\n",
" r = choice(results)\n",
" c.privmsg(self.channel, r.highlights(\"text\"))\n",
"\n",
"if __name__ == \"__main__\":\n",
" import sys, argparse\n",
"\n",
" parser = argparse.ArgumentParser(description='Fats Waller Wikipedia Bot')\n",
" parser.add_argument('--index', default='index', help='path to whoosh index')\n",
" parser.add_argument('--server', default='irc.freenode.net', help='server hostname')\n",
" parser.add_argument('--port', default=6667, type=int, help='server port')\n",
" parser.add_argument('--channel', default='#botopera', help='channel to join')\n",
" parser.add_argument('--nickname', default='BOTSwaller', help='bot nickname')\n",
"\n",
" args = parser.parse_args()\n",
" bot = BotsWaller(args.index, args.channel, args.nickname, args.server, args.port)\n",
" bot.start()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Steps to developing a custom bot\n",
"\n",
"* **Perform the bot(s) speculatively**: Open a new IRC channel, invite some participants and play the role of your *bots* yourselves. You can eventually open multiple windows to play both \"human\" roles and the roles of your bots.\n",
"* **Make use of an existing bot**: For instance Kevin Lenzo's classic [InfoBot](http://www.infobot.org/) implements a sort of mini-language for creating bots. It might be worth experimenting with what results you can get using an already coded bot such as this one.\n",
"* **Explore the histories of algorithms, tools, and techniques**\n",
"* **Translate your speculations and experiences from the previous steps into your own bot**: Make use of IRC libraries like [irc](https://pypi.org/project/irc/) for Python."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Perform the bot(s) speculatively\n",
"\n",
"Open a new IRC channel, invite some participants and play the role of your *bots* yourselves. You can eventually open multiple windows to play both \"human\" roles and the roles of your bots.\n",
"\n",
"Consider exploring artistic traditions of creating \"rule-based\" games, programs that in effect can be implemented by people following a fixed set of rules."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Example: [Oulipo: N+7](https://poets.org/text/brief-guide-oulipo)\n",
"> One of the most popular OULIPO formulas is \"N+7,\" in which the writer takes a poem already in existence and substitutes each of the poem’s substantive nouns with the noun appearing seven nouns away in the dictionary. Care is taken to ensure that the substitution is not just a compound derivative of the original, or shares a similar root, but a wholly different word. Results can vary widely depending on the version of the dictionary one uses."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Examples: Fluxus George Brecht: [Water yam](https://en.wikipedia.org/wiki/Water_Yam_(artist%27s_book))\n",
"![](fluxus_brecht_water_yam.jpg)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Think about sources\n",
"\n",
"Often rules, and algorithms, work by transforming existing data. In the case of the BOTSwaller bot, the source were the sentences of the biographical Wikipedia entry."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Robin van 't Haar: Cityscripts\n",
"Artist and de Kooning instructor [Robin van t' Haar](https://www.cbkrotterdam.nl/2019/03/29/in-memoriam-robin-van-t-haar-1974-2019/) used the city as input, exploring in a photographic practice, ways that the [city \"scripts\"](https://web.archive.org/web/20200805155927/https://cityscripts.com/) its users.\n",
"\n",
"![](vanthaar_zebraanimatie1.gif) ![](vanthaar_camera.jpg) ![](vanthaar_publicaties.jpg)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Robin van 't Haar: Cityscripts\n",
"![](vanthaar_aldi1.png)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Robin van 't Haar: Cityscripts\n",
"![](vanthaar_aldi2.png)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Robin van 't Haar: Cityscripts\n",
"![](vanthaar_publication_files/easycity1.jpg)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Robin van 't Haar: Cityscripts\n",
"![](vanthaar_publication_files/easycity2.jpg)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Robin van 't Haar: Cityscripts\n",
"![](vanthaar_publication_files/easycity3.jpg)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Robin van 't Haar: Cityscripts\n",
"![](vanthaar_publication_files/easycity4.jpg)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Think about algorithms, tools, techniques and their histories\n",
"\n",
"In addition to the artistic traditions, and their techniques, isolate and explore other algorithms, tools, and techniques that may be useful to you. These may come from any number of disciplines, scientific or other. Avoid thinking of these tools as *universal* and *timeless*, but rather explore their histories and the relationship between algorithms as ideas and as implementations.\n",
"\n",
"In the case of BOTSwaller, useful tools were [sentence tokenization](https://www.researchgate.net/publication/220355311_Unsupervised_Multilingual_Sentence_Boundary_Detection) and *parts of speech tagging* as implemented in [nltk](http://nltk.org/), and word stemming and search indexing and querying as implmented in [whoosh](https://www.youtube.com/watch?v=gRvZbYtwTeo)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Kiss & Strunk (punkt)\n",
"\n",
"![](kiss_strunk.png)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"![](kiss_strunk_corpora2.png)\n",
"\n",
"![](kiss_strunk_corpora.png)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Whoosh: Inverted Index for Help Systems\n",
"\n",
"![](whoosh_inverted_index.png)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"![](whoosh_searching.png)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## TODO\n",
"* Develop a \"researchbot\" to aid in your research\n",
"* Install / setup a local IRC server on the sandbox, with...\n",
"* A custom kiwi install; kiwi has [download packages](https://kiwiirc.com/downloads/) to install all the necessary files for a Kiwi client on your own server.\n",
"* Run jupyter notebook locally on your laptop -- and try *this notebook* interactively"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
|