diff --git a/parody-bot.ipynb b/parody-bot.ipynb new file mode 100644 index 0000000..d344aec --- /dev/null +++ b/parody-bot.ipynb @@ -0,0 +1,1333 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# \"Beatrix Botter\"\n", + "\n", + "[An's original code](https://gitlab.constantvzw.org/death-of-the-authors/1943/-/blob/master/bots/beatrixbotter_parody.py) used the pattern library.. but it's possible to implement the same technique using just nltk. It relies on two key functions from nltk: word_tokenize and pos_tag.\n", + "\n", + "### The \"parody algorithm\"\n", + "\n", + "The essence of the \"parody algorithm\" is to translate an input text by replacing its words with randomly chosen words from a \"source\" text -- but which have the *same part of speech* according to nltk's pos_tag function. For example consider the first two lines of Peter Rabbit as a source:\n", + "\n", + " Once upon a time there were four little Rabbits, and their names were--\n", + " Flopsy, Mopsy, Cotton-tail, and Peter.\n", + "\n", + " They lived with their Mother in a sand-bank, underneath the root of a\n", + " very big fir-tree.\n", + "\n", + "And then consider the input text to transform:\n", + "\n", + " The blue pen is in the top drawer.\n", + "\n", + "Applying word tokenization and part of speech tagging to both texts:\n", + "\n", + " Once upon a time there were four little Rabbits, and their names were--\n", + " RB IN DT NN EX VBD CD JJ NNP , CC PRP$ NNS VBD :\n", + " \n", + " Flopsy, Mopsy, Cotton-tail, and Peter.\n", + " NNP , NNP , NNP , CC NNP .\n", + "\n", + " They lived with their Mother in a sand-bank, underneath the root of a\n", + " PRP VBD IN PRP$ NN IN DT JJ , IN DT NN IN DT\n", + "\n", + " very big fir-tree.\n", + " RB JJ NN .\n", + " \n", + " and\n", + " \n", + " The blue pen is in the top drawer.\n", + " DT JJ NN VBZ IN DT JJ NN .\n", + "\n", + "TO transform the input text, we consider each word, looking in the source for another word with the same part of speech and replace it. For instance starting with \"The\", the part of speech is \"DT\" (determiner) ... looking in the source text there are the following words also tagged DT: a, a, the, a, The, the. So we pick one at random: a. Next consider the word \"blue\", we search the input for all words tagged \"JJ\" (adjective): little, sand-bank, big. We pick \"little\". When we get to \"is\" (tagged: VBZ), there's no match in the source, so we just keep the original word. Following these rules, we can producing the new text:\n", + "\n", + " a little time is upon the sand-bank Mother.\n", + " DT JJ NN VBZ IN DT JJ NN .\n", + " \n", + "\n", + "\n", + " " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Doing parts of speech tagging on a text\n", + "See: [Chapter 5: Categorizing and Tagging Words](http://www.nltk.org/book_1ed/ch05.html) in the NLTK book" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "import nltk" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "t = \"\"\"The blue pen is in the top drawer.\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "tt = nltk.word_tokenize(t)" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "tagged = nltk.pos_tag(tt)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[('The', 'DT'), ('blue', 'JJ'), ('pen', 'NN'), ('is', 'VBZ'), ('in', 'IN'), ('the', 'DT'), ('top', 'JJ'), ('drawer', 'NN'), ('.', '.')]\n" + ] + } + ], + "source": [ + "print (tagged)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Counting words\n", + "\n", + "Recall the following code for counting words in a text. The code creates an empty dictionary called *counts* to store the count of each word. The text is stripped and split to make a list. The for loop then loops over this list assigning each to the variable *word*. The if checks if the word is in the dictionary, and when it's *not* already there, initializes the count to 0. Finally count[word] is incremented." + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [], + "source": [ + "text = \"\"\"\n", + "this is a simple sentence . and this is another sentence .\n", + "\"\"\"\n", + "counts = {}\n", + "for word in text.strip().split():\n", + " if word not in counts:\n", + " counts[word] = 0\n", + " counts[word] += 1" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{'this': 2, 'is': 2, 'a': 1, 'simple': 1, 'sentence': 2, '.': 2, 'and': 1, 'another': 1}\n" + ] + } + ], + "source": [ + "print (counts)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 1: Create the index\n", + "A variation on the word counting code, rather than counting each word, use the parts of speech as the key values of the dictionary, and append each word tagged with that tag on a list. *NB: The code assumes there is a file named source.txt with the your source text.*" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "[nltk_data] Downloading package punkt to /home/murtaugh/nltk_data...\n", + "[nltk_data] Package punkt is already up-to-date!\n", + "[nltk_data] Downloading package averaged_perceptron_tagger to\n", + "[nltk_data] /home/murtaugh/nltk_data...\n", + "[nltk_data] Package averaged_perceptron_tagger is already up-to-\n", + "[nltk_data] date!\n" + ] + }, + { + "ename": "FileNotFoundError", + "evalue": "[Errno 2] No such file or directory: 'source.txt'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mFileNotFoundError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mnltk\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdownload\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'averaged_perceptron_tagger'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 7\u001b[0;31m \u001b[0msource\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mopen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"source.txt\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 8\u001b[0m \u001b[0mtokens\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnltk\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mword_tokenize\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msource\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 9\u001b[0m \u001b[0mpos\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnltk\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpos_tag\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtokens\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: 'source.txt'" + ] + } + ], + "source": [ + "import nltk\n", + "\n", + "nltk.download('punkt')\n", + "nltk.download('averaged_perceptron_tagger')\n", + "\n", + "source = open(\"source.txt\").read()\n", + "tokens = nltk.word_tokenize(source)\n", + "pos = nltk.pos_tag(tokens)\n", + "index = {}\n", + "for word, tag in pos:\n", + " # print (word, \"is\", tag)\n", + " if tag not in index:\n", + " index[tag] = []\n", + " index[tag].append(word)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "collapsed": true, + "jupyter": { + "outputs_hidden": true + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "{'RB': ['ONCE',\n", + " 'very',\n", + " 'never',\n", + " 'never',\n", + " 'not',\n", + " 'extremely',\n", + " 'very',\n", + " 'Presently',\n", + " 'then',\n", + " 'again',\n", + " 'too',\n", + " 'cautiously',\n", + " 'not',\n", + " 'fast',\n", + " 'Then',\n", + " 'once',\n", + " 'up',\n", + " 'not',\n", + " 'enough',\n", + " 'as',\n", + " 'alone',\n", + " 'Then',\n", + " 'not',\n", + " 'not',\n", + " 'either',\n", + " 'upside',\n", + " 'down',\n", + " 'inside',\n", + " 'especially',\n", + " 'somehow',\n", + " 'back',\n", + " 'just',\n", + " 'suddenly',\n", + " 'back',\n", + " 'also',\n", + " 'not',\n", + " 'so',\n", + " 'very',\n", + " 'very',\n", + " 'naughty',\n", + " 'very'],\n", + " 'IN': ['upon',\n", + " 'with',\n", + " 'at',\n", + " 'because',\n", + " 'in',\n", + " 'of',\n", + " 'for',\n", + " 'in',\n", + " 'in',\n", + " 'in',\n", + " 'near',\n", + " 'under',\n", + " 'for',\n", + " 'in',\n", + " 'out',\n", + " 'that',\n", + " 'in',\n", + " 'on',\n", + " 'under',\n", + " 'at',\n", + " 'of',\n", + " 'across',\n", + " 'into',\n", + " 'with',\n", + " 'upon',\n", + " 'at',\n", + " 'with',\n", + " 'in',\n", + " 'in',\n", + " 'at',\n", + " 'with',\n", + " 'as',\n", + " 'at',\n", + " 'with',\n", + " 'under',\n", + " 'in',\n", + " 'in',\n", + " 'of',\n", + " 'with',\n", + " 'with',\n", + " 'into',\n", + " 'for',\n", + " 'underneath',\n", + " 'of',\n", + " 'of',\n", + " 'As',\n", + " 'into',\n", + " 'in',\n", + " 'at',\n", + " 'WHILE',\n", + " 'upon',\n", + " 'except',\n", + " 'out',\n", + " 'of',\n", + " 'of',\n", + " 'in',\n", + " 'out',\n", + " 'of',\n", + " 'After',\n", + " 'out',\n", + " 'of',\n", + " 'that',\n", + " 'in',\n", + " 'of',\n", + " 'across',\n", + " 'into',\n", + " 'into',\n", + " 'behind',\n", + " 'with',\n", + " 'of',\n", + " 'upon',\n", + " 'into',\n", + " 'of',\n", + " 'upon',\n", + " 'against',\n", + " 'of',\n", + " 'from',\n", + " 'under',\n", + " 'of',\n", + " 'that',\n", + " 'like',\n", + " 'BUT',\n", + " 'SO',\n", + " 'of',\n", + " 'after',\n", + " 'because',\n", + " 'for',\n", + " 'under',\n", + " 'upon',\n", + " 'into',\n", + " 'of',\n", + " 'of',\n", + " 'before',\n", + " 'with'],\n", + " 'DT': ['a',\n", + " 'a',\n", + " 'a',\n", + " 'a',\n", + " 'the',\n", + " 'any',\n", + " 'the',\n", + " 'a',\n", + " 'a',\n", + " 'a',\n", + " 'a',\n", + " 'some',\n", + " 'the',\n", + " 'a',\n", + " 'the',\n", + " 'no',\n", + " 'the',\n", + " 'a',\n", + " 'a',\n", + " 'the',\n", + " 'a',\n", + " 'the',\n", + " 'a',\n", + " 'a',\n", + " 'A',\n", + " 'no',\n", + " 'the',\n", + " 'the',\n", + " 'the',\n", + " 'THE',\n", + " 'the',\n", + " 'the',\n", + " 'the',\n", + " 'the',\n", + " 'the',\n", + " 'a',\n", + " 'the',\n", + " 'all',\n", + " 'the',\n", + " 'a',\n", + " 'The',\n", + " 'a',\n", + " 'the',\n", + " 'another',\n", + " 'the',\n", + " 'the',\n", + " 'the',\n", + " 'a',\n", + " 'the',\n", + " 'some',\n", + " 'every',\n", + " 'the',\n", + " 'the',\n", + " 'the',\n", + " 'the',\n", + " 'the',\n", + " 'the',\n", + " 'the',\n", + " 'The',\n", + " 'all',\n", + " 'the',\n", + " 'no',\n", + " 'the',\n", + " 'the',\n", + " 'the',\n", + " 'the',\n", + " 'the',\n", + " 'the',\n", + " 'the',\n", + " 'the',\n", + " 'the',\n", + " 'the',\n", + " 'the',\n", + " 'no',\n", + " 'the',\n", + " 'another',\n", + " 'some',\n", + " 'the',\n", + " 'those',\n", + " 'the',\n", + " 'the',\n", + " 'the',\n", + " 'a',\n", + " 'the',\n", + " 'a',\n", + " 'the',\n", + " 'the',\n", + " 'the',\n", + " 'the',\n", + " 'a',\n", + " 'a',\n", + " 'a',\n", + " 'The',\n", + " 'the',\n", + " 'the',\n", + " 'the',\n", + " 'a',\n", + " 'another',\n", + " 'a',\n", + " 'the',\n", + " 'The',\n", + " 'the',\n", + " 'the',\n", + " 'a',\n", + " 'the',\n", + " 'the',\n", + " 'the',\n", + " 'neither',\n", + " 'any',\n", + " 'THE',\n", + " 'the',\n", + " 'the',\n", + " 'the',\n", + " 'some',\n", + " 'some',\n", + " 'THE',\n", + " 'the',\n", + " 'a',\n", + " 'a',\n", + " 'the',\n", + " 'a',\n", + " 'that',\n", + " 'the',\n", + " 'the',\n", + " 'all',\n", + " 'a',\n", + " 'the',\n", + " 'the',\n", + " 'every',\n", + " 'the',\n", + " 'THE'],\n", + " 'NN': ['time',\n", + " \"doll's-house\",\n", + " 'brick',\n", + " 'muslin',\n", + " 'door',\n", + " 'chimney',\n", + " 'cooking',\n", + " 'dinner',\n", + " 'ready-made',\n", + " 'box',\n", + " 'ham',\n", + " 'fish',\n", + " 'pudding',\n", + " 'morning',\n", + " 'drive',\n", + " \"doll's\",\n", + " 'perambulator',\n", + " 'one',\n", + " 'nursery',\n", + " 'scuffling',\n", + " 'noise',\n", + " 'corner',\n", + " 'fire-place',\n", + " 'hole',\n", + " 'skirting-board',\n", + " 'head',\n", + " 'moment',\n", + " 'mouse',\n", + " 'wife',\n", + " 'head',\n", + " 'one',\n", + " 'nursery',\n", + " 'oilcloth',\n", + " 'coal-box',\n", + " 'stood',\n", + " 'side',\n", + " 'fire-place',\n", + " 'hearthrug',\n", + " 'door',\n", + " 'dining-room',\n", + " 'joy',\n", + " 'dinner',\n", + " 'table',\n", + " 'convenient',\n", + " 'ham',\n", + " 'yellow',\n", + " 'knife',\n", + " 'finger',\n", + " 'mouth',\n", + " 'try',\n", + " 'chair',\n", + " 'ham',\n", + " 'knife',\n", + " 'hams',\n", + " 'cheesemonger',\n", + " 'ham',\n", + " 'plate',\n", + " 'jerk',\n", + " 'table',\n", + " 'fish',\n", + " 'tin',\n", + " 'spoon',\n", + " 'turn',\n", + " 'fish',\n", + " 'dish',\n", + " 'temper',\n", + " 'ham',\n", + " 'middle',\n", + " 'floor',\n", + " 'shovel',\n", + " 'bang',\n", + " 'bang',\n", + " 'smash',\n", + " 'smash',\n", + " 'ham',\n", + " 'paint',\n", + " 'nothing',\n", + " 'plaster',\n", + " 'end',\n", + " 'rage',\n", + " 'disappointment',\n", + " 'pudding',\n", + " 'fish',\n", + " 'plate',\n", + " 'paper',\n", + " 'fire',\n", + " 'kitchen',\n", + " 'kitchen',\n", + " 'chimney',\n", + " 'soot',\n", + " 'chimney',\n", + " 'disappointment',\n", + " 'dresser',\n", + " 'nothing',\n", + " 'mischief',\n", + " 'chest',\n", + " 'bedroom',\n", + " 'floor',\n", + " 'window',\n", + " 'mind',\n", + " 'bolster',\n", + " 'want',\n", + " 'feather',\n", + " 'bed',\n", + " 'assistance',\n", + " 'bolster',\n", + " 'downstairs',\n", + " 'hearth-rug',\n", + " 'bolster',\n", + " 'chair',\n", + " 'book-case',\n", + " 'bird-cage',\n", + " 'book-case',\n", + " 'bird-cage',\n", + " 'coal-box',\n", + " 'cradle',\n", + " 'chair',\n", + " 'noise',\n", + " 'landing',\n", + " 'mice',\n", + " 'hole',\n", + " 'nursery',\n", + " 'sight',\n", + " 'kitchen',\n", + " 'stove',\n", + " 'leant',\n", + " 'kitchen',\n", + " 'dresser',\n", + " 'remark',\n", + " 'book-case',\n", + " 'bird-cage',\n", + " 'coal-box',\n", + " 'cradle',\n", + " 'girl',\n", + " \"doll's-house\",\n", + " 'doll',\n", + " 'policeman',\n", + " 'nurse',\n", + " 'mouse-trap',\n", + " 'story',\n", + " 'everything',\n", + " 'sixpence',\n", + " 'hearthrug',\n", + " 'morning',\n", + " 'anybody',\n", + " 'dust-pan',\n", + " 'broom',\n", + " 'house',\n", + " 'END'],\n", + " 'EX': ['there',\n", + " 'THERE',\n", + " 'There',\n", + " 'there',\n", + " 'there',\n", + " 'there',\n", + " 'There',\n", + " 'there',\n", + " 'there',\n", + " 'there',\n", + " 'there'],\n", + " 'VBD': ['was',\n", + " 'was',\n", + " 'had',\n", + " 'belonged',\n", + " 'called',\n", + " 'belonged',\n", + " 'ordered',\n", + " 'was',\n", + " 'did',\n", + " 'had',\n", + " 'were',\n", + " 'were',\n", + " 'had',\n", + " 'was',\n", + " 'was',\n", + " 'was',\n", + " 'was',\n", + " 'put',\n", + " 'popped',\n", + " 'was',\n", + " 'put',\n", + " 'saw',\n", + " 'was',\n", + " 'ventured',\n", + " 'went',\n", + " 'pushed',\n", + " 'was',\n", + " 'went',\n", + " 'peeped',\n", + " 'squeaked',\n", + " 'was',\n", + " 'were',\n", + " 'set',\n", + " 'was',\n", + " 'crumpled',\n", + " 'put',\n", + " 'stood',\n", + " 'chopped',\n", + " 'said',\n", + " 'broke',\n", + " 'rolled',\n", + " 'said',\n", + " 'tried',\n", + " 'was',\n", + " 'lost',\n", + " 'put',\n", + " 'hit',\n", + " 'flew',\n", + " 'was',\n", + " 'was',\n", + " 'broke',\n", + " 'put',\n", + " 'went',\n", + " 'looked',\n", + " 'was',\n", + " 'was',\n", + " 'had',\n", + " 'found',\n", + " 'turned',\n", + " 'was',\n", + " 'set',\n", + " 'took',\n", + " 'threw',\n", + " 'had',\n", + " 'remembered',\n", + " 'was',\n", + " 'carried',\n", + " 'was',\n", + " 'managed',\n", + " 'went',\n", + " 'fetched',\n", + " 'refused',\n", + " 'left',\n", + " 'went',\n", + " 'was',\n", + " 'was',\n", + " 'rushed',\n", + " 'came',\n", + " 'met',\n", + " 'sat',\n", + " 'stared',\n", + " 'made',\n", + " 'were',\n", + " 'got',\n", + " 'belonged',\n", + " 'said',\n", + " 'said',\n", + " 'were',\n", + " 'paid',\n", + " 'broke',\n", + " 'found',\n", + " 'stuffed'],\n", + " 'JJ': ['beautiful',\n", + " 'red',\n", + " 'white',\n", + " 'real',\n", + " 'front',\n", + " 'full',\n", + " 'red',\n", + " 'beautiful',\n", + " 'quiet',\n", + " 'little',\n", + " \"doll's-house\",\n", + " 'other',\n", + " 'front',\n", + " 'lovely',\n", + " 'tin',\n", + " 'lead',\n", + " '_so_',\n", + " 'beautiful',\n", + " 'shiny',\n", + " 'red',\n", + " 'hard',\n", + " 'lead',\n", + " 'hard',\n", + " 'shiny',\n", + " 'red-hot',\n", + " 'crinkly',\n", + " 'top',\n", + " 'tiny',\n", + " 'red',\n", + " 'blue',\n", + " 'top',\n", + " 'frugal',\n", + " 'difficult',\n", + " 'mouse-hole',\n", + " 'several',\n", + " 'small',\n", + " 'mouse-hole',\n", + " 'outside',\n", + " 'upset',\n", + " 'useful',\n", + " 'several',\n", + " 'other',\n", + " 'little',\n", + " 'crooked',\n", + " 'early',\n", + " 'awake'],\n", + " ':': [';',\n", + " ';',\n", + " ';',\n", + " ';',\n", + " '--',\n", + " '--',\n", + " ';',\n", + " ';',\n", + " ';',\n", + " ';',\n", + " '--',\n", + " ';',\n", + " '--',\n", + " '--',\n", + " '--',\n", + " '--',\n", + " '--',\n", + " '--',\n", + " ';',\n", + " ';',\n", + " '--',\n", + " '--',\n", + " '--',\n", + " '--',\n", + " '--',\n", + " ';',\n", + " '--',\n", + " '--'],\n", + " 'PRP': ['it',\n", + " 'it',\n", + " 'it',\n", + " 'she',\n", + " 'she',\n", + " 'They',\n", + " 'they',\n", + " 'it',\n", + " 'it',\n", + " 'she',\n", + " 'she',\n", + " 'They',\n", + " 'it',\n", + " 'they',\n", + " 'It',\n", + " 'him',\n", + " 'he',\n", + " 'It',\n", + " 'it',\n", + " 'You',\n", + " 'It',\n", + " 'it',\n", + " 'me',\n", + " 'He',\n", + " 'it',\n", + " 'it',\n", + " 'They',\n", + " 'they',\n", + " 'it',\n", + " 'it',\n", + " 'She',\n", + " 'she',\n", + " 'them',\n", + " 'they',\n", + " 'He',\n", + " 'he',\n", + " 'them',\n", + " 'she',\n", + " 'she',\n", + " 'herself',\n", + " 'she',\n", + " 'It',\n", + " 'they',\n", + " 'it',\n", + " 'them',\n", + " 'them',\n", + " 'I',\n", + " 'I',\n", + " 'they',\n", + " 'he',\n", + " 'He',\n", + " 'he',\n", + " 'it'],\n", + " 'NNS': ['windows',\n", + " 'curtains',\n", + " 'meals',\n", + " 'shavings',\n", + " 'lobsters',\n", + " 'pears',\n", + " 'oranges',\n", + " 'plates',\n", + " 'afterwards',\n", + " 'upstairs',\n", + " 'spoons',\n", + " 'knives',\n", + " 'forks',\n", + " 'dolly-chairs',\n", + " 'tongs',\n", + " 'pieces',\n", + " 'lobsters',\n", + " 'pears',\n", + " 'oranges',\n", + " 'canisters',\n", + " 'beads',\n", + " 'mice',\n", + " 'clothes',\n", + " 'drawers',\n", + " 'feathers',\n", + " 'odds',\n", + " 'ends',\n", + " 'dolls',\n", + " 'eyes',\n", + " 'clothes',\n", + " 'pots',\n", + " 'pans',\n", + " 'things',\n", + " 'stockings'],\n", + " ',': [',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ',',\n", + " ','],\n", + " 'CC': ['and',\n", + " 'and',\n", + " 'and',\n", + " 'and',\n", + " 'but',\n", + " 'but',\n", + " 'and',\n", + " 'and',\n", + " 'and',\n", + " 'but',\n", + " 'and',\n", + " 'and',\n", + " 'and',\n", + " 'and',\n", + " 'and',\n", + " 'and',\n", + " 'and',\n", + " 'and',\n", + " 'and',\n", + " 'and',\n", + " 'and',\n", + " 'and',\n", + " 'and',\n", + " 'and',\n", + " 'and',\n", + " 'but',\n", + " 'and',\n", + " 'and',\n", + " 'and',\n", + " 'but',\n", + " 'and',\n", + " 'but',\n", + " 'and',\n", + " 'and',\n", + " 'But',\n", + " 'and',\n", + " 'but',\n", + " 'and',\n", + " 'and',\n", + " 'and',\n", + " 'and',\n", + " 'and',\n", + " 'and',\n", + " 'and',\n", + " 'and',\n", + " 'and',\n", + " 'and',\n", + " 'but',\n", + " 'and',\n", + " 'but',\n", + " 'and',\n", + " 'and',\n", + " 'and',\n", + " 'but',\n", + " 'and',\n", + " 'and',\n", + " 'and',\n", + " 'AND',\n", + " 'and'],\n", + " '.': ['.',\n", + " '.',\n", + " '.',\n", + " '.',\n", + " '.',\n", + " '.',\n", + " '.',\n", + " '.',\n", + " '.',\n", + " '.',\n", + " '.',\n", + " '.',\n", + " '.',\n", + " '.',\n", + " '.',\n", + " '!',\n", + " '!',\n", + " '!',\n", + " '.',\n", + " '.',\n", + " '.',\n", + " '.',\n", + " '.',\n", + " '.',\n", + " '.',\n", + " '.',\n", + " '!',\n", + " '.',\n", + " '.',\n", + " '!',\n", + " '!',\n", + " '.',\n", + " '.',\n", + " '.',\n", + " '.',\n", + " '.',\n", + " '.',\n", + " '!',\n", + " '.',\n", + " '.',\n", + " '.',\n", + " '.',\n", + " '.',\n", + " '.',\n", + " '.',\n", + " '.',\n", + " '.',\n", + " '.',\n", + " '!',\n", + " '.',\n", + " '.',\n", + " '.',\n", + " '!',\n", + " '!',\n", + " '.',\n", + " '.',\n", + " '!',\n", + " '.'],\n", + " 'NNP': ['IT',\n", + " 'Dolls',\n", + " 'Lucinda',\n", + " 'Jane',\n", + " 'Lucinda',\n", + " 'Jane',\n", + " 'Cook',\n", + " 'Lucinda',\n", + " 'Jane',\n", + " 'Tom',\n", + " 'Thumb',\n", + " 'Tom',\n", + " 'Thumb',\n", + " 'MINUTE',\n", + " 'Hunca',\n", + " 'Munca',\n", + " 'Tom',\n", + " 'Thumb',\n", + " 'Hunca',\n", + " 'Munca',\n", + " 'TOM',\n", + " 'THUMB',\n", + " 'Hunca',\n", + " 'Munca',\n", + " 'TOM',\n", + " 'THUMB',\n", + " 'Hunca',\n", + " 'Munca',\n", + " 'HUNCA',\n", + " 'MUNCA',\n", + " 'Hunca',\n", + " 'Munca',\n", + " 'THE',\n", + " 'Tom',\n", + " 'Thumb',\n", + " 'Hunca',\n", + " 'Munca',\n", + " 'HUNCA',\n", + " 'MUNCA',\n", + " 'Tom',\n", + " 'Thumb',\n", + " 'THEN',\n", + " 'Tom',\n", + " 'Thumb',\n", + " 'Hunca',\n", + " 'Munca',\n", + " 'TOM',\n", + " 'THUMB',\n", + " 'Tom',\n", + " 'Thumb',\n", + " 'Hunca',\n", + " 'Munca',\n", + " 'Rice',\n", + " 'Coffee',\n", + " 'Sago',\n", + " 'THEN',\n", + " 'Tom',\n", + " 'Thumb',\n", + " 'Jane',\n", + " 'Hunca',\n", + " 'Munca',\n", + " 'Lucinda',\n", + " 'WITH',\n", + " 'Tom',\n", + " 'Thumb',\n", + " 'THEN',\n", + " 'Hunca',\n", + " 'Munca',\n", + " 'HUNCA',\n", + " 'MUNCA',\n", + " 'HUNCA',\n", + " 'MUNCA',\n", + " 'Jane',\n", + " 'Lucinda',\n", + " 'Lucinda',\n", + " 'Jane',\n", + " 'Hunca',\n", + " 'Munca',\n", + " \"Lucinda's\",\n", + " 'SHE',\n", + " 'Bad',\n", + " 'Mice',\n", + " 'Tom',\n", + " 'Thumb',\n", + " 'Christmas',\n", + " 'Eve',\n", + " 'Hunca',\n", + " 'Munca',\n", + " 'Lucinda',\n", + " 'Jane',\n", + " 'Hunca',\n", + " 'Munca',\n", + " 'Dollies'],\n", + " 'TO': ['to',\n", + " 'to',\n", + " 'to',\n", + " 'to',\n", + " 'to',\n", + " 'to',\n", + " 'to',\n", + " 'to',\n", + " 'to',\n", + " 'to',\n", + " 'to',\n", + " 'to',\n", + " 'to',\n", + " 'to'],\n", + " 'CD': ['two', 'two', 'ONE', 'two', 'two', 'one'],\n", + " 'JJS': ['least'],\n", + " 'VBN': ['been',\n", + " 'bought',\n", + " 'gone',\n", + " 'laid',\n", + " 'streaked',\n", + " 'boiled',\n", + " 'glued',\n", + " 'made',\n", + " 'labelled',\n", + " 'smiled',\n", + " 'rescued',\n", + " 'dressed'],\n", + " 'MD': ['would', 'would', 'would', 'could', 'will', 'will'],\n", + " 'VB': ['come',\n", + " 'work',\n", + " 'carve',\n", + " 'hurt',\n", + " 'Let',\n", + " 'give',\n", + " 'come',\n", + " 'burn',\n", + " 'work',\n", + " 'do',\n", + " 'squeeze',\n", + " 'go',\n", + " 'fetch',\n", + " 'get',\n", + " 'set',\n", + " 'sweep'],\n", + " 'RP': ['off',\n", + " 'out',\n", + " 'out',\n", + " 'out',\n", + " 'out',\n", + " 'up',\n", + " 'off',\n", + " 'up',\n", + " 'off',\n", + " 'up',\n", + " 'out',\n", + " 'up'],\n", + " 'VBG': ['scratching', 'pulling', 'returning', 'talking'],\n", + " 'WRB': ['where', 'when', 'when', 'when'],\n", + " 'PRP$': ['his',\n", + " 'his',\n", + " 'her',\n", + " 'his',\n", + " 'his',\n", + " 'her',\n", + " 'his',\n", + " 'her',\n", + " 'their',\n", + " 'her',\n", + " 'her'],\n", + " 'PDT': ['Such', 'all', 'half'],\n", + " '``': ['``', '``', '``', '``'],\n", + " 'VBZ': ['is', 'is', \"'s\", 'has', 'has', 'is', 'is', 'comes'],\n", + " 'VBP': ['have'],\n", + " \"''\": [\"''\", \"''\", \"''\", \"''\", \"''\", \"''\", \"''\", \"''\"],\n", + " 'POS': [\"'s\", \"'s\", \"'s\", \"'s\", \"'\"],\n", + " 'WP': ['WHAT']}" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "index" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Step 2: Transform some input using the index\n", + "Use a *new* list to assemble a new sentence. Use string.join to produce the final text." + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "name": "stdin", + "output_type": "stream", + "text": [ + " I have a question: What is the nature of love?\n" + ] + }, + { + "data": { + "text/plain": [ + "'It have the plate ; WHAT comes a bang under hams !'" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "i = input()\n", + "tokens = nltk.word_tokenize(i)\n", + "pos = nltk.pos_tag(tokens)\n", + "new = []\n", + "for word, tag in pos:\n", + " # print (word,tag)\n", + " # replace word with a random choice from the \"hat\" of words for the tag\n", + " if tag not in index:\n", + " # print (\"no replacement\")\n", + " new.append(word)\n", + " else:\n", + " newword = choice(index[tag])\n", + " new.append(newword)\n", + " # print (\"replace with\", newword)\n", + "print (' '.join(new))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.3" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}