python-irc-bots/parody-bot.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# \"Beatrix Botter\"\n",
    "\n",
    "[An's original code](https://gitlab.constantvzw.org/death-of-the-authors/1943/-/blob/master/bots/beatrixbotter_parody.py) used the pattern library.. but it's possible to implement the same technique using just nltk. It relies on two key functions from nltk: word_tokenize and pos_tag.\n",
    "\n",
    "### The \"parody algorithm\"\n",
    "\n",
    "The essence of the \"parody algorithm\" is to translate an input text by replacing its words with randomly chosen words from a \"source\" text -- but which have the *same part of speech* according to nltk's pos_tag function. For example consider the first two lines of Peter Rabbit as a source:\n",
    "\n",
    "    Once upon a  time there were four little Rabbits, and their names were--\n",
    "    Flopsy, Mopsy, Cotton-tail, and Peter.\n",
    "\n",
    "    They lived with their Mother in a  sand-bank, underneath the root of a\n",
    "    very big fir-tree.\n",
    "\n",
    "And then consider the input text to transform:\n",
    "\n",
    "    The blue pen is in the top drawer.\n",
    "\n",
    "Applying word tokenization and part of speech tagging to both texts:\n",
    "\n",
    "    Once upon a  time there were four little Rabbits, and their names were--\n",
    "    RB   IN   DT NN   EX    VBD  CD   JJ     NNP    , CC  PRP$  NNS   VBD :\n",
    "    \n",
    "    Flopsy, Mopsy, Cotton-tail, and Peter.\n",
    "    NNP   , NNP  , NNP        , CC  NNP  .\n",
    "\n",
    "    They lived with their Mother in a  sand-bank, underneath the root of a\n",
    "    PRP  VBD   IN   PRP$  NN     IN DT JJ       , IN         DT  NN   IN DT\n",
    "\n",
    "    very big fir-tree.\n",
    "    RB   JJ  NN      .\n",
    " \n",
    " and\n",
    " \n",
    "    The blue pen is  in the top drawer.\n",
    "    DT  JJ   NN  VBZ IN DT  JJ  NN    .\n",
    "\n",
    "TO transform the input text, we consider each word, looking in the source for another word with the same part of speech and replace it. For instance starting with \"The\", the part of speech is \"DT\" (determiner) ... looking in the source text there are the following words also tagged DT: a, a, the, a, The, the. So we pick one at random: a. Next consider the word \"blue\", we search the input for all words tagged \"JJ\" (adjective): little, sand-bank, big. We pick \"little\". When we get to \"is\" (tagged: VBZ), there's no match in the source, so we just keep the original word. Following these rules, we can producing the new text:\n",
    "\n",
    "    a little time is  upon the sand-bank Mother.\n",
    "    DT JJ     NN   VBZ IN   DT  JJ        NN    .\n",
    "   \n",
    "\n",
    "\n",
    "  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Doing parts of speech tagging on a text\n",
    "See: [Chapter 5: Categorizing and Tagging Words](http://www.nltk.org/book_1ed/ch05.html) in the NLTK book"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "import nltk"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "t = \"\"\"The blue pen is in the top drawer.\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "tt = nltk.word_tokenize(t)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "tagged = nltk.pos_tag(tt)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[('The', 'DT'), ('blue', 'JJ'), ('pen', 'NN'), ('is', 'VBZ'), ('in', 'IN'), ('the', 'DT'), ('top', 'JJ'), ('drawer', 'NN'), ('.', '.')]\n"
     ]
    }
   ],
   "source": [
    "print (tagged)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Counting words\n",
    "\n",
    "Recall the following code for counting words in a text. The code creates an empty dictionary called *counts* to store the count of each word. The text is stripped and split to make a list. The for loop then loops over this list assigning each to the variable *word*. The if checks if the word is in the dictionary, and when it's *not* already there, initializes the count to 0. Finally count[word] is incremented."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "text = \"\"\"\n",
    "this is a simple sentence . and this is another sentence .\n",
    "\"\"\"\n",
    "counts = {}\n",
    "for word in text.strip().split():\n",
    "    if word not in counts:\n",
    "        counts[word] = 0\n",
    "    counts[word] += 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'this': 2, 'is': 2, 'a': 1, 'simple': 1, 'sentence': 2, '.': 2, 'and': 1, 'another': 1}\n"
     ]
    }
   ],
   "source": [
    "print (counts)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 1: Create the index\n",
    "A variation on the word counting code, rather than counting each word, use the parts of speech as the key values of the dictionary, and append each word tagged with that tag on a list. *NB: The code assumes there is a file named source.txt with the your source text.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[nltk_data] Downloading package punkt to /home/murtaugh/nltk_data...\n",
      "[nltk_data]   Package punkt is already up-to-date!\n",
      "[nltk_data] Downloading package averaged_perceptron_tagger to\n",
      "[nltk_data]     /home/murtaugh/nltk_data...\n",
      "[nltk_data]   Package averaged_perceptron_tagger is already up-to-\n",
      "[nltk_data]       date!\n"
     ]
    },
    {
     "ename": "FileNotFoundError",
     "evalue": "[Errno 2] No such file or directory: 'source.txt'",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mFileNotFoundError\u001b[0m                         Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-13-eccf574101ac>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      5\u001b[0m \u001b[0mnltk\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdownload\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'averaged_perceptron_tagger'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      6\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 7\u001b[0;31m \u001b[0msource\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mopen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"source.txt\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      8\u001b[0m \u001b[0mtokens\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnltk\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mword_tokenize\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msource\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      9\u001b[0m \u001b[0mpos\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnltk\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpos_tag\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtokens\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: 'source.txt'"
     ]
    }
   ],
   "source": [
    "import nltk\n",
    "\n",
    "nltk.download('punkt')\n",
    "nltk.download('averaged_perceptron_tagger')\n",
    "\n",
    "source = open(\"source.txt\").read()\n",
    "tokens = nltk.word_tokenize(source)\n",
    "pos = nltk.pos_tag(tokens)\n",
    "index = {}\n",
    "for word, tag in pos:\n",
    "    # print (word, \"is\", tag)\n",
    "    if tag not in index:\n",
    "        index[tag] = []\n",
    "    index[tag].append(word)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'RB': ['ONCE',\n",
       "  'very',\n",
       "  'never',\n",
       "  'never',\n",
       "  'not',\n",
       "  'extremely',\n",
       "  'very',\n",
       "  'Presently',\n",
       "  'then',\n",
       "  'again',\n",
       "  'too',\n",
       "  'cautiously',\n",
       "  'not',\n",
       "  'fast',\n",
       "  'Then',\n",
       "  'once',\n",
       "  'up',\n",
       "  'not',\n",
       "  'enough',\n",
       "  'as',\n",
       "  'alone',\n",
       "  'Then',\n",
       "  'not',\n",
       "  'not',\n",
       "  'either',\n",
       "  'upside',\n",
       "  'down',\n",
       "  'inside',\n",
       "  'especially',\n",
       "  'somehow',\n",
       "  'back',\n",
       "  'just',\n",
       "  'suddenly',\n",
       "  'back',\n",
       "  'also',\n",
       "  'not',\n",
       "  'so',\n",
       "  'very',\n",
       "  'very',\n",
       "  'naughty',\n",
       "  'very'],\n",
       " 'IN': ['upon',\n",
       "  'with',\n",
       "  'at',\n",
       "  'because',\n",
       "  'in',\n",
       "  'of',\n",
       "  'for',\n",
       "  'in',\n",
       "  'in',\n",
       "  'in',\n",
       "  'near',\n",
       "  'under',\n",
       "  'for',\n",
       "  'in',\n",
       "  'out',\n",
       "  'that',\n",
       "  'in',\n",
       "  'on',\n",
       "  'under',\n",
       "  'at',\n",
       "  'of',\n",
       "  'across',\n",
       "  'into',\n",
       "  'with',\n",
       "  'upon',\n",
       "  'at',\n",
       "  'with',\n",
       "  'in',\n",
       "  'in',\n",
       "  'at',\n",
       "  'with',\n",
       "  'as',\n",
       "  'at',\n",
       "  'with',\n",
       "  'under',\n",
       "  'in',\n",
       "  'in',\n",
       "  'of',\n",
       "  'with',\n",
       "  'with',\n",
       "  'into',\n",
       "  'for',\n",
       "  'underneath',\n",
       "  'of',\n",
       "  'of',\n",
       "  'As',\n",
       "  'into',\n",
       "  'in',\n",
       "  'at',\n",
       "  'WHILE',\n",
       "  'upon',\n",
       "  'except',\n",
       "  'out',\n",
       "  'of',\n",
       "  'of',\n",
       "  'in',\n",
       "  'out',\n",
       "  'of',\n",
       "  'After',\n",
       "  'out',\n",
       "  'of',\n",
       "  'that',\n",
       "  'in',\n",
       "  'of',\n",
       "  'across',\n",
       "  'into',\n",
       "  'into',\n",
       "  'behind',\n",
       "  'with',\n",
       "  'of',\n",
       "  'upon',\n",
       "  'into',\n",
       "  'of',\n",
       "  'upon',\n",
       "  'against',\n",
       "  'of',\n",
       "  'from',\n",
       "  'under',\n",
       "  'of',\n",
       "  'that',\n",
       "  'like',\n",
       "  'BUT',\n",
       "  'SO',\n",
       "  'of',\n",
       "  'after',\n",
       "  'because',\n",
       "  'for',\n",
       "  'under',\n",
       "  'upon',\n",
       "  'into',\n",
       "  'of',\n",
       "  'of',\n",
       "  'before',\n",
       "  'with'],\n",
       " 'DT': ['a',\n",
       "  'a',\n",
       "  'a',\n",
       "  'a',\n",
       "  'the',\n",
       "  'any',\n",
       "  'the',\n",
       "  'a',\n",
       "  'a',\n",
       "  'a',\n",
       "  'a',\n",
       "  'some',\n",
       "  'the',\n",
       "  'a',\n",
       "  'the',\n",
       "  'no',\n",
       "  'the',\n",
       "  'a',\n",
       "  'a',\n",
       "  'the',\n",
       "  'a',\n",
       "  'the',\n",
       "  'a',\n",
       "  'a',\n",
       "  'A',\n",
       "  'no',\n",
       "  'the',\n",
       "  'the',\n",
       "  'the',\n",
       "  'THE',\n",
       "  'the',\n",
       "  'the',\n",
       "  'the',\n",
       "  'the',\n",
       "  'the',\n",
       "  'a',\n",
       "  'the',\n",
       "  'all',\n",
       "  'the',\n",
       "  'a',\n",
       "  'The',\n",
       "  'a',\n",
       "  'the',\n",
       "  'another',\n",
       "  'the',\n",
       "  'the',\n",
       "  'the',\n",
       "  'a',\n",
       "  'the',\n",
       "  'some',\n",
       "  'every',\n",
       "  'the',\n",
       "  'the',\n",
       "  'the',\n",
       "  'the',\n",
       "  'the',\n",
       "  'the',\n",
       "  'the',\n",
       "  'The',\n",
       "  'all',\n",
       "  'the',\n",
       "  'no',\n",
       "  'the',\n",
       "  'the',\n",
       "  'the',\n",
       "  'the',\n",
       "  'the',\n",
       "  'the',\n",
       "  'the',\n",
       "  'the',\n",
       "  'the',\n",
       "  'the',\n",
       "  'the',\n",
       "  'no',\n",
       "  'the',\n",
       "  'another',\n",
       "  'some',\n",
       "  'the',\n",
       "  'those',\n",
       "  'the',\n",
       "  'the',\n",
       "  'the',\n",
       "  'a',\n",
       "  'the',\n",
       "  'a',\n",
       "  'the',\n",
       "  'the',\n",
       "  'the',\n",
       "  'the',\n",
       "  'a',\n",
       "  'a',\n",
       "  'a',\n",
       "  'The',\n",
       "  'the',\n",
       "  'the',\n",
       "  'the',\n",
       "  'a',\n",
       "  'another',\n",
       "  'a',\n",
       "  'the',\n",
       "  'The',\n",
       "  'the',\n",
       "  'the',\n",
       "  'a',\n",
       "  'the',\n",
       "  'the',\n",
       "  'the',\n",
       "  'neither',\n",
       "  'any',\n",
       "  'THE',\n",
       "  'the',\n",
       "  'the',\n",
       "  'the',\n",
       "  'some',\n",
       "  'some',\n",
       "  'THE',\n",
       "  'the',\n",
       "  'a',\n",
       "  'a',\n",
       "  'the',\n",
       "  'a',\n",
       "  'that',\n",
       "  'the',\n",
       "  'the',\n",
       "  'all',\n",
       "  'a',\n",
       "  'the',\n",
       "  'the',\n",
       "  'every',\n",
       "  'the',\n",
       "  'THE'],\n",
       " 'NN': ['time',\n",
       "  \"doll's-house\",\n",
       "  'brick',\n",
       "  'muslin',\n",
       "  'door',\n",
       "  'chimney',\n",
       "  'cooking',\n",
       "  'dinner',\n",
       "  'ready-made',\n",
       "  'box',\n",
       "  'ham',\n",
       "  'fish',\n",
       "  'pudding',\n",
       "  'morning',\n",
       "  'drive',\n",
       "  \"doll's\",\n",
       "  'perambulator',\n",
       "  'one',\n",
       "  'nursery',\n",
       "  'scuffling',\n",
       "  'noise',\n",
       "  'corner',\n",
       "  'fire-place',\n",
       "  'hole',\n",
       "  'skirting-board',\n",
       "  'head',\n",
       "  'moment',\n",
       "  'mouse',\n",
       "  'wife',\n",
       "  'head',\n",
       "  'one',\n",
       "  'nursery',\n",
       "  'oilcloth',\n",
       "  'coal-box',\n",
       "  'stood',\n",
       "  'side',\n",
       "  'fire-place',\n",
       "  'hearthrug',\n",
       "  'door',\n",
       "  'dining-room',\n",
       "  'joy',\n",
       "  'dinner',\n",
       "  'table',\n",
       "  'convenient',\n",
       "  'ham',\n",
       "  'yellow',\n",
       "  'knife',\n",
       "  'finger',\n",
       "  'mouth',\n",
       "  'try',\n",
       "  'chair',\n",
       "  'ham',\n",
       "  'knife',\n",
       "  'hams',\n",
       "  'cheesemonger',\n",
       "  'ham',\n",
       "  'plate',\n",
       "  'jerk',\n",
       "  'table',\n",
       "  'fish',\n",
       "  'tin',\n",
       "  'spoon',\n",
       "  'turn',\n",
       "  'fish',\n",
       "  'dish',\n",
       "  'temper',\n",
       "  'ham',\n",
       "  'middle',\n",
       "  'floor',\n",
       "  'shovel',\n",
       "  'bang',\n",
       "  'bang',\n",
       "  'smash',\n",
       "  'smash',\n",
       "  'ham',\n",
       "  'paint',\n",
       "  'nothing',\n",
       "  'plaster',\n",
       "  'end',\n",
       "  'rage',\n",
       "  'disappointment',\n",
       "  'pudding',\n",
       "  'fish',\n",
       "  'plate',\n",
       "  'paper',\n",
       "  'fire',\n",
       "  'kitchen',\n",
       "  'kitchen',\n",
       "  'chimney',\n",
       "  'soot',\n",
       "  'chimney',\n",
       "  'disappointment',\n",
       "  'dresser',\n",
       "  'nothing',\n",
       "  'mischief',\n",
       "  'chest',\n",
       "  'bedroom',\n",
       "  'floor',\n",
       "  'window',\n",
       "  'mind',\n",
       "  'bolster',\n",
       "  'want',\n",
       "  'feather',\n",
       "  'bed',\n",
       "  'assistance',\n",
       "  'bolster',\n",
       "  'downstairs',\n",
       "  'hearth-rug',\n",
       "  'bolster',\n",
       "  'chair',\n",
       "  'book-case',\n",
       "  'bird-cage',\n",
       "  'book-case',\n",
       "  'bird-cage',\n",
       "  'coal-box',\n",
       "  'cradle',\n",
       "  'chair',\n",
       "  'noise',\n",
       "  'landing',\n",
       "  'mice',\n",
       "  'hole',\n",
       "  'nursery',\n",
       "  'sight',\n",
       "  'kitchen',\n",
       "  'stove',\n",
       "  'leant',\n",
       "  'kitchen',\n",
       "  'dresser',\n",
       "  'remark',\n",
       "  'book-case',\n",
       "  'bird-cage',\n",
       "  'coal-box',\n",
       "  'cradle',\n",
       "  'girl',\n",
       "  \"doll's-house\",\n",
       "  'doll',\n",
       "  'policeman',\n",
       "  'nurse',\n",
       "  'mouse-trap',\n",
       "  'story',\n",
       "  'everything',\n",
       "  'sixpence',\n",
       "  'hearthrug',\n",
       "  'morning',\n",
       "  'anybody',\n",
       "  'dust-pan',\n",
       "  'broom',\n",
       "  'house',\n",
       "  'END'],\n",
       " 'EX': ['there',\n",
       "  'THERE',\n",
       "  'There',\n",
       "  'there',\n",
       "  'there',\n",
       "  'there',\n",
       "  'There',\n",
       "  'there',\n",
       "  'there',\n",
       "  'there',\n",
       "  'there'],\n",
       " 'VBD': ['was',\n",
       "  'was',\n",
       "  'had',\n",
       "  'belonged',\n",
       "  'called',\n",
       "  'belonged',\n",
       "  'ordered',\n",
       "  'was',\n",
       "  'did',\n",
       "  'had',\n",
       "  'were',\n",
       "  'were',\n",
       "  'had',\n",
       "  'was',\n",
       "  'was',\n",
       "  'was',\n",
       "  'was',\n",
       "  'put',\n",
       "  'popped',\n",
       "  'was',\n",
       "  'put',\n",
       "  'saw',\n",
       "  'was',\n",
       "  'ventured',\n",
       "  'went',\n",
       "  'pushed',\n",
       "  'was',\n",
       "  'went',\n",
       "  'peeped',\n",
       "  'squeaked',\n",
       "  'was',\n",
       "  'were',\n",
       "  'set',\n",
       "  'was',\n",
       "  'crumpled',\n",
       "  'put',\n",
       "  'stood',\n",
       "  'chopped',\n",
       "  'said',\n",
       "  'broke',\n",
       "  'rolled',\n",
       "  'said',\n",
       "  'tried',\n",
       "  'was',\n",
       "  'lost',\n",
       "  'put',\n",
       "  'hit',\n",
       "  'flew',\n",
       "  'was',\n",
       "  'was',\n",
       "  'broke',\n",
       "  'put',\n",
       "  'went',\n",
       "  'looked',\n",
       "  'was',\n",
       "  'was',\n",
       "  'had',\n",
       "  'found',\n",
       "  'turned',\n",
       "  'was',\n",
       "  'set',\n",
       "  'took',\n",
       "  'threw',\n",
       "  'had',\n",
       "  'remembered',\n",
       "  'was',\n",
       "  'carried',\n",
       "  'was',\n",
       "  'managed',\n",
       "  'went',\n",
       "  'fetched',\n",
       "  'refused',\n",
       "  'left',\n",
       "  'went',\n",
       "  'was',\n",
       "  'was',\n",
       "  'rushed',\n",
       "  'came',\n",
       "  'met',\n",
       "  'sat',\n",
       "  'stared',\n",
       "  'made',\n",
       "  'were',\n",
       "  'got',\n",
       "  'belonged',\n",
       "  'said',\n",
       "  'said',\n",
       "  'were',\n",
       "  'paid',\n",
       "  'broke',\n",
       "  'found',\n",
       "  'stuffed'],\n",
       " 'JJ': ['beautiful',\n",
       "  'red',\n",
       "  'white',\n",
       "  'real',\n",
       "  'front',\n",
       "  'full',\n",
       "  'red',\n",
       "  'beautiful',\n",
       "  'quiet',\n",
       "  'little',\n",
       "  \"doll's-house\",\n",
       "  'other',\n",
       "  'front',\n",
       "  'lovely',\n",
       "  'tin',\n",
       "  'lead',\n",
       "  '_so_',\n",
       "  'beautiful',\n",
       "  'shiny',\n",
       "  'red',\n",
       "  'hard',\n",
       "  'lead',\n",
       "  'hard',\n",
       "  'shiny',\n",
       "  'red-hot',\n",
       "  'crinkly',\n",
       "  'top',\n",
       "  'tiny',\n",
       "  'red',\n",
       "  'blue',\n",
       "  'top',\n",
       "  'frugal',\n",
       "  'difficult',\n",
       "  'mouse-hole',\n",
       "  'several',\n",
       "  'small',\n",
       "  'mouse-hole',\n",
       "  'outside',\n",
       "  'upset',\n",
       "  'useful',\n",
       "  'several',\n",
       "  'other',\n",
       "  'little',\n",
       "  'crooked',\n",
       "  'early',\n",
       "  'awake'],\n",
       " ':': [';',\n",
       "  ';',\n",
       "  ';',\n",
       "  ';',\n",
       "  '--',\n",
       "  '--',\n",
       "  ';',\n",
       "  ';',\n",
       "  ';',\n",
       "  ';',\n",
       "  '--',\n",
       "  ';',\n",
       "  '--',\n",
       "  '--',\n",
       "  '--',\n",
       "  '--',\n",
       "  '--',\n",
       "  '--',\n",
       "  ';',\n",
       "  ';',\n",
       "  '--',\n",
       "  '--',\n",
       "  '--',\n",
       "  '--',\n",
       "  '--',\n",
       "  ';',\n",
       "  '--',\n",
       "  '--'],\n",
       " 'PRP': ['it',\n",
       "  'it',\n",
       "  'it',\n",
       "  'she',\n",
       "  'she',\n",
       "  'They',\n",
       "  'they',\n",
       "  'it',\n",
       "  'it',\n",
       "  'she',\n",
       "  'she',\n",
       "  'They',\n",
       "  'it',\n",
       "  'they',\n",
       "  'It',\n",
       "  'him',\n",
       "  'he',\n",
       "  'It',\n",
       "  'it',\n",
       "  'You',\n",
       "  'It',\n",
       "  'it',\n",
       "  'me',\n",
       "  'He',\n",
       "  'it',\n",
       "  'it',\n",
       "  'They',\n",
       "  'they',\n",
       "  'it',\n",
       "  'it',\n",
       "  'She',\n",
       "  'she',\n",
       "  'them',\n",
       "  'they',\n",
       "  'He',\n",
       "  'he',\n",
       "  'them',\n",
       "  'she',\n",
       "  'she',\n",
       "  'herself',\n",
       "  'she',\n",
       "  'It',\n",
       "  'they',\n",
       "  'it',\n",
       "  'them',\n",
       "  'them',\n",
       "  'I',\n",
       "  'I',\n",
       "  'they',\n",
       "  'he',\n",
       "  'He',\n",
       "  'he',\n",
       "  'it'],\n",
       " 'NNS': ['windows',\n",
       "  'curtains',\n",
       "  'meals',\n",
       "  'shavings',\n",
       "  'lobsters',\n",
       "  'pears',\n",
       "  'oranges',\n",
       "  'plates',\n",
       "  'afterwards',\n",
       "  'upstairs',\n",
       "  'spoons',\n",
       "  'knives',\n",
       "  'forks',\n",
       "  'dolly-chairs',\n",
       "  'tongs',\n",
       "  'pieces',\n",
       "  'lobsters',\n",
       "  'pears',\n",
       "  'oranges',\n",
       "  'canisters',\n",
       "  'beads',\n",
       "  'mice',\n",
       "  'clothes',\n",
       "  'drawers',\n",
       "  'feathers',\n",
       "  'odds',\n",
       "  'ends',\n",
       "  'dolls',\n",
       "  'eyes',\n",
       "  'clothes',\n",
       "  'pots',\n",
       "  'pans',\n",
       "  'things',\n",
       "  'stockings'],\n",
       " ',': [',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ',',\n",
       "  ','],\n",
       " 'CC': ['and',\n",
       "  'and',\n",
       "  'and',\n",
       "  'and',\n",
       "  'but',\n",
       "  'but',\n",
       "  'and',\n",
       "  'and',\n",
       "  'and',\n",
       "  'but',\n",
       "  'and',\n",
       "  'and',\n",
       "  'and',\n",
       "  'and',\n",
       "  'and',\n",
       "  'and',\n",
       "  'and',\n",
       "  'and',\n",
       "  'and',\n",
       "  'and',\n",
       "  'and',\n",
       "  'and',\n",
       "  'and',\n",
       "  'and',\n",
       "  'and',\n",
       "  'but',\n",
       "  'and',\n",
       "  'and',\n",
       "  'and',\n",
       "  'but',\n",
       "  'and',\n",
       "  'but',\n",
       "  'and',\n",
       "  'and',\n",
       "  'But',\n",
       "  'and',\n",
       "  'but',\n",
       "  'and',\n",
       "  'and',\n",
       "  'and',\n",
       "  'and',\n",
       "  'and',\n",
       "  'and',\n",
       "  'and',\n",
       "  'and',\n",
       "  'and',\n",
       "  'and',\n",
       "  'but',\n",
       "  'and',\n",
       "  'but',\n",
       "  'and',\n",
       "  'and',\n",
       "  'and',\n",
       "  'but',\n",
       "  'and',\n",
       "  'and',\n",
       "  'and',\n",
       "  'AND',\n",
       "  'and'],\n",
       " '.': ['.',\n",
       "  '.',\n",
       "  '.',\n",
       "  '.',\n",
       "  '.',\n",
       "  '.',\n",
       "  '.',\n",
       "  '.',\n",
       "  '.',\n",
       "  '.',\n",
       "  '.',\n",
       "  '.',\n",
       "  '.',\n",
       "  '.',\n",
       "  '.',\n",
       "  '!',\n",
       "  '!',\n",
       "  '!',\n",
       "  '.',\n",
       "  '.',\n",
       "  '.',\n",
       "  '.',\n",
       "  '.',\n",
       "  '.',\n",
       "  '.',\n",
       "  '.',\n",
       "  '!',\n",
       "  '.',\n",
       "  '.',\n",
       "  '!',\n",
       "  '!',\n",
       "  '.',\n",
       "  '.',\n",
       "  '.',\n",
       "  '.',\n",
       "  '.',\n",
       "  '.',\n",
       "  '!',\n",
       "  '.',\n",
       "  '.',\n",
       "  '.',\n",
       "  '.',\n",
       "  '.',\n",
       "  '.',\n",
       "  '.',\n",
       "  '.',\n",
       "  '.',\n",
       "  '.',\n",
       "  '!',\n",
       "  '.',\n",
       "  '.',\n",
       "  '.',\n",
       "  '!',\n",
       "  '!',\n",
       "  '.',\n",
       "  '.',\n",
       "  '!',\n",
       "  '.'],\n",
       " 'NNP': ['IT',\n",
       "  'Dolls',\n",
       "  'Lucinda',\n",
       "  'Jane',\n",
       "  'Lucinda',\n",
       "  'Jane',\n",
       "  'Cook',\n",
       "  'Lucinda',\n",
       "  'Jane',\n",
       "  'Tom',\n",
       "  'Thumb',\n",
       "  'Tom',\n",
       "  'Thumb',\n",
       "  'MINUTE',\n",
       "  'Hunca',\n",
       "  'Munca',\n",
       "  'Tom',\n",
       "  'Thumb',\n",
       "  'Hunca',\n",
       "  'Munca',\n",
       "  'TOM',\n",
       "  'THUMB',\n",
       "  'Hunca',\n",
       "  'Munca',\n",
       "  'TOM',\n",
       "  'THUMB',\n",
       "  'Hunca',\n",
       "  'Munca',\n",
       "  'HUNCA',\n",
       "  'MUNCA',\n",
       "  'Hunca',\n",
       "  'Munca',\n",
       "  'THE',\n",
       "  'Tom',\n",
       "  'Thumb',\n",
       "  'Hunca',\n",
       "  'Munca',\n",
       "  'HUNCA',\n",
       "  'MUNCA',\n",
       "  'Tom',\n",
       "  'Thumb',\n",
       "  'THEN',\n",
       "  'Tom',\n",
       "  'Thumb',\n",
       "  'Hunca',\n",
       "  'Munca',\n",
       "  'TOM',\n",
       "  'THUMB',\n",
       "  'Tom',\n",
       "  'Thumb',\n",
       "  'Hunca',\n",
       "  'Munca',\n",
       "  'Rice',\n",
       "  'Coffee',\n",
       "  'Sago',\n",
       "  'THEN',\n",
       "  'Tom',\n",
       "  'Thumb',\n",
       "  'Jane',\n",
       "  'Hunca',\n",
       "  'Munca',\n",
       "  'Lucinda',\n",
       "  'WITH',\n",
       "  'Tom',\n",
       "  'Thumb',\n",
       "  'THEN',\n",
       "  'Hunca',\n",
       "  'Munca',\n",
       "  'HUNCA',\n",
       "  'MUNCA',\n",
       "  'HUNCA',\n",
       "  'MUNCA',\n",
       "  'Jane',\n",
       "  'Lucinda',\n",
       "  'Lucinda',\n",
       "  'Jane',\n",
       "  'Hunca',\n",
       "  'Munca',\n",
       "  \"Lucinda's\",\n",
       "  'SHE',\n",
       "  'Bad',\n",
       "  'Mice',\n",
       "  'Tom',\n",
       "  'Thumb',\n",
       "  'Christmas',\n",
       "  'Eve',\n",
       "  'Hunca',\n",
       "  'Munca',\n",
       "  'Lucinda',\n",
       "  'Jane',\n",
       "  'Hunca',\n",
       "  'Munca',\n",
       "  'Dollies'],\n",
       " 'TO': ['to',\n",
       "  'to',\n",
       "  'to',\n",
       "  'to',\n",
       "  'to',\n",
       "  'to',\n",
       "  'to',\n",
       "  'to',\n",
       "  'to',\n",
       "  'to',\n",
       "  'to',\n",
       "  'to',\n",
       "  'to',\n",
       "  'to'],\n",
       " 'CD': ['two', 'two', 'ONE', 'two', 'two', 'one'],\n",
       " 'JJS': ['least'],\n",
       " 'VBN': ['been',\n",
       "  'bought',\n",
       "  'gone',\n",
       "  'laid',\n",
       "  'streaked',\n",
       "  'boiled',\n",
       "  'glued',\n",
       "  'made',\n",
       "  'labelled',\n",
       "  'smiled',\n",
       "  'rescued',\n",
       "  'dressed'],\n",
       " 'MD': ['would', 'would', 'would', 'could', 'will', 'will'],\n",
       " 'VB': ['come',\n",
       "  'work',\n",
       "  'carve',\n",
       "  'hurt',\n",
       "  'Let',\n",
       "  'give',\n",
       "  'come',\n",
       "  'burn',\n",
       "  'work',\n",
       "  'do',\n",
       "  'squeeze',\n",
       "  'go',\n",
       "  'fetch',\n",
       "  'get',\n",
       "  'set',\n",
       "  'sweep'],\n",
       " 'RP': ['off',\n",
       "  'out',\n",
       "  'out',\n",
       "  'out',\n",
       "  'out',\n",
       "  'up',\n",
       "  'off',\n",
       "  'up',\n",
       "  'off',\n",
       "  'up',\n",
       "  'out',\n",
       "  'up'],\n",
       " 'VBG': ['scratching', 'pulling', 'returning', 'talking'],\n",
       " 'WRB': ['where', 'when', 'when', 'when'],\n",
       " 'PRP$': ['his',\n",
       "  'his',\n",
       "  'her',\n",
       "  'his',\n",
       "  'his',\n",
       "  'her',\n",
       "  'his',\n",
       "  'her',\n",
       "  'their',\n",
       "  'her',\n",
       "  'her'],\n",
       " 'PDT': ['Such', 'all', 'half'],\n",
       " '``': ['``', '``', '``', '``'],\n",
       " 'VBZ': ['is', 'is', \"'s\", 'has', 'has', 'is', 'is', 'comes'],\n",
       " 'VBP': ['have'],\n",
       " \"''\": [\"''\", \"''\", \"''\", \"''\", \"''\", \"''\", \"''\", \"''\"],\n",
       " 'POS': [\"'s\", \"'s\", \"'s\", \"'s\", \"'\"],\n",
       " 'WP': ['WHAT']}"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "index"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2: Transform some input using the index\n",
    "Use a *new* list to assemble a new sentence. Use string.join to produce the final text."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdin",
     "output_type": "stream",
     "text": [
      " I have a question: What is the nature of love?\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'It have the plate ; WHAT comes a bang under hams !'"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "i = input()\n",
    "tokens = nltk.word_tokenize(i)\n",
    "pos = nltk.pos_tag(tokens)\n",
    "new = []\n",
    "for word, tag in pos:\n",
    "    # print (word,tag)\n",
    "    # replace word with a random choice from the \"hat\" of words for the tag\n",
    "    if tag not in index:\n",
    "        # print (\"no replacement\")\n",
    "        new.append(word)\n",
    "    else:\n",
    "        newword = choice(index[tag])\n",
    "        new.append(newword)\n",
    "        # print (\"replace with\", newword)\n",
    "print (' '.join(new))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}