You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
1334 lines
33 KiB
Plaintext
1334 lines
33 KiB
Plaintext
4 years ago
|
{
|
||
|
"cells": [
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"# \"Beatrix Botter\"\n",
|
||
|
"\n",
|
||
|
"[An's original code](https://gitlab.constantvzw.org/death-of-the-authors/1943/-/blob/master/bots/beatrixbotter_parody.py) used the pattern library.. but it's possible to implement the same technique using just nltk. It relies on two key functions from nltk: word_tokenize and pos_tag.\n",
|
||
|
"\n",
|
||
|
"### The \"parody algorithm\"\n",
|
||
|
"\n",
|
||
|
"The essence of the \"parody algorithm\" is to translate an input text by replacing its words with randomly chosen words from a \"source\" text -- but which have the *same part of speech* according to nltk's pos_tag function. For example consider the first two lines of Peter Rabbit as a source:\n",
|
||
|
"\n",
|
||
|
" Once upon a time there were four little Rabbits, and their names were--\n",
|
||
|
" Flopsy, Mopsy, Cotton-tail, and Peter.\n",
|
||
|
"\n",
|
||
|
" They lived with their Mother in a sand-bank, underneath the root of a\n",
|
||
|
" very big fir-tree.\n",
|
||
|
"\n",
|
||
|
"And then consider the input text to transform:\n",
|
||
|
"\n",
|
||
|
" The blue pen is in the top drawer.\n",
|
||
|
"\n",
|
||
|
"Applying word tokenization and part of speech tagging to both texts:\n",
|
||
|
"\n",
|
||
|
" Once upon a time there were four little Rabbits, and their names were--\n",
|
||
|
" RB IN DT NN EX VBD CD JJ NNP , CC PRP$ NNS VBD :\n",
|
||
|
" \n",
|
||
|
" Flopsy, Mopsy, Cotton-tail, and Peter.\n",
|
||
|
" NNP , NNP , NNP , CC NNP .\n",
|
||
|
"\n",
|
||
|
" They lived with their Mother in a sand-bank, underneath the root of a\n",
|
||
|
" PRP VBD IN PRP$ NN IN DT JJ , IN DT NN IN DT\n",
|
||
|
"\n",
|
||
|
" very big fir-tree.\n",
|
||
|
" RB JJ NN .\n",
|
||
|
" \n",
|
||
|
" and\n",
|
||
|
" \n",
|
||
|
" The blue pen is in the top drawer.\n",
|
||
|
" DT JJ NN VBZ IN DT JJ NN .\n",
|
||
|
"\n",
|
||
|
"TO transform the input text, we consider each word, looking in the source for another word with the same part of speech and replace it. For instance starting with \"The\", the part of speech is \"DT\" (determiner) ... looking in the source text there are the following words also tagged DT: a, a, the, a, The, the. So we pick one at random: a. Next consider the word \"blue\", we search the input for all words tagged \"JJ\" (adjective): little, sand-bank, big. We pick \"little\". When we get to \"is\" (tagged: VBZ), there's no match in the source, so we just keep the original word. Following these rules, we can producing the new text:\n",
|
||
|
"\n",
|
||
|
" a little time is upon the sand-bank Mother.\n",
|
||
|
" DT JJ NN VBZ IN DT JJ NN .\n",
|
||
|
" \n",
|
||
|
"\n",
|
||
|
"\n",
|
||
|
" "
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## Doing parts of speech tagging on a text\n",
|
||
|
"See: [Chapter 5: Categorizing and Tagging Words](http://www.nltk.org/book_1ed/ch05.html) in the NLTK book"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 4,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"import nltk"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 17,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"t = \"\"\"The blue pen is in the top drawer.\"\"\""
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 18,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"tt = nltk.word_tokenize(t)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 19,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"tagged = nltk.pos_tag(tt)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 20,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"[('The', 'DT'), ('blue', 'JJ'), ('pen', 'NN'), ('is', 'VBZ'), ('in', 'IN'), ('the', 'DT'), ('top', 'JJ'), ('drawer', 'NN'), ('.', '.')]\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"print (tagged)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## Counting words\n",
|
||
|
"\n",
|
||
|
"Recall the following code for counting words in a text. The code creates an empty dictionary called *counts* to store the count of each word. The text is stripped and split to make a list. The for loop then loops over this list assigning each to the variable *word*. The if checks if the word is in the dictionary, and when it's *not* already there, initializes the count to 0. Finally count[word] is incremented."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 26,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"text = \"\"\"\n",
|
||
|
"this is a simple sentence . and this is another sentence .\n",
|
||
|
"\"\"\"\n",
|
||
|
"counts = {}\n",
|
||
|
"for word in text.strip().split():\n",
|
||
|
" if word not in counts:\n",
|
||
|
" counts[word] = 0\n",
|
||
|
" counts[word] += 1"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 27,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"{'this': 2, 'is': 2, 'a': 1, 'simple': 1, 'sentence': 2, '.': 2, 'and': 1, 'another': 1}\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"print (counts)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## Step 1: Create the index\n",
|
||
|
"A variation on the word counting code, rather than counting each word, use the parts of speech as the key values of the dictionary, and append each word tagged with that tag on a list. *NB: The code assumes there is a file named source.txt with the your source text.*"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 13,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stderr",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"[nltk_data] Downloading package punkt to /home/murtaugh/nltk_data...\n",
|
||
|
"[nltk_data] Package punkt is already up-to-date!\n",
|
||
|
"[nltk_data] Downloading package averaged_perceptron_tagger to\n",
|
||
|
"[nltk_data] /home/murtaugh/nltk_data...\n",
|
||
|
"[nltk_data] Package averaged_perceptron_tagger is already up-to-\n",
|
||
|
"[nltk_data] date!\n"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"ename": "FileNotFoundError",
|
||
|
"evalue": "[Errno 2] No such file or directory: 'source.txt'",
|
||
|
"output_type": "error",
|
||
|
"traceback": [
|
||
|
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
|
||
|
"\u001b[0;31mFileNotFoundError\u001b[0m Traceback (most recent call last)",
|
||
|
"\u001b[0;32m<ipython-input-13-eccf574101ac>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mnltk\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdownload\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'averaged_perceptron_tagger'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 7\u001b[0;31m \u001b[0msource\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mopen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"source.txt\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 8\u001b[0m \u001b[0mtokens\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnltk\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mword_tokenize\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msource\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 9\u001b[0m \u001b[0mpos\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnltk\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpos_tag\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtokens\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||
|
"\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: 'source.txt'"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"import nltk\n",
|
||
|
"\n",
|
||
|
"nltk.download('punkt')\n",
|
||
|
"nltk.download('averaged_perceptron_tagger')\n",
|
||
|
"\n",
|
||
|
"source = open(\"source.txt\").read()\n",
|
||
|
"tokens = nltk.word_tokenize(source)\n",
|
||
|
"pos = nltk.pos_tag(tokens)\n",
|
||
|
"index = {}\n",
|
||
|
"for word, tag in pos:\n",
|
||
|
" # print (word, \"is\", tag)\n",
|
||
|
" if tag not in index:\n",
|
||
|
" index[tag] = []\n",
|
||
|
" index[tag].append(word)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 13,
|
||
|
"metadata": {
|
||
|
"collapsed": true,
|
||
|
"jupyter": {
|
||
|
"outputs_hidden": true
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"{'RB': ['ONCE',\n",
|
||
|
" 'very',\n",
|
||
|
" 'never',\n",
|
||
|
" 'never',\n",
|
||
|
" 'not',\n",
|
||
|
" 'extremely',\n",
|
||
|
" 'very',\n",
|
||
|
" 'Presently',\n",
|
||
|
" 'then',\n",
|
||
|
" 'again',\n",
|
||
|
" 'too',\n",
|
||
|
" 'cautiously',\n",
|
||
|
" 'not',\n",
|
||
|
" 'fast',\n",
|
||
|
" 'Then',\n",
|
||
|
" 'once',\n",
|
||
|
" 'up',\n",
|
||
|
" 'not',\n",
|
||
|
" 'enough',\n",
|
||
|
" 'as',\n",
|
||
|
" 'alone',\n",
|
||
|
" 'Then',\n",
|
||
|
" 'not',\n",
|
||
|
" 'not',\n",
|
||
|
" 'either',\n",
|
||
|
" 'upside',\n",
|
||
|
" 'down',\n",
|
||
|
" 'inside',\n",
|
||
|
" 'especially',\n",
|
||
|
" 'somehow',\n",
|
||
|
" 'back',\n",
|
||
|
" 'just',\n",
|
||
|
" 'suddenly',\n",
|
||
|
" 'back',\n",
|
||
|
" 'also',\n",
|
||
|
" 'not',\n",
|
||
|
" 'so',\n",
|
||
|
" 'very',\n",
|
||
|
" 'very',\n",
|
||
|
" 'naughty',\n",
|
||
|
" 'very'],\n",
|
||
|
" 'IN': ['upon',\n",
|
||
|
" 'with',\n",
|
||
|
" 'at',\n",
|
||
|
" 'because',\n",
|
||
|
" 'in',\n",
|
||
|
" 'of',\n",
|
||
|
" 'for',\n",
|
||
|
" 'in',\n",
|
||
|
" 'in',\n",
|
||
|
" 'in',\n",
|
||
|
" 'near',\n",
|
||
|
" 'under',\n",
|
||
|
" 'for',\n",
|
||
|
" 'in',\n",
|
||
|
" 'out',\n",
|
||
|
" 'that',\n",
|
||
|
" 'in',\n",
|
||
|
" 'on',\n",
|
||
|
" 'under',\n",
|
||
|
" 'at',\n",
|
||
|
" 'of',\n",
|
||
|
" 'across',\n",
|
||
|
" 'into',\n",
|
||
|
" 'with',\n",
|
||
|
" 'upon',\n",
|
||
|
" 'at',\n",
|
||
|
" 'with',\n",
|
||
|
" 'in',\n",
|
||
|
" 'in',\n",
|
||
|
" 'at',\n",
|
||
|
" 'with',\n",
|
||
|
" 'as',\n",
|
||
|
" 'at',\n",
|
||
|
" 'with',\n",
|
||
|
" 'under',\n",
|
||
|
" 'in',\n",
|
||
|
" 'in',\n",
|
||
|
" 'of',\n",
|
||
|
" 'with',\n",
|
||
|
" 'with',\n",
|
||
|
" 'into',\n",
|
||
|
" 'for',\n",
|
||
|
" 'underneath',\n",
|
||
|
" 'of',\n",
|
||
|
" 'of',\n",
|
||
|
" 'As',\n",
|
||
|
" 'into',\n",
|
||
|
" 'in',\n",
|
||
|
" 'at',\n",
|
||
|
" 'WHILE',\n",
|
||
|
" 'upon',\n",
|
||
|
" 'except',\n",
|
||
|
" 'out',\n",
|
||
|
" 'of',\n",
|
||
|
" 'of',\n",
|
||
|
" 'in',\n",
|
||
|
" 'out',\n",
|
||
|
" 'of',\n",
|
||
|
" 'After',\n",
|
||
|
" 'out',\n",
|
||
|
" 'of',\n",
|
||
|
" 'that',\n",
|
||
|
" 'in',\n",
|
||
|
" 'of',\n",
|
||
|
" 'across',\n",
|
||
|
" 'into',\n",
|
||
|
" 'into',\n",
|
||
|
" 'behind',\n",
|
||
|
" 'with',\n",
|
||
|
" 'of',\n",
|
||
|
" 'upon',\n",
|
||
|
" 'into',\n",
|
||
|
" 'of',\n",
|
||
|
" 'upon',\n",
|
||
|
" 'against',\n",
|
||
|
" 'of',\n",
|
||
|
" 'from',\n",
|
||
|
" 'under',\n",
|
||
|
" 'of',\n",
|
||
|
" 'that',\n",
|
||
|
" 'like',\n",
|
||
|
" 'BUT',\n",
|
||
|
" 'SO',\n",
|
||
|
" 'of',\n",
|
||
|
" 'after',\n",
|
||
|
" 'because',\n",
|
||
|
" 'for',\n",
|
||
|
" 'under',\n",
|
||
|
" 'upon',\n",
|
||
|
" 'into',\n",
|
||
|
" 'of',\n",
|
||
|
" 'of',\n",
|
||
|
" 'before',\n",
|
||
|
" 'with'],\n",
|
||
|
" 'DT': ['a',\n",
|
||
|
" 'a',\n",
|
||
|
" 'a',\n",
|
||
|
" 'a',\n",
|
||
|
" 'the',\n",
|
||
|
" 'any',\n",
|
||
|
" 'the',\n",
|
||
|
" 'a',\n",
|
||
|
" 'a',\n",
|
||
|
" 'a',\n",
|
||
|
" 'a',\n",
|
||
|
" 'some',\n",
|
||
|
" 'the',\n",
|
||
|
" 'a',\n",
|
||
|
" 'the',\n",
|
||
|
" 'no',\n",
|
||
|
" 'the',\n",
|
||
|
" 'a',\n",
|
||
|
" 'a',\n",
|
||
|
" 'the',\n",
|
||
|
" 'a',\n",
|
||
|
" 'the',\n",
|
||
|
" 'a',\n",
|
||
|
" 'a',\n",
|
||
|
" 'A',\n",
|
||
|
" 'no',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'THE',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'a',\n",
|
||
|
" 'the',\n",
|
||
|
" 'all',\n",
|
||
|
" 'the',\n",
|
||
|
" 'a',\n",
|
||
|
" 'The',\n",
|
||
|
" 'a',\n",
|
||
|
" 'the',\n",
|
||
|
" 'another',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'a',\n",
|
||
|
" 'the',\n",
|
||
|
" 'some',\n",
|
||
|
" 'every',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'The',\n",
|
||
|
" 'all',\n",
|
||
|
" 'the',\n",
|
||
|
" 'no',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'no',\n",
|
||
|
" 'the',\n",
|
||
|
" 'another',\n",
|
||
|
" 'some',\n",
|
||
|
" 'the',\n",
|
||
|
" 'those',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'a',\n",
|
||
|
" 'the',\n",
|
||
|
" 'a',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'a',\n",
|
||
|
" 'a',\n",
|
||
|
" 'a',\n",
|
||
|
" 'The',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'a',\n",
|
||
|
" 'another',\n",
|
||
|
" 'a',\n",
|
||
|
" 'the',\n",
|
||
|
" 'The',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'a',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'neither',\n",
|
||
|
" 'any',\n",
|
||
|
" 'THE',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'some',\n",
|
||
|
" 'some',\n",
|
||
|
" 'THE',\n",
|
||
|
" 'the',\n",
|
||
|
" 'a',\n",
|
||
|
" 'a',\n",
|
||
|
" 'the',\n",
|
||
|
" 'a',\n",
|
||
|
" 'that',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'all',\n",
|
||
|
" 'a',\n",
|
||
|
" 'the',\n",
|
||
|
" 'the',\n",
|
||
|
" 'every',\n",
|
||
|
" 'the',\n",
|
||
|
" 'THE'],\n",
|
||
|
" 'NN': ['time',\n",
|
||
|
" \"doll's-house\",\n",
|
||
|
" 'brick',\n",
|
||
|
" 'muslin',\n",
|
||
|
" 'door',\n",
|
||
|
" 'chimney',\n",
|
||
|
" 'cooking',\n",
|
||
|
" 'dinner',\n",
|
||
|
" 'ready-made',\n",
|
||
|
" 'box',\n",
|
||
|
" 'ham',\n",
|
||
|
" 'fish',\n",
|
||
|
" 'pudding',\n",
|
||
|
" 'morning',\n",
|
||
|
" 'drive',\n",
|
||
|
" \"doll's\",\n",
|
||
|
" 'perambulator',\n",
|
||
|
" 'one',\n",
|
||
|
" 'nursery',\n",
|
||
|
" 'scuffling',\n",
|
||
|
" 'noise',\n",
|
||
|
" 'corner',\n",
|
||
|
" 'fire-place',\n",
|
||
|
" 'hole',\n",
|
||
|
" 'skirting-board',\n",
|
||
|
" 'head',\n",
|
||
|
" 'moment',\n",
|
||
|
" 'mouse',\n",
|
||
|
" 'wife',\n",
|
||
|
" 'head',\n",
|
||
|
" 'one',\n",
|
||
|
" 'nursery',\n",
|
||
|
" 'oilcloth',\n",
|
||
|
" 'coal-box',\n",
|
||
|
" 'stood',\n",
|
||
|
" 'side',\n",
|
||
|
" 'fire-place',\n",
|
||
|
" 'hearthrug',\n",
|
||
|
" 'door',\n",
|
||
|
" 'dining-room',\n",
|
||
|
" 'joy',\n",
|
||
|
" 'dinner',\n",
|
||
|
" 'table',\n",
|
||
|
" 'convenient',\n",
|
||
|
" 'ham',\n",
|
||
|
" 'yellow',\n",
|
||
|
" 'knife',\n",
|
||
|
" 'finger',\n",
|
||
|
" 'mouth',\n",
|
||
|
" 'try',\n",
|
||
|
" 'chair',\n",
|
||
|
" 'ham',\n",
|
||
|
" 'knife',\n",
|
||
|
" 'hams',\n",
|
||
|
" 'cheesemonger',\n",
|
||
|
" 'ham',\n",
|
||
|
" 'plate',\n",
|
||
|
" 'jerk',\n",
|
||
|
" 'table',\n",
|
||
|
" 'fish',\n",
|
||
|
" 'tin',\n",
|
||
|
" 'spoon',\n",
|
||
|
" 'turn',\n",
|
||
|
" 'fish',\n",
|
||
|
" 'dish',\n",
|
||
|
" 'temper',\n",
|
||
|
" 'ham',\n",
|
||
|
" 'middle',\n",
|
||
|
" 'floor',\n",
|
||
|
" 'shovel',\n",
|
||
|
" 'bang',\n",
|
||
|
" 'bang',\n",
|
||
|
" 'smash',\n",
|
||
|
" 'smash',\n",
|
||
|
" 'ham',\n",
|
||
|
" 'paint',\n",
|
||
|
" 'nothing',\n",
|
||
|
" 'plaster',\n",
|
||
|
" 'end',\n",
|
||
|
" 'rage',\n",
|
||
|
" 'disappointment',\n",
|
||
|
" 'pudding',\n",
|
||
|
" 'fish',\n",
|
||
|
" 'plate',\n",
|
||
|
" 'paper',\n",
|
||
|
" 'fire',\n",
|
||
|
" 'kitchen',\n",
|
||
|
" 'kitchen',\n",
|
||
|
" 'chimney',\n",
|
||
|
" 'soot',\n",
|
||
|
" 'chimney',\n",
|
||
|
" 'disappointment',\n",
|
||
|
" 'dresser',\n",
|
||
|
" 'nothing',\n",
|
||
|
" 'mischief',\n",
|
||
|
" 'chest',\n",
|
||
|
" 'bedroom',\n",
|
||
|
" 'floor',\n",
|
||
|
" 'window',\n",
|
||
|
" 'mind',\n",
|
||
|
" 'bolster',\n",
|
||
|
" 'want',\n",
|
||
|
" 'feather',\n",
|
||
|
" 'bed',\n",
|
||
|
" 'assistance',\n",
|
||
|
" 'bolster',\n",
|
||
|
" 'downstairs',\n",
|
||
|
" 'hearth-rug',\n",
|
||
|
" 'bolster',\n",
|
||
|
" 'chair',\n",
|
||
|
" 'book-case',\n",
|
||
|
" 'bird-cage',\n",
|
||
|
" 'book-case',\n",
|
||
|
" 'bird-cage',\n",
|
||
|
" 'coal-box',\n",
|
||
|
" 'cradle',\n",
|
||
|
" 'chair',\n",
|
||
|
" 'noise',\n",
|
||
|
" 'landing',\n",
|
||
|
" 'mice',\n",
|
||
|
" 'hole',\n",
|
||
|
" 'nursery',\n",
|
||
|
" 'sight',\n",
|
||
|
" 'kitchen',\n",
|
||
|
" 'stove',\n",
|
||
|
" 'leant',\n",
|
||
|
" 'kitchen',\n",
|
||
|
" 'dresser',\n",
|
||
|
" 'remark',\n",
|
||
|
" 'book-case',\n",
|
||
|
" 'bird-cage',\n",
|
||
|
" 'coal-box',\n",
|
||
|
" 'cradle',\n",
|
||
|
" 'girl',\n",
|
||
|
" \"doll's-house\",\n",
|
||
|
" 'doll',\n",
|
||
|
" 'policeman',\n",
|
||
|
" 'nurse',\n",
|
||
|
" 'mouse-trap',\n",
|
||
|
" 'story',\n",
|
||
|
" 'everything',\n",
|
||
|
" 'sixpence',\n",
|
||
|
" 'hearthrug',\n",
|
||
|
" 'morning',\n",
|
||
|
" 'anybody',\n",
|
||
|
" 'dust-pan',\n",
|
||
|
" 'broom',\n",
|
||
|
" 'house',\n",
|
||
|
" 'END'],\n",
|
||
|
" 'EX': ['there',\n",
|
||
|
" 'THERE',\n",
|
||
|
" 'There',\n",
|
||
|
" 'there',\n",
|
||
|
" 'there',\n",
|
||
|
" 'there',\n",
|
||
|
" 'There',\n",
|
||
|
" 'there',\n",
|
||
|
" 'there',\n",
|
||
|
" 'there',\n",
|
||
|
" 'there'],\n",
|
||
|
" 'VBD': ['was',\n",
|
||
|
" 'was',\n",
|
||
|
" 'had',\n",
|
||
|
" 'belonged',\n",
|
||
|
" 'called',\n",
|
||
|
" 'belonged',\n",
|
||
|
" 'ordered',\n",
|
||
|
" 'was',\n",
|
||
|
" 'did',\n",
|
||
|
" 'had',\n",
|
||
|
" 'were',\n",
|
||
|
" 'were',\n",
|
||
|
" 'had',\n",
|
||
|
" 'was',\n",
|
||
|
" 'was',\n",
|
||
|
" 'was',\n",
|
||
|
" 'was',\n",
|
||
|
" 'put',\n",
|
||
|
" 'popped',\n",
|
||
|
" 'was',\n",
|
||
|
" 'put',\n",
|
||
|
" 'saw',\n",
|
||
|
" 'was',\n",
|
||
|
" 'ventured',\n",
|
||
|
" 'went',\n",
|
||
|
" 'pushed',\n",
|
||
|
" 'was',\n",
|
||
|
" 'went',\n",
|
||
|
" 'peeped',\n",
|
||
|
" 'squeaked',\n",
|
||
|
" 'was',\n",
|
||
|
" 'were',\n",
|
||
|
" 'set',\n",
|
||
|
" 'was',\n",
|
||
|
" 'crumpled',\n",
|
||
|
" 'put',\n",
|
||
|
" 'stood',\n",
|
||
|
" 'chopped',\n",
|
||
|
" 'said',\n",
|
||
|
" 'broke',\n",
|
||
|
" 'rolled',\n",
|
||
|
" 'said',\n",
|
||
|
" 'tried',\n",
|
||
|
" 'was',\n",
|
||
|
" 'lost',\n",
|
||
|
" 'put',\n",
|
||
|
" 'hit',\n",
|
||
|
" 'flew',\n",
|
||
|
" 'was',\n",
|
||
|
" 'was',\n",
|
||
|
" 'broke',\n",
|
||
|
" 'put',\n",
|
||
|
" 'went',\n",
|
||
|
" 'looked',\n",
|
||
|
" 'was',\n",
|
||
|
" 'was',\n",
|
||
|
" 'had',\n",
|
||
|
" 'found',\n",
|
||
|
" 'turned',\n",
|
||
|
" 'was',\n",
|
||
|
" 'set',\n",
|
||
|
" 'took',\n",
|
||
|
" 'threw',\n",
|
||
|
" 'had',\n",
|
||
|
" 'remembered',\n",
|
||
|
" 'was',\n",
|
||
|
" 'carried',\n",
|
||
|
" 'was',\n",
|
||
|
" 'managed',\n",
|
||
|
" 'went',\n",
|
||
|
" 'fetched',\n",
|
||
|
" 'refused',\n",
|
||
|
" 'left',\n",
|
||
|
" 'went',\n",
|
||
|
" 'was',\n",
|
||
|
" 'was',\n",
|
||
|
" 'rushed',\n",
|
||
|
" 'came',\n",
|
||
|
" 'met',\n",
|
||
|
" 'sat',\n",
|
||
|
" 'stared',\n",
|
||
|
" 'made',\n",
|
||
|
" 'were',\n",
|
||
|
" 'got',\n",
|
||
|
" 'belonged',\n",
|
||
|
" 'said',\n",
|
||
|
" 'said',\n",
|
||
|
" 'were',\n",
|
||
|
" 'paid',\n",
|
||
|
" 'broke',\n",
|
||
|
" 'found',\n",
|
||
|
" 'stuffed'],\n",
|
||
|
" 'JJ': ['beautiful',\n",
|
||
|
" 'red',\n",
|
||
|
" 'white',\n",
|
||
|
" 'real',\n",
|
||
|
" 'front',\n",
|
||
|
" 'full',\n",
|
||
|
" 'red',\n",
|
||
|
" 'beautiful',\n",
|
||
|
" 'quiet',\n",
|
||
|
" 'little',\n",
|
||
|
" \"doll's-house\",\n",
|
||
|
" 'other',\n",
|
||
|
" 'front',\n",
|
||
|
" 'lovely',\n",
|
||
|
" 'tin',\n",
|
||
|
" 'lead',\n",
|
||
|
" '_so_',\n",
|
||
|
" 'beautiful',\n",
|
||
|
" 'shiny',\n",
|
||
|
" 'red',\n",
|
||
|
" 'hard',\n",
|
||
|
" 'lead',\n",
|
||
|
" 'hard',\n",
|
||
|
" 'shiny',\n",
|
||
|
" 'red-hot',\n",
|
||
|
" 'crinkly',\n",
|
||
|
" 'top',\n",
|
||
|
" 'tiny',\n",
|
||
|
" 'red',\n",
|
||
|
" 'blue',\n",
|
||
|
" 'top',\n",
|
||
|
" 'frugal',\n",
|
||
|
" 'difficult',\n",
|
||
|
" 'mouse-hole',\n",
|
||
|
" 'several',\n",
|
||
|
" 'small',\n",
|
||
|
" 'mouse-hole',\n",
|
||
|
" 'outside',\n",
|
||
|
" 'upset',\n",
|
||
|
" 'useful',\n",
|
||
|
" 'several',\n",
|
||
|
" 'other',\n",
|
||
|
" 'little',\n",
|
||
|
" 'crooked',\n",
|
||
|
" 'early',\n",
|
||
|
" 'awake'],\n",
|
||
|
" ':': [';',\n",
|
||
|
" ';',\n",
|
||
|
" ';',\n",
|
||
|
" ';',\n",
|
||
|
" '--',\n",
|
||
|
" '--',\n",
|
||
|
" ';',\n",
|
||
|
" ';',\n",
|
||
|
" ';',\n",
|
||
|
" ';',\n",
|
||
|
" '--',\n",
|
||
|
" ';',\n",
|
||
|
" '--',\n",
|
||
|
" '--',\n",
|
||
|
" '--',\n",
|
||
|
" '--',\n",
|
||
|
" '--',\n",
|
||
|
" '--',\n",
|
||
|
" ';',\n",
|
||
|
" ';',\n",
|
||
|
" '--',\n",
|
||
|
" '--',\n",
|
||
|
" '--',\n",
|
||
|
" '--',\n",
|
||
|
" '--',\n",
|
||
|
" ';',\n",
|
||
|
" '--',\n",
|
||
|
" '--'],\n",
|
||
|
" 'PRP': ['it',\n",
|
||
|
" 'it',\n",
|
||
|
" 'it',\n",
|
||
|
" 'she',\n",
|
||
|
" 'she',\n",
|
||
|
" 'They',\n",
|
||
|
" 'they',\n",
|
||
|
" 'it',\n",
|
||
|
" 'it',\n",
|
||
|
" 'she',\n",
|
||
|
" 'she',\n",
|
||
|
" 'They',\n",
|
||
|
" 'it',\n",
|
||
|
" 'they',\n",
|
||
|
" 'It',\n",
|
||
|
" 'him',\n",
|
||
|
" 'he',\n",
|
||
|
" 'It',\n",
|
||
|
" 'it',\n",
|
||
|
" 'You',\n",
|
||
|
" 'It',\n",
|
||
|
" 'it',\n",
|
||
|
" 'me',\n",
|
||
|
" 'He',\n",
|
||
|
" 'it',\n",
|
||
|
" 'it',\n",
|
||
|
" 'They',\n",
|
||
|
" 'they',\n",
|
||
|
" 'it',\n",
|
||
|
" 'it',\n",
|
||
|
" 'She',\n",
|
||
|
" 'she',\n",
|
||
|
" 'them',\n",
|
||
|
" 'they',\n",
|
||
|
" 'He',\n",
|
||
|
" 'he',\n",
|
||
|
" 'them',\n",
|
||
|
" 'she',\n",
|
||
|
" 'she',\n",
|
||
|
" 'herself',\n",
|
||
|
" 'she',\n",
|
||
|
" 'It',\n",
|
||
|
" 'they',\n",
|
||
|
" 'it',\n",
|
||
|
" 'them',\n",
|
||
|
" 'them',\n",
|
||
|
" 'I',\n",
|
||
|
" 'I',\n",
|
||
|
" 'they',\n",
|
||
|
" 'he',\n",
|
||
|
" 'He',\n",
|
||
|
" 'he',\n",
|
||
|
" 'it'],\n",
|
||
|
" 'NNS': ['windows',\n",
|
||
|
" 'curtains',\n",
|
||
|
" 'meals',\n",
|
||
|
" 'shavings',\n",
|
||
|
" 'lobsters',\n",
|
||
|
" 'pears',\n",
|
||
|
" 'oranges',\n",
|
||
|
" 'plates',\n",
|
||
|
" 'afterwards',\n",
|
||
|
" 'upstairs',\n",
|
||
|
" 'spoons',\n",
|
||
|
" 'knives',\n",
|
||
|
" 'forks',\n",
|
||
|
" 'dolly-chairs',\n",
|
||
|
" 'tongs',\n",
|
||
|
" 'pieces',\n",
|
||
|
" 'lobsters',\n",
|
||
|
" 'pears',\n",
|
||
|
" 'oranges',\n",
|
||
|
" 'canisters',\n",
|
||
|
" 'beads',\n",
|
||
|
" 'mice',\n",
|
||
|
" 'clothes',\n",
|
||
|
" 'drawers',\n",
|
||
|
" 'feathers',\n",
|
||
|
" 'odds',\n",
|
||
|
" 'ends',\n",
|
||
|
" 'dolls',\n",
|
||
|
" 'eyes',\n",
|
||
|
" 'clothes',\n",
|
||
|
" 'pots',\n",
|
||
|
" 'pans',\n",
|
||
|
" 'things',\n",
|
||
|
" 'stockings'],\n",
|
||
|
" ',': [',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ',',\n",
|
||
|
" ','],\n",
|
||
|
" 'CC': ['and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'but',\n",
|
||
|
" 'but',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'but',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'but',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'but',\n",
|
||
|
" 'and',\n",
|
||
|
" 'but',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'But',\n",
|
||
|
" 'and',\n",
|
||
|
" 'but',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'but',\n",
|
||
|
" 'and',\n",
|
||
|
" 'but',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'but',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'and',\n",
|
||
|
" 'AND',\n",
|
||
|
" 'and'],\n",
|
||
|
" '.': ['.',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '!',\n",
|
||
|
" '!',\n",
|
||
|
" '!',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '!',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '!',\n",
|
||
|
" '!',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '!',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '!',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '!',\n",
|
||
|
" '!',\n",
|
||
|
" '.',\n",
|
||
|
" '.',\n",
|
||
|
" '!',\n",
|
||
|
" '.'],\n",
|
||
|
" 'NNP': ['IT',\n",
|
||
|
" 'Dolls',\n",
|
||
|
" 'Lucinda',\n",
|
||
|
" 'Jane',\n",
|
||
|
" 'Lucinda',\n",
|
||
|
" 'Jane',\n",
|
||
|
" 'Cook',\n",
|
||
|
" 'Lucinda',\n",
|
||
|
" 'Jane',\n",
|
||
|
" 'Tom',\n",
|
||
|
" 'Thumb',\n",
|
||
|
" 'Tom',\n",
|
||
|
" 'Thumb',\n",
|
||
|
" 'MINUTE',\n",
|
||
|
" 'Hunca',\n",
|
||
|
" 'Munca',\n",
|
||
|
" 'Tom',\n",
|
||
|
" 'Thumb',\n",
|
||
|
" 'Hunca',\n",
|
||
|
" 'Munca',\n",
|
||
|
" 'TOM',\n",
|
||
|
" 'THUMB',\n",
|
||
|
" 'Hunca',\n",
|
||
|
" 'Munca',\n",
|
||
|
" 'TOM',\n",
|
||
|
" 'THUMB',\n",
|
||
|
" 'Hunca',\n",
|
||
|
" 'Munca',\n",
|
||
|
" 'HUNCA',\n",
|
||
|
" 'MUNCA',\n",
|
||
|
" 'Hunca',\n",
|
||
|
" 'Munca',\n",
|
||
|
" 'THE',\n",
|
||
|
" 'Tom',\n",
|
||
|
" 'Thumb',\n",
|
||
|
" 'Hunca',\n",
|
||
|
" 'Munca',\n",
|
||
|
" 'HUNCA',\n",
|
||
|
" 'MUNCA',\n",
|
||
|
" 'Tom',\n",
|
||
|
" 'Thumb',\n",
|
||
|
" 'THEN',\n",
|
||
|
" 'Tom',\n",
|
||
|
" 'Thumb',\n",
|
||
|
" 'Hunca',\n",
|
||
|
" 'Munca',\n",
|
||
|
" 'TOM',\n",
|
||
|
" 'THUMB',\n",
|
||
|
" 'Tom',\n",
|
||
|
" 'Thumb',\n",
|
||
|
" 'Hunca',\n",
|
||
|
" 'Munca',\n",
|
||
|
" 'Rice',\n",
|
||
|
" 'Coffee',\n",
|
||
|
" 'Sago',\n",
|
||
|
" 'THEN',\n",
|
||
|
" 'Tom',\n",
|
||
|
" 'Thumb',\n",
|
||
|
" 'Jane',\n",
|
||
|
" 'Hunca',\n",
|
||
|
" 'Munca',\n",
|
||
|
" 'Lucinda',\n",
|
||
|
" 'WITH',\n",
|
||
|
" 'Tom',\n",
|
||
|
" 'Thumb',\n",
|
||
|
" 'THEN',\n",
|
||
|
" 'Hunca',\n",
|
||
|
" 'Munca',\n",
|
||
|
" 'HUNCA',\n",
|
||
|
" 'MUNCA',\n",
|
||
|
" 'HUNCA',\n",
|
||
|
" 'MUNCA',\n",
|
||
|
" 'Jane',\n",
|
||
|
" 'Lucinda',\n",
|
||
|
" 'Lucinda',\n",
|
||
|
" 'Jane',\n",
|
||
|
" 'Hunca',\n",
|
||
|
" 'Munca',\n",
|
||
|
" \"Lucinda's\",\n",
|
||
|
" 'SHE',\n",
|
||
|
" 'Bad',\n",
|
||
|
" 'Mice',\n",
|
||
|
" 'Tom',\n",
|
||
|
" 'Thumb',\n",
|
||
|
" 'Christmas',\n",
|
||
|
" 'Eve',\n",
|
||
|
" 'Hunca',\n",
|
||
|
" 'Munca',\n",
|
||
|
" 'Lucinda',\n",
|
||
|
" 'Jane',\n",
|
||
|
" 'Hunca',\n",
|
||
|
" 'Munca',\n",
|
||
|
" 'Dollies'],\n",
|
||
|
" 'TO': ['to',\n",
|
||
|
" 'to',\n",
|
||
|
" 'to',\n",
|
||
|
" 'to',\n",
|
||
|
" 'to',\n",
|
||
|
" 'to',\n",
|
||
|
" 'to',\n",
|
||
|
" 'to',\n",
|
||
|
" 'to',\n",
|
||
|
" 'to',\n",
|
||
|
" 'to',\n",
|
||
|
" 'to',\n",
|
||
|
" 'to',\n",
|
||
|
" 'to'],\n",
|
||
|
" 'CD': ['two', 'two', 'ONE', 'two', 'two', 'one'],\n",
|
||
|
" 'JJS': ['least'],\n",
|
||
|
" 'VBN': ['been',\n",
|
||
|
" 'bought',\n",
|
||
|
" 'gone',\n",
|
||
|
" 'laid',\n",
|
||
|
" 'streaked',\n",
|
||
|
" 'boiled',\n",
|
||
|
" 'glued',\n",
|
||
|
" 'made',\n",
|
||
|
" 'labelled',\n",
|
||
|
" 'smiled',\n",
|
||
|
" 'rescued',\n",
|
||
|
" 'dressed'],\n",
|
||
|
" 'MD': ['would', 'would', 'would', 'could', 'will', 'will'],\n",
|
||
|
" 'VB': ['come',\n",
|
||
|
" 'work',\n",
|
||
|
" 'carve',\n",
|
||
|
" 'hurt',\n",
|
||
|
" 'Let',\n",
|
||
|
" 'give',\n",
|
||
|
" 'come',\n",
|
||
|
" 'burn',\n",
|
||
|
" 'work',\n",
|
||
|
" 'do',\n",
|
||
|
" 'squeeze',\n",
|
||
|
" 'go',\n",
|
||
|
" 'fetch',\n",
|
||
|
" 'get',\n",
|
||
|
" 'set',\n",
|
||
|
" 'sweep'],\n",
|
||
|
" 'RP': ['off',\n",
|
||
|
" 'out',\n",
|
||
|
" 'out',\n",
|
||
|
" 'out',\n",
|
||
|
" 'out',\n",
|
||
|
" 'up',\n",
|
||
|
" 'off',\n",
|
||
|
" 'up',\n",
|
||
|
" 'off',\n",
|
||
|
" 'up',\n",
|
||
|
" 'out',\n",
|
||
|
" 'up'],\n",
|
||
|
" 'VBG': ['scratching', 'pulling', 'returning', 'talking'],\n",
|
||
|
" 'WRB': ['where', 'when', 'when', 'when'],\n",
|
||
|
" 'PRP$': ['his',\n",
|
||
|
" 'his',\n",
|
||
|
" 'her',\n",
|
||
|
" 'his',\n",
|
||
|
" 'his',\n",
|
||
|
" 'her',\n",
|
||
|
" 'his',\n",
|
||
|
" 'her',\n",
|
||
|
" 'their',\n",
|
||
|
" 'her',\n",
|
||
|
" 'her'],\n",
|
||
|
" 'PDT': ['Such', 'all', 'half'],\n",
|
||
|
" '``': ['``', '``', '``', '``'],\n",
|
||
|
" 'VBZ': ['is', 'is', \"'s\", 'has', 'has', 'is', 'is', 'comes'],\n",
|
||
|
" 'VBP': ['have'],\n",
|
||
|
" \"''\": [\"''\", \"''\", \"''\", \"''\", \"''\", \"''\", \"''\", \"''\"],\n",
|
||
|
" 'POS': [\"'s\", \"'s\", \"'s\", \"'s\", \"'\"],\n",
|
||
|
" 'WP': ['WHAT']}"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 13,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"index"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## Step 2: Transform some input using the index\n",
|
||
|
"Use a *new* list to assemble a new sentence. Use string.join to produce the final text."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 23,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdin",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
" I have a question: What is the nature of love?\n"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"'It have the plate ; WHAT comes a bang under hams !'"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 23,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"i = input()\n",
|
||
|
"tokens = nltk.word_tokenize(i)\n",
|
||
|
"pos = nltk.pos_tag(tokens)\n",
|
||
|
"new = []\n",
|
||
|
"for word, tag in pos:\n",
|
||
|
" # print (word,tag)\n",
|
||
|
" # replace word with a random choice from the \"hat\" of words for the tag\n",
|
||
|
" if tag not in index:\n",
|
||
|
" # print (\"no replacement\")\n",
|
||
|
" new.append(word)\n",
|
||
|
" else:\n",
|
||
|
" newword = choice(index[tag])\n",
|
||
|
" new.append(newword)\n",
|
||
|
" # print (\"replace with\", newword)\n",
|
||
|
"print (' '.join(new))"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": []
|
||
|
}
|
||
|
],
|
||
|
"metadata": {
|
||
|
"kernelspec": {
|
||
|
"display_name": "Python 3",
|
||
|
"language": "python",
|
||
|
"name": "python3"
|
||
|
},
|
||
|
"language_info": {
|
||
|
"codemirror_mode": {
|
||
|
"name": "ipython",
|
||
|
"version": 3
|
||
|
},
|
||
|
"file_extension": ".py",
|
||
|
"mimetype": "text/x-python",
|
||
|
"name": "python",
|
||
|
"nbconvert_exporter": "python",
|
||
|
"pygments_lexer": "ipython3",
|
||
|
"version": "3.7.3"
|
||
|
}
|
||
|
},
|
||
|
"nbformat": 4,
|
||
|
"nbformat_minor": 4
|
||
|
}
|