From 480c52d913fff811e23b8dc8b3fb42691bfa366b Mon Sep 17 00:00:00 2001 From: Michael Murtaugh Date: Wed, 28 Oct 2020 10:19:09 +0100 Subject: [PATCH] added example to nltk-pos --- nltk-pos-tagger.ipynb | 47 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/nltk-pos-tagger.ipynb b/nltk-pos-tagger.ipynb index a5bf536..1d03834 100644 --- a/nltk-pos-tagger.ipynb +++ b/nltk-pos-tagger.ipynb @@ -368,6 +368,53 @@ "" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## A telling/tricky case\n", + "It's important to realize that POS tagging is not a fixed property of a word -- but depends on the context of each word. The NLTK book gives an example of [homonyms](http://www.nltk.org/book_1ed/ch05.html#using-a-tagger) -- words that are written the same, but are actually pronounced differently and have different meanings depending on their use." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[('They', 'PRP'),\n", + " ('refuse', 'VBP'),\n", + " ('to', 'TO'),\n", + " ('permit', 'VB'),\n", + " ('us', 'PRP'),\n", + " ('to', 'TO'),\n", + " ('obtain', 'VB'),\n", + " ('the', 'DT'),\n", + " ('refuse', 'NN'),\n", + " ('permit', 'NN')]" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "text = nltk.word_tokenize(\"They refuse to permit us to obtain the refuse permit\")\n", + "nltk.pos_tag(text)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "From the book:\n", + "\n", + "> Notice that refuse and permit both appear as a present tense verb (VBP) and a noun (NN). E.g. refUSE is a verb meaning \"deny,\" while REFuse is a noun meaning \"trash\" (i.e. they are not homophones). Thus, we need to know which word is being used in order to pronounce the text correctly. (For this reason, text-to-speech systems usually perform POS-tagging.)" + ] + }, { "cell_type": "markdown", "metadata": {},