You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
9.2 KiB
9.2 KiB
NLTK - Part of Speech¶
In [ ]:
import nltk import random
In [ ]:
lines = open('txt/language.txt').readlines() sentence = random.choice(lines) print(sentence)
Tokens¶
In [ ]:
tokens = nltk.word_tokenize(sentence) print(tokens)
Part of Speech "tags"¶
In [ ]:
tagged = nltk.pos_tag(tokens) print(tagged)
Now, you could select for example all the type of verbs:
In [ ]:
selection = [] for word, tag in tagged: if 'VB' in tag: selection.append(word) print(selection)
Where do these tags come from?¶
An off-the-shelf tagger is available for English. It uses the Penn Treebank tagset.
NLTK provides documentation for each tag, which can be queried using the tag, e.g. nltk.help.upenn_tagset('RB').
In [ ]:
nltk.help.upenn_tagset('PRP')
An alphabetical list of part-of-speech tags used in the Penn Treebank Project (link):
Number
|
Tag
|
Description
|
1. | CC | Coordinating conjunction |
2. | CD | Cardinal number |
3. | DT | Determiner |
4. | EX | Existential there |
5. | FW | Foreign word |
6. | IN | Preposition or subordinating conjunction |
7. | JJ | Adjective |
8. | JJR | Adjective, comparative |
9. | JJS | Adjective, superlative |
10. | LS | List item marker |
11. | MD | Modal |
12. | NN | Noun, singular or mass |
13. | NNS | Noun, plural |
14. | NNP | Proper noun, singular |
15. | NNPS | Proper noun, plural |
16. | PDT | Predeterminer |
17. | POS | Possessive ending |
18. | PRP | Personal pronoun |
19. | PRP\$ | Possessive pronoun |
20. | RB | Adverb |
21. | RBR | Adverb, comparative |
22. | RBS | Adverb, superlative |
23. | RP | Particle |
24. | SYM | Symbol |
25. | TO | to |
26. | UH | Interjection |
27. | VB | Verb, base form |
28. | VBD | Verb, past tense |
29. | VBG | Verb, gerund or present participle |
30. | VBN | Verb, past participle |
31. | VBP | Verb, non-3rd person singular present |
32. | VBZ | Verb, 3rd person singular present |
33. | WDT | Wh-determiner |
34. | WP | Wh-pronoun |
35. | WP$ | Possessive wh-pronoun |
36. | WRB | Wh-adverb |