You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

64 KiB

NLTK - Part of Speech

In [96]:
import nltk
import random
In [101]:
texts = open('1.txt').readlines()
sentence = random.shuffle(texts)
print(texts)
[' In his fifth memo, he subsequently focuses on  multiplicity as a way for literature to comprehend the complex nature of the world that for the author is a whole of wholes, where the acts of watching and knowing also intervene in the observed reality and alter it. Calvino is particularly fascinated by literary works that are built upon a combinatory logic or that are readable as different narratives. The lecture revolves around some novels that contain multiple worlds and make space for the readers imaginations. Therefore, lets think visibility and multiplicity together, as: a multiplication of visibilities. They are traits specific to artistic production and define a context for the undecidable, or rather for undecidability, as the quality of being undecidable.']
In [108]:
lines = open('1.txt').readlines()
sentence = random.shuffle(lines)
print(lines)
[' In his fifth memo, he subsequently focuses on  multiplicity as a way for literature to comprehend the complex nature of the world that for the author is a whole of wholes, where the acts of watching and knowing also intervene in the observed reality and alter it. Calvino is particularly fascinated by literary works that are built upon a combinatory logic or that are readable as different narratives. The lecture revolves around some novels that contain multiple worlds and make space for the readers imaginations. Therefore, lets think visibility and multiplicity together, as: a multiplication of visibilities. They are traits specific to artistic production and define a context for the undecidable, or rather for undecidability, as the quality of being undecidable.']
In [109]:
# using list comprehension 
text1 = ' '.join([str(elem) for elem in lines]) 
  
print(text1)
 In his fifth memo, he subsequently focuses on  multiplicity as a way for literature to comprehend the complex nature of the world that for the author is a whole of wholes, where the acts of watching and knowing also intervene in the observed reality and alter it. Calvino is particularly fascinated by literary works that are built upon a combinatory logic or that are readable as different narratives. The lecture revolves around some novels that contain multiple worlds and make space for the readers imaginations. Therefore, lets think visibility and multiplicity together, as: a multiplication of visibilities. They are traits specific to artistic production and define a context for the undecidable, or rather for undecidability, as the quality of being undecidable.
In [111]:
tokens = nltk.word_tokenize(text1)
print(tokens)
['In', 'his', 'fifth', 'memo', ',', 'he', 'subsequently', 'focuses', 'on', 'multiplicity', 'as', 'a', 'way', 'for', 'literature', 'to', 'comprehend', 'the', 'complex', 'nature', 'of', 'the', 'world', 'that', 'for', 'the', 'author', 'is', 'a', 'whole', 'of', 'wholes', ',', 'where', 'the', 'acts', 'of', 'watching', 'and', 'knowing', 'also', 'intervene', 'in', 'the', 'observed', 'reality', 'and', 'alter', 'it', '.', 'Calvino', 'is', 'particularly', 'fascinated', 'by', 'literary', 'works', 'that', 'are', 'built', 'upon', 'a', 'combinatory', 'logic', 'or', 'that', 'are', 'readable', 'as', 'different', 'narratives', '.', 'The', 'lecture', 'revolves', 'around', 'some', 'novels', 'that', 'contain', 'multiple', 'worlds', 'and', 'make', 'space', 'for', 'the', 'readers', '', 'imaginations', '.', 'Therefore', ',', 'let', '', 's', 'think', 'visibility', 'and', 'multiplicity', 'together', ',', 'as', ':', 'a', 'multiplication', 'of', 'visibilities', '.', 'They', 'are', 'traits', 'specific', 'to', 'artistic', 'production', 'and', 'define', 'a', 'context', 'for', 'the', 'undecidable', ',', 'or', 'rather', 'for', 'undecidability', ',', 'as', 'the', 'quality', 'of', 'being', 'undecidable', '.']
In [112]:
tagged1 = nltk.pos_tag(tokens)
print(tagged1)
[('In', 'IN'), ('his', 'PRP$'), ('fifth', 'JJ'), ('memo', 'NN'), (',', ','), ('he', 'PRP'), ('subsequently', 'RB'), ('focuses', 'VBZ'), ('on', 'IN'), ('multiplicity', 'NN'), ('as', 'IN'), ('a', 'DT'), ('way', 'NN'), ('for', 'IN'), ('literature', 'NN'), ('to', 'TO'), ('comprehend', 'VB'), ('the', 'DT'), ('complex', 'JJ'), ('nature', 'NN'), ('of', 'IN'), ('the', 'DT'), ('world', 'NN'), ('that', 'WDT'), ('for', 'IN'), ('the', 'DT'), ('author', 'NN'), ('is', 'VBZ'), ('a', 'DT'), ('whole', 'NN'), ('of', 'IN'), ('wholes', 'NNS'), (',', ','), ('where', 'WRB'), ('the', 'DT'), ('acts', 'NNS'), ('of', 'IN'), ('watching', 'VBG'), ('and', 'CC'), ('knowing', 'VBG'), ('also', 'RB'), ('intervene', 'NN'), ('in', 'IN'), ('the', 'DT'), ('observed', 'JJ'), ('reality', 'NN'), ('and', 'CC'), ('alter', 'NN'), ('it', 'PRP'), ('.', '.'), ('Calvino', 'NNP'), ('is', 'VBZ'), ('particularly', 'RB'), ('fascinated', 'VBN'), ('by', 'IN'), ('literary', 'JJ'), ('works', 'NNS'), ('that', 'WDT'), ('are', 'VBP'), ('built', 'VBN'), ('upon', 'IN'), ('a', 'DT'), ('combinatory', 'NN'), ('logic', 'NN'), ('or', 'CC'), ('that', 'WDT'), ('are', 'VBP'), ('readable', 'JJ'), ('as', 'IN'), ('different', 'JJ'), ('narratives', 'NNS'), ('.', '.'), ('The', 'DT'), ('lecture', 'NN'), ('revolves', 'VBZ'), ('around', 'IN'), ('some', 'DT'), ('novels', 'NNS'), ('that', 'WDT'), ('contain', 'VBP'), ('multiple', 'JJ'), ('worlds', 'NNS'), ('and', 'CC'), ('make', 'VB'), ('space', 'NN'), ('for', 'IN'), ('the', 'DT'), ('readers', 'NNS'), ('', 'VBP'), ('imaginations', 'NNS'), ('.', '.'), ('Therefore', 'RB'), (',', ','), ('let', 'VB'), ('', 'NNP'), ('s', 'VB'), ('think', 'VBP'), ('visibility', 'NN'), ('and', 'CC'), ('multiplicity', 'NN'), ('together', 'RB'), (',', ','), ('as', 'IN'), (':', ':'), ('a', 'DT'), ('multiplication', 'NN'), ('of', 'IN'), ('visibilities', 'NNS'), ('.', '.'), ('They', 'PRP'), ('are', 'VBP'), ('traits', 'NNS'), ('specific', 'JJ'), ('to', 'TO'), ('artistic', 'JJ'), ('production', 'NN'), ('and', 'CC'), ('define', 'VB'), ('a', 'DT'), ('context', 'NN'), ('for', 'IN'), ('the', 'DT'), ('undecidable', 'JJ'), (',', ','), ('or', 'CC'), ('rather', 'RB'), ('for', 'IN'), ('undecidability', 'NN'), (',', ','), ('as', 'IN'), ('the', 'DT'), ('quality', 'NN'), ('of', 'IN'), ('being', 'VBG'), ('undecidable', 'JJ'), ('.', '.')]
In [113]:
selection = []

for word, tag in tagged1:
    if 'NN' in tag:
        selection.append(word)

print(selection)
['memo', 'multiplicity', 'way', 'literature', 'nature', 'world', 'author', 'whole', 'wholes', 'acts', 'intervene', 'reality', 'alter', 'Calvino', 'works', 'combinatory', 'logic', 'narratives', 'lecture', 'novels', 'worlds', 'space', 'readers', 'imaginations', '', 'visibility', 'multiplicity', 'multiplication', 'visibilities', 'traits', 'production', 'context', 'undecidability', 'quality']
In [114]:
# remove overlapped words, using set()
selection = []

for word, tag in tagged1:
    if 'NN' in tag:
        selection.append(word)

print(set(selection))
{'space', 'undecidability', 'world', 'worlds', 'narratives', 'way', 'literature', 'quality', 'works', 'novels', 'visibilities', 'acts', 'readers', 'nature', 'context', 'reality', 'whole', 'wholes', 'Calvino', 'multiplication', '', 'visibility', 'combinatory', 'lecture', 'alter', 'imaginations', 'memo', 'multiplicity', 'production', 'author', 'logic', 'traits', 'intervene'}
In [115]:
nntagged1 = random.choices(selection, k=6)
print(nntagged1)
['imaginations', 'alter', 'works', 'imaginations', 'lecture', 'imaginations']
In [116]:
nnand = " and ".join(nntagged1)

print(nnand)
imaginations and alter and works and imaginations and lecture and imaginations
In [117]:
nntagged2 = random.choices(selection, k=6)
print(nntagged2)
['novels', 'narratives', 'traits', 'lecture', 'narratives', 'traits']
In [118]:
nnand2 = " and ".join(nntagged2)

print(nnand2)
novels and narratives and traits and lecture and narratives and traits
In [121]:
nntagged3 = random.choices(selection, k=6)
print(nntagged3)
['multiplicity', 'works', 'author', 'reality', 'multiplicity', 'wholes']
In [123]:
nnand3 = " and ".join(nntagged3)

print(nnand3)
multiplicity and works and author and reality and multiplicity and wholes
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
lines = open('relevant.txt').readlines()
sentence = random.shuffle(lines)
print(lines)
In [30]:
type(lines)
Out[30]:
list

Tokens

In [32]:
tokens = nltk.word_tokenize(full)
print(tokens)
['In', 'fact', ',', 'if', 'something', 'is', 'possible', 'when', 'it', 'contains', 'and', 'under', 'certain', 'terms', 'performs', 'the', 'possibility', 'of', 'its', 'actualisation', ',', 'a', 'world', 'is', 'potential', 'when', 'it', 'can', 'maintain', 'its', 'potentiality', 'and', 'never', 'actualize', 'itself', 'into', 'one', 'actual', 'form', '.', 'The', 'kind', 'of', 'collective', 'body', 'that', 'undecidability', 'produces', 'could', 'of', 'course', 'be', 'seen', 'as', 'an', 'image', 'of', 'a', 'possible', 'or', 'future', 'societal', 'structure', ',', 'but', 'it', 'is', 'rather', 'an', 'enigmatic', 'subject', ':', 'it', 'is', 'not', 'there', 'to', 'actualize', 'itself', 'but', 'to', 'keep', 'being', 'a', 'sheer', ',', 'glimmering', 'potentiality', '.', 'If', 'the', 'coexistence', 'of', 'different', 'media', 'already', 'implies', 'different', 'angles', ',', 'durations', ',', 'discourses', ',', 'and', 'forms', 'of', 'spectatorship', ',', 'the', 'performance', 'itself', 'keeps', 'an', 'undecidable', 'bound', 'between', 'its', 'real', 'and', 'fictional', 'ontologies', '.', 'Which', 'is', 'to', 'say', ',', 'if', 'it', 'doesn', '', 't', 'give', 'up', 'on', 'involving', 'radically', 'different', 'realities', 'into', 'its', 'operation', 'modes', 'and', 'doesn', '', 't', 'fade', 'out', 'from', 'the', 'scene', 'of', 'the', '', 'real', '', 'world', '.', 'In', 'particular', ',', 'the', 'potentiality', 'generated', 'by', 'undecidable', 'artworks', 'is', 'grounded', 'in', 'a', 'logic', 'of', 'addition', 'and', 'contradiction', 'that', 'is', 'specific', 'of', 'art', '.', 'If', 'the', 'coexistence', 'of', 'different', 'media', 'already', 'implies', 'different', 'angles', ',', 'durations', ',', 'discourses', ',', 'and', 'forms', 'of', 'spectatorship', ',', 'the', 'performance', 'itself', 'keeps', 'an', 'undecidable', 'bound', 'between', 'its', 'real', 'and', 'fictional', 'ontologies', '.', 'In', 'fact', ',', 'undecidability', 'is', 'a', 'specific', 'force', 'at', 'work', 'that', 'consciously', 'articulates', ',', 'redefines', ',', 'or', 'alters', 'the', 'complex', 'system', 'of', 'links', ',', 'bounds', ',', 'and', 'resonances', 'between', 'different', 'potential', 'and', 'actual', 'worlds', '.', 'What', 'is', 'peculiar', 'to', 'this', 'kind', 'of', 'artworks', 'then', ',', 'and', 'what', 'within', 'them', 'can', 'produce', 'an', 'understanding', 'of', 'the', 'place', 'of', 'art', 'and', 'of', 'its', 'politics', 'today', ',', 'is', 'that', 'they', 'generate', 'a', 'multiplicity', 'of', 'gazes', 'and', 'of', 'forms', 'of', 'spectatorship', 'that', 'also', 'coexist', 'one', 'next', 'to', 'the', 'other', 'without', 'mediating', 'between', 'their', 'own', 'positions', 'and', 'points', 'of', 'view', '.', 'We', 'might', 'stretch', 'this', 'line', 'of', 'thought', 'a', 'bit', 'further', 'and', 'propose', 'that', 'art', '', 's', 'potentiality', 'is', 'that', 'of', 'multiplying', 'the', 'visible', 'as', 'an', 'actual', 'counterstrategy', 'to', 'the', 'proliferation', 'of', 'images', 'that', 'surrounds', 'us', '.', 'Undecidability', 'could', 'then', 'be', 'detached', 'from', 'art', 'and', 'applied', 'to', 'curation', ',', 'instituting', 'processes', 'or', 'even', 'to', 'politics', 'at', 'large', ':', 'the', 'unfolding', 'of', 'its', 'resonances', 'and', 'consequences', 'already', 'opens', 'this', 'possibility', 'and', 'even', 'beckons', 'it', '.', '(', 'Nevertheless', ',', 'acknowledging', 'it', 'as', 'specific', 'to', 'art', ',', 'and', 'thus', 'as', 'a', 'means', 'without', 'ends', ',', 'seems', 'to', 'better', 'protect', 'the', 'inner', 'nature', 'and', 'the', 'intact', 'potentiality', 'of', 'a', 'quality', 'that', 'does', 'not', 'make', 'itself', 'available', 'for', 'any', 'use', 'and', 'does', 'not', 'serve', 'any', 'agenda', ',', 'but', 'stays', 'autonomous', 'and', 'operates', 'by', 'creating', 'its', 'own', 'conditions', 'all', 'over', 'again', '.', 'Here', ',', 'spectators', 'are', 'invited', 'to', 'enter', 'the', 'work', '', 's', 'fictional', 'world', 'carrying', 'with', 'themselves', 'the', 'so-called', 'real', 'world', 'and', 'all', 'their', 'other', 'fictional', 'worlds', ';', 'a', 'space', 'is', 'created', 'where', 'all', 'these', 'worlds', 'are', 'equally', 'welcomed', '.', 'Undecidability', 'could', 'then', 'be', 'detached', 'from', 'art', 'and', 'applied', 'to', 'curation', ',', 'instituting', 'processes', 'or', 'even', 'to', 'politics', 'at', 'large', ':', 'the', 'unfolding', 'of', 'its', 'resonances', 'and', 'consequences', 'already', 'opens', 'this', 'possibility', 'and', 'even', 'beckons', 'it', '.', '-Nevertheless', ',', 'acknowledging', 'it', 'as', 'specific', 'to', 'art', ',', 'and', 'thus', 'as', 'a', 'means', 'without', 'ends', ',', 'seems', 'to', 'better', 'protect', 'the', 'inner', 'nature', 'and', 'the', 'intact', 'potentiality', 'of', 'a', 'quality', 'that', 'does', 'not', 'make', 'itself', 'available', 'for', 'any', 'use', 'and', 'does', 'not', 'serve', 'any', 'agenda', ',', 'but', 'stays', 'autonomous', 'and', 'operates', 'by', 'creating', 'its', 'own', 'conditions', 'all', 'over', 'again', '.']

Part of Speech "tags"

In [54]:
tagged = nltk.pos_tag(tokens)
print(tagged)
[('In', 'IN'), ('fact', 'NN'), (',', ','), ('if', 'IN'), ('something', 'NN'), ('is', 'VBZ'), ('possible', 'JJ'), ('when', 'WRB'), ('it', 'PRP'), ('contains', 'VBZ'), ('and', 'CC'), ('under', 'IN'), ('certain', 'JJ'), ('terms', 'NNS'), ('performs', 'VBP'), ('the', 'DT'), ('possibility', 'NN'), ('of', 'IN'), ('its', 'PRP$'), ('actualisation', 'NN'), (',', ','), ('a', 'DT'), ('world', 'NN'), ('is', 'VBZ'), ('potential', 'JJ'), ('when', 'WRB'), ('it', 'PRP'), ('can', 'MD'), ('maintain', 'VB'), ('its', 'PRP$'), ('potentiality', 'NN'), ('and', 'CC'), ('never', 'RB'), ('actualize', 'VB'), ('itself', 'PRP'), ('into', 'IN'), ('one', 'CD'), ('actual', 'JJ'), ('form', 'NN'), ('.', '.'), ('The', 'DT'), ('kind', 'NN'), ('of', 'IN'), ('collective', 'JJ'), ('body', 'NN'), ('that', 'WDT'), ('undecidability', 'JJ'), ('produces', 'NNS'), ('could', 'MD'), ('of', 'IN'), ('course', 'NN'), ('be', 'VB'), ('seen', 'VBN'), ('as', 'IN'), ('an', 'DT'), ('image', 'NN'), ('of', 'IN'), ('a', 'DT'), ('possible', 'JJ'), ('or', 'CC'), ('future', 'JJ'), ('societal', 'JJ'), ('structure', 'NN'), (',', ','), ('but', 'CC'), ('it', 'PRP'), ('is', 'VBZ'), ('rather', 'RB'), ('an', 'DT'), ('enigmatic', 'JJ'), ('subject', 'NN'), (':', ':'), ('it', 'PRP'), ('is', 'VBZ'), ('not', 'RB'), ('there', 'RB'), ('to', 'TO'), ('actualize', 'VB'), ('itself', 'PRP'), ('but', 'CC'), ('to', 'TO'), ('keep', 'VB'), ('being', 'VBG'), ('a', 'DT'), ('sheer', 'NN'), (',', ','), ('glimmering', 'VBG'), ('potentiality', 'NN'), ('.', '.'), ('If', 'IN'), ('the', 'DT'), ('coexistence', 'NN'), ('of', 'IN'), ('different', 'JJ'), ('media', 'NNS'), ('already', 'RB'), ('implies', 'VBZ'), ('different', 'JJ'), ('angles', 'NNS'), (',', ','), ('durations', 'NNS'), (',', ','), ('discourses', 'NNS'), (',', ','), ('and', 'CC'), ('forms', 'NNS'), ('of', 'IN'), ('spectatorship', 'NN'), (',', ','), ('the', 'DT'), ('performance', 'NN'), ('itself', 'PRP'), ('keeps', 'VBZ'), ('an', 'DT'), ('undecidable', 'JJ'), ('bound', 'NN'), ('between', 'IN'), ('its', 'PRP$'), ('real', 'JJ'), ('and', 'CC'), ('fictional', 'JJ'), ('ontologies', 'NNS'), ('.', '.'), ('Which', 'NNP'), ('is', 'VBZ'), ('to', 'TO'), ('say', 'VB'), (',', ','), ('if', 'IN'), ('it', 'PRP'), ('doesn', 'VBZ'), ('', 'JJ'), ('t', 'NNS'), ('give', 'VBP'), ('up', 'RP'), ('on', 'IN'), ('involving', 'VBG'), ('radically', 'RB'), ('different', 'JJ'), ('realities', 'NNS'), ('into', 'IN'), ('its', 'PRP$'), ('operation', 'NN'), ('modes', 'NNS'), ('and', 'CC'), ('doesn', 'NN'), ('', 'NNP'), ('t', 'NN'), ('fade', 'VBD'), ('out', 'RP'), ('from', 'IN'), ('the', 'DT'), ('scene', 'NN'), ('of', 'IN'), ('the', 'DT'), ('', 'NNP'), ('real', 'JJ'), ('', 'JJ'), ('world', 'NN'), ('.', '.'), ('In', 'IN'), ('particular', 'JJ'), (',', ','), ('the', 'DT'), ('potentiality', 'NN'), ('generated', 'VBN'), ('by', 'IN'), ('undecidable', 'JJ'), ('artworks', 'NNS'), ('is', 'VBZ'), ('grounded', 'VBN'), ('in', 'IN'), ('a', 'DT'), ('logic', 'NN'), ('of', 'IN'), ('addition', 'NN'), ('and', 'CC'), ('contradiction', 'NN'), ('that', 'WDT'), ('is', 'VBZ'), ('specific', 'JJ'), ('of', 'IN'), ('art', 'NN'), ('.', '.'), ('If', 'IN'), ('the', 'DT'), ('coexistence', 'NN'), ('of', 'IN'), ('different', 'JJ'), ('media', 'NNS'), ('already', 'RB'), ('implies', 'VBZ'), ('different', 'JJ'), ('angles', 'NNS'), (',', ','), ('durations', 'NNS'), (',', ','), ('discourses', 'NNS'), (',', ','), ('and', 'CC'), ('forms', 'NNS'), ('of', 'IN'), ('spectatorship', 'NN'), (',', ','), ('the', 'DT'), ('performance', 'NN'), ('itself', 'PRP'), ('keeps', 'VBZ'), ('an', 'DT'), ('undecidable', 'JJ'), ('bound', 'NN'), ('between', 'IN'), ('its', 'PRP$'), ('real', 'JJ'), ('and', 'CC'), ('fictional', 'JJ'), ('ontologies', 'NNS'), ('.', '.'), ('In', 'IN'), ('fact', 'NN'), (',', ','), ('undecidability', 'NN'), ('is', 'VBZ'), ('a', 'DT'), ('specific', 'JJ'), ('force', 'NN'), ('at', 'IN'), ('work', 'NN'), ('that', 'WDT'), ('consciously', 'RB'), ('articulates', 'VBZ'), (',', ','), ('redefines', 'NNS'), (',', ','), ('or', 'CC'), ('alters', 'VBZ'), ('the', 'DT'), ('complex', 'JJ'), ('system', 'NN'), ('of', 'IN'), ('links', 'NNS'), (',', ','), ('bounds', 'NNS'), (',', ','), ('and', 'CC'), ('resonances', 'NNS'), ('between', 'IN'), ('different', 'JJ'), ('potential', 'NN'), ('and', 'CC'), ('actual', 'JJ'), ('worlds', 'NNS'), ('.', '.'), ('What', 'WP'), ('is', 'VBZ'), ('peculiar', 'JJ'), ('to', 'TO'), ('this', 'DT'), ('kind', 'NN'), ('of', 'IN'), ('artworks', 'NNS'), ('then', 'RB'), (',', ','), ('and', 'CC'), ('what', 'WP'), ('within', 'IN'), ('them', 'PRP'), ('can', 'MD'), ('produce', 'VB'), ('an', 'DT'), ('understanding', 'NN'), ('of', 'IN'), ('the', 'DT'), ('place', 'NN'), ('of', 'IN'), ('art', 'NN'), ('and', 'CC'), ('of', 'IN'), ('its', 'PRP$'), ('politics', 'NNS'), ('today', 'NN'), (',', ','), ('is', 'VBZ'), ('that', 'IN'), ('they', 'PRP'), ('generate', 'VBP'), ('a', 'DT'), ('multiplicity', 'NN'), ('of', 'IN'), ('gazes', 'NNS'), ('and', 'CC'), ('of', 'IN'), ('forms', 'NNS'), ('of', 'IN'), ('spectatorship', 'NN'), ('that', 'WDT'), ('also', 'RB'), ('coexist', 'VBP'), ('one', 'CD'), ('next', 'JJ'), ('to', 'TO'), ('the', 'DT'), ('other', 'JJ'), ('without', 'IN'), ('mediating', 'VBG'), ('between', 'IN'), ('their', 'PRP$'), ('own', 'JJ'), ('positions', 'NNS'), ('and', 'CC'), ('points', 'NNS'), ('of', 'IN'), ('view', 'NN'), ('.', '.'), ('We', 'PRP'), ('might', 'MD'), ('stretch', 'VB'), ('this', 'DT'), ('line', 'NN'), ('of', 'IN'), ('thought', 'NN'), ('a', 'DT'), ('bit', 'NN'), ('further', 'JJ'), ('and', 'CC'), ('propose', 'VB'), ('that', 'IN'), ('art', 'NN'), ('', 'NNP'), ('s', 'NN'), ('potentiality', 'NN'), ('is', 'VBZ'), ('that', 'IN'), ('of', 'IN'), ('multiplying', 'VBG'), ('the', 'DT'), ('visible', 'JJ'), ('as', 'IN'), ('an', 'DT'), ('actual', 'JJ'), ('counterstrategy', 'NN'), ('to', 'TO'), ('the', 'DT'), ('proliferation', 'NN'), ('of', 'IN'), ('images', 'NNS'), ('that', 'WDT'), ('surrounds', 'VBZ'), ('us', 'PRP'), ('.', '.'), ('Undecidability', 'NN'), ('could', 'MD'), ('then', 'RB'), ('be', 'VB'), ('detached', 'VBN'), ('from', 'IN'), ('art', 'NN'), ('and', 'CC'), ('applied', 'VBN'), ('to', 'TO'), ('curation', 'NN'), (',', ','), ('instituting', 'VBG'), ('processes', 'NNS'), ('or', 'CC'), ('even', 'RB'), ('to', 'TO'), ('politics', 'NNS'), ('at', 'IN'), ('large', 'JJ'), (':', ':'), ('the', 'DT'), ('unfolding', 'NN'), ('of', 'IN'), ('its', 'PRP$'), ('resonances', 'NNS'), ('and', 'CC'), ('consequences', 'NNS'), ('already', 'RB'), ('opens', 'VBZ'), ('this', 'DT'), ('possibility', 'NN'), ('and', 'CC'), ('even', 'RB'), ('beckons', 'NNS'), ('it', 'PRP'), ('.', '.'), ('(', '('), ('Nevertheless', 'NNP'), (',', ','), ('acknowledging', 'VBG'), ('it', 'PRP'), ('as', 'IN'), ('specific', 'JJ'), ('to', 'TO'), ('art', 'VB'), (',', ','), ('and', 'CC'), ('thus', 'RB'), ('as', 'IN'), ('a', 'DT'), ('means', 'NN'), ('without', 'IN'), ('ends', 'NNS'), (',', ','), ('seems', 'VBZ'), ('to', 'TO'), ('better', 'RBR'), ('protect', 'VB'), ('the', 'DT'), ('inner', 'JJ'), ('nature', 'NN'), ('and', 'CC'), ('the', 'DT'), ('intact', 'JJ'), ('potentiality', 'NN'), ('of', 'IN'), ('a', 'DT'), ('quality', 'NN'), ('that', 'WDT'), ('does', 'VBZ'), ('not', 'RB'), ('make', 'VB'), ('itself', 'PRP'), ('available', 'JJ'), ('for', 'IN'), ('any', 'DT'), ('use', 'NN'), ('and', 'CC'), ('does', 'VBZ'), ('not', 'RB'), ('serve', 'VB'), ('any', 'DT'), ('agenda', 'NN'), (',', ','), ('but', 'CC'), ('stays', 'VBZ'), ('autonomous', 'JJ'), ('and', 'CC'), ('operates', 'VBZ'), ('by', 'IN'), ('creating', 'VBG'), ('its', 'PRP$'), ('own', 'JJ'), ('conditions', 'NNS'), ('all', 'DT'), ('over', 'RB'), ('again', 'RB'), ('.', '.'), ('Here', 'RB'), (',', ','), ('spectators', 'NNS'), ('are', 'VBP'), ('invited', 'VBN'), ('to', 'TO'), ('enter', 'VB'), ('the', 'DT'), ('work', 'NN'), ('', 'NNP'), ('s', 'VBD'), ('fictional', 'JJ'), ('world', 'NN'), ('carrying', 'VBG'), ('with', 'IN'), ('themselves', 'PRP'), ('the', 'DT'), ('so-called', 'JJ'), ('real', 'JJ'), ('world', 'NN'), ('and', 'CC'), ('all', 'DT'), ('their', 'PRP$'), ('other', 'JJ'), ('fictional', 'JJ'), ('worlds', 'NNS'), (';', ':'), ('a', 'DT'), ('space', 'NN'), ('is', 'VBZ'), ('created', 'VBN'), ('where', 'WRB'), ('all', 'PDT'), ('these', 'DT'), ('worlds', 'NNS'), ('are', 'VBP'), ('equally', 'RB'), ('welcomed', 'VBN'), ('.', '.'), ('Undecidability', 'NN'), ('could', 'MD'), ('then', 'RB'), ('be', 'VB'), ('detached', 'VBN'), ('from', 'IN'), ('art', 'NN'), ('and', 'CC'), ('applied', 'VBN'), ('to', 'TO'), ('curation', 'NN'), (',', ','), ('instituting', 'VBG'), ('processes', 'NNS'), ('or', 'CC'), ('even', 'RB'), ('to', 'TO'), ('politics', 'NNS'), ('at', 'IN'), ('large', 'JJ'), (':', ':'), ('the', 'DT'), ('unfolding', 'NN'), ('of', 'IN'), ('its', 'PRP$'), ('resonances', 'NNS'), ('and', 'CC'), ('consequences', 'NNS'), ('already', 'RB'), ('opens', 'VBZ'), ('this', 'DT'), ('possibility', 'NN'), ('and', 'CC'), ('even', 'RB'), ('beckons', 'NNS'), ('it', 'PRP'), ('.', '.'), ('-Nevertheless', 'NN'), (',', ','), ('acknowledging', 'VBG'), ('it', 'PRP'), ('as', 'IN'), ('specific', 'JJ'), ('to', 'TO'), ('art', 'VB'), (',', ','), ('and', 'CC'), ('thus', 'RB'), ('as', 'IN'), ('a', 'DT'), ('means', 'NN'), ('without', 'IN'), ('ends', 'NNS'), (',', ','), ('seems', 'VBZ'), ('to', 'TO'), ('better', 'RBR'), ('protect', 'VB'), ('the', 'DT'), ('inner', 'JJ'), ('nature', 'NN'), ('and', 'CC'), ('the', 'DT'), ('intact', 'JJ'), ('potentiality', 'NN'), ('of', 'IN'), ('a', 'DT'), ('quality', 'NN'), ('that', 'WDT'), ('does', 'VBZ'), ('not', 'RB'), ('make', 'VB'), ('itself', 'PRP'), ('available', 'JJ'), ('for', 'IN'), ('any', 'DT'), ('use', 'NN'), ('and', 'CC'), ('does', 'VBZ'), ('not', 'RB'), ('serve', 'VB'), ('any', 'DT'), ('agenda', 'NN'), (',', ','), ('but', 'CC'), ('stays', 'VBZ'), ('autonomous', 'JJ'), ('and', 'CC'), ('operates', 'VBZ'), ('by', 'IN'), ('creating', 'VBG'), ('its', 'PRP$'), ('own', 'JJ'), ('conditions', 'NNS'), ('all', 'DT'), ('over', 'RB'), ('again', 'RB'), ('.', '.')]

Now, you could select for example all the type of verbs:

In [56]:
selection = []

for word, tag in tagged:
    if 'NN' in tag:
        selection.append(word)

print(selection)
['fact', 'something', 'terms', 'possibility', 'actualisation', 'world', 'potentiality', 'form', 'kind', 'body', 'produces', 'course', 'image', 'structure', 'subject', 'sheer', 'potentiality', 'coexistence', 'media', 'angles', 'durations', 'discourses', 'forms', 'spectatorship', 'performance', 'bound', 'ontologies', 'Which', 't', 'realities', 'operation', 'modes', 'doesn', '', 't', 'scene', '', 'world', 'potentiality', 'artworks', 'logic', 'addition', 'contradiction', 'art', 'coexistence', 'media', 'angles', 'durations', 'discourses', 'forms', 'spectatorship', 'performance', 'bound', 'ontologies', 'fact', 'undecidability', 'force', 'work', 'redefines', 'system', 'links', 'bounds', 'resonances', 'potential', 'worlds', 'kind', 'artworks', 'understanding', 'place', 'art', 'politics', 'today', 'multiplicity', 'gazes', 'forms', 'spectatorship', 'positions', 'points', 'view', 'line', 'thought', 'bit', 'art', '', 's', 'potentiality', 'counterstrategy', 'proliferation', 'images', 'Undecidability', 'art', 'curation', 'processes', 'politics', 'unfolding', 'resonances', 'consequences', 'possibility', 'beckons', 'Nevertheless', 'means', 'ends', 'nature', 'potentiality', 'quality', 'use', 'agenda', 'conditions', 'spectators', 'work', '', 'world', 'world', 'worlds', 'space', 'worlds', 'Undecidability', 'art', 'curation', 'processes', 'politics', 'unfolding', 'resonances', 'consequences', 'possibility', 'beckons', '-Nevertheless', 'means', 'ends', 'nature', 'potentiality', 'quality', 'use', 'agenda', 'conditions']
In [62]:
# remove overlapped words, using set()
selection = []

for word, tag in tagged:
    if 'NN' in tag:
        selection.append(word)

print(set(selection))
{'today', 'proliferation', 'angles', 'art', 'worlds', 'counterstrategy', 'image', 'performance', 'consequences', 'sheer', 'addition', 'kind', 'means', 'work', 's', '', '-Nevertheless', 'course', 'terms', 'line', 'spectators', 'spectatorship', 'world', 'thought', 'Undecidability', 'Which', 'images', 'bound', 'actualisation', 'something', 'forms', 'place', 'conditions', 'produces', 't', 'use', 'multiplicity', 'durations', 'points', 'undecidability', 'resonances', 'politics', 'realities', 'quality', 'discourses', 'system', 'operation', 'view', 'scene', 'agenda', 'body', 'modes', '', 'structure', 'force', 'potentiality', 'processes', 'redefines', 'ontologies', 'links', 'curation', 'beckons', 'possibility', 'ends', 'bounds', 'space', 'coexistence', 'media', 'potential', 'Nevertheless', 'form', 'nature', 'fact', 'understanding', 'positions', 'bit', 'artworks', 'doesn', 'logic', 'unfolding', 'contradiction', 'gazes', 'subject'}
In [ ]:
#gotta use stopwords.(), because i dont need unnecessary characters, such as , . 'etc.
In [68]:
nnrando2 = random.choices(selection, k=6)
print(nnrando2)
['potentiality', 'consequences', 'images', 'world', 'art', 'gazes']
In [69]:
nnand2 = " and ".join(nnrando2)

print(nnand2)
potentiality and consequences and images and world and art and gazes
In [70]:
nnrando3 = random.choices(selection, k=6)
print(nnrando3)
['system', 'conditions', 'bounds', 'bound', 'redefines', 'force']
In [71]:
nnand3 = " and ".join(nnrando3)

print(nnand3)
system and conditions and bounds and bound and redefines and force
In [72]:
nnrando4 = random.choices(selection, k=6)
print(nnrando4)
['potentiality', 'fact', 'place', 'worlds', 'Nevertheless', 'conditions']
In [73]:
nnand4 = " and ".join(nnrando4)

print(nnand4)
potentiality and fact and place and worlds and Nevertheless and conditions
In [75]:
nnrando5 = random.choices(selection, k=6)
print(nnrando5)
['subject', 'forms', 'nature', 'modes', 'forms', 'gazes']
In [76]:
nnand5 = " and ".join(nnrando5)

print(nnand5)
subject and forms and nature and modes and forms and gazes
In [35]:
selection = []

for word, tag in tagged:
    if 'VB' in tag:
        selection.append(word)

print(selection)
['is', 'contains', 'performs', 'is', 'maintain', 'actualize', 'be', 'seen', 'is', 'is', 'actualize', 'keep', 'being', 'glimmering', 'implies', 'keeps', 'is', 'say', 'doesn', 'give', 'involving', 'fade', 'generated', 'is', 'grounded', 'is', 'implies', 'keeps', 'is', 'articulates', 'alters', 'is', 'produce', 'is', 'generate', 'coexist', 'mediating', 'stretch', 'propose', 'is', 'multiplying', 'surrounds', 'be', 'detached', 'applied', 'instituting', 'opens', 'acknowledging', 'art', 'seems', 'protect', 'does', 'make', 'does', 'serve', 'stays', 'operates', 'creating', 'are', 'invited', 'enter', 's', 'carrying', 'is', 'created', 'are', 'welcomed', 'be', 'detached', 'applied', 'instituting', 'opens', 'acknowledging', 'art', 'seems', 'protect', 'does', 'make', 'does', 'serve', 'stays', 'operates', 'creating']
In [77]:
selection = []

for word, tag in tagged:
    if 'VB' in tag:
        selection.append(word)

print(set(selection))
{'actualize', 'keep', 'instituting', 'protect', 'operates', 'make', 'being', 'art', 'serve', 'detached', 'are', 'articulates', 'welcomed', 'maintain', 'contains', 'created', 'produce', 'performs', 'applied', 'does', 'surrounds', 'say', 'carrying', 'generate', 'coexist', 'generated', 'seen', 'be', 'stretch', 'multiplying', 'alters', 'enter', 'give', 'creating', 'is', 'glimmering', 's', 'seems', 'acknowledging', 'implies', 'invited', 'stays', 'mediating', 'propose', 'doesn', 'fade', 'grounded', 'involving', 'keeps', 'opens'}
In [78]:
vbrando = random.choices(selection, k=6)
print(vbrando)
['alters', 'is', 'keep', 'is', 'detached', 'glimmering']
In [79]:
vband = " and ".join(vbrando)

print(vband)
alters and is and keep and is and detached and glimmering
In [81]:
vbrando2 = random.choices(selection, k=6)
print(vbrando2)
['detached', 'invited', 'seen', 'is', 'is', 'does']
In [82]:
vband2 = " and ".join(vbrando2)

print(vband2)
detached and invited and seen and is and is and does
In [83]:
vbrando3 = random.choices(selection, k=6)
print(vbrando3)
['glimmering', 'seems', 'seen', 'make', 'invited', 'operates']
In [84]:
vband3 = " and ".join(vbrando3)

print(vband3)
glimmering and seems and seen and make and invited and operates
In [36]:
selection = []

for word, tag in tagged:
    if 'JJ' in tag:
        selection.append(word)

print(selection)
['possible', 'certain', 'potential', 'actual', 'collective', 'undecidability', 'possible', 'future', 'societal', 'enigmatic', 'different', 'different', 'undecidable', 'real', 'fictional', '', 'different', 'real', '', 'particular', 'undecidable', 'specific', 'different', 'different', 'undecidable', 'real', 'fictional', 'specific', 'complex', 'different', 'actual', 'peculiar', 'next', 'other', 'own', 'further', 'visible', 'actual', 'large', 'specific', 'inner', 'intact', 'available', 'autonomous', 'own', 'fictional', 'so-called', 'real', 'other', 'fictional', 'large', 'specific', 'inner', 'intact', 'available', 'autonomous', 'own']
In [85]:
selection = []

for word, tag in tagged:
    if 'JJ' in tag:
        selection.append(word)

print(set(selection))
{'next', 'undecidable', 'different', 'large', 'undecidability', 'societal', 'complex', 'enigmatic', 'autonomous', 'fictional', 'potential', 'available', 'further', 'actual', 'inner', 'future', 'intact', 'own', '', 'so-called', 'real', 'certain', 'particular', 'specific', 'other', 'visible', 'collective', 'peculiar', 'possible'}
In [91]:
jjrando = random.choices(selection, k=6)
print(jjrando)
['possible', 'real', 'undecidability', 'collective', 'actual', 'other']
In [92]:
jjand = " and ".join(jjrando)

print(jjand)
possible and real and undecidability and collective and actual and other
In [88]:
jjrando2 = random.choices(selection, k=6)
print(jjrando2)
['particular', 'real', 'inner', 'different', 'autonomous', 'specific']
In [90]:
jjand2 = " and ".join(jjrando2)

print(jjand2)
particular and real and inner and different and autonomous and specific
In [93]:
jjrando3 = random.choices(selection, k=6)
print(jjrando3)
['specific', 'future', 'actual', 'next', 'own', 'own']
In [94]:
jjand3 = " and ".join(jjrando3)

print(jjand3)
specific and future and actual and next and own and own
In [37]:
selection = []

for word, tag in tagged:
    if 'DT' in tag:
        selection.append(word)

print(selection)
['the', 'a', 'The', 'that', 'an', 'a', 'an', 'a', 'the', 'the', 'an', 'the', 'the', 'the', 'a', 'that', 'the', 'the', 'an', 'a', 'that', 'the', 'this', 'an', 'the', 'a', 'that', 'the', 'this', 'a', 'the', 'an', 'the', 'that', 'the', 'this', 'a', 'the', 'the', 'a', 'that', 'any', 'any', 'all', 'the', 'the', 'all', 'a', 'all', 'these', 'the', 'this', 'a', 'the', 'the', 'a', 'that', 'any', 'any', 'all']
In [40]:
selection = []

for word, tag in tagged:
    if 'CC' in tag:
        selection.append(word)

print(selection)
['and', 'and', 'or', 'but', 'but', 'and', 'and', 'and', 'and', 'and', 'and', 'or', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'or', 'and', 'and', 'and', 'and', 'and', 'but', 'and', 'and', 'and', 'or', 'and', 'and', 'and', 'and', 'and', 'but', 'and']
In [41]:
selection = []

for word, tag in tagged:
    if 'CD' in tag:
        selection.append(word)

print(selection)
['one', 'one']
In [42]:
selection = []

for word, tag in tagged:
    if 'IN' in tag:
        selection.append(word)

print(selection)
['In', 'if', 'under', 'of', 'into', 'of', 'of', 'as', 'of', 'If', 'of', 'of', 'between', 'if', 'on', 'into', 'from', 'of', 'In', 'by', 'in', 'of', 'of', 'If', 'of', 'of', 'between', 'In', 'at', 'of', 'between', 'of', 'within', 'of', 'of', 'of', 'that', 'of', 'of', 'of', 'without', 'between', 'of', 'of', 'that', 'that', 'of', 'as', 'of', 'from', 'at', 'of', 'as', 'as', 'without', 'of', 'for', 'by', 'with', 'from', 'at', 'of', 'as', 'as', 'without', 'of', 'for', 'by']
In [48]:
selection = []

for word, tag in tagged:
    if 'RB' in tag:
        selection.append(word)

print(selection)
['when', 'when', 'never', 'rather', 'not', 'there', 'already', 'radically', 'already', 'consciously', 'then', 'also', 'then', 'even', 'already', 'even', 'thus', 'better', 'not', 'not', 'over', 'again', 'Here', 'where', 'equally', 'then', 'even', 'already', 'even', 'thus', 'better', 'not', 'not', 'over', 'again']
In [49]:
selection = []

for word, tag in tagged:
    if 'RBR' in tag:
        selection.append(word)

print(selection)
['better', 'better']
In [50]:
selection = []

for word, tag in tagged:
    if 'RBS' in tag:
        selection.append(word)

print(selection)
[]

Where do these tags come from?

An off-the-shelf tagger is available for English. It uses the Penn Treebank tagset.

From: http://www.nltk.org/api/nltk.tag.html#module-nltk.tag

NLTK provides documentation for each tag, which can be queried using the tag, e.g. nltk.help.upenn_tagset('RB').

From: http://www.nltk.org/book_1ed/ch05.html

In [47]:
nltk.help.upenn_tagset('RB')
---------------------------------------------------------------------------
LookupError                               Traceback (most recent call last)
<ipython-input-47-3ec8764fce3e> in <module>
----> 1 nltk.help.upenn_tagset('RB')

/usr/local/lib/python3.7/dist-packages/nltk/help.py in upenn_tagset(tagpattern)
     25 
     26 def upenn_tagset(tagpattern=None):
---> 27     _format_tagset("upenn_tagset", tagpattern)
     28 
     29 

/usr/local/lib/python3.7/dist-packages/nltk/help.py in _format_tagset(tagset, tagpattern)
     44 
     45 def _format_tagset(tagset, tagpattern=None):
---> 46     tagdict = load("help/tagsets/" + tagset + ".pickle")
     47     if not tagpattern:
     48         _print_entries(sorted(tagdict), tagdict)

/usr/local/lib/python3.7/dist-packages/nltk/data.py in load(resource_url, format, cache, verbose, logic_parser, fstruct_reader, encoding)
    750 
    751     # Load the resource.
--> 752     opened_resource = _open(resource_url)
    753 
    754     if format == "raw":

/usr/local/lib/python3.7/dist-packages/nltk/data.py in _open(resource_url)
    875 
    876     if protocol is None or protocol.lower() == "nltk":
--> 877         return find(path_, path + [""]).open()
    878     elif protocol.lower() == "file":
    879         # urllib might not use mode='rb', so handle this one ourselves:

/usr/local/lib/python3.7/dist-packages/nltk/data.py in find(resource_name, paths)
    583     sep = "*" * 70
    584     resource_not_found = "\n%s\n%s\n%s\n" % (sep, msg, sep)
--> 585     raise LookupError(resource_not_found)
    586 
    587 

LookupError: 
**********************************************************************
  Resource tagsets not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('tagsets')
  
  For more information see: https://www.nltk.org/data.html

  Attempted to load help/tagsets/PY3/upenn_tagset.pickle

  Searched in:
    - '/home/namikim/nltk_data'
    - '/usr/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''
**********************************************************************

An alphabetical list of part-of-speech tags used in the Penn Treebank Project (link):

Number
Tag
Description
1. CC Coordinating conjunction
2. CD Cardinal number
3. DT Determiner
4. EX Existential there
5. FW Foreign word
6. IN Preposition or subordinating conjunction
7. JJ Adjective
8. JJR Adjective, comparative
9. JJS Adjective, superlative
10. LS List item marker
11. MD Modal
12. NN Noun, singular or mass
13. NNS Noun, plural
14. NNP Proper noun, singular
15. NNPS Proper noun, plural
16. PDT Predeterminer
17. POS Possessive ending
18. PRP Personal pronoun
19. PRP\$ Possessive pronoun
20. RB Adverb
21. RBR Adverb, comparative
22. RBS Adverb, superlative
23. RP Particle
24. SYM Symbol
25. TO to
26. UH Interjection
27. VB Verb, base form
28. VBD Verb, past tense
29. VBG Verb, gerund or present participle
30. VBN Verb, past participle
31. VBP Verb, non-3rd person singular present
32. VBZ Verb, 3rd person singular present
33. WDT Wh-determiner
34. WP Wh-pronoun
35. WP$ Possessive wh-pronoun
36. WRB Wh-adverb

A telling/tricky case

It's important to realize that POS tagging is not a fixed property of a word -- but depends on the context of each word. The NLTK book gives an example of homonyms -- words that are written the same, but are actually pronounced differently and have different meanings depending on their use.

In [ ]:
text = nltk.word_tokenize("They refuse to permit us to obtain the refuse permit")
nltk.pos_tag(text)

From the book:

Notice that refuse and permit both appear as a present tense verb (VBP) and a noun (NN). E.g. refUSE is a verb meaning "deny," while REFuse is a noun meaning "trash" (i.e. they are not homophones). Thus, we need to know which word is being used in order to pronounce the text correctly. (For this reason, text-to-speech systems usually perform POS-tagging.)

Applying to an entire text

In [ ]:
language = open('../txt/language.txt').read()
tokens = nltk.word_tokenize(language)
tagged = nltk.pos_tag(tokens)
In [ ]:
tagged
In [ ]:
words = "in the beginning was heaven and earth and the time of the whatever".split()
In [ ]:
words
In [ ]:
words.index("the")
In [ ]:
for i, word in enumerate(words):
    if word == "the":
        print (i, word)
    else:
        print (word.upper())
In [ ]:
import random 

words = {}
words["VB"] = []

for word in nltk.word_tokenize("in the beginning was heaven and earth and the time of the whatever"):
    words["VB"].append(word)
    
random.choice(words["VB"])
In [ ]: