You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

81 lines
2.4 KiB
Plaintext

ABOUT:
jspos is a Javascript port of Mark Watson's FastTag Part of Speech Tagger which
was itself based on Eric Brill's trained rule set and English lexicon.
jspos also includes a basic lexer that can be used to extract words and other
tokens from text strings.
LICENSE:
jspos is licensed under the GNU LGPLv3
FILES:
lexicon.js_ - Javascript version of Eric Brill's English lexicon
lexer.js - Lexer to break a sentence into taggable tokens (e.g. words)
POSTagger.js - the Part of Speech tagger
You'll typically need to include all 3 files.
USAGE:
var words = new Lexer().lex("This is some sample text. This text can contain multiple sentences.");
var taggedWords = new POSTagger().tag(words);
for (i in taggedWords) {
var taggedWord = taggedWords[i];
var word = taggedWord[0];
var tag = taggedWord[1];
}
ACKNOWLEDGEMENTS:
Thanks to Mark Watson for writing FastTag, which served as the basis for jspos.
Thanks to Toby Rahilly for compressing the lexicon.
TAGS:
CC Coord Conjuncn and,but,or
CD Cardinal number one,two
DT Determiner the,some
EX Existential there there
FW Foreign Word mon dieu
IN Preposition of,in,by
JJ Adjective big
JJR Adj., comparative bigger
JJS Adj., superlative biggest
LS List item marker 1,One
MD Modal can,should
NN Noun, sing. or mass dog
NNP Proper noun, sing. Edinburgh
NNPS Proper noun, plural Smiths
NNS Noun, plural dogs
POS Possessive ending <20>s
PDT Predeterminer all, both
PP$ Possessive pronoun my,one<6E>s
PRP Personal pronoun I,you,she
RB Adverb quickly
RBR Adverb, comparative faster
RBS Adverb, superlative fastest
RP Particle up,off
SYM Symbol +,%,&
TO <20>to<74> to
UH Interjection oh, oops
VB verb, base form eat
VBD verb, past tense ate
VBG verb, gerund eating
VBN verb, past part eaten
VBP Verb, present eat
VBZ Verb, present eats
WDT Wh-determiner which,that
WP Wh pronoun who,what
WP$ Possessive-Wh whose
WRB Wh-adverb how,where
, Comma ,
. Sent-final punct . ! ?
: Mid-sent punct. : ; <20>
$ Dollar sign $
# Pound sign #
" quote "
( Left paren (
) Right paren )