| Description |
Simple pos-tagger for the RiTa libary using the Penn tagset. Use RiPosTagger.setDefaultTagger(type);
to specify a (faster/lighter) transformation-based tagger, or the (usually more accurate)
maximum-entryopy tagger
RiPosTagger tagger = new RiPosTagger(this);
String s = "The teenage boy, stricken with fear, cried sadly like a little baby";
String[] words = RiTa.tokenize(s);
String[] tags = tagger.tag(words);
for (int i = 0; i < sents.length; i++)
{
System.out.println(sents[i]);
}
// OR
System.out.println(tagger.tagInline(s));
The full Penn part-of-speech tag set:
cc coordinating conjunction
cd cardinal number
dt determiner
ex existential there
fw foreign word
in preposition/subord. conjunction
jj adjective
jjr adjective, comparative
jjs adjective, superlative
ls list item marker
md modal
nn noun, singular or mass
nns noun, plural
nnp proper noun, singular
nnps proper noun, plural
pdt predeterminer
pos possessive ending
prp personal pronoun
prp$ i possessive pronoun
rb adverb
rbr adverb, comparative
rbs adverb, superlative
rp particle
sym symbol (mathematical or scientific)
to to
uh interjection
vb verb, base form
vbd verb, past tense
vbg verb, gerund/present participle
vbn verb, past participle
vbp verb, non-3rd ps. sing. present
vbz verb, 3rd ps. sing. present
wdt wh-determiner
wp wh-pronoun
wp$ possessive wh-pronoun
wrb wh-adverb
# pound sign
$ dollar sign
. sentence-final punctuation
, comma
: colon, semi-colon
( left bracket character
) right bracket character
" straight double quote
` left open single quote
" left open double quote
" right close single quote
" right close double quote
- dash
Note: to use maximum-entry tagger, you must first download
the rita statistical models (rita.me.models.zip) package
and unpack the zip into the "rita" directory in your
processing sketchbook, or the data directory for your sketch.
You can also specify an alternative directory (an absolute path)
for the models via RiTa.setModelDir();
Then call RiPosTagger.setDefaultTagger(RiPosTagger.MAXENT_POS_TAGGER); |
| Methods |
| tag() |
|
Returns a String array of the most probably tags
|
| tagFile() |
|
Loads a file, splits the input into sentences and returns
a single String[] with all the pos-tags from the text.
|
| tagForWordNet() |
|
Tags the array of words (as usual) with a part-of-speech from the Penn tagset,
then returns the corresponding part-of-speech for WordNet from the set
{ "n" (noun), "v"(verb), "a"(adj), "r"(adverb), "-"(other) } as a String.
|
| tagInline() |
|
Returns a String with pos-tags notated inline in the format:
"The/dt doctor/nn treated/vbd dogs/nns"
|
| tagWordForWordNet() |
|
Tags a single word with a part-of-speech from the Penn tagset,
then returns the corresponding part-of-speech for WordNet from the set
{ "n" (noun), "v"(verb), "a"(adj), "r"(adverb), "-"(other) } as a String.
|
| RiPosTagger.isAdjective() |
|
Returns true if pos is an adjective
|
| RiPosTagger.isAdverb() |
|
Returns true if pos is an adverb
|
| RiPosTagger.isNoun() |
|
Returns true if partOfSpeech is a noun
|
| RiPosTagger.isVerb() |
|
Returns true if pos is a verb
|
| RiPosTagger.parseTagString() |
|
Takes a String of words and tags in the format:
The/dt doctor/nn treated/vbd dogs/nns
returns an array of the part-of-speech tags.
|
| RiPosTagger.toWordNet() |
|
Converts a part-of-speech String from the Penn tagset to the corresponding part-of-speech
for WordNet from the set { "n" (noun), "v"(verb), "a"(adj), "r"(adverb), "-"(other) } as a String.
If the pos is not found in the penn set, it is returned unchanged.
|
|