RiTa
index
Name RiPosTagger
Description Simple pos-tagger for the RiTa libary using the Penn tagset. Use RiPosTagger.setDefaultTagger(type); to specify a (faster/lighter) transformation-based tagger, or the (usually more accurate) maximum-entryopy tagger
    RiPosTagger tagger = new RiPosTagger(this);
    String s = "The teenage boy, stricken with fear, cried sadly like a little baby";

    String[] words = RiTa.tokenize(s);
    String[] tags = tagger.tag(words);

    for (int i = 0; i < sents.length; i++) 
    {
        System.out.println(sents[i]);
    }   
    //    OR     
    System.out.println(tagger.tagInline(s));
 
The full Penn part-of-speech tag set:
  • cc coordinating conjunction
  • cd cardinal number
  • dt determiner
  • ex existential there
  • fw foreign word
  • in preposition/subord. conjunction
  • jj adjective
  • jjr adjective, comparative
  • jjs adjective, superlative
  • ls list item marker
  • md modal
  • nn noun, singular or mass
  • nns noun, plural
  • nnp proper noun, singular
  • nnps proper noun, plural
  • pdt predeterminer
  • pos possessive ending
  • prp personal pronoun
  • prp$ i possessive pronoun
  • rb adverb
  • rbr adverb, comparative
  • rbs adverb, superlative
  • rp particle
  • sym symbol (mathematical or scientific)
  • to to
  • uh interjection
  • vb verb, base form
  • vbd verb, past tense
  • vbg verb, gerund/present participle
  • vbn verb, past participle
  • vbp verb, non-3rd ps. sing. present
  • vbz verb, 3rd ps. sing. present
  • wdt wh-determiner
  • wp wh-pronoun
  • wp$ possessive wh-pronoun
  • wrb wh-adverb
  • # pound sign
  • $ dollar sign
  • . sentence-final punctuation
  • , comma
  • : colon, semi-colon
  • ( left bracket character
  • ) right bracket character
  • " straight double quote
  • ` left open single quote
  • " left open double quote
  • " right close single quote
  • " right close double quote
  • - dash
Note: to use maximum-entry tagger, you must first download the rita statistical models (rita.me.models.zip) package and unpack the zip into the "rita" directory in your processing sketchbook, or the data directory for your sketch.

You can also specify an alternative directory (an absolute path) for the models via RiTa.setModelDir();

 Then call RiPosTagger.setDefaultTagger(RiPosTagger.MAXENT_POS_TAGGER);
Constructors
RiPosTagger(pApplet);
RiPosTagger(p, taggerType);
Methods
tag()   Returns a String array of the most probably tags

tagFile()   Loads a file, splits the input into sentences and returns a single String[] with all the pos-tags from the text.

tagForWordNet()   Tags the array of words (as usual) with a part-of-speech from the Penn tagset, then returns the corresponding part-of-speech for WordNet from the set { "n" (noun), "v"(verb), "a"(adj), "r"(adverb), "-"(other) } as a String.

tagInline()   Returns a String with pos-tags notated inline in the format:
    "The/dt doctor/nn treated/vbd dogs/nns"


tagWordForWordNet()   Tags a single word with a part-of-speech from the Penn tagset, then returns the corresponding part-of-speech for WordNet from the set { "n" (noun), "v"(verb), "a"(adj), "r"(adverb), "-"(other) } as a String.

RiPosTagger.isAdjective()   Returns true if pos is an adjective

RiPosTagger.isAdverb()   Returns true if pos is an adverb

RiPosTagger.isNoun()   Returns true if partOfSpeech is a noun

RiPosTagger.isVerb()   Returns true if pos is a verb

RiPosTagger.parseTagString()   Takes a String of words and tags in the format:
     The/dt doctor/nn treated/vbd dogs/nns
returns an array of the part-of-speech tags.

RiPosTagger.toWordNet()   Converts a part-of-speech String from the Penn tagset to the corresponding part-of-speech for WordNet from the set { "n" (noun), "v"(verb), "a"(adj), "r"(adverb), "-"(other) } as a String. If the pos is not found in the penn set, it is returned unchanged.

Usage Web & Application