rita
Class RiPosTagger

java.lang.Object
  extended by rita.RiObject
      extended by rita.RiPosTagger
All Implemented Interfaces:
processing.core.PConstants, RiConstants

public class RiPosTagger
extends RiObject

Simple pos-tagger for the RiTa libary using the Penn tagset. Use RiPosTagger.setDefaultTagger(type); to specify a (faster/lighter) transformation-based tagger, or the (usually more accurate) maximum-entryopy tagger

    RiPosTagger tagger = new RiPosTagger(this);
    String s = "The teenage boy, stricken with fear, cried sadly like a little baby";

    String[] words = RiTa.tokenize(s);
    String[] tags = tagger.tag(words);

    for (int i = 0; i < sents.length; i++) 
    {
        System.out.println(sents[i]);
    }   
    //    OR     
    System.out.println(tagger.tagInline(s));
 
The full Penn part-of-speech tag set: Note: to use maximum-entry tagger, you must first download the rita statistical models (rita.me.models.zip) package and unpack the zip into the "rita" directory in your processing sketchbook, or the data directory for your sketch.

You can also specify an alternative directory (an absolute path) for the models via RiTa.setModelDir();

 Then call RiPosTagger.setDefaultTagger(RiPosTagger.MAXENT_POS_TAGGER);


Field Summary
 
Fields inherited from interface rita.support.RiConstants
BEHAVIOR_COMPLETED, BOUNDING_BOX_ALPHA, BRILL_POS_TAGGER, EASE_IN, EASE_IN_CUBIC, EASE_IN_EXPO, EASE_IN_OUT, EASE_IN_OUT_CUBIC, EASE_IN_OUT_EXPO, EASE_IN_OUT_QUARTIC, EASE_IN_OUT_SINE, EASE_IN_QUARTIC, EASE_IN_SINE, EASE_OUT, EASE_OUT_CUBIC, EASE_OUT_EXPO, EASE_OUT_QUARTIC, EASE_OUT_SINE, ESS, FADE_COLOR, FADE_IN, FADE_OUT, FADE_TO_TEXT, FIRST_PERSON, FUTURE_TENSE, ID, LERP, LINEAR, MAXENT_POS_TAGGER, MINIM, MOVE, MUTABLE, PAST_TENSE, PHONEME_BOUNDARY, PHONEMES, PLING_STEMMER, PLURAL, PORTER_STEMMER, POS, PRESENT_TENSE, SCALE_TO, SECOND_PERSON, SENTENCE_BOUNDARY, SINGULAR, SONIA, SPEECH_COMPLETED, STRESSES, SYLLABLE_BOUNDARY, SYLLABLES, TEXT, TEXT_ENTERED, THIRD_PERSON, TIMER, TIMER_COMPLETED, TIMER_TICK, TOKENS, UNKNOWN, WORD_BOUNDARY
 
Fields inherited from interface processing.core.PConstants
A, AB, ADD, AG, ALPHA, ALPHA_MASK, ALT, AMBIENT, AR, ARC, ARGB, ARROW, B, BACKSPACE, BASELINE, BEEN_LIT, BEVEL, BLEND, BLUE_MASK, BLUR, BOTTOM, BOX, BURN, CENTER, CENTER_DIAMETER, CENTER_RADIUS, CHATTER, CLOSE, CMYK, CODED, COMPLAINT, CONTROL, CORNER, CORNERS, CROSS, CUSTOM, DA, DARKEST, DB, DEG_TO_RAD, DELETE, DG, DIAMETER, DIFFERENCE, DILATE, DIRECTIONAL, DISABLE_ACCURATE_TEXTURES, DISABLE_DEPTH_SORT, DISABLE_DEPTH_TEST, DISABLE_OPENGL_2X_SMOOTH, DISABLE_OPENGL_ERROR_REPORT, DODGE, DOWN, DR, DXF, EB, EDGE, EG, ELLIPSE, ENABLE_ACCURATE_TEXTURES, ENABLE_DEPTH_SORT, ENABLE_DEPTH_TEST, ENABLE_NATIVE_FONTS, ENABLE_OPENGL_2X_SMOOTH, ENABLE_OPENGL_4X_SMOOTH, ENABLE_OPENGL_ERROR_REPORT, ENTER, EPSILON, ER, ERODE, ERROR_BACKGROUND_IMAGE_FORMAT, ERROR_BACKGROUND_IMAGE_SIZE, ERROR_PUSHMATRIX_OVERFLOW, ERROR_PUSHMATRIX_UNDERFLOW, ERROR_TEXTFONT_NULL_PFONT, ESC, EXCLUSION, G, GIF, GRAY, GREEN_MASK, HALF_PI, HAND, HARD_LIGHT, HINT_COUNT, HSB, IMAGE, INVERT, JAVA2D, JPEG, LEFT, LIGHTEST, LINE, LINES, LINUX, MACOSX, MAX_FLOAT, MAX_INT, MIN_FLOAT, MIN_INT, MITER, MODEL, MULTIPLY, NORMAL, NORMALIZED, NX, NY, NZ, OPAQUE, OPEN, OPENGL, ORTHOGRAPHIC, OTHER, OVERLAY, P2D, P3D, PATH, PDF, PERSPECTIVE, PI, platformNames, POINT, POINTS, POLYGON, POSTERIZE, PROBLEM, PROJECT, QUAD, QUAD_STRIP, QUADS, QUARTER_PI, R, RAD_TO_DEG, RADIUS, RECT, RED_MASK, REPLACE, RETURN, RGB, RIGHT, ROUND, SA, SB, SCREEN, SG, SHAPE, SHIFT, SHINE, SOFT_LIGHT, SPB, SPG, SPHERE, SPOT, SPR, SQUARE, SR, SUBTRACT, SW, TAB, TARGA, THIRD_PI, THRESHOLD, TIFF, TOP, TRIANGLE, TRIANGLE_FAN, TRIANGLE_STRIP, TRIANGLES, TWO_PI, TX, TY, TZ, U, UP, V, VERTEX_FIELD_COUNT, VW, VX, VY, VZ, WAIT, WHITESPACE, WINDOWS, X, Y, Z
 
Constructor Summary
RiPosTagger()
          Deprecated.  
RiPosTagger(processing.core.PApplet pApplet)
           
RiPosTagger(processing.core.PApplet p, int taggerType)
           
 
Method Summary
static RiPosTagger getInstance()
           
static RiPosTagger getInstance(processing.core.PApplet p)
           
static java.lang.String inlineTags(java.lang.String[] tokenArray, java.lang.String[] tagArray)
          Takes an array of words and of tags and returns a combined String of the form:
static java.lang.String inlineTags(java.lang.String[] tokenArray, java.lang.String[] tagArray, java.lang.String delimiter)
          Takes an array of words and of tags and returns a combined String of the form:
static boolean isAdjective(java.lang.String pos)
          Returns true if pos is an adjective
static boolean isAdverb(java.lang.String pos)
          Returns true if pos is an adverb
static boolean isNoun(java.lang.String partOfSpeech)
          Returns true if partOfSpeech is a noun
static boolean isVerb(java.lang.String pos)
          Returns true if pos is a verb
static void main(java.lang.String[] args)
           
static java.lang.String[] parseTagString(java.lang.String wordsAndTags)
          Takes a String of words and tags in the format:
static void setDefaultTagger(int taggerType)
          Sets the default tagger type for the application
 java.lang.String[] tag(FeaturedIF[] tokenArray)
          Tags each token with the appropriate POS (as a feature), then returns a String array of the assigned tags.
 java.lang.String[] tag(java.lang.String[] tokenArray)
          Returns a String array of the most probably tags
 java.lang.String[] tagFile(java.lang.String fileName)
          Loads a file, splits the input into sentences and returns a single String[] with all the pos-tags from the text.
 java.lang.String[] tagForWordNet(FeaturedIF[] tokenArray)
          Tags the array of words (as usual) with a part-of-speech from the Penn tagset, then returns the corresponding part-of-speech for WordNet from the set { "n" (noun), "v"(verb), "a"(adj), "r"(adverb), "-"(other) } as a String.
 java.lang.String[] tagForWordNet(java.lang.String[] tokenArray)
          Tags the array of words (as usual) with a part-of-speech from the Penn tagset, then returns the corresponding part-of-speech for WordNet from the set { "n" (noun), "v"(verb), "a"(adj), "r"(adverb), "-"(other) } as a String.
static boolean taggerExists()
          Returns true if the tagger has been already created
 java.lang.String tagInline(java.lang.String sentence)
          Tokenizes the input sentence using the defaultTokenizer and returns a String with pos-tags notated inline
 java.lang.String tagInline(java.lang.String[] tokens)
          Returns a String with pos-tags notated inline in the format:
 java.lang.String tagWordForWordNet(java.lang.String word)
          Tags a single word with a part-of-speech from the Penn tagset, then returns the corresponding part-of-speech for WordNet from the set { "n" (noun), "v"(verb), "a"(adj), "r"(adverb), "-"(other) } as a String.
static java.lang.String toWordNet(java.lang.String pos)
          Converts a part-of-speech String from the Penn tagset to the corresponding part-of-speech for WordNet from the set { "n" (noun), "v"(verb), "a"(adj), "r"(adverb), "-"(other) } as a String.
 
Methods inherited from class rita.RiObject
dispose, getId, getPApplet, nextId
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

RiPosTagger

public RiPosTagger()
Deprecated. 

Invisible:

RiPosTagger

public RiPosTagger(processing.core.PApplet pApplet)

RiPosTagger

public RiPosTagger(processing.core.PApplet p,
                   int taggerType)
Method Detail

getInstance

public static RiPosTagger getInstance()
Invisible:

getInstance

public static RiPosTagger getInstance(processing.core.PApplet p)
Invisible:

inlineTags

public static java.lang.String inlineTags(java.lang.String[] tokenArray,
                                          java.lang.String[] tagArray,
                                          java.lang.String delimiter)
Takes an array of words and of tags and returns a combined String of the form:
"The/dt doctor/nn treated/vbd dogs/nns"
assuming a "/" as delimiter.

Invisible:

tagForWordNet

public java.lang.String[] tagForWordNet(java.lang.String[] tokenArray)
Tags the array of words (as usual) with a part-of-speech from the Penn tagset, then returns the corresponding part-of-speech for WordNet from the set { "n" (noun), "v"(verb), "a"(adj), "r"(adverb), "-"(other) } as a String.

See Also:
tag(rita.support.FeaturedIF[])

tagForWordNet

public java.lang.String[] tagForWordNet(FeaturedIF[] tokenArray)
Tags the array of words (as usual) with a part-of-speech from the Penn tagset, then returns the corresponding part-of-speech for WordNet from the set { "n" (noun), "v"(verb), "a"(adj), "r"(adverb), "-"(other) } as a String.

See Also:
tagForWordNet(String[])
Invisible:

tagWordForWordNet

public java.lang.String tagWordForWordNet(java.lang.String word)
Tags a single word with a part-of-speech from the Penn tagset, then returns the corresponding part-of-speech for WordNet from the set { "n" (noun), "v"(verb), "a"(adj), "r"(adverb), "-"(other) } as a String.

See Also:
tagForWordNet(String[]), tag(String[])

toWordNet

public static java.lang.String toWordNet(java.lang.String pos)
Converts a part-of-speech String from the Penn tagset to the corresponding part-of-speech for WordNet from the set { "n" (noun), "v"(verb), "a"(adj), "r"(adverb), "-"(other) } as a String. If the pos is not found in the penn set, it is returned unchanged.

See Also:
tag(String[])

inlineTags

public static java.lang.String inlineTags(java.lang.String[] tokenArray,
                                          java.lang.String[] tagArray)
Takes an array of words and of tags and returns a combined String of the form:
    "The/dt doctor/nn treated/vbd dogs/nns"

Invisible:

parseTagString

public static java.lang.String[] parseTagString(java.lang.String wordsAndTags)
Takes a String of words and tags in the format:
     The/dt doctor/nn treated/vbd dogs/nns
returns an array of the part-of-speech tags.

Parameters:
wordsAndTags -

isVerb

public static boolean isVerb(java.lang.String pos)
Returns true if pos is a verb


isNoun

public static boolean isNoun(java.lang.String partOfSpeech)
Returns true if partOfSpeech is a noun


isAdverb

public static boolean isAdverb(java.lang.String pos)
Returns true if pos is an adverb


isAdjective

public static boolean isAdjective(java.lang.String pos)
Returns true if pos is an adjective


tag

public java.lang.String[] tag(FeaturedIF[] tokenArray)
Tags each token with the appropriate POS (as a feature), then returns a String array of the assigned tags.


tag

public java.lang.String[] tag(java.lang.String[] tokenArray)
Returns a String array of the most probably tags


tagInline

public java.lang.String tagInline(java.lang.String[] tokens)
Returns a String with pos-tags notated inline in the format:
    "The/dt doctor/nn treated/vbd dogs/nns"


setDefaultTagger

public static void setDefaultTagger(int taggerType)
Sets the default tagger type for the application

Invisible:

taggerExists

public static boolean taggerExists()
Returns true if the tagger has been already created

Invisible:

tagInline

public java.lang.String tagInline(java.lang.String sentence)
Tokenizes the input sentence using the defaultTokenizer and returns a String with pos-tags notated inline


tagFile

public java.lang.String[] tagFile(java.lang.String fileName)
Loads a file, splits the input into sentences and returns a single String[] with all the pos-tags from the text.


main

public static void main(java.lang.String[] args)