rita.support.me
Class MaxEntTagger

java.lang.Object
  extended by rita.RiObject
      extended by rita.support.remote.RiRemotable
          extended by rita.support.me.RiObjectME
              extended by rita.support.me.MaxEntTagger
All Implemented Interfaces:
processing.core.PConstants, RiTaggerIF, RemoteConstants, RiConstants

public class MaxEntTagger
extends RiObjectME
implements RiTaggerIF

Simple pos-tagger for the RiTa libary using the Penn tagset

Based closely on the OpenNLP maximum entropy tagger.

For more info see: Berger & Della Pietra's paper 'A Maximum Entropy Approach to Natural Language Processing', which provides a good introduction to the maxent framework.

The full Penn tag set follows:

  1. CC Coordinating conjunction
  2. CD Cardinal number
  3. DT Determiner
  4. EX Existential there
  5. FW Foreign word
  6. IN Preposition/subord. conjunction
  7. JJ Adjective
  8. JJR Adjective, comparative
  9. JJS Adjective, superlative
  10. LS List item marker
  11. MD Modal
  12. NN Noun, singular or mass
  13. NNS Noun, plural
  14. NNP Proper noun, singular
  15. NNPS Proper noun, plural
  16. PDT Predeterminer
  17. POS Possessive ending
  18. PRP Personal pronoun
  19. PRP$ Possessive pronoun
  20. RB Adverb
  21. RBR Adverb, comparative
  22. RBS Adverb, superlative
  23. RP Particle
  24. SYM Symbol (mathematical or scientific)
  25. TO to
  26. UH Interjection
  27. VB Verb, base form
  28. VBD Verb, past tense
  29. VBG Verb, gerund/present participle
  30. VBN Verb, past participle
  31. VBP Verb, non-3rd ps. sing. present
  32. VBZ Verb, 3rd ps. sing. present
  33. WDT wh-determiner
  34. WP wh-pronoun
  35. WP$ Possessive wh-pronoun
  36. WRB wh-adverb
  37. # Pound sign
  38. $ Dollar sign
  39. . Sentence-final punctuation
  40. , Comma
  41. : Colon, semi-colon
  42. ( Left bracket character
  43. ) Right bracket character
  44. " Straight double quote
  45. ` Left open single quote
  46. " Left open double quote
  47. ' Right close single quote
  48. " Right close double quote
  49. - Right close double quote


Field Summary
 
Fields inherited from class rita.support.me.RiObjectME
ERROR_MSG, LOAD_FROM_MODEL_DIR
 
Fields inherited from interface rita.support.remote.RemoteConstants
ARG_DELIM, ARR_DELIM, CHUNKER, DELIM, FS, LB, LP, MARKOV, PARSER, QQ, RB, RP, SPC, TAGGER, TYPE_DELIM
 
Fields inherited from interface rita.support.RiConstants
BEHAVIOR_COMPLETED, BOUNDING_BOX_ALPHA, BRILL_POS_TAGGER, EASE_IN, EASE_IN_CUBIC, EASE_IN_EXPO, EASE_IN_OUT, EASE_IN_OUT_CUBIC, EASE_IN_OUT_EXPO, EASE_IN_OUT_QUARTIC, EASE_IN_OUT_SINE, EASE_IN_QUARTIC, EASE_IN_SINE, EASE_OUT, EASE_OUT_CUBIC, EASE_OUT_EXPO, EASE_OUT_QUARTIC, EASE_OUT_SINE, ESS, FADE_COLOR, FADE_IN, FADE_OUT, FADE_TO_TEXT, FIRST_PERSON, FUTURE_TENSE, ID, LERP, LINEAR, MAXENT_POS_TAGGER, MINIM, MOVE, MUTABLE, PAST_TENSE, PHONEME_BOUNDARY, PHONEMES, PLING_STEMMER, PLURAL, PORTER_STEMMER, POS, PRESENT_TENSE, SCALE_TO, SECOND_PERSON, SENTENCE_BOUNDARY, SINGULAR, SONIA, SPEECH_COMPLETED, STRESSES, SYLLABLE_BOUNDARY, SYLLABLES, TEXT, TEXT_ENTERED, THIRD_PERSON, TIMER, TIMER_COMPLETED, TIMER_TICK, TOKENS, UNKNOWN, WORD_BOUNDARY
 
Fields inherited from interface processing.core.PConstants
A, AB, ADD, AG, ALPHA, ALPHA_MASK, ALT, AMBIENT, AR, ARC, ARGB, ARROW, B, BACKSPACE, BASELINE, BEEN_LIT, BEVEL, BLEND, BLUE_MASK, BLUR, BOTTOM, BOX, BURN, CENTER, CENTER_DIAMETER, CENTER_RADIUS, CHATTER, CLOSE, CMYK, CODED, COMPLAINT, CONTROL, CORNER, CORNERS, CROSS, CUSTOM, DA, DARKEST, DB, DEG_TO_RAD, DELETE, DG, DIAMETER, DIFFERENCE, DILATE, DIRECTIONAL, DISABLE_ACCURATE_TEXTURES, DISABLE_DEPTH_SORT, DISABLE_DEPTH_TEST, DISABLE_OPENGL_2X_SMOOTH, DISABLE_OPENGL_ERROR_REPORT, DODGE, DOWN, DR, DXF, EB, EDGE, EG, ELLIPSE, ENABLE_ACCURATE_TEXTURES, ENABLE_DEPTH_SORT, ENABLE_DEPTH_TEST, ENABLE_NATIVE_FONTS, ENABLE_OPENGL_2X_SMOOTH, ENABLE_OPENGL_4X_SMOOTH, ENABLE_OPENGL_ERROR_REPORT, ENTER, EPSILON, ER, ERODE, ERROR_BACKGROUND_IMAGE_FORMAT, ERROR_BACKGROUND_IMAGE_SIZE, ERROR_PUSHMATRIX_OVERFLOW, ERROR_PUSHMATRIX_UNDERFLOW, ERROR_TEXTFONT_NULL_PFONT, ESC, EXCLUSION, G, GIF, GRAY, GREEN_MASK, HALF_PI, HAND, HARD_LIGHT, HINT_COUNT, HSB, IMAGE, INVERT, JAVA2D, JPEG, LEFT, LIGHTEST, LINE, LINES, LINUX, MACOSX, MAX_FLOAT, MAX_INT, MIN_FLOAT, MIN_INT, MITER, MODEL, MULTIPLY, NORMAL, NORMALIZED, NX, NY, NZ, OPAQUE, OPEN, OPENGL, ORTHOGRAPHIC, OTHER, OVERLAY, P2D, P3D, PATH, PDF, PERSPECTIVE, PI, platformNames, POINT, POINTS, POLYGON, POSTERIZE, PROBLEM, PROJECT, QUAD, QUAD_STRIP, QUADS, QUARTER_PI, R, RAD_TO_DEG, RADIUS, RECT, RED_MASK, REPLACE, RETURN, RGB, RIGHT, ROUND, SA, SB, SCREEN, SG, SHAPE, SHIFT, SHINE, SOFT_LIGHT, SPB, SPG, SPHERE, SPOT, SPR, SQUARE, SR, SUBTRACT, SW, TAB, TARGA, THIRD_PI, THRESHOLD, TIFF, TOP, TRIANGLE, TRIANGLE_FAN, TRIANGLE_STRIP, TRIANGLES, TWO_PI, TX, TY, TZ, U, UP, V, VERTEX_FIELD_COUNT, VW, VX, VY, VZ, WAIT, WHITESPACE, WINDOWS, X, Y, Z
 
Constructor Summary
MaxEntTagger(processing.core.PApplet p)
           
 
Method Summary
static RiRemotable createRemote(java.util.Map params)
           
 void destroy()
           
static MaxEntTagger getInstance()
           
static MaxEntTagger getInstance(processing.core.PApplet p)
           
 boolean isAdjective(java.lang.String pos)
          Returns true if word is an adjective.
 boolean isAdverb(java.lang.String pos)
          Returns true if word is an adverb.
 boolean isNoun(java.lang.String pos)
          Returns true if word is a noun.
 boolean isVerb(java.lang.String pos)
          Returns true if word is a verb.
static void main(java.lang.String[] args)
           
 java.util.List tag(java.util.List tokens)
           
 java.lang.String tag(java.lang.String sentence)
           
 java.lang.String[] tag(java.lang.String[] tokens)
          Returns a String array of the most probably tags
 java.lang.String[] tagFile(java.lang.String fileName)
          Loads a file, splits the input into sentences and returns a String[] of the most probably tags.
 java.lang.String tagInline(java.lang.String toTag)
          Returns a String with pos-tags notated inline
 java.lang.String tagInline(java.lang.String[] tokens)
          Returns a String with pos-tags notated inline
 
Methods inherited from class rita.support.me.RiObjectME
getModelDir, setModelDir
 
Methods inherited from class rita.RiObject
dispose, getId, getPApplet, nextId
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MaxEntTagger

public MaxEntTagger(processing.core.PApplet p)
Method Detail

getInstance

public static MaxEntTagger getInstance()

getInstance

public static MaxEntTagger getInstance(processing.core.PApplet p)

createRemote

public static RiRemotable createRemote(java.util.Map params)

tag

public java.util.List tag(java.util.List tokens)

tag

public java.lang.String tag(java.lang.String sentence)

tag

public java.lang.String[] tag(java.lang.String[] tokens)
Description copied from interface: RiTaggerIF
Returns a String array of the most probably tags

Specified by:
tag in interface RiTaggerIF

tagInline

public java.lang.String tagInline(java.lang.String[] tokens)
Description copied from interface: RiTaggerIF
Returns a String with pos-tags notated inline

Specified by:
tagInline in interface RiTaggerIF

tagInline

public java.lang.String tagInline(java.lang.String toTag)
Description copied from interface: RiTaggerIF
Returns a String with pos-tags notated inline

Specified by:
tagInline in interface RiTaggerIF

destroy

public void destroy()
Specified by:
destroy in class RiRemotable

isVerb

public boolean isVerb(java.lang.String pos)
Description copied from interface: RiTaggerIF
Returns true if word is a verb.

Specified by:
isVerb in interface RiTaggerIF

isNoun

public boolean isNoun(java.lang.String pos)
Description copied from interface: RiTaggerIF
Returns true if word is a noun.

Specified by:
isNoun in interface RiTaggerIF

isAdverb

public boolean isAdverb(java.lang.String pos)
Description copied from interface: RiTaggerIF
Returns true if word is an adverb.

Specified by:
isAdverb in interface RiTaggerIF

isAdjective

public boolean isAdjective(java.lang.String pos)
Description copied from interface: RiTaggerIF
Returns true if word is an adjective.

Specified by:
isAdjective in interface RiTaggerIF

tagFile

public java.lang.String[] tagFile(java.lang.String fileName)
Description copied from interface: RiTaggerIF
Loads a file, splits the input into sentences and returns a String[] of the most probably tags.

Specified by:
tagFile in interface RiTaggerIF

main

public static void main(java.lang.String[] args)