rita.support.me
Class MaxEntChunker

java.lang.Object
  extended by rita.RiObject
      extended by rita.support.remote.RiRemotable
          extended by rita.support.me.RiObjectME
              extended by rita.support.me.MaxEntChunker
All Implemented Interfaces:
processing.core.PConstants, RiChunkerIF, RemoteConstants, RiConstants

public class MaxEntChunker
extends RiObjectME
implements RiChunkerIF

Simple chunker that finds non-recursive syntactic 'chunks' such as noun-phrases, using the Penn conventions (shown below).


Primarily just a wrapper for the OpenNLP(http://opennlp.sourceforge.net) chunker with some minor modifications/simplifications.

For more info see: Berger & Della Pietra's paper: 'A Maximum Entropy Approach to Natural Language Processing', which provides a good introduction to the maxent framework.


Field Summary
static java.lang.String ADJ_PHRASE
           
static java.lang.String ADV_PHRASE
           
static java.lang.String IND_PHRASE
           
static java.lang.String NOUN_PHRASE
           
static java.lang.String PREP_PHRASE
           
static java.lang.String PRT_PHRASE
           
static java.lang.String SBAR_PHRASE
           
static java.lang.String VERB_PHRASE
           
 
Fields inherited from class rita.support.me.RiObjectME
ERROR_MSG, LOAD_FROM_MODEL_DIR
 
Fields inherited from interface rita.support.remote.RemoteConstants
ARG_DELIM, ARR_DELIM, CHUNKER, DELIM, FS, LB, LP, MARKOV, PARSER, QQ, RB, RP, SPC, TAGGER, TYPE_DELIM
 
Fields inherited from interface rita.support.RiConstants
BEHAVIOR_COMPLETED, BOUNDING_BOX_ALPHA, BRILL_POS_TAGGER, EASE_IN, EASE_IN_CUBIC, EASE_IN_EXPO, EASE_IN_OUT, EASE_IN_OUT_CUBIC, EASE_IN_OUT_EXPO, EASE_IN_OUT_QUARTIC, EASE_IN_OUT_SINE, EASE_IN_QUARTIC, EASE_IN_SINE, EASE_OUT, EASE_OUT_CUBIC, EASE_OUT_EXPO, EASE_OUT_QUARTIC, EASE_OUT_SINE, ESS, FADE_COLOR, FADE_IN, FADE_OUT, FADE_TO_TEXT, FIRST_PERSON, FUTURE_TENSE, ID, LERP, LINEAR, MAXENT_POS_TAGGER, MINIM, MOVE, MUTABLE, PAST_TENSE, PHONEME_BOUNDARY, PHONEMES, PLING_STEMMER, PLURAL, PORTER_STEMMER, POS, PRESENT_TENSE, SCALE_TO, SECOND_PERSON, SENTENCE_BOUNDARY, SINGULAR, SONIA, SPEECH_COMPLETED, STRESSES, SYLLABLE_BOUNDARY, SYLLABLES, TEXT, TEXT_ENTERED, THIRD_PERSON, TIMER, TIMER_COMPLETED, TIMER_TICK, TOKENS, UNKNOWN, WORD_BOUNDARY
 
Fields inherited from interface processing.core.PConstants
A, AB, ADD, AG, ALPHA, ALPHA_MASK, ALT, AMBIENT, AR, ARC, ARGB, ARROW, B, BACKSPACE, BASELINE, BEEN_LIT, BEVEL, BLEND, BLUE_MASK, BLUR, BOTTOM, BOX, BURN, CENTER, CENTER_DIAMETER, CENTER_RADIUS, CHATTER, CLOSE, CMYK, CODED, COMPLAINT, CONTROL, CORNER, CORNERS, CROSS, CUSTOM, DA, DARKEST, DB, DEG_TO_RAD, DELETE, DG, DIAMETER, DIFFERENCE, DILATE, DIRECTIONAL, DISABLE_ACCURATE_TEXTURES, DISABLE_DEPTH_SORT, DISABLE_DEPTH_TEST, DISABLE_OPENGL_2X_SMOOTH, DISABLE_OPENGL_ERROR_REPORT, DODGE, DOWN, DR, DXF, EB, EDGE, EG, ELLIPSE, ENABLE_ACCURATE_TEXTURES, ENABLE_DEPTH_SORT, ENABLE_DEPTH_TEST, ENABLE_NATIVE_FONTS, ENABLE_OPENGL_2X_SMOOTH, ENABLE_OPENGL_4X_SMOOTH, ENABLE_OPENGL_ERROR_REPORT, ENTER, EPSILON, ER, ERODE, ERROR_BACKGROUND_IMAGE_FORMAT, ERROR_BACKGROUND_IMAGE_SIZE, ERROR_PUSHMATRIX_OVERFLOW, ERROR_PUSHMATRIX_UNDERFLOW, ERROR_TEXTFONT_NULL_PFONT, ESC, EXCLUSION, G, GIF, GRAY, GREEN_MASK, HALF_PI, HAND, HARD_LIGHT, HINT_COUNT, HSB, IMAGE, INVERT, JAVA2D, JPEG, LEFT, LIGHTEST, LINE, LINES, LINUX, MACOSX, MAX_FLOAT, MAX_INT, MIN_FLOAT, MIN_INT, MITER, MODEL, MULTIPLY, NORMAL, NORMALIZED, NX, NY, NZ, OPAQUE, OPEN, OPENGL, ORTHOGRAPHIC, OTHER, OVERLAY, P2D, P3D, PATH, PDF, PERSPECTIVE, PI, platformNames, POINT, POINTS, POLYGON, POSTERIZE, PROBLEM, PROJECT, QUAD, QUAD_STRIP, QUADS, QUARTER_PI, R, RAD_TO_DEG, RADIUS, RECT, RED_MASK, REPLACE, RETURN, RGB, RIGHT, ROUND, SA, SB, SCREEN, SG, SHAPE, SHIFT, SHINE, SOFT_LIGHT, SPB, SPG, SPHERE, SPOT, SPR, SQUARE, SR, SUBTRACT, SW, TAB, TARGA, THIRD_PI, THRESHOLD, TIFF, TOP, TRIANGLE, TRIANGLE_FAN, TRIANGLE_STRIP, TRIANGLES, TWO_PI, TX, TY, TZ, U, UP, V, VERTEX_FIELD_COUNT, VW, VX, VY, VZ, WAIT, WHITESPACE, WINDOWS, X, Y, Z
 
Constructor Summary
MaxEntChunker()
           
MaxEntChunker(processing.core.PApplet p)
           
 
Method Summary
 java.lang.String chunk(java.util.List words, java.util.List postags)
          Returns a String of chunks inline
 java.lang.String chunk(java.lang.String[] words, java.lang.String[] tags)
          Returns a String of chunks inline
static MaxEntChunker createRemote(java.util.Map params)
           
 void destroy()
           
 java.lang.String[] getAdjPhrases()
           
 java.lang.String[] getAdvPhrases()
           
 java.lang.String[] getChunkData()
           
static MaxEntChunker getInstance()
           
static MaxEntChunker getInstance(processing.core.PApplet p)
           
 java.lang.String[] getNounPhrases()
           
 java.lang.String[] getPrepPhrases()
           
 java.lang.String[] getVerbPhrases()
           
static void main(java.lang.String[] args)
           
 java.lang.String tagAndChunk(java.lang.String sentence)
          Utility method that uses the default word tokenizer & pos-tagger to prepare a sentence for chunking, then returns the sentence String w' chunk-data inline
 
Methods inherited from class rita.support.me.RiObjectME
getModelDir, setModelDir
 
Methods inherited from class rita.RiObject
dispose, getId, getPApplet, nextId
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

NOUN_PHRASE

public static final java.lang.String NOUN_PHRASE
See Also:
Constant Field Values

VERB_PHRASE

public static final java.lang.String VERB_PHRASE
See Also:
Constant Field Values

PREP_PHRASE

public static final java.lang.String PREP_PHRASE
See Also:
Constant Field Values

SBAR_PHRASE

public static final java.lang.String SBAR_PHRASE
See Also:
Constant Field Values

ADJ_PHRASE

public static final java.lang.String ADJ_PHRASE
See Also:
Constant Field Values

ADV_PHRASE

public static final java.lang.String ADV_PHRASE
See Also:
Constant Field Values

PRT_PHRASE

public static final java.lang.String PRT_PHRASE
See Also:
Constant Field Values

IND_PHRASE

public static final java.lang.String IND_PHRASE
See Also:
Constant Field Values
Constructor Detail

MaxEntChunker

public MaxEntChunker()

MaxEntChunker

public MaxEntChunker(processing.core.PApplet p)
Method Detail

getInstance

public static MaxEntChunker getInstance()

getInstance

public static MaxEntChunker getInstance(processing.core.PApplet p)

createRemote

public static MaxEntChunker createRemote(java.util.Map params)

chunk

public java.lang.String chunk(java.util.List words,
                              java.util.List postags)
Description copied from interface: RiChunkerIF
Returns a String of chunks inline

Specified by:
chunk in interface RiChunkerIF

getNounPhrases

public java.lang.String[] getNounPhrases()
Specified by:
getNounPhrases in interface RiChunkerIF

getVerbPhrases

public java.lang.String[] getVerbPhrases()
Specified by:
getVerbPhrases in interface RiChunkerIF

getPrepPhrases

public java.lang.String[] getPrepPhrases()
Specified by:
getPrepPhrases in interface RiChunkerIF

getAdjPhrases

public java.lang.String[] getAdjPhrases()
Specified by:
getAdjPhrases in interface RiChunkerIF

getAdvPhrases

public java.lang.String[] getAdvPhrases()
Specified by:
getAdvPhrases in interface RiChunkerIF

chunk

public java.lang.String chunk(java.lang.String[] words,
                              java.lang.String[] tags)
Description copied from interface: RiChunkerIF
Returns a String of chunks inline

Specified by:
chunk in interface RiChunkerIF

tagAndChunk

public java.lang.String tagAndChunk(java.lang.String sentence)
Utility method that uses the default word tokenizer & pos-tagger to prepare a sentence for chunking, then returns the sentence String w' chunk-data inline

Specified by:
tagAndChunk in interface RiChunkerIF
Parameters:
sentence -

getChunkData

public java.lang.String[] getChunkData()
Specified by:
getChunkData in interface RiChunkerIF

destroy

public void destroy()
Specified by:
destroy in class RiRemotable

main

public static void main(java.lang.String[] args)