rita
Class RiChunker

java.lang.Object
  extended by rita.RiObject
      extended by rita.RiChunker
All Implemented Interfaces:
processing.core.PConstants, RiChunkerIF, RiConstants

public class RiChunker
extends RiObject
implements RiChunkerIF

A simple and lightweight implementation of a phrase chunker for non-recursive syntactic elements (e.g., noun-phrases, verb-phrases, etc.) using the Penn conventions (shown below).

    String sent = "The boy ran over dog";  
    RiChunker chunker = new RiChunker(this);    
    String chunks = chunker.chunk(sent);
The full tag set follows:
Note: to use this object, you must first download the rita statistical models (rita.me.models.zip) and unpack them into the 'rita' directory in your processing sketchbook, or into the data directory for you sketch. You can also specify an alternative directory (an absolute path) for the models via RiTa.setModelDir();

Based closely on the OpenNLP maximum entropy chunker.

For more info see: Berger & Della Pietra's paper 'A Maximum
Entropy Approach to Natural Language Processing', which
provides a good introduction to the maxent framework.

See Also:
RiTaServer
Invisible:

Field Summary
 
Fields inherited from interface rita.support.RiConstants
BEHAVIOR_COMPLETED, BOUNDING_BOX_ALPHA, BRILL_POS_TAGGER, EASE_IN, EASE_IN_CUBIC, EASE_IN_EXPO, EASE_IN_OUT, EASE_IN_OUT_CUBIC, EASE_IN_OUT_EXPO, EASE_IN_OUT_QUARTIC, EASE_IN_OUT_SINE, EASE_IN_QUARTIC, EASE_IN_SINE, EASE_OUT, EASE_OUT_CUBIC, EASE_OUT_EXPO, EASE_OUT_QUARTIC, EASE_OUT_SINE, ESS, FADE_COLOR, FADE_IN, FADE_OUT, FADE_TO_TEXT, FIRST_PERSON, FUTURE_TENSE, ID, LERP, LINEAR, MAXENT_POS_TAGGER, MINIM, MOVE, MUTABLE, PAST_TENSE, PHONEME_BOUNDARY, PHONEMES, PLING_STEMMER, PLURAL, PORTER_STEMMER, POS, PRESENT_TENSE, SCALE_TO, SECOND_PERSON, SENTENCE_BOUNDARY, SINGULAR, SONIA, SPEECH_COMPLETED, STRESSES, SYLLABLE_BOUNDARY, SYLLABLES, TEXT, TEXT_ENTERED, THIRD_PERSON, TIMER, TIMER_COMPLETED, TIMER_TICK, TOKENS, UNKNOWN, WORD_BOUNDARY
 
Fields inherited from interface processing.core.PConstants
A, AB, ADD, AG, ALPHA, ALPHA_MASK, ALT, AMBIENT, AR, ARC, ARGB, ARROW, B, BACKSPACE, BASELINE, BEEN_LIT, BEVEL, BLEND, BLUE_MASK, BLUR, BOTTOM, BOX, BURN, CENTER, CENTER_DIAMETER, CENTER_RADIUS, CHATTER, CLOSE, CMYK, CODED, COMPLAINT, CONTROL, CORNER, CORNERS, CROSS, CUSTOM, DA, DARKEST, DB, DEG_TO_RAD, DELETE, DG, DIAMETER, DIFFERENCE, DILATE, DIRECTIONAL, DISABLE_ACCURATE_TEXTURES, DISABLE_DEPTH_SORT, DISABLE_DEPTH_TEST, DISABLE_OPENGL_2X_SMOOTH, DISABLE_OPENGL_ERROR_REPORT, DODGE, DOWN, DR, DXF, EB, EDGE, EG, ELLIPSE, ENABLE_ACCURATE_TEXTURES, ENABLE_DEPTH_SORT, ENABLE_DEPTH_TEST, ENABLE_NATIVE_FONTS, ENABLE_OPENGL_2X_SMOOTH, ENABLE_OPENGL_4X_SMOOTH, ENABLE_OPENGL_ERROR_REPORT, ENTER, EPSILON, ER, ERODE, ERROR_BACKGROUND_IMAGE_FORMAT, ERROR_BACKGROUND_IMAGE_SIZE, ERROR_PUSHMATRIX_OVERFLOW, ERROR_PUSHMATRIX_UNDERFLOW, ERROR_TEXTFONT_NULL_PFONT, ESC, EXCLUSION, G, GIF, GRAY, GREEN_MASK, HALF_PI, HAND, HARD_LIGHT, HINT_COUNT, HSB, IMAGE, INVERT, JAVA2D, JPEG, LEFT, LIGHTEST, LINE, LINES, LINUX, MACOSX, MAX_FLOAT, MAX_INT, MIN_FLOAT, MIN_INT, MITER, MODEL, MULTIPLY, NORMAL, NORMALIZED, NX, NY, NZ, OPAQUE, OPEN, OPENGL, ORTHOGRAPHIC, OTHER, OVERLAY, P2D, P3D, PATH, PDF, PERSPECTIVE, PI, platformNames, POINT, POINTS, POLYGON, POSTERIZE, PROBLEM, PROJECT, QUAD, QUAD_STRIP, QUADS, QUARTER_PI, R, RAD_TO_DEG, RADIUS, RECT, RED_MASK, REPLACE, RETURN, RGB, RIGHT, ROUND, SA, SB, SCREEN, SG, SHAPE, SHIFT, SHINE, SOFT_LIGHT, SPB, SPG, SPHERE, SPOT, SPR, SQUARE, SR, SUBTRACT, SW, TAB, TARGA, THIRD_PI, THRESHOLD, TIFF, TOP, TRIANGLE, TRIANGLE_FAN, TRIANGLE_STRIP, TRIANGLES, TWO_PI, TX, TY, TZ, U, UP, V, VERTEX_FIELD_COUNT, VW, VX, VY, VZ, WAIT, WHITESPACE, WINDOWS, X, Y, Z
 
Constructor Summary
RiChunker()
           
RiChunker(processing.core.PApplet pApplet)
           
 
Method Summary
 java.lang.String chunk(java.util.List listOfTokens, java.util.List listOfTags)
          Use supplied part-of-speech tags to do chunking, then returning chunk data inline, in following format (for input 'The boy ran over dog'):
 java.lang.String chunk(java.lang.String[] arrayOfTokens, java.lang.String[] arrayOfTags)
          Use supplied part-of-speech tags to do chunking, then returning chunk data inline, in following format (for input 'The boy ran over dog'):
 java.lang.String[] getAdjPhrases()
          Returns an array of adjective phrases found in the last chunking operation.
 java.lang.String[] getAdvPhrases()
          Returns an array of adverb phrases found in the last chunking operation.
 java.lang.String[] getChunkData()
           
 java.lang.String[] getNounPhrases()
          Returns an array of noun phrases found in the last chunking operation.
 java.lang.String[] getPrepPhrases()
          Returns an array of prepositions found in the last chunking operation.
 java.lang.String[] getVerbPhrases()
          Returns the array of verb phrases found in the last chunking operation.
static void main(java.lang.String[] args)
           
 java.lang.String tagAndChunk(java.lang.String sentence)
          Performs pos-tagging (and word tokenizing) to prepare a sentence for chunking, then returns a String of chunks inline, in the following format(for input 'The boy ran over dog'):
 
Methods inherited from class rita.RiObject
dispose, getId, getPApplet, nextId
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

RiChunker

public RiChunker()
Invisible:

RiChunker

public RiChunker(processing.core.PApplet pApplet)
Method Detail

tagAndChunk

public java.lang.String tagAndChunk(java.lang.String sentence)
Performs pos-tagging (and word tokenizing) to prepare a sentence for chunking, then returns a String of chunks inline, in the following format(for input 'The boy ran over dog'):

(np The/dt boy/nn) (vp ran/vbd) (pp over/in) (np the/dt dog/nn)

Specified by:
tagAndChunk in interface RiChunkerIF

chunk

public java.lang.String chunk(java.lang.String[] arrayOfTokens,
                              java.lang.String[] arrayOfTags)
Use supplied part-of-speech tags to do chunking, then returning chunk data inline, in following format (for input 'The boy ran over dog'):

(np The/dt boy/nn) (vp ran/vbd) (pp over/in) (np the/dt dog/nn)

Specified by:
chunk in interface RiChunkerIF

chunk

public java.lang.String chunk(java.util.List listOfTokens,
                              java.util.List listOfTags)
Use supplied part-of-speech tags to do chunking, then returning chunk data inline, in following format (for input 'The boy ran over dog'):

(np The/dt boy/nn) (vp ran/vbd) (pp over/in) (np the/dt dog/nn)

Specified by:
chunk in interface RiChunkerIF

getAdjPhrases

public java.lang.String[] getAdjPhrases()
Returns an array of adjective phrases found in the last chunking operation.

Specified by:
getAdjPhrases in interface RiChunkerIF

getAdvPhrases

public java.lang.String[] getAdvPhrases()
Returns an array of adverb phrases found in the last chunking operation.

Specified by:
getAdvPhrases in interface RiChunkerIF

getChunkData

public java.lang.String[] getChunkData()
Specified by:
getChunkData in interface RiChunkerIF
Invisible:

getNounPhrases

public java.lang.String[] getNounPhrases()
Returns an array of noun phrases found in the last chunking operation.

Specified by:
getNounPhrases in interface RiChunkerIF

getPrepPhrases

public java.lang.String[] getPrepPhrases()
Returns an array of prepositions found in the last chunking operation.

Specified by:
getPrepPhrases in interface RiChunkerIF

getVerbPhrases

public java.lang.String[] getVerbPhrases()
Returns the array of verb phrases found in the last chunking operation.

Specified by:
getVerbPhrases in interface RiChunkerIF

main

public static void main(java.lang.String[] args)