|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectrita.RiObject
rita.RiChunker
public class RiChunker
A simple and lightweight implementation of a phrase chunker for non-recursive syntactic elements (e.g., noun-phrases, verb-phrases, etc.) using the Penn conventions (shown below).
String sent = "The boy ran over dog";
RiChunker chunker = new RiChunker(this);
String chunks = chunker.chunk(sent);
The full tag set follows:
Based closely on the OpenNLP maximum entropy chunker.
For more info see: Berger & Della Pietra's paper
'A Maximum
Entropy Approach to Natural Language Processing',
which
provides a good introduction to the maxent framework.
RiTaServer| Field Summary |
|---|
| Fields inherited from interface processing.core.PConstants |
|---|
A, AB, ADD, AG, ALPHA, ALPHA_MASK, ALT, AMBIENT, AR, ARC, ARGB, ARROW, B, BACKSPACE, BASELINE, BEEN_LIT, BEVEL, BLEND, BLUE_MASK, BLUR, BOTTOM, BOX, BURN, CENTER, CENTER_DIAMETER, CENTER_RADIUS, CHATTER, CLOSE, CMYK, CODED, COMPLAINT, CONTROL, CORNER, CORNERS, CROSS, CUSTOM, DA, DARKEST, DB, DEG_TO_RAD, DELETE, DG, DIAMETER, DIFFERENCE, DILATE, DIRECTIONAL, DISABLE_ACCURATE_TEXTURES, DISABLE_DEPTH_SORT, DISABLE_DEPTH_TEST, DISABLE_OPENGL_2X_SMOOTH, DISABLE_OPENGL_ERROR_REPORT, DODGE, DOWN, DR, DXF, EB, EDGE, EG, ELLIPSE, ENABLE_ACCURATE_TEXTURES, ENABLE_DEPTH_SORT, ENABLE_DEPTH_TEST, ENABLE_NATIVE_FONTS, ENABLE_OPENGL_2X_SMOOTH, ENABLE_OPENGL_4X_SMOOTH, ENABLE_OPENGL_ERROR_REPORT, ENTER, EPSILON, ER, ERODE, ERROR_BACKGROUND_IMAGE_FORMAT, ERROR_BACKGROUND_IMAGE_SIZE, ERROR_PUSHMATRIX_OVERFLOW, ERROR_PUSHMATRIX_UNDERFLOW, ERROR_TEXTFONT_NULL_PFONT, ESC, EXCLUSION, G, GIF, GRAY, GREEN_MASK, HALF_PI, HAND, HARD_LIGHT, HINT_COUNT, HSB, IMAGE, INVERT, JAVA2D, JPEG, LEFT, LIGHTEST, LINE, LINES, LINUX, MACOSX, MAX_FLOAT, MAX_INT, MIN_FLOAT, MIN_INT, MITER, MODEL, MULTIPLY, NORMAL, NORMALIZED, NX, NY, NZ, OPAQUE, OPEN, OPENGL, ORTHOGRAPHIC, OTHER, OVERLAY, P2D, P3D, PATH, PDF, PERSPECTIVE, PI, platformNames, POINT, POINTS, POLYGON, POSTERIZE, PROBLEM, PROJECT, QUAD, QUAD_STRIP, QUADS, QUARTER_PI, R, RAD_TO_DEG, RADIUS, RECT, RED_MASK, REPLACE, RETURN, RGB, RIGHT, ROUND, SA, SB, SCREEN, SG, SHAPE, SHIFT, SHINE, SOFT_LIGHT, SPB, SPG, SPHERE, SPOT, SPR, SQUARE, SR, SUBTRACT, SW, TAB, TARGA, THIRD_PI, THRESHOLD, TIFF, TOP, TRIANGLE, TRIANGLE_FAN, TRIANGLE_STRIP, TRIANGLES, TWO_PI, TX, TY, TZ, U, UP, V, VERTEX_FIELD_COUNT, VW, VX, VY, VZ, WAIT, WHITESPACE, WINDOWS, X, Y, Z |
| Constructor Summary | |
|---|---|
RiChunker()
|
|
RiChunker(processing.core.PApplet pApplet)
|
|
| Method Summary | |
|---|---|
java.lang.String |
chunk(java.util.List listOfTokens,
java.util.List listOfTags)
Use supplied part-of-speech tags to do chunking, then returning chunk data inline, in following format (for input 'The boy ran over dog'): |
java.lang.String |
chunk(java.lang.String[] arrayOfTokens,
java.lang.String[] arrayOfTags)
Use supplied part-of-speech tags to do chunking, then returning chunk data inline, in following format (for input 'The boy ran over dog'): |
java.lang.String[] |
getAdjPhrases()
Returns an array of adjective phrases found in the last chunking operation. |
java.lang.String[] |
getAdvPhrases()
Returns an array of adverb phrases found in the last chunking operation. |
java.lang.String[] |
getChunkData()
|
java.lang.String[] |
getNounPhrases()
Returns an array of noun phrases found in the last chunking operation. |
java.lang.String[] |
getPrepPhrases()
Returns an array of prepositions found in the last chunking operation. |
java.lang.String[] |
getVerbPhrases()
Returns the array of verb phrases found in the last chunking operation. |
static void |
main(java.lang.String[] args)
|
java.lang.String |
tagAndChunk(java.lang.String sentence)
Performs pos-tagging (and word tokenizing) to prepare a sentence for chunking, then returns a String of chunks inline, in the following format(for input 'The boy ran over dog'): |
| Methods inherited from class rita.RiObject |
|---|
dispose, getId, getPApplet, nextId |
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public RiChunker()
public RiChunker(processing.core.PApplet pApplet)
| Method Detail |
|---|
public java.lang.String tagAndChunk(java.lang.String sentence)
(np The/dt boy/nn) (vp ran/vbd) (pp over/in) (np the/dt dog/nn)
tagAndChunk in interface RiChunkerIF
public java.lang.String chunk(java.lang.String[] arrayOfTokens,
java.lang.String[] arrayOfTags)
(np The/dt boy/nn) (vp ran/vbd) (pp over/in) (np the/dt dog/nn)
chunk in interface RiChunkerIF
public java.lang.String chunk(java.util.List listOfTokens,
java.util.List listOfTags)
(np The/dt boy/nn) (vp ran/vbd) (pp over/in) (np the/dt dog/nn)
chunk in interface RiChunkerIFpublic java.lang.String[] getAdjPhrases()
getAdjPhrases in interface RiChunkerIFpublic java.lang.String[] getAdvPhrases()
getAdvPhrases in interface RiChunkerIFpublic java.lang.String[] getChunkData()
getChunkData in interface RiChunkerIFpublic java.lang.String[] getNounPhrases()
getNounPhrases in interface RiChunkerIFpublic java.lang.String[] getPrepPhrases()
getPrepPhrases in interface RiChunkerIFpublic java.lang.String[] getVerbPhrases()
getVerbPhrases in interface RiChunkerIFpublic static void main(java.lang.String[] args)
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||