|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectrita.RiObject
rita.RiPosTagger
public class RiPosTagger
Simple pos-tagger for the RiTa libary using the Penn tagset. Use RiPosTagger.setDefaultTagger(type);
to specify a (faster/lighter) transformation-based tagger, or the (usually more accurate)
maximum-entryopy tagger
RiPosTagger tagger = new RiPosTagger(this);
String s = "The teenage boy, stricken with fear, cried sadly like a little baby";
String[] words = RiTa.tokenize(s);
String[] tags = tagger.tag(words);
for (int i = 0; i < sents.length; i++)
{
System.out.println(sents[i]);
}
// OR
System.out.println(tagger.tagInline(s));
The full Penn part-of-speech tag set:
cc coordinating conjunction
cd cardinal number
dt determiner
ex existential there
fw foreign word
in preposition/subord. conjunction
jj adjective
jjr adjective, comparative
jjs adjective, superlative
ls list item marker
md modal
nn noun, singular or mass
nns noun, plural
nnp proper noun, singular
nnps proper noun, plural
pdt predeterminer
pos possessive ending
prp personal pronoun
prp$ i possessive pronoun
rb adverb
rbr adverb, comparative
rbs adverb, superlative
rp particle
sym symbol (mathematical or scientific)
to to
uh interjection
vb verb, base form
vbd verb, past tense
vbg verb, gerund/present participle
vbn verb, past participle
vbp verb, non-3rd ps. sing. present
vbz verb, 3rd ps. sing. present
wdt wh-determiner
wp wh-pronoun
wp$ possessive wh-pronoun
wrb wh-adverb
# pound sign
$ dollar sign
. sentence-final punctuation
, comma
: colon, semi-colon
( left bracket character
) right bracket character
" straight double quote
` left open single quote
" left open double quote
" right close single quote
" right close double quote
- dash
You can also specify an alternative directory (an absolute path) for the models via RiTa.setModelDir();
Then call RiPosTagger.setDefaultTagger(RiPosTagger.MAXENT_POS_TAGGER);
| Field Summary |
|---|
| Fields inherited from interface processing.core.PConstants |
|---|
A, AB, ADD, AG, ALPHA, ALPHA_MASK, ALT, AMBIENT, AR, ARC, ARGB, ARROW, B, BACKSPACE, BASELINE, BEEN_LIT, BEVEL, BLEND, BLUE_MASK, BLUR, BOTTOM, BOX, BURN, CENTER, CENTER_DIAMETER, CENTER_RADIUS, CHATTER, CLOSE, CMYK, CODED, COMPLAINT, CONTROL, CORNER, CORNERS, CROSS, CUSTOM, DA, DARKEST, DB, DEG_TO_RAD, DELETE, DG, DIAMETER, DIFFERENCE, DILATE, DIRECTIONAL, DISABLE_ACCURATE_TEXTURES, DISABLE_DEPTH_SORT, DISABLE_DEPTH_TEST, DISABLE_OPENGL_2X_SMOOTH, DISABLE_OPENGL_ERROR_REPORT, DODGE, DOWN, DR, DXF, EB, EDGE, EG, ELLIPSE, ENABLE_ACCURATE_TEXTURES, ENABLE_DEPTH_SORT, ENABLE_DEPTH_TEST, ENABLE_NATIVE_FONTS, ENABLE_OPENGL_2X_SMOOTH, ENABLE_OPENGL_4X_SMOOTH, ENABLE_OPENGL_ERROR_REPORT, ENTER, EPSILON, ER, ERODE, ERROR_BACKGROUND_IMAGE_FORMAT, ERROR_BACKGROUND_IMAGE_SIZE, ERROR_PUSHMATRIX_OVERFLOW, ERROR_PUSHMATRIX_UNDERFLOW, ERROR_TEXTFONT_NULL_PFONT, ESC, EXCLUSION, G, GIF, GRAY, GREEN_MASK, HALF_PI, HAND, HARD_LIGHT, HINT_COUNT, HSB, IMAGE, INVERT, JAVA2D, JPEG, LEFT, LIGHTEST, LINE, LINES, LINUX, MACOSX, MAX_FLOAT, MAX_INT, MIN_FLOAT, MIN_INT, MITER, MODEL, MULTIPLY, NORMAL, NORMALIZED, NX, NY, NZ, OPAQUE, OPEN, OPENGL, ORTHOGRAPHIC, OTHER, OVERLAY, P2D, P3D, PATH, PDF, PERSPECTIVE, PI, platformNames, POINT, POINTS, POLYGON, POSTERIZE, PROBLEM, PROJECT, QUAD, QUAD_STRIP, QUADS, QUARTER_PI, R, RAD_TO_DEG, RADIUS, RECT, RED_MASK, REPLACE, RETURN, RGB, RIGHT, ROUND, SA, SB, SCREEN, SG, SHAPE, SHIFT, SHINE, SOFT_LIGHT, SPB, SPG, SPHERE, SPOT, SPR, SQUARE, SR, SUBTRACT, SW, TAB, TARGA, THIRD_PI, THRESHOLD, TIFF, TOP, TRIANGLE, TRIANGLE_FAN, TRIANGLE_STRIP, TRIANGLES, TWO_PI, TX, TY, TZ, U, UP, V, VERTEX_FIELD_COUNT, VW, VX, VY, VZ, WAIT, WHITESPACE, WINDOWS, X, Y, Z |
| Constructor Summary | |
|---|---|
RiPosTagger()
Deprecated. |
|
RiPosTagger(processing.core.PApplet pApplet)
|
|
RiPosTagger(processing.core.PApplet p,
int taggerType)
|
|
| Method Summary | |
|---|---|
static RiPosTagger |
getInstance()
|
static RiPosTagger |
getInstance(processing.core.PApplet p)
|
static java.lang.String |
inlineTags(java.lang.String[] tokenArray,
java.lang.String[] tagArray)
Takes an array of words and of tags and returns a combined String of the form: |
static java.lang.String |
inlineTags(java.lang.String[] tokenArray,
java.lang.String[] tagArray,
java.lang.String delimiter)
Takes an array of words and of tags and returns a combined String of the form: |
static boolean |
isAdjective(java.lang.String pos)
Returns true if pos is an adjective |
static boolean |
isAdverb(java.lang.String pos)
Returns true if pos is an adverb |
static boolean |
isNoun(java.lang.String partOfSpeech)
Returns true if partOfSpeech is a noun |
static boolean |
isVerb(java.lang.String pos)
Returns true if pos is a verb |
static void |
main(java.lang.String[] args)
|
static java.lang.String[] |
parseTagString(java.lang.String wordsAndTags)
Takes a String of words and tags in the format: |
static void |
setDefaultTagger(int taggerType)
Sets the default tagger type for the application |
java.lang.String[] |
tag(FeaturedIF[] tokenArray)
Tags each token with the appropriate POS (as a feature), then returns a String array of the assigned tags. |
java.lang.String[] |
tag(java.lang.String[] tokenArray)
Returns a String array of the most probably tags |
java.lang.String[] |
tagFile(java.lang.String fileName)
Loads a file, splits the input into sentences and returns a single String[] with all the pos-tags from the text. |
java.lang.String[] |
tagForWordNet(FeaturedIF[] tokenArray)
Tags the array of words (as usual) with a part-of-speech from the Penn tagset, then returns the corresponding part-of-speech for WordNet from the set { "n" (noun), "v"(verb), "a"(adj), "r"(adverb), "-"(other) } as a String. |
java.lang.String[] |
tagForWordNet(java.lang.String[] tokenArray)
Tags the array of words (as usual) with a part-of-speech from the Penn tagset, then returns the corresponding part-of-speech for WordNet from the set { "n" (noun), "v"(verb), "a"(adj), "r"(adverb), "-"(other) } as a String. |
static boolean |
taggerExists()
Returns true if the tagger has been already created |
java.lang.String |
tagInline(java.lang.String sentence)
Tokenizes the input sentence using the defaultTokenizer and returns a String with pos-tags notated inline |
java.lang.String |
tagInline(java.lang.String[] tokens)
Returns a String with pos-tags notated inline in the format: |
java.lang.String |
tagWordForWordNet(java.lang.String word)
Tags a single word with a part-of-speech from the Penn tagset, then returns the corresponding part-of-speech for WordNet from the set { "n" (noun), "v"(verb), "a"(adj), "r"(adverb), "-"(other) } as a String. |
static java.lang.String |
toWordNet(java.lang.String pos)
Converts a part-of-speech String from the Penn tagset to the corresponding part-of-speech for WordNet from the set { "n" (noun), "v"(verb), "a"(adj), "r"(adverb), "-"(other) } as a String. |
| Methods inherited from class rita.RiObject |
|---|
dispose, getId, getPApplet, nextId |
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public RiPosTagger()
public RiPosTagger(processing.core.PApplet pApplet)
public RiPosTagger(processing.core.PApplet p,
int taggerType)
| Method Detail |
|---|
public static RiPosTagger getInstance()
public static RiPosTagger getInstance(processing.core.PApplet p)
public static java.lang.String inlineTags(java.lang.String[] tokenArray,
java.lang.String[] tagArray,
java.lang.String delimiter)
"The/dt doctor/nn treated/vbd dogs/nns"assuming a "/" as
delimiter.
public java.lang.String[] tagForWordNet(java.lang.String[] tokenArray)
tag(rita.support.FeaturedIF[])public java.lang.String[] tagForWordNet(FeaturedIF[] tokenArray)
tagForWordNet(String[])public java.lang.String tagWordForWordNet(java.lang.String word)
tagForWordNet(String[]),
tag(String[])public static java.lang.String toWordNet(java.lang.String pos)
tag(String[])
public static java.lang.String inlineTags(java.lang.String[] tokenArray,
java.lang.String[] tagArray)
"The/dt doctor/nn treated/vbd dogs/nns"
public static java.lang.String[] parseTagString(java.lang.String wordsAndTags)
The/dt doctor/nn treated/vbd dogs/nnsreturns an array of the part-of-speech tags.
wordsAndTags - public static boolean isVerb(java.lang.String pos)
pos is a verb
public static boolean isNoun(java.lang.String partOfSpeech)
partOfSpeech is a noun
public static boolean isAdverb(java.lang.String pos)
pos is an adverb
public static boolean isAdjective(java.lang.String pos)
pos is an adjective
public java.lang.String[] tag(FeaturedIF[] tokenArray)
public java.lang.String[] tag(java.lang.String[] tokenArray)
public java.lang.String tagInline(java.lang.String[] tokens)
"The/dt doctor/nn treated/vbd dogs/nns"
public static void setDefaultTagger(int taggerType)
public static boolean taggerExists()
public java.lang.String tagInline(java.lang.String sentence)
public java.lang.String[] tagFile(java.lang.String fileName)
public static void main(java.lang.String[] args)
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||