|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectrita.support.BrillPosTagger
public class BrillPosTagger
Simple transformation-based pos-tagger for the RiTa libary using the Penn tagset
Uses the Brill data set with a minimal subset of the original context-sensitive transformations (plus some custom additions.)
For more info see: Brill (1995) 'Unsupervised Learning of Disambiguation Rules for Part of Speech Tagging'
The full Penn tag set follows:
cc Coordinating conjunction
cd Cardinal number
dt Determiner
ex Existential there
fw Foreign word
in Preposition/subord. conjunction
jj Adjective
jjr Adjective, comparative
jjs Adjective, superlative
ls List item marker
md Modal
nn Noun, singular or mass
nns Noun, plural
nnp Proper noun, singular
nnps Proper noun, plural
pdt Predeterminer
pos Possessive ending
prp Personal pronoun
prp$ Possessive pronoun
rb Adverb
rbr Adverb, comparative
rbs Adverb, superlative
rp Particle
sym Symbol (mathematical or scientific)
to to
uh Interjection
vb Verb, base form
vbd Verb, past tense
vbg Verb, gerund/present participle
vbn Verb, past participle
vbp Verb, non-3rd ps. sing. present
vbz Verb, 3rd ps. sing. present
wdt wh-determiner
wp wh-pronoun
wp$ Possessive wh-pronoun
wrb wh-adverb
# Pound sign
$ Dollar sign
. Sentence-final punctuation
, Comma
: Colon, semi-colon
( Left bracket character
) Right bracket character
" Straight double quote
` Left open single quote
" Left open double quote
' Right close single quote
" Right close double quote
- Right close double quote
| Field Summary | |
|---|---|
static boolean |
PRINT_CUSTOM_TAGS
|
| Method Summary | |
|---|---|
static BrillPosTagger |
getInstance(processing.core.PApplet p)
invisible deprecated public static BrillPosTagger getInstance() { return getInstance(null); } |
boolean |
isAdjective(java.lang.String pos)
Returns true if word is an adjective. |
boolean |
isAdverb(java.lang.String pos)
Returns true if word is an adverb. |
boolean |
isNoun(java.lang.String pos)
Returns true if word is a noun. |
boolean |
isVerb(java.lang.String pos)
Returns true if word is a verb. |
java.lang.String[] |
lookup(java.lang.String word)
|
static void |
main(java.lang.String[] args)
|
java.lang.String |
tag(java.lang.String word)
Returns the part(s)-of-speech from the Penn tagset for a single word |
java.lang.String[] |
tag(java.lang.String[] words)
Returns an array of parts-of-speech from the Penn tagset each corresponding to one word of input. |
void |
tag(java.lang.String[] words,
java.util.List result)
Returns an array of parts-of-speech from the Penn tagset each corresponding to one word of input. |
java.lang.String[] |
tagFile(java.lang.String fileName)
Loads a file, splits the input into sentences and returns a String[] of the most probably tags. |
java.lang.String |
tagForWordNet(java.lang.String word)
Tags the word (as usual) with a part-of-speech from the Penn tagset, then returns the corresponding part-of-speech for WordNet from the set { 'n', 'v', 'a', 'r' } as a String. |
java.lang.String |
tagInline(java.lang.String sentence)
Returns a String with pos-tags notated inline |
java.lang.String |
tagInline(java.lang.String[] tokens)
Returns a String with pos-tags notated inline |
static void |
tests(java.lang.String[] args)
|
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final boolean PRINT_CUSTOM_TAGS
| Method Detail |
|---|
public static BrillPosTagger getInstance(processing.core.PApplet p)
public java.lang.String[] tagFile(java.lang.String fileName)
tagFile in interface RiTaggerIFpublic boolean isVerb(java.lang.String pos)
RiTaggerIFword is a verb.
isVerb in interface RiTaggerIFpublic boolean isNoun(java.lang.String pos)
RiTaggerIFword is a noun.
isNoun in interface RiTaggerIFpublic boolean isAdverb(java.lang.String pos)
RiTaggerIFword is an adverb.
isAdverb in interface RiTaggerIFpublic boolean isAdjective(java.lang.String pos)
RiTaggerIFword is an adjective.
isAdjective in interface RiTaggerIFpublic java.lang.String tag(java.lang.String word)
word - String
tag(String[])public java.lang.String tagForWordNet(java.lang.String word)
word - tag(java.lang.String)
public void tag(java.lang.String[] words,
java.util.List result)
words - String[]public java.lang.String[] tag(java.lang.String[] words)
tag in interface RiTaggerIFwords - String[]
public java.lang.String[] lookup(java.lang.String word)
public java.lang.String tagInline(java.lang.String[] tokens)
tagInline in interface RiTaggerIFpublic java.lang.String tagInline(java.lang.String sentence)
RiTaggerIF
tagInline in interface RiTaggerIFpublic static void tests(java.lang.String[] args)
public static void main(java.lang.String[] args)
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||