rita.support
Class BrillPosTagger

java.lang.Object
  extended by rita.support.BrillPosTagger
All Implemented Interfaces:
RiTaggerIF

public class BrillPosTagger
extends java.lang.Object
implements RiTaggerIF

Simple transformation-based pos-tagger for the RiTa libary using the Penn tagset

Uses the Brill data set with a minimal subset of the original context-sensitive transformations (plus some custom additions.)

For more info see: Brill (1995) 'Unsupervised Learning of Disambiguation Rules for Part of Speech Tagging'

The full Penn tag set follows:

  1. cc Coordinating conjunction
  2. cd Cardinal number
  3. dt Determiner
  4. ex Existential there
  5. fw Foreign word
  6. in Preposition/subord. conjunction
  7. jj Adjective
  8. jjr Adjective, comparative
  9. jjs Adjective, superlative
  10. ls List item marker
  11. md Modal
  12. nn Noun, singular or mass
  13. nns Noun, plural
  14. nnp Proper noun, singular
  15. nnps Proper noun, plural
  16. pdt Predeterminer
  17. pos Possessive ending
  18. prp Personal pronoun
  19. prp$ Possessive pronoun
  20. rb Adverb
  21. rbr Adverb, comparative
  22. rbs Adverb, superlative
  23. rp Particle
  24. sym Symbol (mathematical or scientific)
  25. to to
  26. uh Interjection
  27. vb Verb, base form
  28. vbd Verb, past tense
  29. vbg Verb, gerund/present participle
  30. vbn Verb, past participle
  31. vbp Verb, non-3rd ps. sing. present
  32. vbz Verb, 3rd ps. sing. present
  33. wdt wh-determiner
  34. wp wh-pronoun
  35. wp$ Possessive wh-pronoun
  36. wrb wh-adverb
  37. # Pound sign
  38. $ Dollar sign
  39. . Sentence-final punctuation
  40. , Comma
  41. : Colon, semi-colon
  42. ( Left bracket character
  43. ) Right bracket character
  44. " Straight double quote
  45. ` Left open single quote
  46. " Left open double quote
  47. ' Right close single quote
  48. " Right close double quote
  49. - Right close double quote


Field Summary
static boolean PRINT_CUSTOM_TAGS
           
 
Method Summary
static BrillPosTagger getInstance(processing.core.PApplet p)
          invisible deprecated public static BrillPosTagger getInstance() { return getInstance(null); }
 boolean isAdjective(java.lang.String pos)
          Returns true if word is an adjective.
 boolean isAdverb(java.lang.String pos)
          Returns true if word is an adverb.
 boolean isNoun(java.lang.String pos)
          Returns true if word is a noun.
 boolean isVerb(java.lang.String pos)
          Returns true if word is a verb.
 java.lang.String[] lookup(java.lang.String word)
           
static void main(java.lang.String[] args)
           
 java.lang.String tag(java.lang.String word)
          Returns the part(s)-of-speech from the Penn tagset for a single word
 java.lang.String[] tag(java.lang.String[] words)
          Returns an array of parts-of-speech from the Penn tagset each corresponding to one word of input.
 void tag(java.lang.String[] words, java.util.List result)
          Returns an array of parts-of-speech from the Penn tagset each corresponding to one word of input.
 java.lang.String[] tagFile(java.lang.String fileName)
          Loads a file, splits the input into sentences and returns a String[] of the most probably tags.
 java.lang.String tagForWordNet(java.lang.String word)
          Tags the word (as usual) with a part-of-speech from the Penn tagset, then returns the corresponding part-of-speech for WordNet from the set { 'n', 'v', 'a', 'r' } as a String.
 java.lang.String tagInline(java.lang.String sentence)
          Returns a String with pos-tags notated inline
 java.lang.String tagInline(java.lang.String[] tokens)
          Returns a String with pos-tags notated inline
static void tests(java.lang.String[] args)
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

PRINT_CUSTOM_TAGS

public static final boolean PRINT_CUSTOM_TAGS
See Also:
Constant Field Values
Method Detail

getInstance

public static BrillPosTagger getInstance(processing.core.PApplet p)
invisible deprecated public static BrillPosTagger getInstance() { return getInstance(null); }


tagFile

public java.lang.String[] tagFile(java.lang.String fileName)
Loads a file, splits the input into sentences and returns a String[] of the most probably tags.

Specified by:
tagFile in interface RiTaggerIF

isVerb

public boolean isVerb(java.lang.String pos)
Description copied from interface: RiTaggerIF
Returns true if word is a verb.

Specified by:
isVerb in interface RiTaggerIF

isNoun

public boolean isNoun(java.lang.String pos)
Description copied from interface: RiTaggerIF
Returns true if word is a noun.

Specified by:
isNoun in interface RiTaggerIF

isAdverb

public boolean isAdverb(java.lang.String pos)
Description copied from interface: RiTaggerIF
Returns true if word is an adverb.

Specified by:
isAdverb in interface RiTaggerIF

isAdjective

public boolean isAdjective(java.lang.String pos)
Description copied from interface: RiTaggerIF
Returns true if word is an adjective.

Specified by:
isAdjective in interface RiTaggerIF

tag

public java.lang.String tag(java.lang.String word)
Returns the part(s)-of-speech from the Penn tagset for a single word

Parameters:
word - String
Returns:
String (or String[])
See Also:
tag(String[])

tagForWordNet

public java.lang.String tagForWordNet(java.lang.String word)
Tags the word (as usual) with a part-of-speech from the Penn tagset, then returns the corresponding part-of-speech for WordNet from the set { 'n', 'v', 'a', 'r' } as a String.

Parameters:
word -
See Also:
tag(java.lang.String)

tag

public void tag(java.lang.String[] words,
                java.util.List result)
Returns an array of parts-of-speech from the Penn tagset each corresponding to one word of input.

Parameters:
words - String[]

tag

public java.lang.String[] tag(java.lang.String[] words)
Returns an array of parts-of-speech from the Penn tagset each corresponding to one word of input.

Specified by:
tag in interface RiTaggerIF
Parameters:
words - String[]
Returns:
String[]

lookup

public java.lang.String[] lookup(java.lang.String word)

tagInline

public java.lang.String tagInline(java.lang.String[] tokens)
Returns a String with pos-tags notated inline

Specified by:
tagInline in interface RiTaggerIF

tagInline

public java.lang.String tagInline(java.lang.String sentence)
Description copied from interface: RiTaggerIF
Returns a String with pos-tags notated inline

Specified by:
tagInline in interface RiTaggerIF

tests

public static void tests(java.lang.String[] args)
Invisible:

main

public static void main(java.lang.String[] args)