| Description |
Tree-based parser for recursive syntactic annotations, e.g.,
noun-phrases, using the Penn conventions.
An example:
String s = "The black cat crossed my path.";
RiParser parser = new RiParser();
String result = parser.parse(s);
System.out.println(result);
Note: to use this object, first download the rita statistical
models (rita.me.models.zip) and unpack them into the 'rita'
directory in your libraries directory within your processing sketchbook,
e.g., $SKETCH_PAD/libraries/rita/models.
You may also specify an alternative directory (an absolute path) for the
models via RiTa.setModelDir();
This object is most useful when used with the RiTaServer
as it can take significant time to load the necessary
statisical models.
Primarily just a wrapper for the OpenNLP(http://opennlp.sourceforge.net) parser
with some minor modifications/simplifications.
For more info see: Berger & Della Pietra's paper:
'A Maximum Entropy Approach to Natural Language Processing',
which provides a good introduction to the maxent framework.
The full tag set follows:
- S - simple declarative clause, i.e. one that is not introduced by a (possible
empty) subordinating conjunction or a wh-word and that does not exhibit
subject-verb inversion.
- SBAR - Clause introduced by a (possibly empty) subordinating conjunction.
- SBARQ - Direct question introduced by a wh-word or a wh-phrase. Indirect
questions and relative clauses should be bracketed as SBAR, not SBARQ.
- SINV - Inverted declarative sentence, i.e. one in which the subject follows
the tensed verb or modal.
- SQ - Inverted yes/no question, or main clause of a wh-question, following the
wh-phrase in SBARQ.
- Phrase Level
- ADJP - Adjective Phrase.
- ADVP - Adverb Phrase.
- CONJP - Conjunction Phrase.
- FRAG - Fragment.
- INTJ - Interjection. Corresponds approximately to the part-of-speech tag UH.
- LST - List marker. Includes surrounding punctuation.
- NAC - Not a Constituent; used to show the scope of certain prenominal
modifiers within an NP.
- NP - Noun Phrase.
- NX - Used within certain complex NPs to mark the head of the NP. Corresponds
very roughly to N-bar level but used quite differently.
- PP - Prepositional Phrase.
- PRN - Parenthetical.
- PRT - Particle. Category for words that should be tagged RP.
- QP - Quantifier Phrase (i.e. complex measure/amount phrase); used within NP.
- RRC - Reduced Relative Clause.
- UCP - Unlike Coordinated Phrase.
- VP - Vereb Phrase.
- WHADJP - Wh-adjective Phrase. Adjectival phrase containing a wh-adverb, as in
how hot.
- WHAVP - Wh-adverb Phrase. Introduces a clause with an NP gap. May be null
(containing the 0 complementizer) or lexical, containing a wh-adverb such as how
or why.
- WHNP - Wh-noun Phrase. Introduces a clause with an NP gap. May be null
(containing the 0 complementizer) or lexical, containing some wh-word, e.g. who,
which book, whose daughter, none of which, or how many leopards.
- WHPP - Wh-prepositional Phrase. Prepositional phrase containing a wh-noun
phrase (such as of which or by whose authority) that either introduces a PP gap
or is contained by a WHNP.
- X - Unknown,simple
|