| Description |
Tree-based parser for recursive syntactic annotations, e.g.,
noun-phrases, using the Penn conventions.
An example:
String s = "The black cat crossed my path.";
RiParser parser = new RiParser();
String result = parser.parse(s);
System.out.println(result);
Note: to use this object, first download the rita statistical
models (rita.me.models.zip) and unpack them into the 'rita'
directory in your libraries directory within your processing sketchbook,
e.g., $SKETCH_PAD/libraries/rita/models.
You may also specify an alternative directory (an absolute path) for the
models via RiTa.setModelDir();
This object is most useful when used with the RiTaServer
as it can take significant time to load the necessary
statisical models.
Based closely on the OpenNLP maximum entropy parser.
For more info see: Berger & Della Pietra's paper
'A Maximum Entropy Approach to Natural Language Processing',
which provides a good introduction to the maxent framework.
The full tag set follows:
- S - simple declarative clause, i.e. one that is not introduced by a (possible
empty) subordinating conjunction or a wh-word and that does not exhibit
subject-verb inversion.
- SBAR - Clause introduced by a (possibly empty) subordinating conjunction.
- SBARQ - Direct question introduced by a wh-word or a wh-phrase. Indirect
questions and relative clauses should be bracketed as SBAR, not SBARQ.
- SINV - Inverted declarative sentence, i.e. one in which the subject follows
the tensed verb or modal.
- SQ - Inverted yes/no question, or main clause of a wh-question, following the
wh-phrase in SBARQ.
- Phrase Level
- ADJP - Adjective Phrase.
- ADVP - Adverb Phrase.
- CONJP - Conjunction Phrase.
- FRAG - Fragment.
- INTJ - Interjection. Corresponds approximately to the part-of-speech tag UH.
- LST - List marker. Includes surrounding punctuation.
- NAC - Not a Constituent; used to show the scope of certain prenominal
modifiers within an NP.
- NP - Noun Phrase.
- NX - Used within certain complex NPs to mark the head of the NP. Corresponds
very roughly to N-bar level but used quite differently.
- PP - Prepositional Phrase.
- PRN - Parenthetical.
- PRT - Particle. Category for words that should be tagged RP.
- QP - Quantifier Phrase (i.e. complex measure/amount phrase); used within NP.
- RRC - Reduced Relative Clause.
- UCP - Unlike Coordinated Phrase.
- VP - Vereb Phrase.
- WHADJP - Wh-adjective Phrase. Adjectival phrase containing a wh-adverb, as in
how hot.
- WHAVP - Wh-adverb Phrase. Introduces a clause with an NP gap. May be null
(containing the 0 complementizer) or lexical, containing a wh-adverb such as how
or why.
- WHNP - Wh-noun Phrase. Introduces a clause with an NP gap. May be null
(containing the 0 complementizer) or lexical, containing some wh-word, e.g. who,
which book, whose daughter, none of which, or how many leopards.
- WHPP - Wh-prepositional Phrase. Prepositional phrase containing a wh-noun
phrase (such as of which or by whose authority) that either introduces a PP gap
or is contained by a WHNP.
- X - Unknown,simple
|