RiTa
index
Name RiChunker
Description A simple and lightweight implementation of a phrase chunker for non-recursive syntactic elements (e.g., noun-phrases, verb-phrases, etc.) using the Penn conventions (shown below).
    String sent = "The boy ran over dog";  
    RiChunker chunker = new RiChunker(this);    
    String chunks = chunker.chunk(sent);
The full tag set follows:
  • adjp = adjective phrase
  • advp = adverb phrase
  • conjp = conjunction phrase
  • intj = interjection
  • lst = list marker
  • np = noun phrase
  • pp = prepositional phrase
  • prt = particle
  • sbar = clause introduced by a subordinating conjunction
  • ucp = unlike coordinated phrase
  • vp = verb phrase
  • o = independent phrase

Note: to use this object, you must first download the rita statistical models (rita.me.models.zip) and unpack them into the 'rita' directory in your processing sketchbook, or into the data directory for you sketch. You can also specify an alternative directory (an absolute path) for the models via RiTa.setModelDir();

Based closely on the OpenNLP maximum entropy chunker.

For more info see: Berger & Della Pietra's paper 'A Maximum
Entropy Approach to Natural Language Processing', which
provides a good introduction to the maxent framework.

Constructors
RiChunker(pApplet);
Methods
chunk()   Use supplied part-of-speech tags to do chunking, then returning chunk data inline, in following format (for input 'The boy ran over dog'):

(np The/dt boy/nn) (vp ran/vbd) (pp over/in) (np the/dt dog/nn)

getAdjPhrases()   Returns an array of adjective phrases found in the last chunking operation.

getAdvPhrases()   Returns an array of adverb phrases found in the last chunking operation.

getNounPhrases()   Returns an array of noun phrases found in the last chunking operation.

getPrepPhrases()   Returns an array of prepositions found in the last chunking operation.

getVerbPhrases()   Returns the array of verb phrases found in the last chunking operation.

tagAndChunk()   Performs pos-tagging (and word tokenizing) to prepare a sentence for chunking, then returns a String of chunks inline, in the following format(for input 'The boy ran over dog'):

(np The/dt boy/nn) (vp ran/vbd) (pp over/in) (np the/dt dog/nn)

Usage Web & Application