rita.support
Class RiSplitter

java.lang.Object
  extended by rita.support.RiSplitter
All Implemented Interfaces:
RiSplitterIF

public class RiSplitter
extends java.lang.Object
implements RiSplitterIF

A simple interface for different sentence splitters.

Note: Adapted from JET & OAK


Field Summary
static boolean DBUG
           
static int MAX_CHARS_PERS_SENTENCE
           
static int MIN_CHARS_PERS_SENTENCE
           
 
Method Summary
static RiSplitter getInstance()
           
 boolean isRemovingQuotations()
          Returns whether the parser is trimming single and double quotes from input text.
static boolean isSentenceEnd(java.lang.String currentToken, java.lang.String nextToken)
          Returns true if currentToken is the final token of a sentence.
static boolean isSentenceEnd(java.lang.String currentToken, java.lang.String nextToken, boolean startOfSentence)
          Returns true if currentToken is the final token of a sentence.
static void main(java.lang.String[] args)
           
 void setTrimQuotations(boolean removeQuotations)
          Tells the parser whether to trim single and double quotes from input text.
 java.util.List splitSentences(java.util.List sentences, java.lang.String text)
          Splits the data in text into sentences.
 java.lang.String[] splitSentences(java.lang.String text)
          Splits text into a String[] of sentences
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DBUG

public static final boolean DBUG
See Also:
Constant Field Values

MAX_CHARS_PERS_SENTENCE

public static final int MAX_CHARS_PERS_SENTENCE
See Also:
Constant Field Values

MIN_CHARS_PERS_SENTENCE

public static final int MIN_CHARS_PERS_SENTENCE
See Also:
Constant Field Values
Method Detail

getInstance

public static RiSplitter getInstance()

splitSentences

public java.lang.String[] splitSentences(java.lang.String text)
Description copied from interface: RiSplitterIF
Splits text into a String[] of sentences

Specified by:
splitSentences in interface RiSplitterIF

splitSentences

public java.util.List splitSentences(java.util.List sentences,
                                     java.lang.String text)
Splits the data in text into sentences.

We split after a period if the following token is capitalized, and the preceding token is not a known not-sentence-ending abbreviation (such as a title) or a single capital letter.


isSentenceEnd

public static boolean isSentenceEnd(java.lang.String currentToken,
                                    java.lang.String nextToken)
Returns true if currentToken is the final token of a sentence.

This is a simplified version of the OAK/JET sentence splitter.


isSentenceEnd

public static boolean isSentenceEnd(java.lang.String currentToken,
                                    java.lang.String nextToken,
                                    boolean startOfSentence)
Returns true if currentToken is the final token of a sentence.

This is a simplified version of the OAK/JET sentence splitter.


setTrimQuotations

public void setTrimQuotations(boolean removeQuotations)
Tells the parser whether to trim single and double quotes from input text.

Parameters:
removeQuotations -

isRemovingQuotations

public boolean isRemovingQuotations()
Returns whether the parser is trimming single and double quotes from input text.


main

public static void main(java.lang.String[] args)