|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectrita.support.RiSplitter
public class RiSplitter
A simple interface for different sentence splitters.
Note: Adapted from JET & OAK
| Field Summary | |
|---|---|
static boolean |
DBUG
|
static int |
MAX_CHARS_PERS_SENTENCE
|
static int |
MIN_CHARS_PERS_SENTENCE
|
| Method Summary | |
|---|---|
static RiSplitter |
getInstance()
|
boolean |
isRemovingQuotations()
Returns whether the parser is trimming single and double quotes from input text. |
static boolean |
isSentenceEnd(java.lang.String currentToken,
java.lang.String nextToken)
Returns true if currentToken is the final token of a sentence. |
static boolean |
isSentenceEnd(java.lang.String currentToken,
java.lang.String nextToken,
boolean startOfSentence)
Returns true if currentToken is the final token of a sentence. |
static void |
main(java.lang.String[] args)
|
void |
setTrimQuotations(boolean removeQuotations)
Tells the parser whether to trim single and double quotes from input text. |
java.util.List |
splitSentences(java.util.List sentences,
java.lang.String text)
Splits the data in text into sentences. |
java.lang.String[] |
splitSentences(java.lang.String text)
Splits text into a String[] of sentences |
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final boolean DBUG
public static final int MAX_CHARS_PERS_SENTENCE
public static final int MIN_CHARS_PERS_SENTENCE
| Method Detail |
|---|
public static RiSplitter getInstance()
public java.lang.String[] splitSentences(java.lang.String text)
RiSplitterIFtext into a String[] of sentences
splitSentences in interface RiSplitterIF
public java.util.List splitSentences(java.util.List sentences,
java.lang.String text)
text into sentences. We split after a period if the following token is capitalized, and the preceding token is not a known not-sentence-ending abbreviation (such as a title) or a single capital letter.
public static boolean isSentenceEnd(java.lang.String currentToken,
java.lang.String nextToken)
This is a simplified version of the OAK/JET sentence splitter.
public static boolean isSentenceEnd(java.lang.String currentToken,
java.lang.String nextToken,
boolean startOfSentence)
This is a simplified version of the OAK/JET sentence splitter.
public void setTrimQuotations(boolean removeQuotations)
removeQuotations - public boolean isRemovingQuotations()
public static void main(java.lang.String[] args)
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||