rita.support
Interface NGramIF

All Known Implementing Classes:
NGramModel

public interface NGramIF


Method Summary
 java.lang.String generateTokens(int targetNumber)
          Generates a string of
 java.lang.String generateUntil(java.lang.String regex, int minLength, int maxLength)
          Continues generating tokens until a token matches 'regex', assuming the length of the output is between min and maxLength (inclusive).
 java.lang.String[] getCompletions(java.lang.String[] seed)
          Returns all possible next words (or tokens), ordered by probability, for the given seed array, or null if none are found.
 java.lang.String[] getCompletions(java.lang.String[] pre, java.lang.String[] post)
          Returns an unordered list of possible words w that complete an n-gram consisting of: pre[0]...pre[k], w, post[k+1]...post[n].
 int getNFactor()
          Returns the current n-value for the model
 java.util.Map getProbabilities(java.lang.String[] path)
          Returns the full set of possible next tokens (as a HashMap: String -> Float (probability)) given an array of tokens representing the path down the tree (with length less than n).
 float getProbability(java.lang.String singleToken)
          Returns the raw (unigram) probability for a token in the model, or 0 if it does not exist
 float getProbability(java.lang.String[] tokens)
          Returns the probability of obtaining a sequence of k character tokens were k <= nFactor, e.g., if nFactor = 3, then valid lengths for the String tokens are 1, 2 & 3.
 

Method Detail

generateUntil

java.lang.String generateUntil(java.lang.String regex,
                               int minLength,
                               int maxLength)
Continues generating tokens until a token matches 'regex', assuming the length of the output is between min and maxLength (inclusive).


generateTokens

java.lang.String generateTokens(int targetNumber)
Generates a string of
length
tokens from the model.


getNFactor

int getNFactor()
Returns the current n-value for the model


getCompletions

java.lang.String[] getCompletions(java.lang.String[] seed)
Returns all possible next words (or tokens), ordered by probability, for the given seed array, or null if none are found.

Note: seed arrays of any size (>0) may be input, but only the last n-1 elements will be considered.


getProbability

float getProbability(java.lang.String singleToken)
Returns the raw (unigram) probability for a token in the model, or 0 if it does not exist


getProbability

float getProbability(java.lang.String[] tokens)
Returns the probability of obtaining a sequence of k character tokens were k <= nFactor, e.g., if nFactor = 3, then valid lengths for the String tokens are 1, 2 & 3.


getCompletions

java.lang.String[] getCompletions(java.lang.String[] pre,
                                  java.lang.String[] post)
Returns an unordered list of possible words w that complete an n-gram consisting of: pre[0]...pre[k], w, post[k+1]...post[n]. As an example, the following call:
 getCompletions(new String[]{ "the" }, new String[]{ "ball" })
 
will return all the single words that occur between 'the' and 'ball' in the current model (assuming n > 2), e.g., ['red', 'big', 'bouncy']).

Note: For this operation to be valid, (pre.length + post.length) must be strictly less than the model's nFactor, otherwise an exception will be thrown.


getProbabilities

java.util.Map getProbabilities(java.lang.String[] path)
Returns the full set of possible next tokens (as a HashMap: String -> Float (probability)) given an array of tokens representing the path down the tree (with length less than n). If the input array length is not less than n, or the path cannot be found, or the endnode has no children, null is returned.

Note: As the returned Map represents the full set of possible next tokens, the sum of its probabilities will always be equal 1.

See Also:
getProbability(String)