| disableSentenceProcessing() |
|
Tells the model to ignore (english-like) sentences in its input
and treat all text tokens the same.
|
| generateSentence() |
|
Generates a sentence from the model.
Note: multiple sentences generated by this method WILL NOT follow
the model across sentence boundaries; thus the following two calls
are NOT equivalent:
String[] results = markov.generateSentences(10);
and
for (int i = 0; i < 10; i++) {
results[i] = markov.generateSentence();
}
The latter will create 10 sentences with no explicit relationship
between one and the next; while the former will follow probabilities
from one sentence (across a boundary) to the next.
|
| generateSentences() |
|
Generates some # (one or more) of sentences from the model.
Note: multiple sentences generated by this method WILL follow
the model across sentence boundaries; thus the following two calls
are NOT equivalent:
String[] results = markov.generateSentences(10);
and
for (int i = 0; i < 10; i++) {
results[i] = markov.generateSentence();
}
The latter will create 10 sentences with no explicit relationship
between one and the next; while the former will follow probabilities
from one sentence (across a boundary) to the next.
|
| generateTokens() |
|
Generates a string of length tokens from the model.
|
| getCompletions() |
|
Returns an unordered list of possible words w that complete
an n-gram consisting of: pre[0]...pre[k], w, post[k+1]...post[n].
As an example, the following call:
getCompletions(new String[]{ "the" }, new String[]{ "ball" })
will return all the single words that occur between 'the' and 'ball'
in the current model (assuming n > 2), e.g., ['red', 'big', 'bouncy']).
Note: For this operation to be valid, (pre.length + post.length)
must be strictly less than the model's nFactor, otherwise an
exception will be thrown.
|
| getMaxSentenceLength() |
|
Returns the maximum # of words allowed in a generated sentence
|
| getMinSentenceLength() |
|
Returns the minimum # of words allowed in a generated sentence
|
| getNFactor() |
|
Returns the current n-value for the model
|
| getProbabilities() |
|
Returns the full set of possible next tokens (as a HashMap:
String -> Float (probability)) given an array of tokens
representing the path down the tree (with length less than n).
If the input array length is not less than n, or the path cannot be
found, or the endnode has no children, null is returned.
Note: As the returned Map represents the full set of possible next
tokens, the sum of its probabilities will always be equal 1.
|
| getProbability() |
|
Returns the probability of obtaining
a sequence of k character tokens were k <= nFactor,
e.g., if nFactor = 3, then valid lengths
for the String tokens are 1, 2 & 3.
|
| getRoot() |
|
|
| getWordCount() |
|
Returns the # of words loaded into the model
|
| isPrintingIgnoredText() |
|
|
| isRemovingQuotations() |
|
Returns whether the model is ignoring quotations found in the input
|
| isSmoothing() |
|
Returns whether (add-1) smoothing is enabled for the model
|
| loadFile() |
|
Load a text file into the model -- if using Processing,
the file should be in the sketch's data folder.
|
| loadSentences() |
|
Loads an array of sentences into the model; each
element in the array must be a single sentence for
proper parsing.
|
| loadText() |
|
Load a String into the model, splitting the text first into sentences,
then into words, according to the current regular expression.
|
| loadTokens() |
|
Loads an array of tokens (or words) into the model; each
element in the array must be a single token for proper
constuction of the model.
|
| printTree() |
|
Outputs a String representing the models probability tree using
the supplied print stream (or System.out).
NOTE: this method will block for potentially long periods of time
on large models.
|
| setAllowDuplicates() |
|
Determines whether calls to generateSentence(s) will return
sentences that exist (character-for-character) in the input text(s).
Note: The trade-off here is between ensuring novel outputs
and a potential slow-down due to rejected outputs (b/c they
exist in the input text.) Use with care as setting this to true for
large models may result in excessive memory use.
|
| setMaxSentenceLength() |
|
Sets the maximum # of words allowed in a generated sentence (default=35)
|
| setMinSentenceLength() |
|
Sets the minimum # of words allowed in a generated sentence (default=6)
|
| setPrintIgnoredText() |
|
|
| setRecognizeSentences() |
|
Sets whether the model will try to recognize
(english-like) sentences in its input (default=true).
|
| setRemoveQuotations() |
|
Tells the model whether to ignore various quotations types in the input (default=true)
|
| setTokenizerRegex() |
|
Creates a new RegexTokenizer from the supplied regular expression
and uses it when adding subsequent data to the model.
|
| setUseSmoothing() |
|
Toggles whether (add-1) smoothing is enabled for the model.
Should be called before any data loading is done.
|