RiTa
index
Name RiLexicon
Description RiLexicon represents the core 'dictionary' (or lexicon) for the RiTa tools. It contains ~35,000 words augmented with phonemic and syllabic data, as well as a list of valid parts-of-speech for each. The lexicon can be extended and/or customized for additional words, usages, or pronunciations.

Additionally the lexicon is equipped with implementations of a variety of matching algorithms (min-edit-distance, soundex, anagrams, alliteration, rhymes, looks-like, etc.) based on combinations of letters, syllables and phonemes. An example use:

    RiLexicon lex = new RiLexicon(this);
    String[] similars = lex.similarBySound("cat");
    String[] rhymes  = lex.getSimpleRhymes("cat");
    // etc.

Note: If you wish to modify or customize the lexicon (e.g., add words, or change pronunciations) you can do so by editing the 'rita_addenda.txt' file, found in $SKETCH_DIR/libraries/rita folder and placing the modifed version in the 'data' folder of your sketch.

Constructors
RiLexicon(pApplet);
RiLexicon(pApplet, lexiconFile);
Methods
containingStringsByLetter()   Returns valid words (in lexicon) using both substring and superstring matching.

This method, CONTAINS(K), is equivalent to UNION( SUB(K), SUPER(K) ).

contains()   Returns true if the word exists in the lexicon

getAlliterations()   Finds alliterations by comparing the phonemes of the input string to those of each word in the lexicon

getFeatures()  

getLexicalData()   Returns the raw data (as a Map) used in the lexicon, allowing for deletion or modification of existing lexical entires. Modifications to this Map will be immediately reflected in all operations on the lexicon.

getPosEntries()   Return the list of possible parts-of-speech for the word , or null if not found.

getPosStr()   Returns

getRandomWord()   Returns a random word from the lexicon with the specified part-of-speech and target-length, or null if no such word exists.

getRandomWordWithSyllableCount()   Returns a random word from the lexicon with the specified part-of-speech and syllable-count, or null if no such word exists.

getRhymes()   Returns the rhymes for a given word or null if none found

Two words rhyme if their final stressed vowel and all following phonemes are identical.

getWords()   Returns the set of words in the lexicon (including those from user-addenda) that match the supplied regular expression. For example, getWords("ee"); returns 661 words with 2 or more consecutive e's, while getWords("ee.*ee"); returns exactyl 2: 'freewheeling' and 'squeegee'.

isAlliteration()   Returns true if the first stressed consonant of the two words match, else false.

Note: returns true if wordA.equals(wordB) and false if either (or both) are null;

isContaining()   Returns true if orig is a sub or super-string of toCheck.

isRhyme()   Returns true if the two words rhyme (that is, if their final stressed phoneme and all following phonemes are identical) else false. Note: returns false if wordA.equals(wordB) or if either (or both) are null;

Note: at present doesn't use letter-to-sound engine if either word is not found in the lexicon, but instead just returns false. TODO

isStopWord()   Returns true if the word is a 'stop' (or 'closed-class') word else false. See http://en.wikipedia.org/wiki/Stop_words

isSubstring()   Returns true if orig is a substring of toCheck.

isSuperstring()   Returns true if orig is a superstring of toCheck.

iterator()   Returns an iterator over the words in lexicon matching the supplied regular expression.

posIterator()   Returns an iterator over the words in lexicon, for the supplied part-of-speech

preloadFeatures()   Use this method to preload the Lexicon with feature data (stress, syllables, pos, phones, etc). Increases the initialization time but speeds up all subsequent lookups by an order of magnitude. Useful when doing many lookups over the course of a program, especially with the RiTaServer. Example:
        RiLexicon lex = new RiLexicon();
        lex.preloadFeatures();
        // use the lexicon


RiLexicon.randomIterator()   Utility method that returns a random-iterator over the specified set.

randomPosIterator()   Returns an iterator over the words in lexicon, for the supplied part-of-speech beginning at a random offset.

setLexicalData()   Sets the raw data to be used in the lexicon, replacing all default words and features with those specified in the map. When using this method, be sure to exactly match the format as specified rita_addenda.txt, e.g.,
##############################################################################
#### FORMAT##:   ...  |   ... 
##############################################################################

blog: b-l-ao-g  | nn vbg
cepstral: k-eh1-p s-t-r-ax-l  | nnp
freetts:  f-r-iy1 t-iy t-iy eh-s  | nnp
jsapi:  jh-ey s-ae1-p iy  | nnp
 


similarByLetter()   Compares the characters of the input string (using a version of the min-edit distance algorithm) to each word in the lexicon, adding the set of closest matches to result, considering all matches where the edit distance >= 'minMed'.

If 'preserveLength' is true, the method will favor words of the same length as the input.

similarBySound()   Compares the phonemes of the input String to those of each word in the lexicon, returning the set of closest matches as a String[].

similarBySoundAndLetter()   First calls similarBySound(), then filters the result set by the algorithm used in similarByLetter(); (useful when similarBySound() returns too large a result set)

singleLetterDeletes()  

singleLetterInsertions()  

singleLetterSubtitutions()  

substringsByLetter()   Returns all valid substrings of the input word in the lexicon of length at least minLength

superstringsByLetter()   Returns all valid superstrings of the input word in the lexicon

RiLexicon.mainX()  

RiLexicon.testRhymes()  

Usage Web & Application