RiTa
index
Name RiLexicon
Description RiLexicon repesents the core 'dictionary' (or lexicon) for the RiTa tools. It contains ~35,000 words augmented with phonemic and syllabic data, as well as a list of valid parts-of-speech for each. The lexicon can be extended and/or customized for additional words, usages, or pronunciations.

Additionally the lexicon is equipped with implementations of a variety of matching algorithms (min-edit-distance, soundex, anagrams, alliteration, rhymes, looks-like, etc.) based on combinations of letters, syllables and phonemes. An example use:

    RiLexicon lex = new RiLexicon(this);
    String[] similars = lex.similarBySound("cat");
    String[] rhymes  = lex.getSimpleRhymes("cat");
    // etc.

Note: If you wish to modify or customize the lexicon (e.g., add words, or change pronunciations) you can do so by editing the 'rita_addenda.txt' file, found in $SKETCH_DIR/libraries/rita folder and placing the modifed version in the 'data' folder of your sketch.

Constructors
RiLexicon(pApplet);
Methods
containingStringsByLetter()   Returns valid words (in lexicon) using both substring and superstring matching.

This method, CONTAINS(K), is equivalent to UNION( SUB(K), SUPER(K) ).

contains()   Returns true if the word exists in the lexicon

getAlliterations()   Finds alliterations by comparing the phonemes of the input string to those of each word in the lexicon

getFeatures()  

getPosEntries()   Return the list of possible parts-of-speech for the word , or null if not found.

getRandomWord()   Returns a random word from the lexicon with the specified part-of-speech and target-length.

getRhymes()   Returns the rhymes for a given word or null if none found

Two words rhyme if their final stressed vowel and all following phonemes are identical.

getWords()   Returns the set of words in the lexicon (including those from user-addenda) that match the supplied regular expression. For example, getWords("ee"); returns 661 words with 2 or more consecutive e's, while getWords("ee.*ee"); returns exactyl 2: 'freewheeling' and 'squeegee'.

isAlliteration()   Returns true if the first stressed consonant of the two words match, else false.

Note: returns true if wordA.equals(wordB) and false if either (or both) are null;

isContaining()   Returns true if orig is a sub or super-string of toCheck.

isRhyme()   Returns true if the two words rhyme (that is, if their final stressed phoneme and all following phonemes are identical) else false. Note: returns true if wordA.equals(wordB) and false if either (or both) are null;

Note: at present doesn't use letter-to-sound engine if either word is not found in the lexicon, but instead just returns false. TODO

isStopWord()   Returns true if the word is a 'stop' (or 'closed-class') word else false. See http://en.wikipedia.org/wiki/Stop_words

isSubstring()   Returns true if orig is a substring of toCheck.

isSuperstring()   Returns true if orig is a superstring of toCheck.

iterator()   Returns an iterator over the words in lexicon matching the supplied regular expression.

posIterator()   Returns an iterator over the words in lexicon, for the supplied part-of-speech

preloadFeatures()   Use this method to preload the Lexicon with feature data (stress, syllables, pos, phones, etc). Increases the initialization time but speeds up all subsequent lookups by an order of magnitude. Useful when doing many lookups over the course of a program, especially with the RiTaServer. Example:
        RiLexicon lex = new RiLexicon();
        lex.preloadFeatures();
        // use the lexicon


RiLexicon.randomIterator()   Utility method that returns a random-iterator over the specified set.

randomPosIterator()   Returns an iterator over the words in lexicon, for the supplied part-of-speech beginning at a random offset.

similarByLetter()   Compares the characters of the input string (using a version of the min-edit distance algorithm) to each word in the lexicon, adding the set of closest matches to result, considering all matches where the edit distance >= 'minMed'.

If 'preserveLength' is true, the method will favor words of the same length as the input.

similarBySound()   Compares the phonemes of the input String to those of each word in the lexicon, returning the set of closest matches as a String[].

similarBySoundAndLetter()   First calls similarBySound(), then filters the result set by the algorithm used in similarByLetter(); (useful when similarBySound() returns too large a result set)

singleLetterDeletes()  

singleLetterInsertions()  

singleLetterSubtitutions()  

substringsByLetter()   Returns all valid substrings of the input word in the lexicon of length at least minLength

superstringsByLetter()   Returns all valid superstrings of the input word in the lexicon

RiLexicon.testAllits()  

Usage Web & Application