rita
Class RiConcorder

java.lang.Object
  extended by rita.RiObject
      extended by rita.RiConcorder
All Implemented Interfaces:
processing.core.PConstants, RiConstants

public class RiConcorder
extends RiObject

Maintains a simple word frequency table for a set of input data

    RiConcorder ric = new RiConcorder(this);
    ric.setIgnoreCase(false);
    ric.setIgnoreStopWords(false);
    ric.setIgnorePunctuation(false);
    ric.loadFile("myTestFile.txt");
    ric.dump();
    String[] mostCommon = ric.getMostCommonTokens(5);
    print(mostCommon);
    

Invisible:

Field Summary
 
Fields inherited from interface rita.support.RiConstants
BEHAVIOR_COMPLETED, BOUNDING_BOX_ALPHA, BRILL_POS_TAGGER, EASE_IN, EASE_IN_CUBIC, EASE_IN_EXPO, EASE_IN_OUT, EASE_IN_OUT_CUBIC, EASE_IN_OUT_EXPO, EASE_IN_OUT_QUARTIC, EASE_IN_OUT_SINE, EASE_IN_QUARTIC, EASE_IN_SINE, EASE_OUT, EASE_OUT_CUBIC, EASE_OUT_EXPO, EASE_OUT_QUARTIC, EASE_OUT_SINE, ESS, FADE_COLOR, FADE_IN, FADE_OUT, FADE_TO_TEXT, FIRST_PERSON, FUTURE_TENSE, ID, LERP, LINEAR, MAXENT_POS_TAGGER, MINIM, MOVE, MUTABLE, PAST_TENSE, PHONEME_BOUNDARY, PHONEMES, PLING_STEMMER, PLURAL, PORTER_STEMMER, POS, PRESENT_TENSE, SCALE_TO, SECOND_PERSON, SENTENCE_BOUNDARY, SINGULAR, SONIA, SPEECH_COMPLETED, STRESSES, SYLLABLE_BOUNDARY, SYLLABLES, TEXT, TEXT_ENTERED, THIRD_PERSON, TIMER, TIMER_COMPLETED, TIMER_TICK, TOKENS, UNKNOWN, WORD_BOUNDARY
 
Fields inherited from interface processing.core.PConstants
A, AB, ADD, AG, ALPHA, ALPHA_MASK, ALT, AMBIENT, AR, ARC, ARGB, ARROW, B, BACKSPACE, BASELINE, BEEN_LIT, BEVEL, BLEND, BLUE_MASK, BLUR, BOTTOM, BOX, BURN, CENTER, CENTER_DIAMETER, CENTER_RADIUS, CHATTER, CLOSE, CMYK, CODED, COMPLAINT, CONTROL, CORNER, CORNERS, CROSS, CUSTOM, DA, DARKEST, DB, DEG_TO_RAD, DELETE, DG, DIAMETER, DIFFERENCE, DILATE, DIRECTIONAL, DISABLE_ACCURATE_TEXTURES, DISABLE_DEPTH_SORT, DISABLE_DEPTH_TEST, DISABLE_OPENGL_2X_SMOOTH, DISABLE_OPENGL_ERROR_REPORT, DODGE, DOWN, DR, DXF, EB, EDGE, EG, ELLIPSE, ENABLE_ACCURATE_TEXTURES, ENABLE_DEPTH_SORT, ENABLE_DEPTH_TEST, ENABLE_NATIVE_FONTS, ENABLE_OPENGL_2X_SMOOTH, ENABLE_OPENGL_4X_SMOOTH, ENABLE_OPENGL_ERROR_REPORT, ENTER, EPSILON, ER, ERODE, ERROR_BACKGROUND_IMAGE_FORMAT, ERROR_BACKGROUND_IMAGE_SIZE, ERROR_PUSHMATRIX_OVERFLOW, ERROR_PUSHMATRIX_UNDERFLOW, ERROR_TEXTFONT_NULL_PFONT, ESC, EXCLUSION, G, GIF, GRAY, GREEN_MASK, HALF_PI, HAND, HARD_LIGHT, HINT_COUNT, HSB, IMAGE, INVERT, JAVA2D, JPEG, LEFT, LIGHTEST, LINE, LINES, LINUX, MACOSX, MAX_FLOAT, MAX_INT, MIN_FLOAT, MIN_INT, MITER, MODEL, MULTIPLY, NORMAL, NORMALIZED, NX, NY, NZ, OPAQUE, OPEN, OPENGL, ORTHOGRAPHIC, OTHER, OVERLAY, P2D, P3D, PATH, PDF, PERSPECTIVE, PI, platformNames, POINT, POINTS, POLYGON, POSTERIZE, PROBLEM, PROJECT, QUAD, QUAD_STRIP, QUADS, QUARTER_PI, R, RAD_TO_DEG, RADIUS, RECT, RED_MASK, REPLACE, RETURN, RGB, RIGHT, ROUND, SA, SB, SCREEN, SG, SHAPE, SHIFT, SHINE, SOFT_LIGHT, SPB, SPG, SPHERE, SPOT, SPR, SQUARE, SR, SUBTRACT, SW, TAB, TARGA, THIRD_PI, THRESHOLD, TIFF, TOP, TRIANGLE, TRIANGLE_FAN, TRIANGLE_STRIP, TRIANGLES, TWO_PI, TX, TY, TZ, U, UP, V, VERTEX_FIELD_COUNT, VW, VX, VY, VZ, WAIT, WHITESPACE, WINDOWS, X, Y, Z
 
Constructor Summary
RiConcorder()
          Constructs a new RiConcorder
RiConcorder(processing.core.PApplet pApplet)
          Constructs a new RiConcorder
RiConcorder(processing.core.PApplet pApplet, RiTokenizer tokenizer)
          Constructs a new RiConcorder using the specified tokenizer
RiConcorder(processing.core.PApplet pApplet, java.lang.String fileName)
          Constructs a new RiConcorder ands loads it with the data in fileName.
RiConcorder(processing.core.PApplet pApplet, java.lang.String[] fileNames)
          Constructs a new RiConcorder ands loads it with the data in fileName.
RiConcorder(processing.core.PApplet pApplet, java.lang.String[] fileNames, RiTokenizer tokenizer)
          Constructs a new RiConcorder using the specified tokenizer ands loads it with the data in fileName(s).
RiConcorder(RiTokenizer tokenizer)
          Constructs a new RiConcorder using the specified tokenizer
 
Method Summary
 void addLine(java.lang.String line)
          Add the data from a single line into the frequency table
 void addWord(java.lang.String word)
          Adds a single word to the model with a count of 1 if it does not yet exist, else increments its count by 1.
 void addWords(java.lang.String[] words)
          Adds the wordsto the model, incrementing their counts (and the total-count) for each.
 void clear()
          Clears the model, resets variables, and prepares it for reloading with new data
 boolean contains(java.lang.String word)
          True if the concordance contains word, else false
 void dump()
           
 int getCount(java.lang.String word)
          Returns the # of occurences of word or 0 if the word does not exist in the table.
 java.lang.String[] getLeastCommonTokens(int numberToReturn)
          Returns the numberToReturn words with the highest frequency.
 java.lang.String[] getMostCommonTokens(int numberToReturn)
          Returns the numberToReturn words with the highest frequency.
 float getProbability(java.lang.String word)
          Returns the normalized frequency (probability) of word, 1 if it is the only word in the model, 0 if it does not exist.
 boolean isIgnoringCase()
          Returns whether the model is ignoring case by considering all words as lowerCase (default=false)
 boolean isIgnoringPunctuation()
          Returns whether the model is ignoring punctuation (default = true)
 boolean isIgnoringStopWords()
          Returns whether the model is ignoring stopWords (default = false)
 void loadFile(java.lang.String fileName)
          Loads the data from the file into a frequency table
 void loadFiles(java.lang.String[] fileNames)
          Loads the data from the files into a single frequency table
static void main(java.lang.String[] args)
           
 void setIgnoreCase(boolean ignoreCase)
          Sets whether the model should ignore case (default=false), treating all tokens as lower-case
 void setIgnorePunctuation(boolean ignore)
          Sets whether the model should ignore punctuation (default = true)
 void setIgnoreStopWords(boolean ignoreStopWords)
          Sets whether the model should ignore stopWords (default = false)
 void setWordsToIgnore(java.lang.String[] wordsToIgnore)
          Tells the model to ignore this set of words
 int totalCount()
          Returns the total # of entries in the model.
 int uniqueCount()
          Returns the # of unique words in the model.
 
Methods inherited from class rita.RiObject
dispose, getId, getPApplet, nextId
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

RiConcorder

public RiConcorder(processing.core.PApplet pApplet,
                   java.lang.String[] fileNames,
                   RiTokenizer tokenizer)
Constructs a new RiConcorder using the specified tokenizer ands loads it with the data in fileName(s).


RiConcorder

public RiConcorder(processing.core.PApplet pApplet,
                   java.lang.String fileName)
Constructs a new RiConcorder ands loads it with the data in fileName.


RiConcorder

public RiConcorder(processing.core.PApplet pApplet,
                   java.lang.String[] fileNames)
Constructs a new RiConcorder ands loads it with the data in fileName.


RiConcorder

public RiConcorder(processing.core.PApplet pApplet,
                   RiTokenizer tokenizer)
Constructs a new RiConcorder using the specified tokenizer


RiConcorder

public RiConcorder(processing.core.PApplet pApplet)
Constructs a new RiConcorder


RiConcorder

public RiConcorder()
Constructs a new RiConcorder

Invisible:

RiConcorder

public RiConcorder(RiTokenizer tokenizer)
Constructs a new RiConcorder using the specified tokenizer

Invisible:
Method Detail

loadFiles

public void loadFiles(java.lang.String[] fileNames)
Loads the data from the files into a single frequency table


setWordsToIgnore

public void setWordsToIgnore(java.lang.String[] wordsToIgnore)
Tells the model to ignore this set of words


addLine

public void addLine(java.lang.String line)
Add the data from a single line into the frequency table


getCount

public int getCount(java.lang.String word)
Returns the # of occurences of word or 0 if the word does not exist in the table.


getProbability

public float getProbability(java.lang.String word)
Returns the normalized frequency (probability) of word, 1 if it is the only word in the model, 0 if it does not exist.


getMostCommonTokens

public java.lang.String[] getMostCommonTokens(int numberToReturn)
Returns the numberToReturn words with the highest frequency. If there are less than numberToReturn words then all items are returned.


getLeastCommonTokens

public java.lang.String[] getLeastCommonTokens(int numberToReturn)
Returns the numberToReturn words with the highest frequency. If there are less than numberToReturn words then all items are returned.


totalCount

public int totalCount()
Returns the total # of entries in the model.


uniqueCount

public int uniqueCount()
Returns the # of unique words in the model.


addWords

public void addWords(java.lang.String[] words)
Adds the wordsto the model, incrementing their counts (and the total-count) for each.


addWord

public void addWord(java.lang.String word)
Adds a single word to the model with a count of 1 if it does not yet exist, else increments its count by 1.


contains

public boolean contains(java.lang.String word)
True if the concordance contains word, else false


loadFile

public void loadFile(java.lang.String fileName)
Loads the data from the file into a frequency table


clear

public void clear()
Clears the model, resets variables, and prepares it for reloading with new data


dump

public void dump()
Invisible:

isIgnoringCase

public boolean isIgnoringCase()
Returns whether the model is ignoring case by considering all words as lowerCase (default=false)


setIgnoreCase

public void setIgnoreCase(boolean ignoreCase)
Sets whether the model should ignore case (default=false), treating all tokens as lower-case


isIgnoringStopWords

public boolean isIgnoringStopWords()
Returns whether the model is ignoring stopWords (default = false)

See Also:
RiTa.STOP_WORDS

setIgnoreStopWords

public void setIgnoreStopWords(boolean ignoreStopWords)
Sets whether the model should ignore stopWords (default = false)

See Also:
RiTa.STOP_WORDS

isIgnoringPunctuation

public boolean isIgnoringPunctuation()
Returns whether the model is ignoring punctuation (default = true)

See Also:
RiTa.STOP_WORDS

setIgnorePunctuation

public void setIgnorePunctuation(boolean ignore)
Sets whether the model should ignore punctuation (default = true)


main

public static void main(java.lang.String[] args)