A common problem faced by electronic information providers is determining the number of unique words in a document. The case of a word does not affect its uniqueness. For example, The, tHE and The are all considered equivalent. Punctuation can appear in these documents and is handled as follows:
1) Periods '.' and exclamation marks '!' may appear at the end of a sentence and should not be considered a word, or part of a word.
2) Dashes '-' appear between hyphenated words. The hyphenated words should be considered separately.
3) Commas ',' colons ':' and semicolons ';' appear within a sentence and should not be considered a word, or part of a word.
4) Apostrophes ' appear within contractions and possessive forms. These symbols should be treated as if they never appeared (i.e., as if they were deleted from the word).