9. On improving results of semantic search, semantic indexing and clustering.
Stop words are those that occur very often and have little specialized meaning, like 'the', 'and', 'of', ... Clustering benefits greatly from removing them from the dataset.
If your text corpus comes from scanning and #OCR'ed printed stuff you may have a problem with typos. Dictionaries can help here. There are also specialized ones for detecting spelling variants.
For more on practice: Read on...