Refurio Anachro is a user on mastodon.cloud. You can follow them or interact with them if you have an account anywhere in the fediverse. If you don't, you can sign up here.

0. LATENT SEMANTIC ANALYSIS and WORD2VEC, what are they?

Anchor post for my upcoming introduction to and ! Read on...

Refurio Anachro @RefurioAnachro

9. On improving results of semantic search, semantic indexing and clustering.

Stop words are those that occur very often and have little specialized meaning, like 'the', 'and', 'of', ... Clustering benefits greatly from removing them from the dataset.

If your text corpus comes from scanning and 'ed printed stuff you may have a problem with typos. Dictionaries can help here. There are also specialized ones for detecting spelling variants.

For more on practice: Read on...