  1. Your guess, or the target word, is polysemous, and the meaning that is similar is rarely used. This is why "leather" is far from "patent." Sometimes one usage is simply more popular (among newspaper reporters, which is the corpus): "display" is more often a verb than a noun, and its vector reflects this.
  2. You capitalized your word. SmartKey and some other keyboards stupidly ignore the autocapitalize settings that I have explicitly set in the HTML, and there does not seem to be anything I can do about this. I added a checkbox to help you avoid this.
  3. Your word and the target word belong to different parts of speech. Sometimes this matters a lot. Sometimes it matters only a little.
  4. By "similarity", we really mean "used in similar contexts". The principle was articulated by John Rupert Firth, who wrote, "[Y]ou shall know a word by the company it keeps." So, "love" and "hate" may seem like opposites, but they will often score similarly. The actual opposite of "love" is probably something like "Arizona Diamondbacks", or "carburetor".

The data set is what it is -- it's not perfect, and I can't afford enough computing power (or a big enough corpus) to try to make a better one. The technique has limitations. Sometimes, they'll bite you and you'll lose.

Unusual word found! This word is not in the list of "normal" words that we use for the top-1000 list, but it is still similar! (Is it maybe capitalized?)

No clue why the data set doesn't include them. Perhaps they were just warping the scores of everything else, since they are used in so many different contexts?

The data seems to be normalized to US spelling. Semantle tries to automatically Americanize your spelling (in the cases where only the American version is in the data set). Of course, they probably couldn't normalize some words, like 'biscuit', 'lift', or 'pants', so for those you're on your own.

