Mathieu Tozer's Dev Blog

Cocoa, the development of Words, and other software projects (including those dang assessment tasks).




Language Detection


E-mail this post



Remember me (?)



All personal information that you provide here will be governed by the Privacy Policy of Blogger.com. More...



I wrote something on language detection too. Haven't proof read. Too tired.

--

How might Words detect which language it is looking at?
The most simple strategy might be to search dictionaries of the entered word, and the one that returns a hit is the language being used. But when there are similar words that appear in different languages, there needs to be a better strategy, so that the computer doesn't get confused.

Actually, this could be A FEATURE. The option could be that if the language is found in the dictionary of the 'active' language, then it is added, and no further searching is done. But If it is first found in other languages... actually no, the option should be that all dictionaries are searched, and added to, or all dictionaries are searched, and only the active language is added to. This should work for most people, since at any one time a user would be learning only a few languages, and so their words to remember list shouldn't grow too quickly.

So in conclusion, a word is entered to words, it is either searched for in all dictionaries, and added to all languages it returns results for, or it only searches the 'active' language. Which means that if the user switches languages, they have to let words know somehow.

If the word is not found in the active language, THEN words searches the other dictionaries of languages the user is learning, and if it finds a hit, adds it. If ti finds hits in multiple language dictionaries, then it must then ask the user which language. OR it could make intelligent languages, like if there is any hiragana surrounding the word, or the encoding. The encoding of the selection, if it makes it though the copy and paste / services insertion method, might be very useful in unambiguously finding which language is being inserted. A simple reference lookup table would show which language we're talking about!

After all we're talking about rich text here, not just plain text, which means that there's all kind of delicious metadata waiting to be used.


This language detection thing could be extended to be incorporated by browsers for encoding detection, to remove that irritating view->encoding->Japanese. 1. 2. 3 aah there we go I can actually read that kind of thing. If I can figure out a way to make software workout whether a language is displaying properly then that would be good.

--

I love it, I have to record on the sheet all the events that happen in the solarium in a day. Right down to the beds flicking on automatically occasionally


0 Responses to “Language Detection”

Leave a Reply

      Convert to boldConvert to italicConvert to link

 


+RSS | dev blog | Portfolio

About me

My status

Previous posts

  • my del.icio.us
  • my flickr