Mathieu Tozer's Dev Blog

Cocoa, the development of Words, and other software projects (including those dang assessment tasks).




Component Brainstorm


E-mail this post



Remember me (?)



All personal information that you provide here will be governed by the Privacy Policy of Blogger.com. More...



Parser
Basically a parser in words (for language X) just breaks a 'text'
into word tokens, and feeds them into the dictionary.

The software should support the imports of the following file formats:
1. HTML
2. txt
3. rtf
4. pdf

Pre-processing of Texts
Aim to separate the pre-processing of texts from their original
format into 'parser friendly' format to reduce the complexity needed
by the parser itself. Ie HTML tags removed, rtf and pdf made into
plain text.

Ideally, the object passed to the parser must be a uniform plain(est)
text. English and English like languages should be in plain text.

User's Dictionary
An important Component. Each user has one for each language they
learn. Holds all the words a user knows at any given knowledge level.
Might be conceptually divided, and depending on performance and
implementation details, it might be a bundle of separate entities
acting together.
Input: a newWord object

reflects information held in word lists throughout the application.

Language Definition Dictionary.
This is a regular dictionary. One freely available and distributed
needs to be obtained.
Input: Word

Output: Definitions / translations.

There ought to be one of these for each language available in words.

The distributed installation should come with provisioning for at
least English to English, English to Japanese, Japanese to English,
and if available English to Chinese, Chinese to English. Other
European languages, if available, should be provided.
Available Dictionaries:
http://www.csse.monash.edu.au/~jwb/j_edict.html
http://www.mandarintools.com/download/cedict_readme.txt
http://www.csse.monash.edu.au/~jwb/j_jmdict.html

These dictionaries are constantly being updated. Updated dics should
be included in updates of words, if the option is not available for
the dict to be downloaded separately and installed.

Text Suggestion Lists.
This small and easily implemented feature might just be a list of
links to places where users can get good texts to start reading in
their chosen language.
Sources might include
1. Major Newspapers
2. Wikipedia(s)
3. ...

The list would be shown in an independent window pane for users to
jump from.

I would imagine that the major performance issues will arise with the
parser and the looking up of words in large dictionaries, however if
you look at WordLookup then it need not be too demanding.

//BrainWave//

What if I were to leave the parsing 'socket' unplugged for a moment,
so that I can concentrate on getting the MAIN functioning of the
program working, which is the management of word lists and
dictionaries. This I can now see will be what I will spend most time
on. I must specify the interface of the components so that plugging
any number of language parsers into it later is easy and well
planned, but I think the other functioning is more important.

The idea of an open socket for something to be plugged is an
interesting one. What if I did have the option for even third parties
to plug the socket?

What about a technical dictionary included as well so people can use
it to learn programming languages and look up technical definitions,
or things like cocoa terms and stuff.

Same goes for medical dictionaries.


0 Responses to “Component Brainstorm”

Leave a Reply

      Convert to boldConvert to italicConvert to link

 


+RSS | dev blog | Portfolio

About me

My status

Previous posts

  • my del.icio.us
  • my flickr