next up previous contents
Next: C. Support for Gcc Up: Aspell .33.7.1 alpha A Previous: A. Changelog
  Contents

Subsections

  * B.1 Things that will be done real soon
  * B.2 Things that need to be done
  * B.3 Things that I would like to get done
  * B.4 Things that will be done eventually
  * B.5 Good ideas that are worth implementing
  * B.6 Things that are not likely to get implemented
  * B.7 Notes and Status of various items
      + B.7.1 Affix Compression
      + B.7.2 Extremely Large Dictionaries
      + B.7.3 General region skipping
      + B.7.4 Word skipping by context
      + B.7.5 Hidden Markov Model
      + B.7.6 Email the Personal Dictionary
      + B.7.7 Words With Spaces in Them

--------------------------------------------------------------------------

B. To Do

Words in bold indicate how you should refer to the item when discussing it
with me or others.

B.1 Things that will be done real soon

These items should get done within a release or two.

  * Totally rewrite the aspell international support. See http://
    aspell.sourceforge.net/international/for more information.
  * Rework the aspell check function to provide support for using any
    number of filters which will be needed for international support.
  * Add support for more intelligently coming up with suggestions for
    words that are run-togethers.

B.2 Things that need to be done

Things items will eventually be implemented as I know they are important
however I am not sure when they will get done.

  * Figure out a way for Aspell to work better with extremely large
    dictionaries.

B.3 Things that I would like to get done

These items will eventually be implemented. I hope to have them all done
before I move aspell to beta testing. They are in the approximate order of
when they will get done.

B.4 Things that will be done eventually

I plan on doing these things eventually. It is just a matter of getting
around to it.

B.5 Good ideas that are worth implementing

These items all sound like good ideas however I am not sure when I will
get to implementing then if ever. If you are looking for a way to
contribute picking up on one of these ideas would be a great way to start.
They are presented in no particular order.

  * Use Lawrence Philips' new Double Metaphone algorithm. See http://
    aspell.sourceforge.net/metaphone/.
  * Add support for affix compression.
  * Come up with a plug-in for gEdit the gnome text editor.
  * Change languages (and thus dictionaries) based on the information in
    the actual document.
  * Come up with a nroff mode for spell checking.
  * Come up with a mode that will skip words based on the symbols that
    (almost) always surround the word. (Word skipping by context)
  * Create two server modes for Aspell. One that uses the DICT protocol
    and one that uses ispell -a method of communication of some arbitrary
    port.
  * Come up with thread safe personal dictionaries.
  * Use the Hidden Markov Model to base the suggestions on not only the
    word itself but on the context around the word.
  * Having a way to email the personal dictionary and/or replacement list
    to a particular address either periodical or when it grows to a
    certain size.
  * Be able to accept words with spaces in them as many languages have
    words such as as a word in a foreign phrases which only make sense
    when followed by other words.

The following good ideas where found in the ispell WISHES file so I
thought I would pass them on.

  * Ispell should be smart enough to ignore hyphenation signs, such as the
    TEX \- hyphenation indicator.
  * (Jeff Edmonds) The personal dictionary should be able to remove
    certain words from the master dictionary, so that obscure words like
    "wether" wouldn't mask favorite typos.
  * (Jeff Edmonds) It would be wonderful if ispell could correct inserted
    spaces such as "th e" for "the" or even "can not" for "cannot".
  * Since ispell has dictionaries available to it, it is conceivable that
    it could automatically determine the language of a particular file by
    choosing the dictionary that produced the fewest spelling errors on
    the first few lines.

B.6 Things that are not likely to get implemented

Theses ideas are not likely to get implemented any time soon.

  * (None Yet)

B.7 Notes and Status of various items


B.7.1 Affix Compression

Due to the current way my spell checker works implementing affix
compression would be next to impossible. Nevertheless, I do realize that
for some languages affix compression is very important.

So to solve this dilemma I plan on having two different modes of my spell
checker: One with affix compression that does not use soundslike pairs
(much like ispell) and one without affix compression that does use
soundslike.

I plan to extract the affix manipulation code from Ispell with the help of
an Ispell author. The tricky part would be getting this to getting this
all to work properly at tun time bases on the dictionary used.

B.7.2 Extremely Large Dictionaries

This problem extends back to the fact of the way words are index is
Aspell. This problem will get resolved when I implanted the affix
compression mode as only one index would be used.

B.7.3 General region skipping

I want to implement this give other people an idea of how it should be
done and because I am really sick of having to spell check through url and
email address.

B.7.4 Word skipping by context

This was posted on the Aspell mailing list on January 1, 1999:

I had an idea on a great general way to determine if a word should be
skipped. Determine the words to skip based on the symbols that (almost)
always surround the word.

For example when asked to check the following C++ code:

    cout  "My age is: "  num  endl;  
    cout  "Next year I will be "  num + 1  endl; 

cout, num, and endl will all be skipped. "cout" will be skipped because it
is always preceded by a . "num" will be skipped because it is always
preceded by a . And "endl" will be skipped because it is always between a
 and a ;.

Given the following html code.

    <table width=50% cellspacing=0 cellpadding=1>  
    <tr><td>One<td>Two<td>Three  
    <tr><td>1<td>2<td>3  
    </table> 
     
    <table cellspacing=0 cellpadding=1>  
    </table>

table, width cellspacing, cellpadding, tr, td will all be skipped because
they are always enclosed in "<>". Now of course table and width would be
marked as correct anyway however there is no harm in skipping them.

So I was wondering if anyone on this list has any experience in writing
this sort of context recognition code or could give me some pointers in
the right direction.

This sort of word skipping will be very powerful if done right. I imagine
that it could replace specific spell checker modes for Tex, Nroff, SGML
etc because it will automatically be able to figure out where it should
skip words. It could also probably do a very good job on programming
languages code.

If you are interested in helping be out with this or just have general
comments about the idea please let me know.

B.7.5 Hidden Markov Model

Knud Haugaard Srensen suggested this one. From his email on the Aspell
mailing list:

    consider this examples.

    a fone number. -> a phone number.
    a fone dress. -> a fine dress.

    the example illustrates that the right correction might depend on the
    context of the word. So I suggest that you take a look on HMM to solve
    this problem.

    This might also provide a good base to include grammar correction in
    aspell.

    see this link http://www.cse.ogi.edu/CSLU/HLTsurvey/ch1node7.html

I think it is a great idea. However unfortunately it will probably be very
complicated to implement. Perhaps in the far future.

B.7.6 Email the Personal Dictionary

Some one suggest in a personal email:

    Have you thought of adding a function to aspell, that - when the
    personal dictionary has grown significantly - sends the user's
    personal dictionary to the maintainer of the corresponding aspell
    dictionary? (if the user allows it)

    It would be a very useful service to the dictionary maintainers, and I
    think most users can see their benefit in it too.

And I replied:

    Yes I have considered something like that but not for the personal
    dictionaries but rather the replacement word list in order to get
    better test data for http://aspell.sourceforge.net/test/. The problem
    is I don't know of a good way to do this sense Aspell can also be used
    as a library. It also is not a real high priority, especially sense I
    would first need to learn how to send email within a C++ program.

B.7.7 Words With Spaces in Them

While this is something I would like to do it is not a simple task. The
basic problem is that when tokenizing a string there is no good way to
keep phrases together. So the solution is to some how add special
conditions to certain words which will dictate which words can come before
/after it. Then there is also a problem of how to come up with intelligent
suggestions. What further complicates things is that many applications
send words to Aspell a word at a time. So even if Aspell did support such
a thing many applications that would use Aspell will not. So, in order for
this to work applications will need to send text to Aspell a document or
at least a sentence at a time. Unfortunately the framework for doing this
is not there yet. It will be once I finish the filter interface. Another
possible is to provide call back functions in which Aspell will be able to
request the previous or next word on request. Yet again the framework for
doing this is not there. Perhaps sometime in the near future.

--------------------------------------------------------------------------
next up previous contents
Next: C. Support for Gcc Up: Aspell .33.7.1 alpha A Previous: A. Changelog
  Contents
Kevin Atkinson 2001-08-19
