- Inspired by the spelling checker in various Search-Engines, Office Packages and many more, here is an attempt to implement spelling-corrector in Erlang.
- Norvig(Director of Research at Google Inc) in 2007 had released the Toy Spelling Corrector in Python(only 21 lines),achieving 80 or 90% accuracy at a processing speed of at least 10 words per second in about half a page of code.
- He had released it after his two friends Dean and Bill were amazed at Google's spelling correction and did not have good intuitions about how the process works,though being highly accomplished engineers and mathematicians.
- It takes reference of words from big.txt which has about a million words(The same was used by Norvig in his implementation of Spell-Corrector).
- All the words of the file big.txt are splitted and saved as a list.
- New list is formed with various edits from the 4 functions(
deletion_edits
,transposition_edits
,alteration_edits
,insertion_edits
). - After which list is filtered by comparing the words of list formed by big.txt and the list formed by various edits, and returns a list with the similarities found.
- Clone the repository after forking it and then head to the Erlang Shell.
- Change the directory to cloned repository.
- Compile it.
- Input a word in double quotes and check the recommendations given.
- For my system after heading to Erlang Shell, it is as follows
1> cd("C:/Users/Mishal Shah/Desktop/Erlang").
C:/Users/Mishal Shah/Desktop/Erlang
ok
2> c(check).
{ok,check}
3> check:known("helo").
Did you mean?
["felo","halo","held","hell","hello","helm","help","hero"]
4> check:known("seach").
Did you mean?
["beach","each","reach","search","teach"]
5> check:known("somthing").
Did you mean?
["something","soothing"]
- The time noted is the average of 6 outputs of timer function.
Word | 3rd Release time(in seconds) | 2nd Release time(in seconds) | 1st Release time(in seconds) |
somthing | 3.09 | 4.525 | 13.3 |
seach | 3.05 | 4.46 | 9.5 |
helo | 2.8 | 4.5 | 8.2 |
- Work on run-time speed.
- Work on increasing accuracy.
- Work on spell-checker in more than one word.
- This repository is under MIT License
FootNotes
- Norvig's original post here