BIGRAMS AND CHUNKING: ADVANTAGES FOR USING IN AUTOMATIC SPELLING CORRECTION IN RUSSIAN AND ENGLISH
Author(s): Vladimir Polyakov ,Ivan Anisimov, Elena Makarova
J. Ponte - Oct 2017 - Volume 73 - Issue 10
doi: 10.21506/j.ponte.2017.10.10
Abstract:
The present research is concerned with the problem of automatic spelling correction for Russian and English. The program realized in a batch mode draws upon chunking - a model of an incomplete syntactic analysis. Basing on the previous version of the program and its advantages and shortcomings, we made a decision to introduce a stage of analysis using bigrams into the chunking pipeline, which considerably increased the efficiency of spelling correction. Unlike other programs that presuppose an interactive mode with a human interference, the spelling corrector described in the present paper is completely automatic, i.e. the program itself chooses the best variant of a correction and makes the necessary replacement. The work of the program was tested on two mini-collections (for Russian and for English) of a hundred clauses each collected from Twitter. Though there is still room for improvement, the results testify to the fact that joint use of bigrams and chunks has great potential.
|