Ponte Academic Journal Nov 2015, Volume 71, Issue 11 |
Modelling Highly Inflective Language for Target Applications Using Natural Language Author(s): Mirjam Sepesy Maucec, Janez Brest, Andrej Zgank J. Ponte - Nov 2015 - Volume 71 - Issue 11 Abstract: Language models are widely used in different applications including speech recognition, machine translation, handwritten recognition, stenographic codes conversion, and information retrieval. In such applications, language models are meant for constraining the search space by delivering a priori probabilities of possible word sequences. Language is a robust and necessarily redundant communication mechanism. Its redundancies commonly manifest themselves as predictable patterns in word sequences, and it is largely these patterns that enable language modelling. Several methods for statistical language modelling were originally developed for English and declared as language independent. Although they do not incorporate linguistic knowledge of the English language, the results for other languages are only modestly successful. Our general goal is the treatment of inflective languages. The idea of the paper is to adjust language modelling methods to make them more powerful when modeling inflective languages. High inflection in a language is correlated with some degree of word-order flexibility. Morphological features either directly identify or help disambiguate the syntactic participants of a sentence. Modelling morphological features in a language not only provides an additional source of information but also alleviate data sparsity problems. In this research Slovenian language is taken as an example of highly inflective languages. The results of comparative analysis of four language model types are presented: word-based, lemma-based, POS (Part-Of-Speech)-based and MSD (Morpho-Syntactic-Description)-based language models. Some combinations of them in terms of linear interpolation are investigated. Experiments are performed using the largest Slovenian corpus FidaPLUS. It is lemmatized and tagged with POS and MSD tags. Constructed language models are evaluated by perplexity values. Our experiments prove that interpolated models outperform a classical language model. The use of language models is demonstrated in two prototype systems: speech recognition and machine translation.
|
Download full text: Check if you have access through your login credentials or your institution |
|
Guide for Authors
This guideline has been prepared for the authors to new submissions and after their manuscripts have been accepted |
Authors Login
We welcome refrees who would be willing to act as reviewers |
Paper Tracking
You can track your submitted article from this tab |
Editorial Board
The international editorial board is headed by Dr. Maria E. Boschi |
General Policies
Papers that are published or held by the Journal may not be published elsewhere |
Peer Review Process
Papers will be sent to three peer reviewers for evaluation |