logo
Ponte Academic Journal
Apr 2017, Volume 73, Issue 4

MODELLING OF MISSING DATA IMPUTATION METHODS ON GENE EXPRESSION DATA

Author(s): V. Sujatha ,Shaheda Akthar

J. Ponte - Apr 2017 - Volume 73 - Issue 4
doi: 10.21506/j.ponte.2017.4.33



Abstract:
With the rapid growth of parallel and big data computing technologies, finance, engineering, biometrics, neuro-imaging and bio-informatics data processing activities has become common in industry and research areas. All of these disciplines are based on analyzing high dimensional data. Many real worlds datasets of Bio-Informatics may contain encoded information, but this formatted information in in large scale data with several repeating patterns. These patterns are considered as features which are required for analysis to analyze gene properties and patterns. Gene expression data is recorded in large scale on storage devices using various mechanisms and the recording procedure may be influenced by several different factors like device failures, malfunction in recording equipment or improper recording procedure used and so on. This might lead to wrong or misinterpretable or noise data recorded here and there in the large volume of invocation. This causes missing information in the large data sets and is considered as less quality data,because in data deposits misplaced values of bulk divisions exists. To produce quality information by refined Manifold imputation methods are required to estimate the missing vales in the datasets. In this paper general Manifold Imputation(MI), Joint Probability Distribution(IJPD), and Data Escalation(DE) are employed and compared. Results show that Imputation by Additive Regression(IAR) gives better results than Manifold Imputation method.
Download full text:
Check if you have access through your login credentials or your institution