logo
Ponte Academic Journal
Jan 2017, Volume 73, Issue 1

AN EFFICIENT DECISION TREE APPROACH FOR OPINION MINING FROM SKEWED TWITTER CORPUS USING CONFISCATE AND SUBSTITUTE TECHNIQUE

Author(s): salina adinarayana ,E. Ilavarasan

J. Ponte - Jan 2017 - Volume 73 - Issue 1
doi: 10.21506/j.ponte.2017.1.13



Abstract:
Data mining and knowledge discovery is the process of discovering knowledge from the real world datasets. One of the limitations of the real world datasets is the existence of contamination in the dataset. The existing algorithms performance will degrade due to the contamination in the real world datasets in the form of noisy, missing values and imbalance nature. In this paper, we propose a novel algorithm dubbed as Confiscate and Substitute Imbalance Data Learning (CSIDL) for better knowledge discovery from real world datasets. The process of confiscate is implemented in the majority subset for the removal of noisy, border line and missing instances and substitute of missing instances is done in the minority subset for improving the strength of the dataset. Experimental comparisons are done on six real world dataset with bench mark traditional algorithms. The experimental validation is also done using the 8 evaluation metrics as case study on imbalance twitter dataset with the benchmark C4.5 algorithm. The results suggest that the proposed CSIDL algorithm performed better than the compared algorithms in terms of Accuracy, AUC, Precision and F-measure. The experimental results also clearly indicate the effectiveness of the proposed approach in the scenario of skewed twitter corpus.
Download full text:
Check if you have access through your login credentials or your institution