USING WEKA FRAMEWORK IN DOCUMENT CLASSIFICATION
Abstract
Text document classification problem is a special case of a supervised data mining problem. In order to solve a text document classification problem some steps are required to fulfill. The common steps are: feature extraction, feature selection, classification, evaluation and visualization. The WEKA is a framework that helps us with all these steps. WEKA was initially developed as a library of java classes that help us to implement data mining applications. In the last years, in order to avoid java programming skills, the components from WEKA are also available into a visual form inside “WEKA Knowledge Flow Environment”. We have studied and present in this paper some of the most important visual components that are available in the WEKA framework for the previously presented steps. These components are: “Arff Loader”, “Attribute Selection”, “Normalize”, “Train Test Split Maker”, a lot of classifier algorithms, “Performance Evaluator” and “Text Viewer”. In order to prove the functionality of the visual framework in text document classification we have made and present some experiments. The most important advantage of the visual WEKA framework is the possibility to test different approaches without programming abilities.References
Ian H. Witten, Eibe F., Hall, M. A., Pal C.J., Data Mining – Practical Machine Learning Tools and Techniques with Java Implementation, Morgan Koufmann Press, 2000
Han, J., Kamber, M., - Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, 2001;
Manning, C., - An Introduction to Information Retrieval, Cambridge University Press, 2009;
Tom. M. Mitchell, Machine Learning, The McGrow-Hill Companies, 1997
Mitkov R., The Oxford Handbook of Computational Linguistics, Oxford University Press, 2005;
Misha Wolf and Charles Wicksteed – Reuters Corpus: http://trec.nist.gov/data/reuters/reuters.html, accessed in 03.2016
http://www.cs.waikato.ac.nz/ml/weka/, accessed in 03.2016
http://www.cs.waikato.ac.nz/ml/weka/documentation.html, accessed 03.2016
https://en.wikipedia.org/wiki/Precision_and_recall, accessed 03.2016