USING WEKA FRAMEWORK IN DOCUMENT CLASSIFICATION

Authors

  • Radu Crețulescu Lucian Blaga University of Sibiu
  • Daniel Morariu Lucian Blaga University of Sibiu
  • Macarie Breazu Lucian Blaga Univeristy of Sibiu

Abstract

Text document classification problem is a special case of a supervised data mining problem. In order to solve a text document classification problem some steps are required to fulfill. The common steps are: feature extraction, feature selection, classification, evaluation and visualization. The WEKA is a framework that helps us with all these steps. WEKA was initially developed as a library of java classes that help us to implement data mining applications. In the last years, in order to avoid java programming skills, the components from WEKA are also available into a visual form inside “WEKA Knowledge Flow Environment”. We have studied and present in this paper some of the most important visual components that are available in the WEKA framework for the previously presented steps. These components are: “Arff Loader”, “Attribute Selection”, “Normalize”, “Train Test Split Maker”, a lot of classifier algorithms, “Performance Evaluator” and “Text Viewer”. In order to prove the functionality of the visual framework in text document classification we have made and present some experiments. The most important advantage of the visual WEKA framework is the possibility to test different approaches without programming abilities.

References

Ian H. Witten, Eibe F., Hall, M. A., Pal C.J., Data Mining – Practical Machine Learning Tools and Techniques with Java Implementation, Morgan Koufmann Press, 2000

Han, J., Kamber, M., - Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, 2001;

Manning, C., - An Introduction to Information Retrieval, Cambridge University Press, 2009;

Tom. M. Mitchell, Machine Learning, The McGrow-Hill Companies, 1997

Mitkov R., The Oxford Handbook of Computational Linguistics, Oxford University Press, 2005;

Misha Wolf and Charles Wicksteed – Reuters Corpus: http://trec.nist.gov/data/reuters/reuters.html, accessed in 03.2016

http://www.cs.waikato.ac.nz/ml/weka/, accessed in 03.2016

http://www.cs.waikato.ac.nz/ml/weka/documentation.html, accessed 03.2016

https://en.wikipedia.org/wiki/Precision_and_recall, accessed 03.2016

Downloads

Published

2016-12-02

Issue

Section

Articles