PART OF SPEECH TAGGING IN ROMANIAN TEXTS
Abstract
Identifying Parts of Speech (PoS) represents the process by which grammar tags containing their corresponding PoS are attached automatic to every word within a sentence. Since no word acts as just one single PoS—their syntactic value depending on the context they are used in—identifying parts of speech is not a trivial matter. In this paper we have taken into account two tagging methods, based on Naïve Bayes’ classifier probabilities and the occurring context of the word for which the PoS must be identified. We have called these methods Backward Naïve Bayes and Forward Naïve Bayes. For Romanian language, we have taken into account seven different PoS as: noun, verb, adjective, adverb, article, preposition plus the „and others” category. From conducted experiments, we have observed that identifying the PoS for a word based on the PoS for the previous word produces better results in all respects. We have studied each PoS separately and have concluded that there also are more easily identifiable PoS in Romanian as well: article, preposition, noun, verb; meanwhile the adjective and adverb are more problematic in identifying the PoS.References
Agavriloaei Ioan, Modele şi Algoritmi Mining, PhD thesis, 2012
Dumitru-Clementin Cercel, POS tagger bazat pe modelul HMM, Romanian journal of Human- Computer interaction, 2012
Colhon M., Procesarea Limbajului Natural, 2012, https://www.google.ro/webhp?sourceid= chrome-instant&ion=1&espv=2&ie=UTF-8#q=Colhon+M.%2C+Procesarea +Limbajului+Natural , accessed in February 2016
http://www.mcolhon.ro/patterns/index.html - accessed in February 2016
Radu G. Cretulescu, Daniel I. Morariu, Text Mining. Tehnici de clasificare si clustering al documentelor, Published at Editura Albastra, Cluj Napoca, 2012, ISBN 978-973-650-289-7
R. CRETULESCU, A. DAVID, D. MORARIU, L. VINŢAN - Part of Speech Tagging with Naive Bayes Methods, Proceedings of The 18-th International Conference on System Theory, Control and Computing, Sinaia (Romania), October 17 - 19, 2014
Dan Jurafsky, James H. Martin, Speech and Language Processing, 2016, https://web.stanford.edu/~jurafsky/slp3/, accessed in February 2017
Daniel I. Morariu, Text Mining Methods based on Support Vector Machine, MATRIX ROM Publishing house, Bucharest, ISBN 978-973-755-343-0, 168 pages, 2008.
Robi Polikar, Pattern recognition, Wiley Encyclopedia of BioMedical Engineering, 2006
Data Mining From A to Z, SAS Institute Inc., 2015, www.Sas.com
Catalin Stoean, Ruxandra Stoean, Support Vector Machines and Evolutionary Algorithms for Classification: Single or Together?, Intelligent Systems Reference Library, Volume 69, Springer, 2014
Dan Tufiș, Promovarea limbii române în SI – SC, www.racai.ro/media/Tufis-SISC2001.pdf, published in 2001