Natural language contents evaluation system for multi-class news categorization using machine learning and transformers

Marrugo, Duván A; Martinez-Santos, Juan Carlos; Puertas, Edwin

View/Open

Marrugo-2023-Natural-language-contents-evaluatio.pdf (590.6Kb)

Date

2023-12-05

Author

Marrugo, Duván A

Martinez-Santos, Juan Carlos

Puertas, Edwin

Metadata

Show full item record

Abstract

The exponential growth of digital documents has come with rapid progress in text classification techniques in recent years. This paper provides text classification models, which analyze various steps of news classification, where some algorithmic approaches for machine learning, such as Logistic Regression, Support Vector Machine, and Random Forest, are implemented. In turn, the uses of Transformers as classification models for the solution of the same problem, proposing BERT and DistilBERT as possible solutions to compare for the automatic classification of news containing articles belonging to four categories (World, Sports, Business, and Science/Technology). We obtained the highest accuracy on the machine learning side, with 88% using Support Vector Machine with Word2Vec. However, using Transformer DistilBERT, we got an efficient model in terms of performance and 91.7% accuracy for classifying news.

Citar como

Marrugo, D. A., Martinez-Santos, J. C., & Puertas, E. (2023, October). Natural Language Contents Evaluation System for Multi-class News Categorization Using Machine Learning and Transformers. In Workshop on Engineering Applications (pp. 115-126). Cham: Springer Nature Switzerland.

URI

https://hdl.handle.net/20.500.12585/12578

Collections

Productos de investigación [1453]

Compatible para recolección con:

Archivos

Marrugo-2023-Natural-language-contents-evaluatio.pdf

Universidad Tecnológica de Bolívar - 2017 Institución de Educación Superior sujeta a inspección y vigilancia por el Ministerio de Educación Nacional. Resolución No 961 del 26 de octubre de 1970 a través de la cual la Gobernación de Bolívar otorga la Personería Jurídica a la Universidad Tecnológica de Bolívar.