Mostrar el registro sencillo del ítem
Natural language contents evaluation system for multi-class news categorization using machine learning and transformers
dc.contributor.author | Marrugo, Duván A | |
dc.contributor.author | Martinez-Santos, Juan Carlos | |
dc.contributor.author | Puertas, Edwin | |
dc.date.accessioned | 2023-12-05T16:10:06Z | |
dc.date.available | 2023-12-05T16:10:06Z | |
dc.date.issued | 2023-12-05 | |
dc.date.submitted | 2023-12-05 | |
dc.identifier.citation | Marrugo, D. A., Martinez-Santos, J. C., & Puertas, E. (2023, October). Natural Language Contents Evaluation System for Multi-class News Categorization Using Machine Learning and Transformers. In Workshop on Engineering Applications (pp. 115-126). Cham: Springer Nature Switzerland. | spa |
dc.identifier.uri | https://hdl.handle.net/20.500.12585/12578 | |
dc.description.abstract | The exponential growth of digital documents has come with rapid progress in text classification techniques in recent years. This paper provides text classification models, which analyze various steps of news classification, where some algorithmic approaches for machine learning, such as Logistic Regression, Support Vector Machine, and Random Forest, are implemented. In turn, the uses of Transformers as classification models for the solution of the same problem, proposing BERT and DistilBERT as possible solutions to compare for the automatic classification of news containing articles belonging to four categories (World, Sports, Business, and Science/Technology). We obtained the highest accuracy on the machine learning side, with 88% using Support Vector Machine with Word2Vec. However, using Transformer DistilBERT, we got an efficient model in terms of performance and 91.7% accuracy for classifying news. | spa |
dc.description.sponsorship | Universidad Tecnlógica de Bolívar | spa |
dc.format.extent | 12 páginas | |
dc.format.mimetype | application/pdf | spa |
dc.language.iso | eng | spa |
dc.source | Applied Computer Sciences in Engineering | spa |
dc.title | Natural language contents evaluation system for multi-class news categorization using machine learning and transformers | spa |
dcterms.bibliographicCitation | lab 912, M.: Deeplearning hw2 transformer (2022). https://kaggle.com/ competitions/deeplearning-hw2-transformer | spa |
dcterms.bibliographicCitation | Ahmed, J., Ahmed, M.: Online news classification using machine learning tech niques. IIUM Eng. J. 22, 210–225 (2021). https://doi.org/10.31436/iiumej.v22i2. 1662, https://journals.iium.edu.my/ejournal/index.php/iiumej/article/view/1662 | spa |
dcterms.bibliographicCitation | Ahmed, J., Ahmed, M.: Online news classification using machine learning tech niques. IIUM Eng. J. 22, 210–225 (2021). https://doi.org/10.31436/iiumej.v22i2. 1662, https://journals.iium.edu.my/ejournal/index.php/iiumej/article/view/1662 | spa |
dcterms.bibliographicCitation | Patro, A., Mahima Patel, R.S., Save, D.J.: Real time news classification using machine learning. Int. J. Adv. Sci. Technol. 29(9s), 620–630 (2020) | spa |
dcterms.bibliographicCitation | Barua, A., Sharif, O., Hoque, M.M.: Multi-class sports news categorization using machine learning techniques: resource creation and evaluation. Proce dia Compute. Sci. 193, 112–121 (2021). https://doi.org/10.1016/j.procs.2021.11. 002, https://www.sciencedirect.com/science/article/pii/S1877050921021268. 10th International Young Scientists Conference in Computational Science, YSC2021, 28 June–2 July 2021 | spa |
dcterms.bibliographicCitation | Blackledge, C., Atapour-Abarghouei, A.: Transforming fake news: robust gener alisable news classification using transformers (2021). https://doi.org/10.48550/ ARXIV.2109.09796, http://arxiv.org/2109.09796 | spa |
dcterms.bibliographicCitation | Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for sta tistical machine translation (2014). https://doi.org/10.48550/ARXIV.1406.1078, http://arxiv.org/1406.1078 | spa |
dcterms.bibliographicCitation | Deb, N., Jha, V., Panjiyar, A., Gupta, R.: A comparative analysis of news catego rization using machine learning approaches. Int. J. Sci. Technol. Res. 9, 2469–2472 (2020) | spa |
dcterms.bibliographicCitation | Devi, J.S., Bai, D.M.R., Reddy, C.: Newspaper article classification using machine learning techniques. Int. J. Innov. Technol. Explor. Eng. 9(5), 872–877 (2020). https://doi.org/10.35940/ijitee.e2753.039520, https://dx.doi.org/10.35940/ijitee.E2753.039520 | spa |
dcterms.bibliographicCitation | Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding (2018). https://doi.org/10. 48550/ARXIV.1810.04805, http://arxiv.org/1810.04805 11. Elnagar, A., Al-Debsi, R., Einea, O.: Arabic text classification using deep learning models. Inf. Process. Manag. 57(1), 102121 (2020). https://doi.org/ 10.1016/j.ipm.2019.102121, https://www.sciencedirect.com/science/article/pii/ S0306457319303413 12. Gillioz, A., Casas, J., Mugellini, E., Khaled, O.A.: Overview of the transformer based models for NLP tasks. In: 2020 15th Conference on Computer Science and Information Systems (FedCSIS), pp. 179–183 (2020). https://doi.org/10.15439/ 2020F20 | spa |
dcterms.bibliographicCitation | Elnagar, A., Al-Debsi, R., Einea, O.: Arabic text classification using deep learning models. Inf. Process. Manag. 57(1), 102121 (2020). https://doi.org/ 10.1016/j.ipm.2019.102121, https://www.sciencedirect.com/science/article/pii/ S0306457319303413 | spa |
dcterms.bibliographicCitation | illioz, A., Casas, J., Mugellini, E., Khaled, O.A.: Overview of the transformer based models for NLP tasks. In: 2020 15th Conference on Computer Science and Information Systems (FedCSIS), pp. 179–183 (2020). https://doi.org/10.15439/ 2020F20 | spa |
dcterms.bibliographicCitation | Greff, K., Srivastava, R.K., Koutnik, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: a search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 28(10), 2222–2232 (2017). https://doi.org/10.1109/tnnls.2016.2582924 | spa |
dcterms.bibliographicCitation | Kosheleva, O., Kreinovich, V., Shahbazova, S.: Type-2 fuzzy analysis explains ubiquity of triangular and trapezoid membership functions. In: Shahbazova, S.N., Kacprzyk, J., Balas, V.E., Kreinovich, V. (eds.) Recent Developments and the New Direction in Soft-Computing Foundations and Applications. SFSC, vol. 393, pp. 63–75. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-47124-8 6 | spa |
dcterms.bibliographicCitation | Lilleberg, J., Zhu, Y., Zhang, Y.: Support vector machines and word2vec for text classification with semantic features. In: 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC), pp. 136–140 (2015). https://doi.org/10.1109/ICCI-CC.2015.7259377 | spa |
dcterms.bibliographicCitation | Luo, X.: Efficient English text classification using selected machine learning tech niques. Alex. Eng. J. 60(3), 3401–3409 (2021). https://doi.org/10.1016/j.aej.2021. 02.009, https://www.sciencedirect.com/science/article/pii/S1110016821000806 | spa |
dcterms.bibliographicCitation | Munikar, M., Shakya, S., Shrestha, A.: Fine-grained sentiment classification using BERT. In: 2019 Artificial Intelligence for Transforming Business and Society (AITB), vol. 1, pp. 1–5 (2019). https://doi.org/10.1109/AITB48515.2019.8947435 | spa |
dcterms.bibliographicCitation | Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word repre sentation. In: Proceedings of the 2014 Conference on Empirical Methods in Nat ural Language Processing (EMNLP), pp. 1532–1543. Association for Computa tional Linguistics, Doha (2014). https://doi.org/10.3115/v1/D14-1162, https:// www.aclanthology.org/D14-1162 | spa |
dcterms.bibliographicCitation | Qadi, L.A., Rifai, H.E., Obaid, S., Elnagar, A.: Arabic text classification of news articles using classical supervised classifiers. In: 2019 2nd International Conference on new Trends in Computing Sciences (ICTCS), pp. 1–6 (2019). https://doi.org/ 10.1109/ICTCS.2019.8923073 | spa |
dcterms.bibliographicCitation | Rustamov, S., Mustafayev, E., Clements, M.: Context analysis of customer requests using a hybrid adaptive neuro fuzzy inference system and hidden Markov models in the natural language call routing problem. Open Eng. 8, 61–68 (2018). https:// doi.org/10.1515/eng-2018-0008 | spa |
dcterms.bibliographicCitation | Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter (2019). https://doi.org/10.48550/ ARXIV.1910.01108, http://arxiv.org/1910.01108 | spa |
dcterms.bibliographicCitation | Vaswani, A., et al.: Attention is all you need (2017). https://doi.org/10.48550/ ARXIV.1706.03762, http://arxiv.org/1706.03762 | spa |
dcterms.bibliographicCitation | Yang, Y., Chen, X., Tan, R., Xiao, Y.: IoT Technologies and Applications, pp. 1–60. Wiley (2021). https://doi.org/10.1002/9781119593584.ch1 | spa |
dcterms.bibliographicCitation | Yıldırım, S., Jothimani, D., Kavaklıoˇglu, C., Ba¸sar, A.: Classification of “hot news” for financial forecast using NLP techniques. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 4719–4722 (2018). https://doi.org/10.1109/BigData. 2018.8621903 | spa |
datacite.rights | http://purl.org/coar/access_right/c_abf2 | spa |
oaire.version | http://purl.org/coar/version/c_b1a7d7d4d402bcce | spa |
dc.type.driver | info:eu-repo/semantics/bookPart | spa |
dc.type.hasversion | info:eu-repo/semantics/publishedVersion | spa |
dc.identifier.doi | 10.1007/978-3-031-46739-4_11 | |
dc.subject.keywords | Text Classification | spa |
dc.subject.keywords | Automatic Classification | spa |
dc.subject.keywords | News Classification | spa |
dc.subject.keywords | Transformer | spa |
dc.subject.keywords | Machine Learning | spa |
dc.subject.keywords | Deep Learning | spa |
dc.rights.accessrights | info:eu-repo/semantics/openAccess | spa |
dc.identifier.instname | Universidad Tecnológica de Bolívar | spa |
dc.identifier.reponame | Repositorio Universidad Tecnológica de Bolívar | spa |
dc.publisher.place | Cartagena de Indias | spa |
dc.subject.armarc | LEMB | |
dc.type.spa | http://purl.org/coar/resource_type/c_6501 | spa |
dc.audience | Público general | spa |
dc.publisher.sede | Campus Tecnológico | spa |
oaire.resourcetype | http://purl.org/coar/resource_type/c_6501 | spa |
dc.publisher.discipline | Maestría en Ingeniería | spa |
Ficheros en el ítem
Este ítem aparece en la(s) siguiente(s) colección(ones)
-
Productos de investigación [1453]
Universidad Tecnológica de Bolívar - 2017 Institución de Educación Superior sujeta a inspección y vigilancia por el Ministerio de Educación Nacional. Resolución No 961 del 26 de octubre de 1970 a través de la cual la Gobernación de Bolívar otorga la Personería Jurídica a la Universidad Tecnológica de Bolívar.