Mostrar el registro sencillo del ítem
Natural language content evaluation system for multiclass detection of hate speech in tweets using transformers
dc.contributor.author | Marrugo-Tobón, Duván Andres | |
dc.contributor.author | Martinez-Santos, Juan Carlos | |
dc.contributor.author | Puertas, Edwin | |
dc.date.accessioned | 2023-12-05T18:16:47Z | |
dc.date.available | 2023-12-05T18:16:47Z | |
dc.date.issued | 2023-12-05 | |
dc.date.submitted | 2023-12-05 | |
dc.identifier.citation | Marrugo-Tobón, D., Martınez-Santos, J., & Puerta, E. (2023). Natural language content evaluation system for multiclass detection of hate speech in tweets using transformers. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2023). | spa |
dc.identifier.uri | https://hdl.handle.net/20.500.12585/12581 | |
dc.description.abstract | In natural language processing, accurate categorization of tweets, including detecting hate speech, plays a pivotal role in efficient information organization and analysis. This paper presents a Natural Language Contents Evaluation System specifically tailored for multi-class tweet categorization, focusing on hate speech detection. Our system enhances classification accuracy and efficiency by harnessing the power of Transformers, namely BERT and DistilBERT. By leveraging feature extraction techniques, we capture pertinent information from tweets, enabling practical analysis, categorization, and identification of hate speech instances. During training, we also tackle imbalanced corpora by employing techniques to ensure fair representation of different tweet categories, including hate speech. Our system achieves impressive accuracy through extensive training of 95%, showcasing Transformers' effectiveness in comprehending and categorizing tweets, including identifying hate speech. Furthermore, our system maintains a good accuracy during testing of 83%, highlighting the robustness and generalizability of the trained models for hate speech detection. This system contributes to advancing automated tweet categorization, specifically in hate speech detection, providing a reliable and efficient solution for organizing and analyzing diverse tweet datasets. | spa |
dc.description.sponsorship | Universidad Tecnología de Bolívar | spa |
dc.format.extent | 12 páginas | |
dc.format.mimetype | application/pdf | spa |
dc.language.iso | eng | spa |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.source | Iberian Languages Evaluation Forum | spa |
dc.title | Natural language content evaluation system for multiclass detection of hate speech in tweets using transformers | spa |
dcterms.bibliographicCitation | Kim, H., & Jeong, Y. S. (2019). Sentiment classification using convolutional neural networks. Applied Sciences, 9(11), 2347. | spa |
dcterms.bibliographicCitation | Galas, M. (2015). Experimental Computational Simulation Environments for Big Data Analytic in Social Sciences. In Handbook of Statistics (Vol. 33, pp. 259-277). Elsevier. | spa |
dcterms.bibliographicCitation | Abro, S., Shaikh, S., Khand, Z. H., Zafar, A., Khan, S., & Mujtaba, G. (2020). Automatic hate speech detection using machine learning: A comparative study. International Journal of Advanced Computer Science and Applications, 11(8). | spa |
dcterms.bibliographicCitation | Alkomah, F., & Ma, X. (2022). A literature review of textual hate speech detection methods and datasets. Information, 13(6), 273. | spa |
dcterms.bibliographicCitation | Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. | spa |
dcterms.bibliographicCitation | Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108. | spa |
dcterms.bibliographicCitation | Madni, H. A., Umer, M., Abuzinadah, N., Hu, Y. C., Saidani, O., Alsubai, S., ... & Ashraf, I. (2023). Improving Sentiment Prediction of Textual Tweets Using Feature Fusion and Deep Machine Ensemble Model. Electronics, 12(6), 1302. | spa |
dcterms.bibliographicCitation | Tan, K. L., Lee, C. P., & Lim, K. M. (2023). A survey of sentiment analysis: Approaches, datasets, and future research. Applied Sciences, 13(7), 4550. | spa |
dcterms.bibliographicCitation | Tan, K. L., Lee, C. P., & Lim, K. M. (2023). A survey of sentiment analysis: Approaches, datasets, and future research. Applied Sciences, 13(7), 4550. | spa |
dcterms.bibliographicCitation | Severyn, A., & Moschitti, A. (2015, August). Twitter sentiment analysis with deep convolutional neural networks. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval (pp. 959-962). | spa |
dcterms.bibliographicCitation | Pilar, G. D., Isabel, S. B., Diego, P. M., & Luis, G. A. J. (2023). A novel flexible feature extraction algorithm for Spanish tweet sentiment analysis based on the context of words. Expert Systems with Applications, 212, 118817. | spa |
dcterms.bibliographicCitation | Behera, R. K., Jena, M., Rath, S. K., & Misra, S. (2021). Co-LSTM: Convolutional LSTM model for sentiment analysis in social big data. Information Processing & Management, 58(1), 102435. | spa |
dcterms.bibliographicCitation | Elreedy, D., & Atiya, A. F. (2019). A comprehensive analysis of synthetic minority oversampling technique (SMOTE) for handling class imbalance. Information Sciences, 505, 32-64. | spa |
dcterms.bibliographicCitation | Hussein, A. S., Li, T., Abd Ali, D. M., Bashir, K., & Yohannese, C. W. (2020). A modified adaptive synthetic sampling method for learning imbalanced datasets. In Developments of Artificial Intelligence Technologies in Computation and Robotics: Proceedings of the 14th International FLINS Conference (FLINS 2020) (pp. 76-83). | spa |
dcterms.bibliographicCitation | Aloraini, M., Khan, A., Aladhadh, S., Habib, S., Alsharekh, M. F., & Islam, M. (2023). Combining the Transformer and Convolution for Effective Brain Tumor Classification Using MRI Images. Applied Sciences, 13(6), 3680. | spa |
dcterms.bibliographicCitation | Jang, B., Kim, M., Harerimana, G., Kang, S. U., & Kim, J. W. (2020). Bi-LSTM model to increase accuracy in text classification: Combining Word2vec CNN and attention mechanism. Applied Sciences, 10(17), 5841. | spa |
dcterms.bibliographicCitation | Bel-Enguix, G., Gómez-Adorno, H., Sierra, G., Vásquez, J., Andersen, S. T., & Ojeda-Trueba, S. (2023). Overview of HOMO-MEX at Iberlef 2023: Hate speech detection in Online Messages directed Towards the MEXican Spanish speaking LGBTQ+ population. Procesamiento del lenguaje natural, 71, 361-370. | spa |
dcterms.bibliographicCitation | Eler, D. M., Grosa, D., Pola, I., Garcia, R., Correia, R., & Teixeira, J. (2018). Analysis of document pre-processing effects in text and opinion mining. Information, 9(4), 100. | spa |
dcterms.bibliographicCitation | Huerta-Velasco, D. A., & Calvo, H. (2022). Verbal Aggressions Detection in Mexican Tweets. Computación y Sistemas, 26(1), 261-269. | spa |
dcterms.bibliographicCitation | Silva Barbon, R., & Akabane, A. T. (2022). Towards Transfer Learning Techniques—BERT, DistilBERT, BERTimbau, and DistilBERTimbau for Automatic Text Classification from Different Languages: A Case Study. Sensors, 22(21), 8184. | spa |
datacite.rights | http://purl.org/coar/access_right/c_abf2 | spa |
oaire.version | http://purl.org/coar/version/c_b1a7d7d4d402bcce | spa |
dc.identifier.url | https://ceur-ws.org/Vol-3496/homomex-paper4.pdf | |
dc.type.driver | info:eu-repo/semantics/article | spa |
dc.type.hasversion | info:eu-repo/semantics/publishedVersion | spa |
dc.subject.keywords | BERT | spa |
dc.subject.keywords | DistilBERT | spa |
dc.subject.keywords | Feature extraction | spa |
dc.subject.keywords | Hate speech detection | spa |
dc.subject.keywords | Natural language processing | spa |
dc.subject.keywords | Transformers | spa |
dc.subject.keywords | Tweet categorization | spa |
dc.rights.accessrights | info:eu-repo/semantics/openAccess | spa |
dc.rights.cc | Attribution-NonCommercial-NoDerivatives 4.0 Internacional | * |
dc.identifier.instname | Universidad Tecnológica de Bolívar | spa |
dc.identifier.reponame | Repositorio Universidad Tecnológica de Bolívar | spa |
dc.publisher.place | Cartagena de Indias | spa |
dc.subject.armarc | LEMB | |
dc.type.spa | http://purl.org/coar/resource_type/c_6501 | spa |
dc.audience | Público general | spa |
dc.publisher.sede | Campus Tecnológico | spa |
oaire.resourcetype | http://purl.org/coar/resource_type/c_6501 | spa |
dc.publisher.discipline | Maestría en Ingeniería | spa |
Ficheros en el ítem
Este ítem aparece en la(s) siguiente(s) colección(ones)
-
Productos de investigación [1453]
Universidad Tecnológica de Bolívar - 2017 Institución de Educación Superior sujeta a inspección y vigilancia por el Ministerio de Educación Nacional. Resolución No 961 del 26 de octubre de 1970 a través de la cual la Gobernación de Bolívar otorga la Personería Jurídica a la Universidad Tecnológica de Bolívar.