Mostrar el registro sencillo del ítem
An automatic approach to generate corpus in Spanish
dc.contributor.editor | Serrano C. J.E. | |
dc.contributor.editor | Martínez-Santos, Juan Carlos | |
dc.creator | Puertas E. | |
dc.creator | Alvarado‑Valencia, Jorge Andres | |
dc.creator | Moreno-Sandoval L.G. | |
dc.creator | Pomares-Quimbaya A. | |
dc.date.accessioned | 2020-03-26T16:32:36Z | |
dc.date.available | 2020-03-26T16:32:36Z | |
dc.date.issued | 2018 | |
dc.identifier.citation | Communications in Computer and Information Science; Vol. 885, pp. 150-161 | |
dc.identifier.isbn | 9783319989976 | |
dc.identifier.issn | 18650929 | |
dc.identifier.uri | https://hdl.handle.net/20.500.12585/8916 | |
dc.description.abstract | A corpus is an indispensable linguistic resource for any application of natural language processing. Some corpora have been created manually or semi-automatically for a specific domain. In this paper, we present an automatic approach to generate corpus from digital information sources such as Wikipedia and web pages. The information extracted by Wikipedia is done by delimiting the domain, using a propagation algorithm to determine the categories associated with a domain region and a set of seeds to delimit the search. The information extracted from the web pages is carried out efficiently, determining the patterns associated with the structure of each page with the purpose of defining the quality of the extraction. © Springer Nature Switzerland AG 2018. | eng |
dc.description.sponsorship | Pontificia Universidad Javeriana | |
dc.format.medium | Recurso electrónico | |
dc.format.mimetype | application/pdf | |
dc.language.iso | eng | |
dc.publisher | Springer Verlag | |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | |
dc.source | https://www.scopus.com/inward/record.uri?eid=2-s2.0-85054377708&doi=10.1007%2f978-3-319-98998-3_12&partnerID=40&md5=d8689ca7ab863965c5539711ded485c1 | |
dc.title | An automatic approach to generate corpus in Spanish | |
dcterms.bibliographicCitation | Arnold, P., Rahm, E., Automatic extraction of semantic relations from wikipedia (2015) Int. J. Artif. Intell. Tools, 24 (2) | |
dcterms.bibliographicCitation | Berners-Lee, T., Connolly, D., (1995) Hypertext Markup Language-2.0, , Technical report, USA | |
dcterms.bibliographicCitation | Blei, D.M., Ng, A.Y., Jordan, M.I., Latent dirichlet allocation (2003) J. Mach. Learn. Res, 3, pp. 993-1022. , Jan | |
dcterms.bibliographicCitation | (2006) Extensible Markup Language (Xml) 1.1 | |
dcterms.bibliographicCitation | Crawford, W., Csomay, E., Doing Corpus Linguistics (2015) Routledge, , Abingdon | |
dcterms.bibliographicCitation | Crockford, D., (2006) The Application/Json Media Type for Javascript Object Notation, , JSON | |
dcterms.bibliographicCitation | Drechsler, A., Hevner, A., A four-cycle model of is design science research: Capturing the dynamic nature of is artifact design (2016) Breakthroughs and Emerging Insights from Ongoing Design Science Projects: Research-In-Progress Papers and Poster Presentations from the 11Th International Conference on Design Science Research in Information Systems and Technology (DESRIST). DESRIST 2016, , St. John, Canada | |
dcterms.bibliographicCitation | Dutta, B., Chatterjee, U., Madalli, D.P., YAMO: Yet another methodology for large-scale faceted ontology construction (2015) J. Knowl. Manag., 19 (1), pp. 6-24 | |
dcterms.bibliographicCitation | Edeki, C., Agile unified process (2013) Int. J. Comput. Sci., 1 (3), pp. 13-17 | |
dcterms.bibliographicCitation | Fan, J., Kalyanpur, A., Gondek, D.C., Ferrucci, D.A., Automatic knowledge extraction from documents (2012) IBM J. Res. Dev., 56 (3), pp. 1-5 | |
dcterms.bibliographicCitation | Ferrara, E., de Meo, P., Fiumara, G., Baumgartner, R., Web data extraction, applications and techniques: A survey (2014) Knowl.-Based Syst., 70, pp. 301-323 | |
dcterms.bibliographicCitation | Gharib, T.F., Badr, N.L., Haridy, S., Abraham, A., Enriching ontology concepts based on texts from WWW and corpus (2012) J. UCS, 18 (16), pp. 2234-2251 | |
dcterms.bibliographicCitation | Jiang, J., Information extraction from text (2012) Mining Text Data, pp. 11-41. , https://doi.org/10.1007/978-1-4614-3223-42, Aggarwal, C., Zhai, C. (eds.), Springer, Boston | |
dcterms.bibliographicCitation | Jurafsky, D., Martin, J.H., Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (2009) Prentice Hall Series in Artificial Intelligence, pp. 1-1024 | |
dcterms.bibliographicCitation | Kanakaraj, M., Kamath, S.S., NLP based intelligent news search engine using information extraction from e-newspapers (2014) 2014 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), pp. 1-5. , IEEE | |
dcterms.bibliographicCitation | Kanavos, A., Makris, C., Plegas, Y., Theodoridis, E., Ranking web search results exploiting wikipedia (2016) Int. J. Artif. Intell. Tools, 25 (3) | |
dcterms.bibliographicCitation | Kozareva, Z., Hovy, E., Tailoring the automated construction of large-scale taxonomies using the web (2013) Lang. Resour. Eval., 47 (3), pp. 859-890 | |
dcterms.bibliographicCitation | Küçük, D., Arslan, Y., Semi-automatic construction of a domain ontology for wind energy using wikipedia articles (2014) Renew. Energy, 62, pp. 484-489 | |
dcterms.bibliographicCitation | Lahbib, W., Bounhas, I., Slimani, Y., Arabic terminology extraction and enrichment based on domain-specific text mining (2015) 2015 IEEE 27Th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 340-347. , IEEE | |
dcterms.bibliographicCitation | Leskovec, J., Rajaraman, A., Ullman, J.D., (2014) Mining of Massive Datasets, , Cambridge University Press, Cambridge | |
dcterms.bibliographicCitation | Liu, S., Zhang, C., Termhood-based comparability metrics of comparable corpus in special domain (2013) CLSW 2012. LNCS (LNAI), 7717, pp. 134-144. , https://doi.org/10.1007/978-3-642-36337-515, Ji, D., Xiao, G. (eds.), Springer, Heidelberg | |
dcterms.bibliographicCitation | Loria, S., TextBlob: Simplified text processing (2014) Secondary Textblob: Simplified Text Processing | |
dcterms.bibliographicCitation | March, S.T., Smith, G.F., Design and natural science research on information technology (1995) Decis. Support Syst., 15 (4), pp. 251-266 | |
dcterms.bibliographicCitation | March, S.T., Storey, V.C., Design science in the information systems discipline: An introduction to the special issue on design science research (2008) MIS Q, 32, pp. 725-730 | |
dcterms.bibliographicCitation | Medelyan, O., Witten, I.H., Divoli, A., Broekstra, J., Automatic construction of lexicons, taxonomies, ontologies, and other knowledge structures (2013) Wiley Interdisc. Rev.: Data Min. Knowl. Discov., 3 (4), pp. 257-279 | |
dcterms.bibliographicCitation | Morell, M.F., The Wikimedia foundation and the governance of Wikipedias infrastructure: Historical trajectories and its hybrid character (2011) Critical Point of View: A Wikipedia Reader, pp. 325-341 | |
dcterms.bibliographicCitation | Petrov, S., Das, D., McDonald, R., (2011) A Universal Part-Of-Speech Tagset | |
dcterms.bibliographicCitation | Powers, D.M.W., Evaluation: From precision, recall and F-measure to ROC, informedness, markedness & correlation (2011) J. Mach. Learn. Technol., 2 (1), pp. 37-63 | |
dcterms.bibliographicCitation | Richardson, L., Ruby, S., (2008) Restful Web Services, , O’Reilly Media, Inc., Sebastopol | |
dcterms.bibliographicCitation | Schwaber, K., Beedle, M., (2002) Agile Software Development with Scrum, 1. , Prentice Hall, Upper Saddle River | |
dcterms.bibliographicCitation | Vállez, M., Pedraza-Jiménez, R., Codina, L., Blanco, S., Rovira, C., A semiautomatic indexing system based on embedded information in HTML documents (2015) Library Hi Tech, 33 (2), pp. 195-210 | |
dcterms.bibliographicCitation | van Rossum, G., Drake, F.L., Python Language Reference Manual (2003) Network Theory, , Bristol | |
dcterms.bibliographicCitation | Wood, L., Nicol, G., Robie, J., Champion, M., Byrne, S., (2004) Document Object Model (DOM) Level 3 Core Specification | |
dcterms.bibliographicCitation | Zhu, M., Recall, precision and average precision (2004) Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, 2, p. 30 | |
datacite.rights | http://purl.org/coar/access_right/c_16ec | |
oaire.resourceType | http://purl.org/coar/resource_type/c_c94f | |
oaire.version | http://purl.org/coar/version/c_970fb48d4fbd8a85 | |
dc.source.event | 13th Colombian Conference on Computing, CCC 2018 | |
dc.type.driver | info:eu-repo/semantics/conferenceObject | |
dc.type.hasversion | info:eu-repo/semantics/publishedVersion | |
dc.identifier.doi | 10.1007/978-3-319-98998-3_12 | |
dc.subject.keywords | Corpus | |
dc.subject.keywords | Knowledge extraction | |
dc.subject.keywords | Linguistic computational | |
dc.subject.keywords | Natural language processing | |
dc.subject.keywords | Text mining | |
dc.subject.keywords | Data mining | |
dc.subject.keywords | Extraction | |
dc.subject.keywords | Natural language processing systems | |
dc.subject.keywords | Tellurium compounds | |
dc.subject.keywords | Websites | |
dc.subject.keywords | Automatic approaches | |
dc.subject.keywords | Corpus | |
dc.subject.keywords | Digital information | |
dc.subject.keywords | Knowledge extraction | |
dc.subject.keywords | Linguistic resources | |
dc.subject.keywords | Propagation algorithm | |
dc.subject.keywords | Text mining | |
dc.subject.keywords | Wikipedia | |
dc.subject.keywords | Linguistics | |
dc.rights.accessrights | info:eu-repo/semantics/restrictedAccess | |
dc.rights.cc | Atribución-NoComercial 4.0 Internacional | |
dc.identifier.instname | Universidad Tecnológica de Bolívar | |
dc.identifier.reponame | Repositorio UTB | |
dc.description.notes | Acknowledgements. The tool presented was carried out within the construction of research capabilities of the Center for Excellence and Appropriation in Big Data and Data Analytics (CAOBA), led by the Pontificia Universidad Javeriana, funded by the Ministry of Information Technologies and Telecommunications of the Republic of Colombia (MinTIC). | |
dc.relation.conferencedate | 26 September 2018 through 28 September 2018 | |
dc.type.spa | Conferencia | |
dc.identifier.orcid | 57202285682 | |
dc.identifier.orcid | 8738428200 | |
dc.identifier.orcid | 57194828933 | |
dc.identifier.orcid | 57203852380 |
Ficheros en el ítem
Ficheros | Tamaño | Formato | Ver |
---|---|---|---|
No hay ficheros asociados a este ítem. |
Este ítem aparece en la(s) siguiente(s) colección(ones)
-
Productos de investigación [1460]
Universidad Tecnológica de Bolívar - 2017 Institución de Educación Superior sujeta a inspección y vigilancia por el Ministerio de Educación Nacional. Resolución No 961 del 26 de octubre de 1970 a través de la cual la Gobernación de Bolívar otorga la Personería Jurídica a la Universidad Tecnológica de Bolívar.