IMMAN: free software for information theory-based chemometric analysis
datacite.rights | http://purl.org/coar/access_right/c_16ec | |
dc.creator | Urias R.W.P. | |
dc.creator | Barigye S.J. | |
dc.creator | Marrero-Ponce Y. | |
dc.creator | García-Jacas C.R. | |
dc.creator | Valdes-Martiní J.R. | |
dc.creator | Perez-Gimenez F. | |
dc.date.accessioned | 2020-03-26T16:32:46Z | |
dc.date.available | 2020-03-26T16:32:46Z | |
dc.date.issued | 2015 | |
dc.description.abstract | Abstract: The features and theoretical background of a new and free computational program for chemometric analysis denominated IMMAN (acronym for Information theory-based CheMoMetrics ANalysis) are presented. This is multi-platform software developed in the Java programming language, designed with a remarkably user-friendly graphical interface for the computation of a collection of information-theoretic functions adapted for rank-based unsupervised and supervised feature selection tasks. A total of 20 feature selection parameters are presented, with the unsupervised and supervised frameworks represented by 10 approaches in each case. Several information-theoretic parameters traditionally used as molecular descriptors (MDs) are adapted for use as unsupervised rank-based feature selection methods. On the other hand, a generalization scheme for the previously defined differential Shannon’s entropy is discussed, as well as the introduction of Jeffreys information measure for supervised feature selection. Moreover, well-known information-theoretic feature selection parameters, such as information gain, gain ratio, and symmetrical uncertainty are incorporated to the IMMAN software (http://mobiosd-hub.com/imman-soft/), following an equal-interval discretization approach. IMMAN offers data pre-processing functionalities, such as missing values processing, dataset partitioning, and browsing. Moreover, single parameter or ensemble (multi-criteria) ranking options are provided. Consequently, this software is suitable for tasks like dimensionality reduction, feature ranking, as well as comparative diversity analysis of data matrices. Simple examples of applications performed with this program are presented. A comparative study between IMMAN and WEKA feature selection tools using the Arcene dataset was performed, demonstrating similar behavior. In addition, it is revealed that the use of IMMAN unsupervised feature selection methods improves the performance of both IMMAN and WEKA supervised algorithms. © 2015, Springer International Publishing Switzerland. | eng |
dc.description.sponsorship | Conselho Nacional de Desenvolvimento Científico e Tecnológico, CNPq | |
dc.format.medium | Recurso electrónico | |
dc.format.mimetype | application/pdf | |
dc.identifier.citation | Molecular Diversity; Vol. 19, Núm. 2; pp. 305-319 | |
dc.identifier.doi | 10.1007/s11030-014-9565-z | |
dc.identifier.instname | Universidad Tecnológica de Bolívar | |
dc.identifier.issn | 13811991 | |
dc.identifier.orcid | 56497011800 | |
dc.identifier.orcid | 55363486500 | |
dc.identifier.orcid | 55665599200 | |
dc.identifier.orcid | 56189852800 | |
dc.identifier.orcid | 56191215400 | |
dc.identifier.orcid | 6701762262 | |
dc.identifier.reponame | Repositorio UTB | |
dc.identifier.uri | https://hdl.handle.net/20.500.12585/9015 | |
dc.language.iso | eng | |
dc.publisher | Kluwer Academic Publishers | |
dc.rights.accessrights | info:eu-repo/semantics/restrictedAccess | |
dc.rights.cc | Atribución-NoComercial 4.0 Internacional | |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | |
dc.source | https://www.scopus.com/inward/record.uri?eid=2-s2.0-84937517073&doi=10.1007%2fs11030-014-9565-z&partnerID=40&md5=bebd134ed45279902c02db40eaa3b28c | |
dc.subject.keywords | Chemometric analysis | |
dc.subject.keywords | Classification | |
dc.subject.keywords | Computational program | |
dc.subject.keywords | Feature selection | |
dc.subject.keywords | IMMAN | |
dc.subject.keywords | Information-theoretic function | |
dc.subject.keywords | Algorithm | |
dc.subject.keywords | Software | |
dc.subject.keywords | Theoretical model | |
dc.subject.keywords | Algorithms | |
dc.subject.keywords | Models, Theoretical | |
dc.subject.keywords | Software | |
dc.title | IMMAN: free software for information theory-based chemometric analysis | |
dc.type.driver | info:eu-repo/semantics/article | |
dc.type.hasversion | info:eu-repo/semantics/publishedVersion | |
dc.type.spa | Artículo | |
dcterms.bibliographicCitation | Todeschini, R., Consonni, V., (2009) Molecular descriptors for chemoinformatics, , 1, Wiley-VCH, Weinheim: | |
dcterms.bibliographicCitation | Todeschini, R., Consonni, V., Pavan, M., DRAGON Software version 2.1. Milano Chemometric and QSAR Research Group (2002) Milano | |
dcterms.bibliographicCitation | Guha, R., The CDK descriptor calculator, 0.94th edn (1991) Indiana | |
dcterms.bibliographicCitation | Yap, C.W., PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints (2011) J Comput Chem, 32, pp. 1466-1474. , COI: 1:CAS:528:DC%2BC3MXjsF2isLc%3D, PID: 21425294 | |
dcterms.bibliographicCitation | Georg, H., (2008) BlueDesc-molecular descriptor calculator, , University of Tübingen, Tübingen: | |
dcterms.bibliographicCitation | Liu, J., Feng, J., Brooks, A., Young, S., (2005) PowerMV, , National Institute of Statistical Sciences, Research Triangle Park: | |
dcterms.bibliographicCitation | Code, A.D.R.I.A.N.A., (2011) Molecular Networks, , Erlangen, Germany: | |
dcterms.bibliographicCitation | Hong, H., Xie, Q., Ge, W., Qian, F., Fang, H., Shi, L., Su, Z., Tong, W., Mold2, molecular descriptors from 2D structures for chemoinformatics and toxicoinformatics (2008) J Chem Inf Comput Sci, 48, pp. 1337-1344. , COI: 1:CAS:528:DC%2BD1cXnsVehtL0%3D | |
dcterms.bibliographicCitation | Kellogg, G.E., Molconn-Z 4.0 edn. eduSoft (2001) Virginia | |
dcterms.bibliographicCitation | Liu, H., Motoda, H., Liu, H., Motoda, H., Less is More (2008) Computational methods of feature selection. Data mining and knowledge discovery series, p. 411. , Taylor * Francis Group, Boca Raton: | |
dcterms.bibliographicCitation | Wolpert, D.H., Macready, W.G., No free lunch theorems for optimization (1997) IEEE Trans Evol Comput, 1, pp. 67-82 | |
dcterms.bibliographicCitation | Venkatraman, V., Dalby, A.R., Yang, Z.R., Evaluation of mutual information and genetic programming for feature selection in QSAR (2004) J Chem Inf Comput Sci, 44, pp. 1686-1692. , COI: 1:CAS:528:DC%2BD2cXmsVensr4%3D, PID: 15446827 | |
dcterms.bibliographicCitation | Yu, L., Liu, H., Feature selection for high-dimensional data: a fast correlation-based filter solution (2003) In, , Proceedings of the Twentieth international conference on machine learning, Washington DC: | |
dcterms.bibliographicCitation | Kira, K., Rendell, L., The feature selection problem: traditional methods and a new algorithm (1992) Association for the advancement of artificial intelligence, pp. 129-134. , AAAI Press and MIT Press, Cambridge: | |
dcterms.bibliographicCitation | Kullback, S., Leibler, R.A., On information and sufficiency (1951) Ann Math Stat, 22, pp. 79-86 | |
dcterms.bibliographicCitation | Jeffreys, H., An invariant form for the prior probability in estimation problems (1946) Proc Roy Soc A, 186, pp. 453-461. , COI: 1:STN:280:DyaH28%2Fhs1yntA%3D%3D | |
dcterms.bibliographicCitation | Jennifer, G.D., Liu, H., Motoda, H., Unsupervised Feature Selection (2008) Computational methods of feature selection. Data mining and knowledge discovery series. Taylor &, p. 411. , Francis Group, Boca Raton: | |
dcterms.bibliographicCitation | Varshavsky, R., Gottlieb, A., Linial, M., Horn, D., Novel unsupervised feature filtering of biological data (2006) Bioinformatics, 22, pp. 507-513. , COI: 1:CAS:528:DC%2BD28Xotl2rt7Y%3D, PID: 16873514 | |
dcterms.bibliographicCitation | Maldonado, A.G., Doucet, J.P., Petitjean, M., Fan, B.-T., Molecular similarity and diversity in chemoinformatics: from theory to applications (2006) Mol Divers, 10, pp. 39-79. , COI: 1:CAS:528:DC%2BD28XjsFCmsg%3D%3D, PID: 16404528 | |
dcterms.bibliographicCitation | Godden, J.W., Stahura, F.L., Variability of molecular descriptors in compound databases revealed by Shannon entropy calculations (2000) J Chem Inf Comput Sci, 40, pp. 796-800. , COI: 1:CAS:528:DC%2BD3cXisVOqurc%3D, PID: 10850785 | |
dcterms.bibliographicCitation | Godden, J.W., Bajorath, J., Chemical descriptors with distinct levels of information content and varying sensitivity to differences between selected compound databases identified by SE-DSE analysis (2002) J Chem Inf Comput Sci, 42, pp. 87-93. , COI: 1:CAS:528:DC%2BD3MXosFOqsbk%3D, PID: 11855971 | |
dcterms.bibliographicCitation | Barigye, S.J., Marrero-Ponce, Y., Pérez-Giménez, F., Bonchev, D., Trends in information theory-based chemical structure codification (2014) Mol Divers, 18, pp. 673-686. , COI: 1:CAS:528:DC%2BC2cXls1Kmsr8%3D, PID: 24705993 | |
dcterms.bibliographicCitation | Witten, I.H., Eibe, F., Hall, M.A., Data mining: practical machine learning tools and techniques (2011) The Morgan Kaufmann series in data management systems, , Morgan Kaufmann, Burlington | |
dcterms.bibliographicCitation | Alter, O., Brown, P.O., Botstein, D., Singular value decomposition for genome-wide expression data processing and modeling (2000) Proc Natl Acad Sci USA, 97, pp. 10101-10106. , COI: 1:CAS:528:DC%2BD3cXmtlehsbs%3D, PID: 10963673 | |
dcterms.bibliographicCitation | Devakumari, D., Thangavel, K., Unsupervised adaptive floating search feature selection based on contribution entropy. In: 2010 international conference on communication and computational intelligence (INCOCCI) (2010) pp 623–627 | |
dcterms.bibliographicCitation | Dash, M., Choi, K., Scheuermann, P., Huan, L., Feature selection for clustering—a filter solution (2002) Proceedings of the 2002 IEEE international conference on data mining (ICDM, 2003, pp. 115-122 | |
dcterms.bibliographicCitation | Stahura, F.L., Godden, J.W., Bajorath, J., Differential Shannon entropy analysis identifies molecular property descriptors that predict aqueous solubility of synthetic compounds with high accuracy in binary QSAR calculations (2002) J Chem Inf Comput Sci, 42, pp. 550-558. , COI: 1:CAS:528:DC%2BD38Xht1Gktrs%3D, PID: 12086513 | |
dcterms.bibliographicCitation | Wassermann, A.M., Nisius, B., Vogt, M., Bajorath, J., Identification of descriptors capturing compound class-specific features by mutual information analysis (2010) J Chem Inf Model, 50, pp. 1935-1940. , COI: 1:CAS:528:DC%2BC3cXhtlWiu7zO, PID: 20961115 | |
dcterms.bibliographicCitation | Cover, T.M., Thomas, J.A., (1991) Elements of Information theory, , Wiley, New York: | |
dcterms.bibliographicCitation | Desurvire, E., (2009) Classical and quantum information theory, , Cambridge University Press, New York: | |
dcterms.bibliographicCitation | Quinlan, J.R., Learning efficient classification procedures and their application to chess end games. In: Michalski R, Carbonell J, Mitchell T (eds) Machine learning. Symbolic computation. Springer, Berlin, pp 463–482 (1983) doi:10.1007/978-3-662-12405-5_15 | |
dcterms.bibliographicCitation | Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T., (1988) Numerical recipes in C: the art of scientific computing, , Cambridge University Press, New York: | |
dcterms.bibliographicCitation | Consonni, V., Todeschini, R., Pavan, M., Gramatica, P., Structure/response correlations and similarity/diversity analysis by GETAWAY descriptors. Part 2. Application of the novel 3D molecular descriptors to QSAR/QSPR studies (2002) J Chem Inf Comput Sci, 42, pp. 693-705. , COI: 1:CAS:528:DC%2BD38XivFCgtrc%3D, PID: 12086531 | |
dcterms.bibliographicCitation | Pérez González, M., Terán, C., Teijeira, M., González-Moa, M.J., GETAWAY descriptors to predicting A2A adenosine receptors agonists (2005) Eur J Med Chem, 40, pp. 1080-1086 | |
dcterms.bibliographicCitation | Saiz-Urra, L., Pérez González, M., Quantitative structure-activity relationship studies of HIV-1 integrase inhibition.1. GETAWAY descriptors (2007) Eur J Med Chem, 42, pp. 64-70. , COI: 1:CAS:528:DC%2BD2sXhsFyku7s%3D, PID: 17030481 | |
dcterms.bibliographicCitation | Fedorowicz, A., Singh, H., Soderholm, S., Demchuk, E., Structure–activity models for contact sensitization (2005) Chem Res Toxicol, 18, pp. 954-969. , COI: 1:CAS:528:DC%2BD2MXjvFKjtbs%3D, PID: 15962930 | |
dcterms.bibliographicCitation | Saiz-Urra, L., Pérez González, M., QSAR studies about cytotoxicity of benzophenazines with dual inhibition toward both topoisomerases I and II: 3D-MoRSE descriptors and statistical considerations about variable selection (2006) Bioorg Med Chem, 14, pp. 7347-7358. , COI: 1:CAS:528:DC%2BD28XpvFGjtb4%3D, PID: 16962784 | |
dcterms.bibliographicCitation | Gasteiger, J., Sadowski, J., Schuur, J., Selzer, P., Steinhauer, L., Steinhauer, V., Chemical information in 3Dspace (1996) J Chem Inf Comput Sci, 36, pp. 1030-1037. , COI: 1:CAS:528:DyaK28XltlCms7k%3D | |
dcterms.bibliographicCitation | Gasteiger, J., Schuur, J., Selzer, P., Steinhauer, L., Steinhauer, V., Finding the 3D structure of a molecule in its IR spectrum (1997) Fresen J Anal Chem, 359, pp. 50-55. , COI: 1:CAS:528:DyaK2sXls1Clt7c%3D | |
dcterms.bibliographicCitation | Schuur, J., Selzer, P., Gasteiger, J., The coding of the three-dimensional structure of molecules by molecular transforms and its application to structure-spectra correlations and studies of biological activity (1996) J Chem Inf Comput Sci, 36, pp. 334-344. , COI: 1:CAS:528:DyaK28Xhtlygtb4%3D | |
dcterms.bibliographicCitation | Baumann, K., Uniform-length molecular descriptors for quantitative structure-property relationships (QSPR) and quantitative structure-activity relationships (QSAR): classification studies and similarity searching (1999) TRAC, 18, pp. 36-46. , COI: 1:CAS:528:DyaK1MXltFShsg%3D%3D | |
dcterms.bibliographicCitation | Jelcic, Z., Solvent molecular descriptors on poly(D, L-lactide-co-glycolide) particle size in emulsification-diffusion process (2004) Coll Surf A Physico-Chem Eng Asp, 242, pp. 159-166. , COI: 1:CAS:528:DC%2BD2cXlvFGktbs%3D | |
dcterms.bibliographicCitation | Todeschini, R., Bettiol, C., Giurin, G., Gramatica, P., Miana, P., Argese, E., Modeling and prediction by using WHIM descriptors in QSAR studies. Submitochondrial particles (SMP) as toxicity biosensors of chlorophenols (1996) Chemosphere, 33, pp. 71-79. , COI: 1:CAS:528:DyaK28XktlersLs%3D | |
dcterms.bibliographicCitation | Randic, M., Molecular profiles. Novel geometry-dependent molecular descriptors (1995) New J Chem, 19, pp. 781-791. , COI: 1:CAS:528:DyaK2MXnvVWisbg%3D | |
dcterms.bibliographicCitation | Fayyad, U.M., Irani, K.B., Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th international joint conference on artificial intelligence (1993) pp 1022–1027, , http://dblp.uni-trier.de/db/conf/ijcai/ijcai93.html#FayyadI93 | |
dcterms.bibliographicCitation | http://www.ics.uci.edu/~mlearn/MLRepository.html, Newman DJ, Hettich S, Blake CL, Merz CJ (1998) UCI repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine, CA | |
dcterms.bibliographicCitation | Guyon, I., Gunn, S.R., Ben-Hur, A., Dror G (2004) Result analysis of the NIPS (2003) feature selection challenge. In, pp. 545-552. , http://papers.nips.cc/paper/2728-result-analysis-of-the-nips-2003-feature-selection-challenge, Advances in neural information processing systems, Vancouver, BC: | |
dcterms.bibliographicCitation | Webb, A.R., (2002) Statistical pattern recognition, , Wiley, Chichester: | |
dcterms.bibliographicCitation | Cover, T.M., The best two independent measurements are not the two best (1974) IEEE Trans Syst Man Cybern, 4, pp. 116-117 | |
oaire.resourceType | http://purl.org/coar/resource_type/c_6501 | |
oaire.version | http://purl.org/coar/version/c_970fb48d4fbd8a85 |