Browsing by Author "Barigye S.J."

Now showing 1 - 8 of 8

Generalized molecular descriptors derived from event-based discrete derivative
(Bentham Science Publishers B.V., 2016) Martínez-Santiago O.; Cabrera R.M.; Marrero-Ponce Y.; Barigye S.J.; Le-Thi-Thu H.; Torres, Javier; Zambrano C.H.; Yaber Goenaga, Iván; Cruz-Monteagudo, M.; López Y.M.; Giménez F.P.; Torrens, F.
In the present study, a generalized approach for molecular structure characterization is introduced, based on the relation frequency matrix (F) representation of the molecular graph and the subsequent calculation of the corresponding discrete derivative (finite difference) over a pair of elements (atoms). In earlier publications (22-24), an unique event, named connected subgraphs, (based on the Kier-Hall’s subgraphs) was systematically employed for the computation of the matrix F. The present report is a generalization of this notion, in which eleven additional events are introduced, classified in three categories, namely, topological (terminal paths, vertex path incidence, quantum subgraphs, walks of length k, Sach’s subgraphs), fingerprints (MACCs, E-state and substructure fingerprints) and atomic contributions (Ghose and Crippen atom-types for hydrophobicity and refractivity) for F generation. The events are intended to capture diverse information by the generation or search of different kinds of substructures from the graph representation of a molecule. The discrete derivative over duplex atom relations are calculated for each event, and the resulting derivatives, local vertex invariants (LOVIs) are finally obtained. These LOVIs are subsequently employed as the basis for the calculation of global and local indices over groups of atoms (heteroatoms, halogens, methyl carbons, etc.), by using norms, means, statistics and classical algorithms as aggregator (fusion) operators. These indices were implemented in our house software DIVATI (Derivative Type Indices, a new module of TOMOCOMDCARDD system). DIVATI provides a friendly and cross-platform graphical user interface, developed in the Java programming language and is freely available at: http: //www.tomocomd.com. Factor analysis shows that the presented events are rather orthogonal and collect diverse information about the chemical structure. Finally, QSPR models were built to describe the logP and logK of 34 furylethylenes derivatives using the eleven events. Generally, the equations obtained according to these events showed high correlations, with the Sach’s sub-graphs and Multiplicity events showing the best behavior in the description of logK (Q2LOO value of 99.06%) and logP (Q2LOO value of 98.1%), respectively. These results show that these new eventbased indices constitute a powerful approach for chemoinformatics studies. © 2016 Bentham Science Publishers.
IMMAN: free software for information theory-based chemometric analysis
(Kluwer Academic Publishers, 2015) Urias R.W.P.; Barigye S.J.; Marrero-Ponce Y.; García-Jacas C.R.; Valdes-Martiní J.R.; Perez-Gimenez F.
Abstract: The features and theoretical background of a new and free computational program for chemometric analysis denominated IMMAN (acronym for Information theory-based CheMoMetrics ANalysis) are presented. This is multi-platform software developed in the Java programming language, designed with a remarkably user-friendly graphical interface for the computation of a collection of information-theoretic functions adapted for rank-based unsupervised and supervised feature selection tasks. A total of 20 feature selection parameters are presented, with the unsupervised and supervised frameworks represented by 10 approaches in each case. Several information-theoretic parameters traditionally used as molecular descriptors (MDs) are adapted for use as unsupervised rank-based feature selection methods. On the other hand, a generalization scheme for the previously defined differential Shannon’s entropy is discussed, as well as the introduction of Jeffreys information measure for supervised feature selection. Moreover, well-known information-theoretic feature selection parameters, such as information gain, gain ratio, and symmetrical uncertainty are incorporated to the IMMAN software (http://mobiosd-hub.com/imman-soft/), following an equal-interval discretization approach. IMMAN offers data pre-processing functionalities, such as missing values processing, dataset partitioning, and browsing. Moreover, single parameter or ensemble (multi-criteria) ranking options are provided. Consequently, this software is suitable for tasks like dimensionality reduction, feature ranking, as well as comparative diversity analysis of data matrices. Simple examples of applications performed with this program are presented. A comparative study between IMMAN and WEKA feature selection tools using the Arcene dataset was performed, demonstrating similar behavior. In addition, it is revealed that the use of IMMAN unsupervised feature selection methods improves the performance of both IMMAN and WEKA supervised algorithms. © 2015, Springer International Publishing Switzerland.
Multi-server approach for high-throughput molecular descriptors calculation based on multi-linear algebraic maps
(Wiley-VCH Verlag, 2015) García-Jacas C.R.; Aguilera-Mendoza, L.; González-Pérez R.; Marrero-Ponce Y.; Acevedo-Martínez L.; Barigye S.J.; Avdeenko T.
The present report introduces a novel module of the QuBiLS-MIDAS software for the distributed computation of the 3D Multi-Linear algebraic molecular indices. The main motivation for developing this module is to deal with the computational complexity experienced during the calculation of the descriptors over large datasets. To accomplish this task, a multi-server computing platform named Tarenal was developed, which is suited for institutions with many workstations interconnected through a local network and without resources particularly destined for computation tasks. This new system was deployed in 337 workstations and it was perfectly integrated with the QuBiLSMIDAS software. To illustrate the usability of the T-arenal platform, performance tests over a dataset comprised of 15000 compounds are carried out, yielding a 52 and 60 fold reduction in the sequential processing time for the 2-Linear and 3-Linear indices, respectively. Therefore, it can be stated that the T-arenal based distribution of computation tasks constitutes a suitable strategy for performing high-throughput calculations of 3D Multi-Linear descriptors over thousands of chemical structures for posterior QSAR and/or ADME-Tox studies. © 2015 Wiley-VCH Verlag GmbH & Co. KGaA.
Novel 3D bio-macromolecular bilinear descriptors for protein science: Predicting protein structural classes
(Academic Press, 2015) Marrero-Ponce Y.; Contreras-Torres E.; García-Jacas C.R.; Barigye S.J.; Cubillán, Néstor; Alvarado Y.J.
In the present study, we introduce novel 3D protein descriptors based on the bilinear algebraic form in the ℝn space on the coulombic matrix. For the calculation of these descriptors, macromolecular vectors belonging to ℝn space, whose components represent certain amino acid side-chain properties, were used as weighting schemes. Generalization approaches for the calculation of inter-amino acidic residue spatial distances based on Minkowski metrics are proposed. The simple- and double-stochastic schemes were defined as approaches to normalize the coulombic matrix. The local-fragment indices for both amino acid-types and amino acid-groups are presented in order to permit characterizing fragments of interest in proteins. On the other hand, with the objective of taking into account specific interactions among amino acids in global or local indices, geometric and topological cut-offs are defined. To assess the utility of global and local indices a classification model for the prediction of the major four protein structural classes, was built with the Linear Discriminant Analysis (LDA) technique. The developed LDA-model correctly classifies the 92.6% and 92.7% of the proteins on the training and test sets, respectively. The obtained model showed high values of the generalized square correlation coefficient (GC2) on both the training and test series. The statistical parameters derived from the internal and external validation procedures demonstrate the robustness, stability and the high predictive power of the proposed model. The performance of the LDA-model demonstrates the capability of the proposed indices not only to codify relevant biochemical information related to the structural classes of proteins, but also to yield suitable interpretability. It is anticipated that the current method will benefit the prediction of other protein attributes or functions. © 2015 Elsevier Ltd.
Novel global and local 3D atom-based linear descriptors of the Minkowski distance matrix: theory, diversity–variability analysis and QSPR applications
(Kluwer Academic Publishers, 2015) Cubillán, Néstor; Marrero-Ponce Y.; Ariza-Rico H.; Barigye S.J.; García-Jacas C.R.; Valdes-Martini J.R.; Alvarado Y.J.
A new family of alignment-free 3D descriptors based on TOMOCOMD-CARDD framework has been designed, namely 3D-linear indices. In this report, we have proposed the use of a generalized form of the geometric pairwise atom-atom distance matrix as structural information matrix. This matrix, denominated as non-stochastic, uses as matrix form of linear maps as well as their algebraic transformations: stochastic, double stochastic and mutual probabilities matrices. The methodology for 3D-QSAR studies is based on the combined use of global and local approaches. Principal component analysis reveals that the novel indices are capable of capturing structural information not codified by the indices implemented in the DRAGON’s software. Moreover, Shannon’s entropy based variability analysis comparing the 3D-linear indices with some relevant descriptors suggests that the former encode similar-to-better amount of structural information than these descriptors. Finally, a search for the best regressions for congeneric databases in QSPR modeling was performed. The overall results demonstrates satisfactory behavior. © 2015, Springer International Publishing Switzerland.
Optimum search strategies or novel 3D molecular descriptors: Is there a stalemate?
(Bentham Science Publishers B.V., 2015) Marrero-Ponce Y.; García-Jacas C.R.; Barigye S.J.; Valdés-Martiní J.R.; Rivera-Borroto O.M.; Pino-Urias R.W.; Cubillán, Néstor; Alvarado Y.J.; Le-Thi-Thu H.
The present manuscript describes a novel 3D-QSAR alignment free method (QuBiLS-MIDAS Duplex) based on algebraic bilinear, quadratic and linear forms on the kth two-tuple spatial-(dis)similarity matrix. Generalization schemes for the inter-atomic spatial distance using diverse (dis)-similarity measures are discussed. On the other hand, normalization approaches for the two-tuple spatial-(dis)similarity matrix by using simple-and double-stochastic and mutual probability schemes are introduced. With the aim of taking into consideration particular inter-atomic interactions in total or local-fragment indices, path and length cut-off constraints are used. Also, in order to generalize the use of the linear combination of atom-level indices to yield global (molecular) definitions, a set of aggregation operators (invariants) are applied. A Shannon’s entropy based variability study for the proposed 3D algebraic form-based indices and the DRAGON molecular descriptor families demonstrates superior performance for the former. A principal component analysis reveals that the novel indices codify structural information orthogonal to those captured by the DRAGON indices. Finally, a QSAR study for the binding affinity to the corticosteroid-binding globulin using Cramer’s steroid database is performed. From this study, it is revealed that the QuBiLS-MIDAS Duplex approach yields similar-to-superior performance statistics than all the 3D-QSAR methods reported in the literature reported so far, even with lower degree of freedom, using both the 31 steroids as the training set and the popular division of Cramer’s database in training [1-21] and test sets [22-31]. It is thus expected that this methodology provides useful tools for the diversity analysis of compound datasets and high-throughput screening structure–activity data. © 2015 Bentham Science Publishers.
QuBiLs-MAS method in early drug discovery and rational drug identification of antifungal agents
(Taylor and Francis Ltd., 2015) Medina Marrero R.; Marrero-Ponce Y.; Barigye S.J.; Echeverría Díaz Y.; Acevedo Barrios, Rosa; Casañola-Martín G.M.; García Bernal M.; Torrens, F.; Pérez-Giménez F.
The QuBiLs-MAS approach is used for the in silico modelling of the antifungal activity of organic molecules. To this effect, non-stochastic (NS) and simple-stochastic (SS) atom-based quadratic indices are used to codify chemical information for a comprehensive dataset of 2478 compounds having a great structural variability, with 1087 of them being antifungal agents, covering the broadest antifungal mechanisms of action known so far. The NS and SS index-based antifungal activity classification models obtained using linear discriminant analysis (LDA) yield correct classification percentages of 90.73% and 92.47%, respectively, for the training set. Additionally, these models are able to correctly classify 92.16% and 87.56% of 706 compounds in an external test set. A comparison of the statistical parameters of the QuBiLs-MAS LDA-based models with those for models reported in the literature reveals comparable to superior performance, although the latter were built over much smaller and less diverse datasets, representing fewer mechanisms of action. It may therefore be inferred that the QuBiLs-MAS method constitutes a valuable tool useful in the design and/or selection of new and broad spectrum agents against life-threatening fungal infections. © 2015 Taylor & Francis.
Towards better BBB passage prediction using an extensive and curated data set
(Wiley-VCH Verlag, 2015) Brito-Sánchez Y.; Marrero-Ponce Y.; Barigye S.J.; Yaber Goenaga, Iván; Morell Pérez C.; Le-Thi-Thu H.; Cherkasov A.
In the present report, the challenging task of drug delivery across the blood-brain barrier (BBB) is addressed via a computational approach. The BBB passage was modeled using classification and regression schemes on a novel extensive and curated data set (the largest to the best of our knowledge) in terms of log BB. Prior to the model development, steps of data analysis that comprise chemical data curation, structural, cutoff and cluster analysis (CA) were conducted. Linear Discriminant Analysis (LDA) and Multiple Linear Regression (MLR) were used to fit classification and correlation functions. The best LDA-based model showed overall accuracies over 85% and 83% for the training and test sets, respectively. Also a MLR-based model with acceptable explanation of more than 69% of the variance in the experimental log BB was developed. A brief and general interpretation of proposed models allowed the estimation on how 'near' our computational approach is to the factors that determine the passage of molecules through the BBB. In a final effort some popular and powerful Machine Learning methods were considered. Comparable or similar performance was observed respect to the simpler linear techniques. Most of the compounds with anomalous behavior were put aside into a set denoted as controversial set and discussion regarding to these compounds is provided. Finally, our results were compared with methodologies previously reported in the literature showing comparable to better results. The results could represent useful tools available and reproducible by all scientific community in the early stages of neuropharmaceutical drug discovery/development projects. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.