Browsing by Author "Marrero-Ponce Y."

Now showing 1 - 11 of 11

A Hooke's law-based approach to protein folding rate
(Academic Press, 2015) Ruiz-Blanco Y.B.; Marrero-Ponce Y.; Prieto P.J.; Salgado J.; García Y.; Sotomayor-Torres C.M.
Kinetics is a key aspect of the renowned protein folding problem. Here, we propose a comprehensive approach to folding kinetics where a polypeptide chain is assumed to behave as an elastic material described by the Hooke[U+05F3]s law. A novel parameter called elastic-folding constant results from our model and is suggested to distinguish between protein with two-state and multi-state folding pathways. A contact-free descriptor, named folding degree, is introduced as a suitable structural feature to study protein-folding kinetics. This approach generalizes the observed correlations between varieties of structural descriptors with the folding rate constant. Additionally several comparisons among structural classes and folding mechanisms were carried out showing the good performance of our model with proteins of different types. The present model constitutes a simple rationale for the structural and energetic factors involved in protein folding kinetics. © 2014 Elsevier Ltd.
Generalized molecular descriptors derived from event-based discrete derivative
(Bentham Science Publishers B.V., 2016) Martínez-Santiago O.; Cabrera R.M.; Marrero-Ponce Y.; Barigye S.J.; Le-Thi-Thu H.; Torres, Javier; Zambrano C.H.; Yaber Goenaga, Iván; Cruz-Monteagudo, M.; López Y.M.; Giménez F.P.; Torrens, F.
In the present study, a generalized approach for molecular structure characterization is introduced, based on the relation frequency matrix (F) representation of the molecular graph and the subsequent calculation of the corresponding discrete derivative (finite difference) over a pair of elements (atoms). In earlier publications (22-24), an unique event, named connected subgraphs, (based on the Kier-Hall’s subgraphs) was systematically employed for the computation of the matrix F. The present report is a generalization of this notion, in which eleven additional events are introduced, classified in three categories, namely, topological (terminal paths, vertex path incidence, quantum subgraphs, walks of length k, Sach’s subgraphs), fingerprints (MACCs, E-state and substructure fingerprints) and atomic contributions (Ghose and Crippen atom-types for hydrophobicity and refractivity) for F generation. The events are intended to capture diverse information by the generation or search of different kinds of substructures from the graph representation of a molecule. The discrete derivative over duplex atom relations are calculated for each event, and the resulting derivatives, local vertex invariants (LOVIs) are finally obtained. These LOVIs are subsequently employed as the basis for the calculation of global and local indices over groups of atoms (heteroatoms, halogens, methyl carbons, etc.), by using norms, means, statistics and classical algorithms as aggregator (fusion) operators. These indices were implemented in our house software DIVATI (Derivative Type Indices, a new module of TOMOCOMDCARDD system). DIVATI provides a friendly and cross-platform graphical user interface, developed in the Java programming language and is freely available at: http: //www.tomocomd.com. Factor analysis shows that the presented events are rather orthogonal and collect diverse information about the chemical structure. Finally, QSPR models were built to describe the logP and logK of 34 furylethylenes derivatives using the eleven events. Generally, the equations obtained according to these events showed high correlations, with the Sach’s sub-graphs and Multiplicity events showing the best behavior in the description of logK (Q2LOO value of 99.06%) and logP (Q2LOO value of 98.1%), respectively. These results show that these new eventbased indices constitute a powerful approach for chemoinformatics studies. © 2016 Bentham Science Publishers.
IMMAN: free software for information theory-based chemometric analysis
(Kluwer Academic Publishers, 2015) Urias R.W.P.; Barigye S.J.; Marrero-Ponce Y.; García-Jacas C.R.; Valdes-Martiní J.R.; Perez-Gimenez F.
Abstract: The features and theoretical background of a new and free computational program for chemometric analysis denominated IMMAN (acronym for Information theory-based CheMoMetrics ANalysis) are presented. This is multi-platform software developed in the Java programming language, designed with a remarkably user-friendly graphical interface for the computation of a collection of information-theoretic functions adapted for rank-based unsupervised and supervised feature selection tasks. A total of 20 feature selection parameters are presented, with the unsupervised and supervised frameworks represented by 10 approaches in each case. Several information-theoretic parameters traditionally used as molecular descriptors (MDs) are adapted for use as unsupervised rank-based feature selection methods. On the other hand, a generalization scheme for the previously defined differential Shannon’s entropy is discussed, as well as the introduction of Jeffreys information measure for supervised feature selection. Moreover, well-known information-theoretic feature selection parameters, such as information gain, gain ratio, and symmetrical uncertainty are incorporated to the IMMAN software (http://mobiosd-hub.com/imman-soft/), following an equal-interval discretization approach. IMMAN offers data pre-processing functionalities, such as missing values processing, dataset partitioning, and browsing. Moreover, single parameter or ensemble (multi-criteria) ranking options are provided. Consequently, this software is suitable for tasks like dimensionality reduction, feature ranking, as well as comparative diversity analysis of data matrices. Simple examples of applications performed with this program are presented. A comparative study between IMMAN and WEKA feature selection tools using the Arcene dataset was performed, demonstrating similar behavior. In addition, it is revealed that the use of IMMAN unsupervised feature selection methods improves the performance of both IMMAN and WEKA supervised algorithms. © 2015, Springer International Publishing Switzerland.
Multi-output model with box-jenkins operators of quadratic indices for prediction of malaria and cancer inhibitors targeting ubiquitin-proteasome pathway (UPP) proteins
(Bentham Science Publishers B.V., 2016) Casañola-Martín G.M.; Le-Thi-Thu H.; Pérez-Giménez F.; Marrero-Ponce Y.; Merino-Sanjuán M.; Abad C.; González-Díaz H.
The ubiquitin-proteasome pathway (UPP) is the primary degradation system of short-lived regulatory proteins. Cellular processes such as the cell cycle, signal transduction, gene expression, DNA repair and apoptosis are regulated by this UPP and dysfunctions in this system have important implications in the development of cancer, neurodegenerative, cardiac and other human pathologies. UPP seems also to be very important in the function of eukaryote cells of the human parasites like Plasmodium falciparum, the causal agent of the neglected disease Malaria. Hence, the UPP could be considered as an attractive target for the development of compounds with Anti-Malarial or Anti-cancer properties. Recent online databases like ChEMBL contains a larger quantity of information in terms of pharmacological assay protocols and compounds tested as UPP inhibitors under many different conditions. This large amount of data give new openings for the computer-aided identification of UPP inhibitors, but the intrinsic data diversity is an obstacle for the development of successful classifiers. To solve this problem here we used the Bob-Jenkins moving average operators and the atom-based quadratic molecular indices calculated with the software TOMOCOMD-CARDD (TC) to develop a quantitative model for the prediction of the multiple outputs in this complex dataset. Our multi-target model can predict results for drugs against 22 molecular or cellular targets of different organisms with accuracies above 70% in both training and validation sets. © 2016 Bentham Science Publishers.
Multi-server approach for high-throughput molecular descriptors calculation based on multi-linear algebraic maps
(Wiley-VCH Verlag, 2015) García-Jacas C.R.; Aguilera-Mendoza, L.; González-Pérez R.; Marrero-Ponce Y.; Acevedo-Martínez L.; Barigye S.J.; Avdeenko T.
The present report introduces a novel module of the QuBiLS-MIDAS software for the distributed computation of the 3D Multi-Linear algebraic molecular indices. The main motivation for developing this module is to deal with the computational complexity experienced during the calculation of the descriptors over large datasets. To accomplish this task, a multi-server computing platform named Tarenal was developed, which is suited for institutions with many workstations interconnected through a local network and without resources particularly destined for computation tasks. This new system was deployed in 337 workstations and it was perfectly integrated with the QuBiLSMIDAS software. To illustrate the usability of the T-arenal platform, performance tests over a dataset comprised of 15000 compounds are carried out, yielding a 52 and 60 fold reduction in the sequential processing time for the 2-Linear and 3-Linear indices, respectively. Therefore, it can be stated that the T-arenal based distribution of computation tasks constitutes a suitable strategy for performing high-throughput calculations of 3D Multi-Linear descriptors over thousands of chemical structures for posterior QSAR and/or ADME-Tox studies. © 2015 Wiley-VCH Verlag GmbH & Co. KGaA.
Novel 3D bio-macromolecular bilinear descriptors for protein science: Predicting protein structural classes
(Academic Press, 2015) Marrero-Ponce Y.; Contreras-Torres E.; García-Jacas C.R.; Barigye S.J.; Cubillán, Néstor; Alvarado Y.J.
In the present study, we introduce novel 3D protein descriptors based on the bilinear algebraic form in the ℝn space on the coulombic matrix. For the calculation of these descriptors, macromolecular vectors belonging to ℝn space, whose components represent certain amino acid side-chain properties, were used as weighting schemes. Generalization approaches for the calculation of inter-amino acidic residue spatial distances based on Minkowski metrics are proposed. The simple- and double-stochastic schemes were defined as approaches to normalize the coulombic matrix. The local-fragment indices for both amino acid-types and amino acid-groups are presented in order to permit characterizing fragments of interest in proteins. On the other hand, with the objective of taking into account specific interactions among amino acids in global or local indices, geometric and topological cut-offs are defined. To assess the utility of global and local indices a classification model for the prediction of the major four protein structural classes, was built with the Linear Discriminant Analysis (LDA) technique. The developed LDA-model correctly classifies the 92.6% and 92.7% of the proteins on the training and test sets, respectively. The obtained model showed high values of the generalized square correlation coefficient (GC2) on both the training and test series. The statistical parameters derived from the internal and external validation procedures demonstrate the robustness, stability and the high predictive power of the proposed model. The performance of the LDA-model demonstrates the capability of the proposed indices not only to codify relevant biochemical information related to the structural classes of proteins, but also to yield suitable interpretability. It is anticipated that the current method will benefit the prediction of other protein attributes or functions. © 2015 Elsevier Ltd.
Novel global and local 3D atom-based linear descriptors of the Minkowski distance matrix: theory, diversity–variability analysis and QSPR applications
(Kluwer Academic Publishers, 2015) Cubillán, Néstor; Marrero-Ponce Y.; Ariza-Rico H.; Barigye S.J.; García-Jacas C.R.; Valdes-Martini J.R.; Alvarado Y.J.
A new family of alignment-free 3D descriptors based on TOMOCOMD-CARDD framework has been designed, namely 3D-linear indices. In this report, we have proposed the use of a generalized form of the geometric pairwise atom-atom distance matrix as structural information matrix. This matrix, denominated as non-stochastic, uses as matrix form of linear maps as well as their algebraic transformations: stochastic, double stochastic and mutual probabilities matrices. The methodology for 3D-QSAR studies is based on the combined use of global and local approaches. Principal component analysis reveals that the novel indices are capable of capturing structural information not codified by the indices implemented in the DRAGON’s software. Moreover, Shannon’s entropy based variability analysis comparing the 3D-linear indices with some relevant descriptors suggests that the former encode similar-to-better amount of structural information than these descriptors. Finally, a search for the best regressions for congeneric databases in QSPR modeling was performed. The overall results demonstrates satisfactory behavior. © 2015, Springer International Publishing Switzerland.
Optimum search strategies or novel 3D molecular descriptors: Is there a stalemate?
(Bentham Science Publishers B.V., 2015) Marrero-Ponce Y.; García-Jacas C.R.; Barigye S.J.; Valdés-Martiní J.R.; Rivera-Borroto O.M.; Pino-Urias R.W.; Cubillán, Néstor; Alvarado Y.J.; Le-Thi-Thu H.
The present manuscript describes a novel 3D-QSAR alignment free method (QuBiLS-MIDAS Duplex) based on algebraic bilinear, quadratic and linear forms on the kth two-tuple spatial-(dis)similarity matrix. Generalization schemes for the inter-atomic spatial distance using diverse (dis)-similarity measures are discussed. On the other hand, normalization approaches for the two-tuple spatial-(dis)similarity matrix by using simple-and double-stochastic and mutual probability schemes are introduced. With the aim of taking into consideration particular inter-atomic interactions in total or local-fragment indices, path and length cut-off constraints are used. Also, in order to generalize the use of the linear combination of atom-level indices to yield global (molecular) definitions, a set of aggregation operators (invariants) are applied. A Shannon’s entropy based variability study for the proposed 3D algebraic form-based indices and the DRAGON molecular descriptor families demonstrates superior performance for the former. A principal component analysis reveals that the novel indices codify structural information orthogonal to those captured by the DRAGON indices. Finally, a QSAR study for the binding affinity to the corticosteroid-binding globulin using Cramer’s steroid database is performed. From this study, it is revealed that the QuBiLS-MIDAS Duplex approach yields similar-to-superior performance statistics than all the 3D-QSAR methods reported in the literature reported so far, even with lower degree of freedom, using both the 31 steroids as the training set and the popular division of Cramer’s database in training [1-21] and test sets [22-31]. It is thus expected that this methodology provides useful tools for the diversity analysis of compound datasets and high-throughput screening structure–activity data. © 2015 Bentham Science Publishers.
QuBiLs-MAS method in early drug discovery and rational drug identification of antifungal agents
(Taylor and Francis Ltd., 2015) Medina Marrero R.; Marrero-Ponce Y.; Barigye S.J.; Echeverría Díaz Y.; Acevedo Barrios, Rosa; Casañola-Martín G.M.; García Bernal M.; Torrens, F.; Pérez-Giménez F.
The QuBiLs-MAS approach is used for the in silico modelling of the antifungal activity of organic molecules. To this effect, non-stochastic (NS) and simple-stochastic (SS) atom-based quadratic indices are used to codify chemical information for a comprehensive dataset of 2478 compounds having a great structural variability, with 1087 of them being antifungal agents, covering the broadest antifungal mechanisms of action known so far. The NS and SS index-based antifungal activity classification models obtained using linear discriminant analysis (LDA) yield correct classification percentages of 90.73% and 92.47%, respectively, for the training set. Additionally, these models are able to correctly classify 92.16% and 87.56% of 706 compounds in an external test set. A comparison of the statistical parameters of the QuBiLs-MAS LDA-based models with those for models reported in the literature reveals comparable to superior performance, although the latter were built over much smaller and less diverse datasets, representing fewer mechanisms of action. It may therefore be inferred that the QuBiLs-MAS method constitutes a valuable tool useful in the design and/or selection of new and broad spectrum agents against life-threatening fungal infections. © 2015 Taylor & Francis.
Relational Agreement Measures for Similarity Searching of Cheminformatic Data Sets
(Institute of Electrical and Electronics Engineers Inc., 2016) Rivera-Borroto O.M.; García-De La Vega J.M.; Marrero-Ponce Y.; Grau R.
Research on similarity searching of cheminformatic data sets has been focused on similarity measures using fingerprints. However, nominal scales are the least informative of all metric scales, increasing the tied similarity scores, and decreasing the effectivity of the retrieval engines. Tanimoto's coefficient has been claimed to be the most prominent measure for this task. Nevertheless, this field is far from being exhausted since the computer science no free lunch theorem predicts that "no similarity measure has overall superiority over the population of data sets". We introduce 12 relational agreement (RA) coefficients for seven metric scales, which are integrated within a group fusion-based similarity searching algorithm. These similarity measures are compared to a reference panel of 21 proximity quantifiers over 17 benchmark data sets (MUV), by using informative descriptors, a feature selection stage, a suitable performance metric, and powerful comparison tests. In this stage, RA coefficients perform favourably with repect to the state-of-the-art proximity measures. Afterward, the RA-based method outperform another four nearest neighbor searching algorithms over the same data domains. In a third validation stage, RA measures are successfully applied to the virtual screening of the NCI data set. Finally, we discuss a possible molecular interpretation for these similarity variants. © 2016 IEEE.
Towards better BBB passage prediction using an extensive and curated data set
(Wiley-VCH Verlag, 2015) Brito-Sánchez Y.; Marrero-Ponce Y.; Barigye S.J.; Yaber Goenaga, Iván; Morell Pérez C.; Le-Thi-Thu H.; Cherkasov A.
In the present report, the challenging task of drug delivery across the blood-brain barrier (BBB) is addressed via a computational approach. The BBB passage was modeled using classification and regression schemes on a novel extensive and curated data set (the largest to the best of our knowledge) in terms of log BB. Prior to the model development, steps of data analysis that comprise chemical data curation, structural, cutoff and cluster analysis (CA) were conducted. Linear Discriminant Analysis (LDA) and Multiple Linear Regression (MLR) were used to fit classification and correlation functions. The best LDA-based model showed overall accuracies over 85% and 83% for the training and test sets, respectively. Also a MLR-based model with acceptable explanation of more than 69% of the variance in the experimental log BB was developed. A brief and general interpretation of proposed models allowed the estimation on how 'near' our computational approach is to the factors that determine the passage of molecules through the BBB. In a final effort some popular and powerful Machine Learning methods were considered. Comparable or similar performance was observed respect to the simpler linear techniques. Most of the compounds with anomalous behavior were put aside into a set denoted as controversial set and discussion regarding to these compounds is provided. Finally, our results were compared with methodologies previously reported in the literature showing comparable to better results. The results could represent useful tools available and reproducible by all scientific community in the early stages of neuropharmaceutical drug discovery/development projects. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.