Predicting the Molecular Targets of Conopeptides by using Principal Component Analysis and Multiclass Logistic Regression
Xavier Eugenio Asuncion1,4, Abdul-Rashid Sampaco III2,4,
Henry Adorna2,4, Joselito Magadia3,4, Vena Pearl Boñgolan2,4, and Arturo Lluisma1,4*
Computational tools for inferring molecular targets from the primary structure would be crucial in exploiting the wealth of sequence data. In this work, we have developed a computational method in predicting the molecular targets of conopeptides given only their primary structures. Our proposed method makes use of descriptors calculated from the primary structure, and machine learning to create a model that can identify the most likely target among five target types. Our proposed method is based on principal component analysis (PCA) and multiclass logistic regression algorithms. PCA was used to reduce the dimensionality of the data, which resulted in the improvement of the model’s performance. By using nested cross-validation, a multiclass logistic regression with PCA was able to achieve an accuracy of 89% – outperforming other classical machine learning algorithms. We also compared our proposed method to a basic sequence similarity search and found that our method produced better overall results. These results suggest that our proposed method may be used as a complementary method to sequence similarity search in identifying candidate targets of newly sequenced and isolated conopeptides.
With around 800 cone snail species, Conus has emerged as one of the most promising sources of marine drugs (Himaya and Lewis 2018, Gao et al. 2017, Prashanth et al. 2014). Like other venomous organisms, cone snails use their extremely potent venom to capture their prey and to protect themselves from potential predators. Over the course of their evolution, each Conus species has developed a unique set of bioactive peptides, which are commonly referred to as conotoxins or conopeptides. Due to their small size (typically less than 5 kDa), diversity (around 100 per species with little overlap between species), and their high specificity to an array of biological targets, conopeptides serve as excellent templates for the design of novel drugs (Prashanth et al. 2014, Lewis et al. 2012). In 2004, the first conopeptide-derived drug – Prialt – was approved by the US Food and Drug Administration (Pope and Deer 2013) and, since then, several other conopeptides reached the advanced stages of clinical trials (Nielsen et al. 2005, Lubbers et al. 2005, Barton et al. 2004, Sandall et al. 2003). Thus, with more than 80,000 estimated conopeptides and only 6260 conopeptides known to date (Kaas et al. 2007, 2011), cone snail venoms still hold great promise for the discovery of new drug leads. . . . . read more
BARTON M, WHITE H, WILCOX K. 2004. The effect of cgx-1007 and ci-1041, novel NMDA receptor antagonists, on NMDA receptor-mediated EPSCS. Epilepsy Research 59(1): 13–24.
DEGUELDRE M, VERDENAUD M, SHEILA Z, GILLES N, DUCANCEL F, DE PAUW E, QUINTON L. 2017. Diversity in sequences, folds and pharmacological activities of toxins from four Conus species revealed by the combination of cutting-edge technologies of proteomics, transcriptomics, and bioinformatics. Toxicon 130: 116–125.
DING H, DENG E-Z, YUAN L-F, LIU L, LIN H, CHEN W, CHOU K-C. 2014. iCTX-Type: A sequence-based predictor for identifying the types of conotoxins in targeting ion channels. BioMed Research International, Vol. 2014, Article ID 286419, 10 pages.
GAO B, PENG C, YANG J, YI Y, ZHANG J, SHI Q. 2017. Cone snails: A big store of conotoxins for novel drug discovery. Toxins 9(12): 397.
HIMAYA S, LEWIS RJ. 2018. Venomics-accelerated cone snail venom peptide discovery. International Journal of Molecular Sciences 19(3): 788.
JIN A, VETTER I, HIMAYA S, ALEWOOD P, LEWIS R, DUTERTRE S. 2015. Transcriptome and proteome of Conus planorbis identify the nicotinic receptors as primary target for the defensive venom. Proteomics 15(23–24): 4030–4040.
KAAS Q, WESTERMANN J, HALAI R, WANG C, CRAIK D. 2007. Conoserver, a databse for conopeptide sequences and structures. Bioinformatics 24(3): 445–446.
KAAS Q, YU R, JIN A, DUTERTRE S, CRAIK D. 2011.Conoserver: Updated content, knowledge, and discovery tools in the conopeptide database. Nucleic Acids Research 40(D1): D325–D330.
LAVERGNE V, HARLIWONG I, JONES A, MILLER D, TAFT R, ALEWOOD P. 2015. Optimized deep-targeted proteotranscriptomic profiling reveals unexplored Conus toxin diversity and novel cysteine frameworks. Proceedings of the National Academy of Sciences 112(29): E3782–E3791.
LEWIS R, DUTERTRE S, VETTER I, CHRISTIE M. 2012. Conus venom peptide pharmacology. Pharmacological Reviews 64: 259–298.
LUBBERS N, CAMPBELL T, POLAKOWSKI J, BULAJ G, LAYER R, MOORE J, GROSS G, COX B. 2005. Postischemic administration of cgx-1051, a peptide from cone snail venom, reduces infarct size in both rat and dog models of myocardial ischemia and reperfusion. Journal of Cardiovascular Pharmacology 46(2): 141–146.
MANSBACH RA, TRAVERS T, McMAHON BH, FAIR JM, GNANAKARAN S. 2019. Snails in silico: A review of computational studies on the conopeptides. Marine Drugs 17(3): 145.
NIELSEN C, LEWIS R, ALEWOOD D, DRINKWATER R, PALANT E, PATTERSON M, YAKSH T, McCUMBER D, SMITH M. 2005. Anti-allodynic efficacy of the conopeptide, xen2174, in rates with neurpathic pain. Pain 118(102): 112–124.
OLDRATI V, ARRELL M, VIOLETTE A, PERRET F, SPRÜNGLI X, WOLFENDER JL, STÖCKLIN R. 2016. Advances in venomics. Molecular Biosystems 12(12): 3530–3543.
PEDREGOSA F, VAROQUAUX G, GRAMFOR A, MICHEL V, THIRION B, GRISEL O, BLONDEL M, PRETTENHOFER P, WEISS R, DUBOURG V, VANDERPLAS J, PASSOS A, CORNAPEAU D, BRUCHER M, PERROT M, DUCHESNAY E. 2011. Scikit-learn: Machine learning in python. Journal of Machine Learning Research 12: 2825–2830.
POPE J, DEER T. 2013. Ziconotide: A clinical update and pharmacologic review. Expert Opinion on Pharmacotherapy 14(7): 957–966.
PRASHANTH J, BRUST A, JIN A, ALEWOOD P, DUTERTRE S, LEWIS R. 2014. Cone snail venomics: From novel biology to novel therapeutics. Future Medicinal Chemistry 6(15): 1659–1675.
SAFAVI-HEMAMI H, HU H, GORASIA D, BANDYOPADHYAY P, VEITH P, YOUNG N, REYNOLDS E, YANDELL M, OLIVERA B, PURCELL A. 2014. Combined proteomic and transcriptomic interrogation of the venom gland of Conus geographus uncovers novel components and functional compartmentalization. Mol Cell Proteomics 13(4): 938–953.
SANDALL D, SATKUNANATHAN N, KEAYS D, POLIDANO M, LIPING X, PHAM V, DOWN J, KHALIL Z, LIVETT B, GAYLER K. 2003. A novel conotoxin identified by gene sequencing is active in suppressing the vascular response to selective stimulation of sensory nerves in vivo. Biochemistry 42(22): 6904–6911.
SIEVERS F, HIGGINS D. 2018. Clustal omega for making accurate alignments of many protein sciences. Protein Sci. 27: 135–145.
SIEVERS F, WILM A, DINEEN D, GIBSON T, KARPLUS K, LI W, LOPEZ R, McWILLIAM H, REMMERT M, SÖDING J, THOMPSON J, HIGGINS D. 2011. Fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega. Molecular Systems Biology 7(1): 539.
THARWAT A. 2016. Principal component analysis—A tutorial. International Journal of Applied Pattern Recognition 3(3): 197–240.
VARMA S, SIMON R. 2006. Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics 7(1): 91.
VIOLETTE A, BIASS D, DUTERTRE S, KOUA D, PIQUEMAL D, PIERRAT F, STÖCKLIN R, FAVREAU P. 2012. Large-scale discovery of conopeptdes and conoproteins in the injectable venom of a fish-hunting cone snail using a combined proteomic and transcriptomic approach. Journal of Proteomics 75(17): 5215–5225.
WALSH I, POLLASTRI G, TOSATTO S. 2017. Correct machine learning on protein sequences: A peer-reviewing prespective. Briefings in Bioinformatics 5: 831–840.
WANG XF, WANG JM, WANG XL, ZHANG Y. 2017. Predicting the types of ion channel-targeted conotoxins based on avc-svm model. BioMed Research International, Vol. 2017, Article ID 2929807, 8 pages.
WU Y, ZHENG Y, TANG H. 2016. Identifying the types of ion channel-targeted conotoxins by incorporating new properties of residues into pseudo amino acid composition. BioMed Research International, Vol. 2016, Article ID 3981478, 5 pages.
XIAO N, CAO DS, ZHU MF, XU QS. 2015. Protr/protrweb: R package and web server for generating various numerical representation schemes of protein sequences. Bioinformatics 31(11): 1857–1859.
YUAN LF, DING C, GUO SH, DING H, CHEN W, LIN H. 2013. Prediction of the types of ion channel-targeted conotoxins based on radial basis function network. Toxicology In Vitro 27(2): 852–856.
ZHANG L, ZHANG C, GAO R, YANG R, SONG Q. 2016. Using the smote technique and hybrid features to predict the types of ion channel-targeted conotoxins. Journal of Theoretical Biology 403: 75–84.