Transcriptome of the Traditional Coconut Variety Laguna Tall

Ma. Regina Punzalan1,2, Ma. Anita Bautista1,2, Ernesto Emmanuel3,
Ramon Rivera3, Susan Rivera3, and Cynthia Saloma1,2*

1Philippine Genome Center (PGC), University of the Philippines,
Diliman, Quezon City 1101 Philippines
2National Institute of Molecular Biology and Biotechnology,
University of the Philippines, Diliman, Quezon City 1101 Philippines
3Philippine Coconut Authority – Zamboanga Research Center (PCA-ZRC),
San Ramon, Zamboanga City 7000 Philippines

*Corresponding Author: This email address is being protected from spambots. You need JavaScript enabled to view it.




Coconut, Cocos nucifera L., is widely cultivated for its edible and non-edible products. In the Philippines, the traditional coconut variety, Laguna Tall (LAGT), exhibits good genetic potential as a pure population or as open-pollinated variety (OPV).  It was the male parent of the first PCA-recommended hybrid. Because of its importance in agronomic breeding, efforts are geared towards increasing genetic resources through genome and transcriptome sequencing. Here, pooled total RNA from leaves, nuts, and flowers of mature stage LAGT was sequenced using Illumina HiSeq 2000, followed by de novo assembly using four different transcriptome assemblers: Trinity, SOAPdenovo-Trans, Trans-Abyss, and Velvet-Oases. Each assembly was evaluated for accuracy using RSEM-EVAL, a reference-free evaluation method for transcript abundance data. Trans-Abyss outperformed the other three assemblers, but to have a better representation of the LAGT transcriptome, assemblies generated by the four programs were combined using the Evidential Gene tr2aacds pipeline. A total of 79,263 transcripts were generated from the combined transcriptomes. Also, Fragments Per Kilobase of transcript per Million mapped read (FPKM) units were used to quantify in silico gene expression. A total of 68,147 transcripts were generated by RSEM and compared against the CDD, Trembl, and UniProt databases. Gene ontology (GO) analysis and KEGG classification revealed that up to 33.8% of LAGT genes are involved in protein modification. The top 20 expressed genes were annotated using the nr database, which revealed that the most highly expressed transcript is a novel transcript. Microsatellite markers were also obtained for future use as breeding tools. Overall, this study provides a comprehensive assembly of the Cocos nucifera L. transcriptome useful as a molecular toolbox to identify key factors involved in important biological and cellular processes in coconut.



Coconut is an economically significant crop in the tropics particularly in the Philippines, Indonesia, and India. Commonly referred to as the “Tree of Life,” this palm is widely cultivated in 93 countries and is globally a source of edible and non-edible products such as copra, desiccated coconut, coconut oil, coco lumber, and coco coir (Arancon 2010). The Philippines is the top exporter of coconut in the world, with coconut exports accounting to USD 1,586 million per year – equivalent to 30% of the country’s total agricultural earnings (Forbes 2013). In 2018, the country exported 350,000 metric tons of copra and 1 million metric tons of coconut oil (Index Mundi 2019). More than 3.5 M hectares of agricultural land are utilized as coconut plantation with an average production of 14.9 billion nuts per year (PCA 2019). The yield of coconut palms depends on many factors: interaction of the variety; the growing environment, particularly soil conditions and climate; cultural practices such as fertilizer application, pest, and disease management; and farming systems. In terms of varieties, the Philippines has several that are used in commercial planting. Among these is LAGT, which is one of the major tall populations grown in the country (Santos and Rivera 1994). It was named as . . . read more


AMBAWAT S, SHARMA P, YADAV NR, YADAV RC. 2013. MYB transcription factor genes as regulators of plant responses; an overview. Physiology and Molecular Biology of Plants 19(3): 307–321.
ANDREWS S. 2010. FastQC: A quality control tool for high throughput sequence data. Retrieved from on 22 Jun 2016.
ARANCON RN. 2010. Global trends and new opportunities for the coconut industry. Proceedings National Coconut Conference 2009: Opportunities for a Sunrise Industry. Malaysian Agricultural Research and Development Institute.
BARKAN A, SMALL I. 2014. Pentatricopeptide repeat proteins in plants. Annual Review of Plant Biology 65: 415–442.
BARTEL B. 2012. Focus on ubiquitin in plant physiology. Plant Physiology 160(1):1.
BAUD S, LEPINIEC L. 2010. Physiological and developmental regulation of seed oil production. Progress in Lipid Research 49(3): 235–249.
BOLGER AM, LOHSE M, USADEL B. 2014. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics: btu170.
BUSSEMER J, CHIGRI F, VOTHKNECHT UC. 2009. Arabidopsis ATPase family gene 1-like protein 1 is a calmodulin-binding AAA+-ATPase with a dual localization in chloroplasts and mitochondria. FEBS Journal 276: 3870–3880.
CLARKE K, YANG Y, MARSH R, XIE L, ZHANG KK. 2013. Comparative analysis of de novo transcriptome assembly. Science China Life Sciences 56(2): 156–162.
CONESA A, GÖTZ S, GARCÍA-GÓMEZ JM, TEROL J, TALÓN M, ROBLES M. 2005. Blast2GO: A universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21(18): 3674–3676.
DAVIDSON NM, OSHLACK A. 2014. Corset: Enabling differential gene expression analysis for de novo assembled transcriptomes. Genome Biology 15(7):1.
DUAN J, XIA C, ZHAO G, JIA J, KONG X. 2012. Optimizing de novo common wheat transcriptome assembly using short-read RNA-Seq data. BMC Genomics 13(1): 392.
FAN H, XIAO Y, YANG Y, XIA W, MASON AS, XIA Z, QIAO F, ZHAO S, TANG H. 2013. RNA-Seq analysis of Cocos nucifera: Transcriptome sequencing and de novo assembly for subsequent functional genomics approaches. PloS One 8(3): e59997.
FORBES EG. 2013. Outlook for the coconut industry. Retrieved from on 01 Sep 2015.
GARG R, PATEL RK, TYAGI AK, JAIN M. 2011. De novo assembly of chickpea transcriptome using short reads for gene discovery and marker identification. DNA Research 18(1): 53–63.
GILBERT D. 2015. Gene-omes built from mRNA seq not genome DNA. 7th annual arthropod genomics symposium. Notre Dame. Retrieved from on 25 Sep 2015.
GRABHERR MG, HAAS BJ, YASSOUR M, LEVIN JZ, THOMPSON DA, AMIT I, ADICONIS X, FAN L, RAYCHOWDHURY R, ZENG Q, CHEN Z. 2011. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature Biotechnology 29(7): 644–652.
HUANG YY, LEE CP, FU JL, CHANG BCH, MATZKE AJ, MATZKE M. 2014. De Novo Transcriptome Sequence Assembly from Coconut Leaves and Seeds with a Focus on Factors Involved in RNA-Directed DNA Methylation. G3: Genes| Genomes| Genetics 4(11): 2147–2157.
INDEX MUNDI. 2019. Coconut oil production by country in 1000 MT. Retrieved from on 11 Mar 2019.
IWANAGA D, GRAY DA, FISK ID, DECKER EA, WEISS J, MCCLEMENTS DJ. 2007. Extraction and characterization of oil bodies from soy beans: A natural source of pre-emulsified soybean oil. Journal of Agricultural and Food Chemistry 55(21): 8711-8716.
KENT WJ. 2002. BLAT—the BLAST-like alignment tool. Genome Research 12(4): 656–664.
LANG T, YIN K, LIU J, CAO K, CANNON CH, DU FK. 2014. Protein domain analysis of genome sequence data reveals regulation of LRR-related domain in plant transpiration in Ficus. PLoS One 9(9): e108719.
LANGADEL DM, SHAHI JP, AGRAWAL VK, SHARMA A. 2013. Maize as emerging source of oil in India: an overview. Maydica 58(3–4): 224–230.
LANGMEAD B, SALZBERG SL. 2012. Fast gapped-read alignment with Bowtie 2. Nature Methods 9(4): 357–359.
LEE H, SUH SS, PARK E, CHO E, AHN JH, KIM SG, LEE JS, KWON YM, LEE I. 2000. The AGAMOUS-LIKE 20 MADS domain protein integrates floral inductive pathways in Arabidopsis. Genes & Development 14(18): 2366–2376.
LI B, DEWEY CN. 2011. RSEM: Accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12(1):1.
LI B, FILLMORE N, BAI Y, COLLINS M, THOMSON JA, STEWART R, DEWEY CN. 2014. Evaluation of de novo transcriptome assemblies from RNA-seq data. Genome Biology 15(21):1.
LIU S, LI W, WU Y, CHEN, C, LEI J. 2013. De novo transcriptome assembly in chili pepper (Capsicum frutescens) to identify genes involved in the biosynthesis of capsaicinoids. PloS One 8(1): e48156.
MUNDRY M, BORNBERG-BAUER E, SAMMETH M, FEULNER PGD. 2012. Evaluating characteristics of de novo assembly software on 454 transcriptome data: A simulation approach. PLoS One 7(2): e313410.
MUSACCHIA F, BASU S, PETROSINO G, SALVEMINI M, SANGES R. 2015. Annocript: A flexible pipeline for the annotation of transcriptomes also able to identify putative long noncoding RNAs. Bioinformatics 31(13): 2199–2201.
NAKASUGI K, CROWHURST RN, BALLY J, WOOD CC, HELLENS RP, WATERHOUSE PM. 2013. De novo transcriptome sequence assembly and analysis of RNA silencing genes of Nicotiana benthamiana. PloS One 8(3): e59534.
NEJAT N, CAHILL DM, VADAMALAI G, ZIEMANN M, ROOKES J, NADERALI N. 2015. Transcriptomics-based analysis using RNA-Seq of the coconut (Cocos nucifera) leaf in response to yellow decline phytoplasma infection. Molecular Genetics and Genomics 290(5): 1899–1910.
NELSON DR, MING R, ALAM M, SCHULER MA. 2008. Comparison of cytochrome P450 genes from siz plant genomes. Tropical Plant Biology 1(3): 216–235.
[PCA] Philippine Coconut Authority. 2019. Coconut Statistics 2019. Retrieved from on 12 Mar 2019.
PORTEREIKO MF, LLOYD A, STEFFEN JG, PUNWANI JA, OTSUGA D, DREWS GN. 2006. AGL80 is required for central cell and endosperm development in Arabidopsis. The Plant Cell 18(8): 1862–1872.
REYNOLDS KB, CULLERNE DP, EL TAHCHY A, ROLLAND V, BLANCHARD CL, WOOD CC, SINGH SP, PETRIE JR. 2019. Identification of genes involved in lipid biosynthesis through de novo transcriptome assembly from Cocos nucifera developing endosperm. Plant and Cell Physiology 60(5): 945–960.
ROBERTSON G, SCHEIN J, CHIU R, CORBETT R, FIELD M, JACKMAN SD, MUNGALL K, LEE S, OKADA HM, QIAN JQ, GRIFFITH M. 2010. De novo assembly and analysis of RNA-seq data. Nature Methods 7(11): 909–912.
SANTOS GA, RIVERA RL. 1994.  Coconut Breeding Programme of the Philippines. Papers presented at a workshop on Standardization of Coconut Breeding Research Techniques, Port Bouet, Côte d’Ivoire. p. 42–58.
SANTOS GA, RIVERA SM, RIVERA RL, BAYLON GB, DELA CRUZ BV. 1993.  Comparative investment analysis of recommended coconut hybrids/cultivars for the National Planting/Replanting Programme in the Philippines [Annual Report]. Quezon City, Philippines: Philippine Coconut Authority. 17p.
SANTOS GA, RIVERA SM, RIVERA RL, BAYLON GB, DELA CRUZ BV. 1995.  Comparative investment analysis of recommended coconut hybrids/cultivars for the National Planting/Replanting Programme in the Philippines. CORD 9(2): 1–38.
SCHROEDER A, MUELLER O, STOCKER S, SALOWSKY R, LEIBER M, GASSMANN M, LIGHTFOOT S, MENZEL W, GRANZOW M, RAGG T. 2006. The RIN: An RNA integrity number for assigning integrity values to RNA measurements. BMC Molecular Biology 7: 3.
SCHULZ MH, ZERBINO DR, VINGRON M, BIRNEY E. 2012. Oases: Robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28(8): 1086–1092.
SHI X, BENTOLILA S, HANSON M. 2016. Organelle RNA recognition motif-containing (ORRM) proteins are plastid and mitochondrial editing factors in Arabidopsis. Plant Signaling & Behavior 11: 5, e1167299.
SILOTO RMP, FINDLAY K, VILLALOBOS AL, YEUNG EC, NYKIFORUK CL, MOLONEY MM. 2006. The accumulation of oleosins determines the size of seed oilbodies in Arabidopsis. The Plant Cell 18(80): 1961–1974.
SIMAO FA, WATERHOUSE RM, IOANNIDIS P, KRIVENTSEVA EV, ZDOBNOV EM. 2015. BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31(19): 3210–3212.
SIMPSON JT, WONG K, JACKMAN SD, SCHEIN JE, JONES SJ, BIROL I. 2009. ABySS: A parallel assembler for short read sequence data. Genome Research 19(6): 1117–1123.
SUBROTO AP, UTOMO C, DARMAWAN C, TANJUNG ZA, LIWANG T. 2015. Isolation and characterization of oil palm Wrinkled 1 (WRI1) Gene. Procedia Chemistry 14: 40–46.
THIEL T, MICHALEK W, VARSHNEY R, GRANER A. 2003. Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.). Theoretical and Applied Genetics 106(3): 411–422.
TUTEJA N, GILL SS, TIBURCIO AF, TUTEJA R. 2012. Helicases in improving abiotic stress tolerance in crop plants. In: Improving Crop Resistance to Abiotic Stress, 1st ed. Tuteja N, Gill SS, Tiburcio AF, Tuteja R eds. Weinheim, Germany: Wiley-VCH Verlag GmbH & Co. p. 433–447.
XIE Y, WU G, TANG J, LUO R, PATTERSON J, LIU S, HUANG W, HE G, GU S, LI S, ZHU X. 2014. SOAPdenovo-Trans: De novo transcriptome assembly with short RNA-Seq reads. Bioinformatics 30(12): 1660–1666.