J Plant Biotechnol 2019; 46(4): 274-281
Published online December 31, 2019
https://doi.org/10.5010/JPB.2019.46.4.274
© The Korean Society of Plant Biotechnology
Correspondence to : e-mail: jongkook@kangwon.ac.kr
This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Keywords Transcriptome, Medicinal plant, Pistacia
The genus
Several
Various studies have shown that most
Advanced RNA sequencing transcriptome analysis tool is in preference to get transcriptome information of the target organism because it is cost- and time-effective (Lee et al. 2015). Because RNA sequencing method generates transcriptome information in a short time, this tool is frequently utilized to analyze transcriptome information of non-model organisms including medicinal plants (Bae et al. 2018; Eum et al. 2019). An increasing number of previously unexplored medicinal plants have been sequenced through this advanced sequencing technology, providing genomic resources for unravelling genes and biosynthetic pathways involved in metabolite biosynthesis in various medicinal plants (Bae et al. 2018; Eum et al. 2019; Kotwal et al. 2016; Loke et al. 2016; Rai et al. 2016). However, a vast majority of medicinal plants are yet to be studied.
In this study, RNA-seq transcriptome analysis was performed to characterize genomic features of
Fresh leaf tissues of a fully grown
Leaf samples of
For assembly, Trimmomatic tool (Bolger et al. 2014) was used to remove low quality reads (< Q20) and the read with a length < 50 bp. The following methods were performed according to the methods described by Bae et al. (2018). Briefly, three different assemblers were used for
The Blast2GO analysis tool was used for GO annotation of unigenes, of which annotation information was used for functional classification of unigenes using WEGO software. For KEGG pathway search (
To examine SSR accumulation in the unigenes from
RNA-seq whole transcriptome sequencing generated 20.8 million of raw reads (~2.6 Gb) from
Table 1 . Summary of sequencing and assembly data
Data description | Data summary |
---|---|
Total number of raw reads | 10,621,059 |
Total length of raw reads (bp) | 2,676,506,868 |
Number of filtered reads used for assembly | 9,625,676 |
Total length of filtered reads (bp) | 2,358,007,393 |
Number of assembed contigs (Unigenes) | 18,524 |
Total length of assembed contigs (bp) | 16,174,683 |
Average length (bp) | 873 |
Length of largest contig (bp) | 9,942 |
N50 (bp) | 1,104 |
GC content (%) | 40.7 |
Length distribution of unigenes from transcriptome of
To annotate the unigenes, protein sequences of the unigenes were searched for similarity against NCBI non-redundant (NR) protein database. Among 18,524 unigenes, 17,814 unigenes (96.2%) were aligned to protein sequences from other organisms, whereas 710 unigenes (3.8%) did not show similarity to other known proteins (Fig. 2). Top-five plant species with most hits with annotated unigenes were
Top five plant species with higher homologous genes
To examine the content of repetitive sequences, RepeatMasker (
SSR search using MISA SSR search tool identified a total of 2,629 perfect SSRs from 2,041 unigenes (Supplementary Table S2). Among SSR-containing unigenes, 393 had more than one SSR. The frequency of all identified SSRs was 162.5 per one million base pairs (Mbp). Tri-nucleotide SSRs were the most abundant SSRs with 2,343 (89.1%) occurrences, followed by di-nucleotide SSRs with 142 (5.4%) occurrences. Frequency of tri-nucleotide SSRs were 144.9 per Mbp (Fig. 3). The highest SSR frequency by motif type was AAG/CTT motif with 43.5 occurrences per Mbp, followed by ACC/GGT motif with 22.8 per Mbp (Fig. 4; Supplementary Table S2). Among di-nucleotide repeats, AG/CT motif showed the highest 7.6 occurrences per Mbp, while AAAG/CTTT motif showed the highest frequency with 0.6 per Mbp among tetra-nucleotide motifs (Fig. 4; Supplementary Table S2).
SSR distribution and frequencies by repeat unit size
SSR distribution by motif types
To examine the functional classification of annotated genes, unigenes were assigned by the GO functional term using Blast2GO software (Conesa et al. 2005) and classified using WEGO tool. By functional categorization of annotated unigenes, 9,020 were classified into molecular functions at level one, 6,133 into biological processes, and 2,622 unigenes classified into cellular component, respectively. Most unigenes belonging to molecular function at level one were classified into two major categories, binding with 5,999 and catalytic activity with 4,493 unigenes (Fig. 5). As for unigenes in biological process, most genes were classified into three sub-categories, metabolic process with 4,728 genes, cellular process with 4,139, and single-organism process with 2,706 (Fig. 5). In cellular component, a majority of genes were classified into four sub-categories, cell with 1,659 genes, membrane with 1,225, organelle with 1,109, and macromolecular complex with 810 (Fig. 5; Supplementary Table S3).
Gene Ontology functional categorization of annotated unigenes
To identify unigenes involved in the KEGG metabolic pathway, the KEGG BlastKOALA online tool was used to assign KEGG Orthology (KO) number, by which a total of 6,553 unigenes of
Table 2 . Categories of KEGG metabolic pathways and their associated entries and unigenes
Category | No. of sub categories | No. of pathways | No. of entry | No. of associated genes |
---|---|---|---|---|
Metabolism | 12 | 138 | 3324 | 6715 |
Genetic Information Processing | 4 | 22 | 859 | 1520 |
Environmental Information Processing | 3 | 35 | 424 | 1242 |
Cellular Processes | 5 | 31 | 549 | 1280 |
Organismal Systems | 10 | 84 | 607 | 1608 |
Human Diseases | 12 | 81 | 935 | 2131 |
Among all pathways, top four pathways with most entry enzyme hits were metabolic pathways with 821 entry enzymes, biosynthesis of secondary metabolites with 381, biosynthesis of antibiotics with 194, microbial metabolism in diverse environments with 144 entries, respectively (Fig. 6).
Top 10 KEGG pathways with the most entry enzymes identified from
Biosynthesis of many secondary metabolites is tightly correlated to some of metabolic pathways including the pathways involved in the metabolism of terpenoid, the biosynthesis of phenylpropanoid, or flavonoid. Twenty-two unigenes were found for 16 entries of carotenoid biosynthesis pathway (Table 3). For phenylpropanoid biosynthesis pathway, 63 unigenes were found to encode enzymes for 17 entries, while 24 unigenes were found for 14 entries of the flavonoid biosynthetic pathway (Table 3; Supplementary Table S4).
Table 3 . KEGG metabolic pathways related to the biosynthesis of various medicinal metabolites
Metabolic pathways | KEGG map ID | Number of entry | No of unigenes* |
---|---|---|---|
Carotenoid biosynthesis | 00906 | 16 | 22 |
Sesquiterpenoid and triterpenoid biosynthesis | 00909 | 5 | 8 |
Diterpenoid biosynthesis | 00904 | 5 | 6 |
Monoterpenoid biosynthesis | 00902 | 2 | 5 |
Phenylpropanoid biosynthesis | 00940 | 17 | 63 |
Flavonoid biosynthesis | 00941 | 14 | 24 |
Isoflavonoid biosynthesis | 00943 | 1 | 2 |
Flavone and flavonol biosynthesis | 00944 | 1 | 1 |
*Number of unigenes indicates that they can encode enzymes for corresponding entry.
Medicinal plants are normally rich in traditional knowledge about medicinal usage, but there is very limited genetic information available for most traditional medicinal plants except for well-known medicinal plants. Medicinal plants are getting more interest to identify new metabolic compounds that possess important medicinal properties. In the absence of genomic information, however, it is very difficult to identify new lead molecules for pharmaceutical drug development from medicinal plants and to delve into how those molecules are synthesized in those plants. Therefore, enrichment of genomic resources as well as genetic information is crucial for studying medicinal properties and for identifying potential lead molecules from unexplored medicinal plant species. As medicinal plants are getting more interest, increasing number of those species are getting sequenced by advanced sequencing technologies.
Advanced RNA-seq technology has spurred transcriptome analysis of increasing number of medicinal plants that have not been of interest previously. In this study, transcriptome
Many unigenes from the present study showed higher similarity to those of
Many SSR markers have been developed mostly from
Various secondary metabolites possess important medicinal effect, and many metabolites are known to be derived from pathways involved in terpenoid metabolism or secondary metabolite biosynthesis pathways. A total of 4,061 unigenes were assigned into 391 different metabolic pathways through KEGG pathway analysis. Among the assigned unigenes, 131 genes were found to be involved in some of metabolite biosynthesis pathways including terpenoid, phenylpropanoid, and flavonoid (Table 3; Supplementary Table S4). Various genes involved in these pathways were identified from the present study as shown in Supplementary (Supplementary Table S4). Reconstruction of flavonoid and phenylpropanoid biosynthesis using the unigene information in Supplementary Table S4 revealed that most key genes involved in those pathways are well conserved in
The authors declare that there is no conflict of interests.
This study was supported by KRIBB initiative program of Republic of Korea and 2017 research grant to JKN from Kangwon National University.
This transcriptome shotgun sequencing data was deposited at BioProject: PRJNA566127 in NCBI GenBank. SRA accession number is SRR10136265.
J Plant Biotechnol 2019; 46(4): 274-281
Published online December 31, 2019 https://doi.org/10.5010/JPB.2019.46.4.274
Copyright © The Korean Society of Plant Biotechnology.
Ki-Young Choi · Duck Hwan Park · Eun-Soo Seong · Sang Woo Lee · Jin Hang · Li Wan Yi · Jong-Hwa Kim · Jong-Kuk Na
Department of Controlled Agriculture, Kangwon National University, Chuncheon, Kangwon 24341, Republic of Korea
Division of Bioresource Sciences, Kangwon National University, Chuncheon, Kangwon 24341, Republic of Korea
Department of Medicinal Plants, Suwon Women’s University, Suwon 18333, Republic of Korea
International Biological Material Research Center, KRIBB, Daejeon 34141, Republic of Korea
Yunnan Academy of Agricultural Sciences, Yunnan 650223, China
Department of Horticulture, Kangwon National University, Chuncheon, Kangwon 24341, Republic of Korea
Correspondence to:e-mail: jongkook@kangwon.ac.kr
This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Keywords: Transcriptome, Medicinal plant, Pistacia
The genus
Several
Various studies have shown that most
Advanced RNA sequencing transcriptome analysis tool is in preference to get transcriptome information of the target organism because it is cost- and time-effective (Lee et al. 2015). Because RNA sequencing method generates transcriptome information in a short time, this tool is frequently utilized to analyze transcriptome information of non-model organisms including medicinal plants (Bae et al. 2018; Eum et al. 2019). An increasing number of previously unexplored medicinal plants have been sequenced through this advanced sequencing technology, providing genomic resources for unravelling genes and biosynthetic pathways involved in metabolite biosynthesis in various medicinal plants (Bae et al. 2018; Eum et al. 2019; Kotwal et al. 2016; Loke et al. 2016; Rai et al. 2016). However, a vast majority of medicinal plants are yet to be studied.
In this study, RNA-seq transcriptome analysis was performed to characterize genomic features of
Fresh leaf tissues of a fully grown
Leaf samples of
For assembly, Trimmomatic tool (Bolger et al. 2014) was used to remove low quality reads (< Q20) and the read with a length < 50 bp. The following methods were performed according to the methods described by Bae et al. (2018). Briefly, three different assemblers were used for
The Blast2GO analysis tool was used for GO annotation of unigenes, of which annotation information was used for functional classification of unigenes using WEGO software. For KEGG pathway search (
To examine SSR accumulation in the unigenes from
RNA-seq whole transcriptome sequencing generated 20.8 million of raw reads (~2.6 Gb) from
Table 1 . Summary of sequencing and assembly data.
Data description | Data summary |
---|---|
Total number of raw reads | 10,621,059 |
Total length of raw reads (bp) | 2,676,506,868 |
Number of filtered reads used for assembly | 9,625,676 |
Total length of filtered reads (bp) | 2,358,007,393 |
Number of assembed contigs (Unigenes) | 18,524 |
Total length of assembed contigs (bp) | 16,174,683 |
Average length (bp) | 873 |
Length of largest contig (bp) | 9,942 |
N50 (bp) | 1,104 |
GC content (%) | 40.7 |
Length distribution of unigenes from transcriptome of
To annotate the unigenes, protein sequences of the unigenes were searched for similarity against NCBI non-redundant (NR) protein database. Among 18,524 unigenes, 17,814 unigenes (96.2%) were aligned to protein sequences from other organisms, whereas 710 unigenes (3.8%) did not show similarity to other known proteins (Fig. 2). Top-five plant species with most hits with annotated unigenes were
Top five plant species with higher homologous genes
To examine the content of repetitive sequences, RepeatMasker (
SSR search using MISA SSR search tool identified a total of 2,629 perfect SSRs from 2,041 unigenes (Supplementary Table S2). Among SSR-containing unigenes, 393 had more than one SSR. The frequency of all identified SSRs was 162.5 per one million base pairs (Mbp). Tri-nucleotide SSRs were the most abundant SSRs with 2,343 (89.1%) occurrences, followed by di-nucleotide SSRs with 142 (5.4%) occurrences. Frequency of tri-nucleotide SSRs were 144.9 per Mbp (Fig. 3). The highest SSR frequency by motif type was AAG/CTT motif with 43.5 occurrences per Mbp, followed by ACC/GGT motif with 22.8 per Mbp (Fig. 4; Supplementary Table S2). Among di-nucleotide repeats, AG/CT motif showed the highest 7.6 occurrences per Mbp, while AAAG/CTTT motif showed the highest frequency with 0.6 per Mbp among tetra-nucleotide motifs (Fig. 4; Supplementary Table S2).
SSR distribution and frequencies by repeat unit size
SSR distribution by motif types
To examine the functional classification of annotated genes, unigenes were assigned by the GO functional term using Blast2GO software (Conesa et al. 2005) and classified using WEGO tool. By functional categorization of annotated unigenes, 9,020 were classified into molecular functions at level one, 6,133 into biological processes, and 2,622 unigenes classified into cellular component, respectively. Most unigenes belonging to molecular function at level one were classified into two major categories, binding with 5,999 and catalytic activity with 4,493 unigenes (Fig. 5). As for unigenes in biological process, most genes were classified into three sub-categories, metabolic process with 4,728 genes, cellular process with 4,139, and single-organism process with 2,706 (Fig. 5). In cellular component, a majority of genes were classified into four sub-categories, cell with 1,659 genes, membrane with 1,225, organelle with 1,109, and macromolecular complex with 810 (Fig. 5; Supplementary Table S3).
Gene Ontology functional categorization of annotated unigenes
To identify unigenes involved in the KEGG metabolic pathway, the KEGG BlastKOALA online tool was used to assign KEGG Orthology (KO) number, by which a total of 6,553 unigenes of
Table 2 . Categories of KEGG metabolic pathways and their associated entries and unigenes.
Category | No. of sub categories | No. of pathways | No. of entry | No. of associated genes |
---|---|---|---|---|
Metabolism | 12 | 138 | 3324 | 6715 |
Genetic Information Processing | 4 | 22 | 859 | 1520 |
Environmental Information Processing | 3 | 35 | 424 | 1242 |
Cellular Processes | 5 | 31 | 549 | 1280 |
Organismal Systems | 10 | 84 | 607 | 1608 |
Human Diseases | 12 | 81 | 935 | 2131 |
Among all pathways, top four pathways with most entry enzyme hits were metabolic pathways with 821 entry enzymes, biosynthesis of secondary metabolites with 381, biosynthesis of antibiotics with 194, microbial metabolism in diverse environments with 144 entries, respectively (Fig. 6).
Top 10 KEGG pathways with the most entry enzymes identified from
Biosynthesis of many secondary metabolites is tightly correlated to some of metabolic pathways including the pathways involved in the metabolism of terpenoid, the biosynthesis of phenylpropanoid, or flavonoid. Twenty-two unigenes were found for 16 entries of carotenoid biosynthesis pathway (Table 3). For phenylpropanoid biosynthesis pathway, 63 unigenes were found to encode enzymes for 17 entries, while 24 unigenes were found for 14 entries of the flavonoid biosynthetic pathway (Table 3; Supplementary Table S4).
Table 3 . KEGG metabolic pathways related to the biosynthesis of various medicinal metabolites.
Metabolic pathways | KEGG map ID | Number of entry | No of unigenes* |
---|---|---|---|
Carotenoid biosynthesis | 00906 | 16 | 22 |
Sesquiterpenoid and triterpenoid biosynthesis | 00909 | 5 | 8 |
Diterpenoid biosynthesis | 00904 | 5 | 6 |
Monoterpenoid biosynthesis | 00902 | 2 | 5 |
Phenylpropanoid biosynthesis | 00940 | 17 | 63 |
Flavonoid biosynthesis | 00941 | 14 | 24 |
Isoflavonoid biosynthesis | 00943 | 1 | 2 |
Flavone and flavonol biosynthesis | 00944 | 1 | 1 |
*Number of unigenes indicates that they can encode enzymes for corresponding entry.
Medicinal plants are normally rich in traditional knowledge about medicinal usage, but there is very limited genetic information available for most traditional medicinal plants except for well-known medicinal plants. Medicinal plants are getting more interest to identify new metabolic compounds that possess important medicinal properties. In the absence of genomic information, however, it is very difficult to identify new lead molecules for pharmaceutical drug development from medicinal plants and to delve into how those molecules are synthesized in those plants. Therefore, enrichment of genomic resources as well as genetic information is crucial for studying medicinal properties and for identifying potential lead molecules from unexplored medicinal plant species. As medicinal plants are getting more interest, increasing number of those species are getting sequenced by advanced sequencing technologies.
Advanced RNA-seq technology has spurred transcriptome analysis of increasing number of medicinal plants that have not been of interest previously. In this study, transcriptome
Many unigenes from the present study showed higher similarity to those of
Many SSR markers have been developed mostly from
Various secondary metabolites possess important medicinal effect, and many metabolites are known to be derived from pathways involved in terpenoid metabolism or secondary metabolite biosynthesis pathways. A total of 4,061 unigenes were assigned into 391 different metabolic pathways through KEGG pathway analysis. Among the assigned unigenes, 131 genes were found to be involved in some of metabolite biosynthesis pathways including terpenoid, phenylpropanoid, and flavonoid (Table 3; Supplementary Table S4). Various genes involved in these pathways were identified from the present study as shown in Supplementary (Supplementary Table S4). Reconstruction of flavonoid and phenylpropanoid biosynthesis using the unigene information in Supplementary Table S4 revealed that most key genes involved in those pathways are well conserved in
The authors declare that there is no conflict of interests.
This study was supported by KRIBB initiative program of Republic of Korea and 2017 research grant to JKN from Kangwon National University.
This transcriptome shotgun sequencing data was deposited at BioProject: PRJNA566127 in NCBI GenBank. SRA accession number is SRR10136265.
Length distribution of unigenes from transcriptome of
Top five plant species with higher homologous genes
SSR distribution and frequencies by repeat unit size
SSR distribution by motif types
Gene Ontology functional categorization of annotated unigenes
Top 10 KEGG pathways with the most entry enzymes identified from
Table 1 . Summary of sequencing and assembly data.
Data description | Data summary |
---|---|
Total number of raw reads | 10,621,059 |
Total length of raw reads (bp) | 2,676,506,868 |
Number of filtered reads used for assembly | 9,625,676 |
Total length of filtered reads (bp) | 2,358,007,393 |
Number of assembed contigs (Unigenes) | 18,524 |
Total length of assembed contigs (bp) | 16,174,683 |
Average length (bp) | 873 |
Length of largest contig (bp) | 9,942 |
N50 (bp) | 1,104 |
GC content (%) | 40.7 |
Table 2 . Categories of KEGG metabolic pathways and their associated entries and unigenes.
Category | No. of sub categories | No. of pathways | No. of entry | No. of associated genes |
---|---|---|---|---|
Metabolism | 12 | 138 | 3324 | 6715 |
Genetic Information Processing | 4 | 22 | 859 | 1520 |
Environmental Information Processing | 3 | 35 | 424 | 1242 |
Cellular Processes | 5 | 31 | 549 | 1280 |
Organismal Systems | 10 | 84 | 607 | 1608 |
Human Diseases | 12 | 81 | 935 | 2131 |
Table 3 . KEGG metabolic pathways related to the biosynthesis of various medicinal metabolites.
Metabolic pathways | KEGG map ID | Number of entry | No of unigenes* |
---|---|---|---|
Carotenoid biosynthesis | 00906 | 16 | 22 |
Sesquiterpenoid and triterpenoid biosynthesis | 00909 | 5 | 8 |
Diterpenoid biosynthesis | 00904 | 5 | 6 |
Monoterpenoid biosynthesis | 00902 | 2 | 5 |
Phenylpropanoid biosynthesis | 00940 | 17 | 63 |
Flavonoid biosynthesis | 00941 | 14 | 24 |
Isoflavonoid biosynthesis | 00943 | 1 | 2 |
Flavone and flavonol biosynthesis | 00944 | 1 | 1 |
*Number of unigenes indicates that they can encode enzymes for corresponding entry.
Mi Kyung Choi ・Bimpe Suliyat Azeez ・Sang Woo Lee ・Wan Yi Li ・Sangho Choi ・Ik-Young Choi ・Ki-Young Choi ・Jong-Kuk Na
J Plant Biotechnol 2024; 51(1): 33-49Ho Bang Kim · Chang Jae Oh · Nam-Hoon Kim · Cheol Woo Choi · Minju Kim · Sukman Park · Seong Beom Jin · Su-Hyun Yun · Kwan Jeong Song
J Plant Biotechnol 2022; 49(4): 271-291Biswaranjan Behera ·Shashikanta Behera ·Shasmita ·Debasish Mohapatra ·Durga Prasad Barik · Soumendra Kumar Naik
J Plant Biotechnol 2021; 48(4): 255-263
Journal of
Plant BiotechnologyLength distribution of unigenes from transcriptome of
Top five plant species with higher homologous genes
|@|~(^,^)~|@|SSR distribution and frequencies by repeat unit size
|@|~(^,^)~|@|SSR distribution by motif types
|@|~(^,^)~|@|Gene Ontology functional categorization of annotated unigenes
|@|~(^,^)~|@|Top 10 KEGG pathways with the most entry enzymes identified from