J Plant Biotechnol 2022; 49(1): 46-60
Published online March 31, 2022
https://doi.org/10.5010/JPB.2022.49.1.046
© The Korean Society of Plant Biotechnology
Correspondence to : e-mail: pourrahim@yahoo.com
This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
The genetic variability and population structure of apple mosaic virus (ApMV) have been studied; however, synonymous codon usage patterns influencing the survival rates and fitness of ApMV have not been reported. Based on phylogenetic analyses of 52 ApMV coat protein (CP) sequences obtained from apple, pear, and hazelnut, ApMV isolates were clustered into two groups. High molecular diversity in GII may indicate their recent expansion. A constant and conserved genomic composition of the CP sequences was inferred from the low codon usage bias. Nucleotide composition and relative synonymous codon usage (RSCU) analysis indicated that the ApMV CP gene is AU-rich, but G- and U-ending codons are favored while coding amino acids. This unequal use of nucleotides together with parity rule 2 and the effective number of codon (ENC) plots indicate that mutation pressure together with natural selection drives codon usage patterns in the CP gene. However, in this combination, selection pressure plays a more crucial role. Based on principal component analysis plots, ApMV seems to have originated from apple trees in Europe. However, according to the relative codon deoptimization index and codon adaptation index (CAI) analyses, ApMV exhibited the greatest fitness to hazelnut. As inferred from the results of the similarity index analysis, hazelnut has a major role in shaping ApMV RSCU patterns, which is consistent with the CAI analysis results. This study contributes to the understanding of plant virus evolution, reveals novel information about ApMV evolutionary fitness, and helps find better ApMV management strategies.
Keywords ApMV, codon usage patterns, mutation pressure, natural selection, host adaptation
Understanding the evolution of virus-host interactions is so important, due to rapid evolution through genetic recombination, mutation, the potential of adaption to new or resistant hosts (Davino et al. 2017; Garcia-Arenal et al. 2001), fast adaptation to the different environmental conditions, and mostly lack effective chemical compounds (Elena et al. 2014). As the virus translation is dependent on the host cellular machinery, the interaction of a virus with a particular host must be studied based on its codon usage pattern. A remarkable role of codon usage bias (CUB) in the evolution of viruses was reported (Angellotti et al. 2007). The codon usage pattern of viruses indicates the evolutionary changes that allow the viruses to optimize their survival and better adapt toward fitness to the external environment and, most importantly, their host (Butt et al. 2014). Natural/translational selection and the mutational/neutral model are two major models, which explain the codon usage bias (Bulmer 1991; Hershberg and Petrov 2008). The natural selection model suggests that there is a co-adaptation of synonymous codon usage and the transfer RNA (tRNA) abundance to optimize translational efficiency (Zhou et al. 1999). Therefore, the efficient use of ribosomes and maximized growth rate of fast-growing organisms will be provided by the codon usage adaptation (Hershberg and Petrov 2008). The mutational model hypothesizes that genetic compositional constraints affect the possibility of mutational fixation, and this was observed in numerous RNA viruses (Adams and Antoniw 2003). The GC content is probably to be determined mostly by genome-wide mutation bias rather than by selective forces acting specifically on coding regions. Unfortunately, the studies on CUB and its role in the evolution of plant viruses are limited (Adams and Antoniw 2003). The recent advancement in sequencing technologies allows studying the codon usage behavior of viral diseases (He et al. 2019; He et al. 2017; Liu et al. 2012; Xu et al. 2008). It is presumed that viral CP evolved more rapidly than proteins involved in replication and expression of virus genomes (Callaway et al. 2001), thus providing a strong incentive to study the diversity of viruses based on CP genes. Apple mosaic virus is a key species of subgroup III in the Ilarvirus genus (Bromoviridae family) (Bujarski et al. 2012). ApMV causes economic yield losses in pome fruits worldwide. More than 65 species of woody or herbaceous plants belonging to 19 families have been reported as naturally or experimentally host for ApMV (Brunt et al. 1996; Cieslinska and Valasevich 2016; Tzanetakis and Martin 2005). The virus is graft and mechanically transmissible and persists in propagative infected materials such as scion, rootstocks, or buds, and has no known vector (Fulton 1972). The genome of ApMV is divided into three single-stranded RNA segments in which, coat protein (CP) and movement protein (MP) are coded by RNA3 (Bujarski et al. 2012). Phylogenetic analysis using complete CP sequences divided ApMV isolates into two major clusters. One cluster involves isolates from Maloideae and Trebouxia lichen algae while the second cluster involves isolates from Prunus, hop, and the other woody trees (Grimova et al. 2013). No relation has been shown between the geographic origins and clustering of ApMV isolates (Crowle et al. 2003; Petrzik 2005).
The genetic variability and population structure of ApMV, have already been studied. However, the synonymous codon usage patterns and selection pressure analysis, which provides significant information about the virus evolution as well as gene expression and functions, have not been reported. In this study, patterns of codon usage bias were investigated using 52 complete CP nucleotide sequences of isolates from apple (M. domesticus) and pear (Pyrus sp.) from Rosaceae family and hazelnut (Corylus sp.) belonging to Betulaceae family. These analyses reveal novel information about the evolutionary fitness of ApMV.
Fifty-two full ApMV CP sequences of apple (n = 36), pear (n = 7) and hazelnut (n = 9) were retrieved from NCBI GenBank. Data on ApMV isolates, including geographical location, host origin, and the time of collection are shown in Table S1. To clarify the genetic diversity of ApMV, CP sequences were aligned using CLUSTALX2 (Kumar et al. 2018). Maximum Likelihood (ML) tree was reconstructed by MEGAX (Kumar et al. 2018) using K2 + G + I method with 1000 Bootstrap replicates. Nucleotide diversity was estimated using Kimura two parameters implemented in MEGAX (Kumar et al. 2018). The sequence pairwise identity was classified using the SDTv1.2 program. The pairwise nucleotide diversity and identity are shown using color plots.
After deleting five non-bias codons including AUG (start codon), UGG (encoding Trp), and three termination codons UAA, UGA, and UAG, the component parameters of the ApMV CP sequences were calculated. The total percent nucleotide composition and the overall GC and AU contents were estimated by MEGAX (Kumar et al. 2018). Using CodonW 1.4.2 package, the overall frequencies of the occurrence of nucleotides (A%, U%, C%, and G%), the nucleotide at the third position of synonymous codons (A3%, U3%, C3%, and G3%), G+C at the first (GC1), second (GC2), and third (GC3) positions, and G+C at the first and second positions (GC1,2) for the CP gene sequence of each ApMV isolate were calculated. The codon usage data for the different hosts were obtained from the codon usage database (available at https://hive.biochemistry.gwu.edu/review/codon) (Athey et al. 2017).
RSCU value shows the relative application of synonymous codons among the combination of codons encoding similar amino acids (Sharp and Li 1986). Codon usage is applied less frequently, if an RSCU value is equal 1.0, but RSCU values with < 0.6 and > 1.6 are indicated to be “underrepresented” and “overrepresented, respectively (Sharp et al. 1986).
The maximum synonymous codons bias of the ApMV CP gene was inferred by the ENC analysis. The range of ENC values is differed from 20 (an excessive codon usage bias) to 61 (non-bias), respectively. Generally, highly expressed genes have the lower ENC value with the stronger codon preference termed as optimal codons, whereas lowly expressed genes with higher ENC value illustrate that all synonymous codons are used equally (Wright 1990). The ENC value was determined using CodonW v1.4.2.
Using the ENC versus GC3s values (ENC-plot), the effect of mutational pressure or natural selection on codon usage bias is analyzed. When the points are on the standard curve it shows that mutation pressure is the lonely factor for driving the codon usage bias. Otherwise, if the selection were the main force, the ENC values would lie lower than the standard curve (Wright 1990). In addition, the neutral evolution analysis was done to determine the influence rate of natural selection and mutation pressure on codon usage patterns of the ApMV CP gene by plotting the GC1,2s values of the synonymous codons against and GC3s values. GC3 indicates the abundance of G+C at the third codon position and GC12 represents the average of GC1 and GC2. The mutation pressure is shown using the slope of the regression line plotted between the GC3s and GC1,2s contents. Weak or no exterior selection pressure is indicated where regression line (s) near to the diagonal (slope = 1.0). Conversely, the deviation of regression curves from the diagonal demonstrates considerable effects of natural selection on codon usage bias.
Parity rule 2 (PR2) plot shows the influence of natural selection and mutation pressure on the codon usage of each gene using A3/(A3 + U3) value plotted versus G3/(G3 + C3) value. The center of the PR2 plot is 0.5 which indicates A=U and G = C (Sueoka 1999). If there is no deviance between mutation pressure and selection pressure, the points are placed in the center of the plot and vice versa. Furthermore, the significant tendency in codon usage variation of the ApMV CP sequences was examined by PCA analysis, which demonstrated the significant tendency in codon usage variation (Zhou et al. 1999). PCA plot of the 1st axis and the 2nd axis of the isolated strains according to the phylogroups were drawn.
The codon adaptation index (CAI) value for ApMV CP sequences was determined using the CAIcal SERVER (http://genomes.urv.cat/CAIcal/RCDI/). The CAI values ranging from 0.0 to 1.0 indicate the various degrees of adaptation to the host. The high CAI value of a sequence shows its stronger adaptability to the host, and conversely (Puigbò et al. 2010). In addition, the relative codon deoptimization index (RCDI) value of 1.0 shows that the virus acts in accordance with the host codon usage patterns. Otherwise, RCDI values of more than 1.0 show lower compatibility. The RCDI values were determined using the RCDI/eRCDI server (http://genomes.urv.cat/CAIcal/RCDI/). The influence of the codon usage bias of the hosts was measured by SiD value. The SiD was determined in this way:
In this formula
Phylogenetic analyses clustered the 52 ApMV isolates into two main groups, in which apple and pear isolates fell in one group (GI) whereas, those isolated from hazelnut cluster in another group (GII) (Figure 1a). Nucleotide identity ranged from 88 to 100% with higher identity (Figure 1b) and lower diversity (Figure 1c) indicated in GI. Nucleotide distance plots for GI and GII were (0.0 to 13.5%) and (13.5 to 19.6%), respectively (Figure 1c).
High frequency of G and A nucleotides were detected in the ApMV CP sequences, with average compositions of 28.84 ± 0.61% and 26.85 ± 0.54% (Table S2) respectively, in comparison with T (U) (24.48 ± 0.84%) and C (19.80 ± 0.72%). In contrast, the nucleotide composition was remarkably different for the nucleotide compositions at the 3rd position of synonymous codons. The most frequent nucleotide was G3s (31.90% ± 1.64), followed by T3s (28.59% ± 1.76), C3s (21.01% ± 1.57) and A3s (18.49% ± 1.68). The compositions of AU and GC in the CP coding sequences were 51.34% ± 1.22 and 48.65% ± 1.22, respectively, informing that there is an AU-biased composition in the ApMV CP gene. The mean GC contents for GC1,2s and GC3s at 1st, 2nd, and 3rd positions were 46.53 ± 0.53% and 51.02 ± 0.02%, respectively.
RSCU analysis was done for estimating the codon usage patterns of the ApMV CP sequences (Table 1). Twelve out of 18 frequently used codons were G/U-ending (6 ended to G and 6 ended to U), while the six remaining codons were ended to A or C (Table 1). This result indicates that U- and G-ending codons are favored in the ApMV CP gene. Regardless of the ApMV host, the RSCU value > 1.6 was detected for nine of the optimal synonymous codons (UUG, GUG, AGU, CCG, ACG, GCU, CAA, AGG, and GGU), with the highest preferred value for AGU codon (2.57). The variation of the codon usage bias across ApMV CP gene was calculated for the RSCU of each codon for each ApMV isolate and the results indicated three main clusters of codons (Figure 2). The first cluster generally included overrepresented codons (RSCU > 1), which contained A/U-ending codons (19 out of 59 codons) and G/C-ending codons (14 out of 59 codons). The second cluster consisted of mostly G/C-ending codons (11 out of 59 codons) and six codons ended to A/U that were generally underrepresented (RSCU < 1). The last and the smallest group consisted of five A/U-ending codons (UCA, GUA, GCA, CUU, and UAU) and four G/C ending codons (CGG, CUC, UGC, and CAG) that were underrepresented. Among the underrepresented codons, two UCA and UCG codons, which encode serine were found in most of the hazelnut isolates.
Table 1 The relative synonymous codon usage value of 59 codons encoding 18 amino acids in the coat protein gene of apple mosaic virus according to hosts
Codon | aa | Apple | Pear | Hazelnut | All |
---|---|---|---|---|---|
UUU | F | 1.04* | 1.05 | 1.11 | 1.07 |
UUC | F | 0.96 | 0.95 | 0.89 | 0.93 |
UUA | L | 1.29 | 2.03 | 1.13 | 1.48 |
UUG | L | 2.26 | 2.03 | 2.38 | 2.22 |
CUU | L | 0.74 | 0.44 | 0.62 | 0.60 |
CUC | L | 0.09 | 0.04 | 0.06 | 0.06 |
CUA | L | 0.51 | 0.75 | 0.62 | 0.63 |
CUG | L | 1.12 | 0.71 | 1.19 | 1.01 |
AUU | I | 1.00 | 1.01 | 0.89 | 0.97 |
AUC | I | 0.9 | 0.89 | 1.00 | 0.93 |
AUA | I | 1.1 | 1.09 | 1.11 | 1.10 |
GUU | V | 0.86 | 1.24 | 0.81 | 0.97 |
GUC | V | 1.1 | 0.63 | 1.06 | 0.93 |
GUA | V | 0.17 | 0.36 | 0.22 | 0.25 |
GUG | V | 1.87 | 1.77 | 1.92 | 1.85 |
UCU | S | 1.16 | 0.85 | 1.20 | 1.07 |
UCC | S | 1.51 | 1.25 | 1.63 | 1.46 |
UCA | S | 0.03 | 0.23 | 0.09 | 0.12 |
UCG | S | 0.11 | 0.45 | 0.00 | 0.19 |
AGU | S | 2.55 | 2.66 | 2.49 | 2.57 |
AGC | S | 0.64 | 0.57 | 0.60 | 0.60 |
CCU | P | 0.79 | 0.91 | 0.85 | 0.85 |
CCC | P | 0.45 | 0.32 | 0.46 | 0.41 |
CCA | P | 0.64 | 1.03 | 0.64 | 0.77 |
CCG | P | 2.13 | 1.74 | 2.05 | 1.97 |
ACU | T | 1.13 | 0.86 | 1.20 | 1.06 |
ACC | T | 0.46 | 0.52 | 0.46 | 0.48 |
ACA | T | 0.73 | 0.67 | 0.80 | 0.73 |
ACG | T | 1.69 | 1.95 | 1.54 | 1.73 |
GCU | A | 1.42 | 1.82 | 1.69 | 1.64 |
GCC | A | 1.26 | 1.27 | 1.16 | 1.23 |
GCA | A | 0.47 | 0.22 | 0.31 | 0.33 |
GCG | A | 0.84 | 0.69 | 0.84 | 0.79 |
UAU | Y | 0.77 | 0.37 | 0.73 | 0.62 |
UAC | Y | 1.23 | 1.63 | 1.27 | 1.38 |
CAU | H | 0.49 | 0.39 | 0.43 | 0.44 |
CAC | H | 1.51 | 1.61 | 1.57 | 1.56 |
CAA | Q | 1.48 | 1.89 | 1.52 | 1.63 |
CAG | Q | 0.52 | 0.11 | 0.48 | 0.37 |
AAU | N | 1.34 | 1.38 | 1.42 | 1.38 |
AAC | N | 0.66 | 0.63 | 0.58 | 0.62 |
AAA | K | 0.54 | 0.59 | 0.56 | 0.56 |
AAG | K | 1.46 | 1.41 | 1.44 | 1.44 |
GAU | D | 1.21 | 1.52 | 1.24 | 1.32 |
GAC | D | 0.79 | 0.48 | 0.76 | 0.68 |
GAA | E | 0.94 | 1.15 | 0.98 | 1.02 |
GAG | E | 1.06 | 0.85 | 1.02 | 0.98 |
UGU | C | 0.57 | 1.04 | 0.50 | 0.70 |
UGC | C | 1.43 | 0.96 | 1.50 | 1.30 |
CGU | R | 0.47 | 0.62 | 0.43 | 0.51 |
CGC | R | 0.28 | 0.62 | 0.38 | 0.43 |
CGA | R | 1.49 | 0.98 | 1.39 | 1.29 |
CGG | R | 0.01 | 0.00 | 0.00 | 0.00 |
AGA | R | 1.37 | 1.73 | 1.39 | 1.50 |
AGG | R | 2.37 | 2.04 | 2.41 | 2.27 |
GGU | G | 1.88 | 1.97 | 2.13 | 1.99 |
GGC | G | 0.79 | 0.54 | 0.84 | 0.72 |
GGA | G | 0.92 | 1.11 | 0.80 | 0.94 |
GGG | G | 0.41 | 0.37 | 0.23 | 0.34 |
*The most frequently used codons are shown in bold.
The importance of the ApMV CP codon usage bias was measured by ENC value. Low codon usage bias in all CP coding sequences of the ApMV with ENC average value 54.46 ± 2.04 (Table S2), represents an approximately constant and conserved genomic composition. However, the highest and lowest ENC values were indicated for the ApMV CP coding sequences of isolates from apple and hazelnut hosts, respectively (Figure 3).
The significant tendency in codon usage variation of the ApMV CP gene was examined by PCA analysis (Figure 4a). Among the three various hosts, several overlaps were detected between apple and pear isolates suggesting that the main codon usage trend is somewhat identical in these two hosts (Figure 4a). In addition, the principal axes are plotted according to the geographical locations of ApMV isolates (Figure 4b). By this analysis, no clustering was found between the isolates and geographical locations, which were isolated (Table S2). Clustering of the majority of ApMV isolates from apple (Figure 4a) near to origin by PCA, illustrated the possible origin of this virus from the apple host.
By ENC values against GC3s values, the data points belonging to three hosts clustered together under the normal ENC curve (Figure 5). When the data points drop below the standard curve, the codon usage is more affected by natural selection rather than the mutation pressure. In addition, the degree of mutational pressure and natural selection on the codon usage in ApMV CP gene was determined, using the neutrality analyses between GC1,2s and GC3s for all of the sequences, and the results were grouped by the ApMV hosts (Figure 6). A significant positive correlation (r2 = 0.5022,
The PR2-bias plot of the ApMV CP gene is shown in Figure 7. Along the ordinate, in the PR2 plot, all ApMV CP genes showed similar distribution, and all of them were distributed on the lower right area of the plot (the G > C side). The PR2-bias plot indicates a codon usage deviation between G + C and A + T at the 3rd nucleotide position. This unequal use of nucleotides composition with PR2 plot indicates that the combination of mutation pressure and natural selection is driving the codon usage patterns in the CP gene but the role of selection pressure is more important (Figure 7).
The CAI and RCDI analyses were done for assessment of the codon usage optimization and host adaptation of ApMV. The average CAI values of the CP coding sequences were 0.693, 0.678, and 0.630 for the hazelnut, apple, and pear, respectively (Figure 8). These results showed that ApMV host adaptation was highest for hazelnut and minimum for pear. In addition, the average RCDI values were highest for pear (1.975), followed by apple (1.792) and hazelnut (1.715), which shows codon usage deoptimization was the greatest for the pear (Figure 8). The SiD values were also calculated to investigate how the hosts’ codon usage patterns influence the ApMV CP codon usage pattern (Figure 9). The SiD value of hazelnut was greater than those of apple and pear suggesting that hazelnut had a higher influence on the ApMV CP gene in comparison with apple and pear.
Identification of codon usage patterns provides important information about the host-pathogen co-evolution, such as adaptation of pathogens to hosts and molecular evolution of genes (Butt et al. 2016; He et al. 2019; Pandit and Sinha 2011; Zhang et al. 2019). In comparison with eukaryotic and prokaryotic organisms, the importance of CUB in the evolution of plant viruses is less considered. In this study, we analyzed synonymous codon usage in CP sequences from 52 ApMV in order to understand its molecular evolution under the influence of multiple viral and host factors. It has previously been indicated that codon usage bias, or preference for one type of codon over another, can be significantly influenced by overall genomic composition (Jenkins and Holmes 2003). Nucleotide composition analysis indicated that the ApMV CP gene was AU rich. However, it appears that codons with U or G in the third position are preferred in the ApMV CP gene, which indicates possible codon usage bias (Table 1 and Figure 1). The uneven usage of A3/U3 and G3/C3 nucleotides in AU-rich CP genes in this study shows that the compositional patterns of the ApMV CP sequences are more complex than the commonly observed GC- and/or AU-rich compositions of most virus genes. This unequal use of nucleotides indicates the overlapping influences of mutational pressure and natural selection on the codon preferences in the present CP gene sequences as previously reported for
According to the existence of codon bias toward G and U ended codons in ApMV CP gene sequences, we analyzed this bias between different hosts of ApMV using ENC analyses. Generally, the stronger codon usage bias is indicated by a smaller ENC value and the ENC values less than 35 are illustrated for genes with considerable codon bias. For this case, the mean ENC value was 54.46 (Table S2), which shows slightly biased, relatively conserved, and stable coding sequences composition among different isolates. In addition, among the three hosts, those isolates from apples with higher mean ENC values showed a lower codon usage bias than isolates from hazelnut and pear (Figure 3). The low codon usage bias has been previously reported for some plant viruses including Begomoviruses (Xu et al. 2008),
The significant tendency in codon usage variation of the ApMV CP gene was examined by PCA analysis on RSCU values (Figure 4a). PCA analysis among three various hosts indicated several overlaps between apple and pear isolates which suggests that the main codon usage trend is identical in these two hosts. In addition, we plotted principal axes based on hosts and geographical isolation. Clustering of the majority of ApMV isolates from apple (Figure 4a) near to origin by PCA plot, illustrating the possible origin of this virus from apple host. In contrast, hazelnut isolates might have independently evolved due to biological variation and/or dispensation diversities. ApMV was firstly isolated and described from apple in the early 1940s (Bradford and Joly 1933) and later form other hosts including pear and hazelnuts (Brunt et al. 1996). Based on Principal component analysis (PCA) plots, it was inferred that ApMV originated from apple trees in Europe continent (Figure 4b, Figure S1). Apple trees are cultivated worldwide and are the most widely grown species in the genus Malus. Apple tree originated in Central Asia, where its wild ancestor, Malus sieversii, is still found today. Apples have been grown for thousands of years in Asia and Europe and were brought to North America by European colonists. The grouping of various ApMV isolates (Figure 1a and 1b), separated by thousands of miles within a single group indicated an important role of the mobility of ApMV’s natural host. This analysis show that the isolates of ApMV might have independently evolved in two clusters after diverging from a common ancestor. In addition, the role of natural hosts within area of infection, and susceptibility of hosts may have affected codon usage patterns in ApMV CP gene. Beside the composition frequencies of nucleotides, the ENC plot is considered to identify codon usage differentiation among genes in various organisms (Comeron and Aguadé 1998). After the ENC and GC3s values of ApMV CP gene were plotted, none of the isolates fell on the standard continuous curve (Figure 5) indicating that selection pressure is the major factor for driving the codon usage bias in CP gene of ApMV. Using neutral plot (GC3s versus GC1,2s values) the effects of mutation pressure and natural selection bias on codon usage patterns are determined showing that influence of natural selection dominates over mutation pressure (Figure 6). It was shown that mutational pressure has a major role in the CUB of plant viruses (Adams and Antoniw 2003). However, the present study show that both the natural selection and mutational pressure have influence on the CUB in plant viruses (Chakraborty et al. 2015).
It has been proposed that if mutation pressure alone influenced the synonymous codon usage bias, therefore the frequency of nucleotides A and U/T should be equal to that of C and G at the synonymous codon third position (Wang et al. 2016). Using PR2 plot analysis it was indicated that the frequency of GC and AU nucleotides at the third position of synonymous codon was not equal (Figure 7). The AU bias in ApMV CP gene demonstrates the potential influence of natural selection on codon usage patterns. The pathogen-host interactions can affect the dynamics, emergence, genetic divergence, and evolution of infectious diseases (Wang et al. 2016; Zhang et al. 2013). CAI is considered as an index of gene expression and can be used to evaluate the adaptation of viral genes to their hosts. The highest CAI value was calculated for hazelnut indicating that natural selection from these hosts has influenced the codon usage patterns (Figure 8). As inferred from the SiD analysis (Figure 9) hazelnut has a more effect on shaping ApMV RSCU patterns, which is in accord with the CAI analysis.
Although apple has always been suggested to be the primary ApMV host however, a strong link between ApMV and hazelnut was observed in this study. Based on our findings, this study showed that overall codon usage within the ApMV CP gene is slightly biased. The evolution of ApMV perhaps reflects a dynamic process of mutation and natural selection to adapt their codon usage to different environments and hosts. This study reflects an essential contribution to the understanding of plant virus evolution, reveals novel information about their evolutionary fitness of them, and helps find better management strategies of ApMV.
We gratefully thank to Dr. Hassan Ebrahimi, Department of Advanced Technology Fusion, Graduate School of Science and Engineering, Saga University, 1 Honjo-Machi, Saga 840-8502, Japan, for kindly supporting CodonW analysis and mathematical equations in this research.
The Iranian Research Institute of Plant Protection-IRIPP (Project No. 961463) partly supported this work.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The data sets analyzed during the present study are available in the GenBank repository (https://www.ncbi.nlm.nih.gov/)
No custom or special code or mathematical algorithm was used this study.
All authors contributed equally.
The authors confirm that the ethical policy of the journal, as mentioned on the journal’s author guidelines page, was ensured, and no ethical approval was required for this study as no samples or questionnaires were collected from animals or humans.
Written informed consent for study participation was obtained from all individual participants.
Written informed consent for study publication was obtained from all individual participants.
J Plant Biotechnol 2022; 49(1): 46-60
Published online March 31, 2022 https://doi.org/10.5010/JPB.2022.49.1.046
Copyright © The Korean Society of Plant Biotechnology.
R. Pourrahim ・Sh. Farzadfar
Plant Virus Research Department, Iranian Research Institute of Plant Protection (IRIPP), Agricultural Research, Education and Extension Organization (AREEO), Tehran, Iran
Correspondence to:e-mail: pourrahim@yahoo.com
This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
The genetic variability and population structure of apple mosaic virus (ApMV) have been studied; however, synonymous codon usage patterns influencing the survival rates and fitness of ApMV have not been reported. Based on phylogenetic analyses of 52 ApMV coat protein (CP) sequences obtained from apple, pear, and hazelnut, ApMV isolates were clustered into two groups. High molecular diversity in GII may indicate their recent expansion. A constant and conserved genomic composition of the CP sequences was inferred from the low codon usage bias. Nucleotide composition and relative synonymous codon usage (RSCU) analysis indicated that the ApMV CP gene is AU-rich, but G- and U-ending codons are favored while coding amino acids. This unequal use of nucleotides together with parity rule 2 and the effective number of codon (ENC) plots indicate that mutation pressure together with natural selection drives codon usage patterns in the CP gene. However, in this combination, selection pressure plays a more crucial role. Based on principal component analysis plots, ApMV seems to have originated from apple trees in Europe. However, according to the relative codon deoptimization index and codon adaptation index (CAI) analyses, ApMV exhibited the greatest fitness to hazelnut. As inferred from the results of the similarity index analysis, hazelnut has a major role in shaping ApMV RSCU patterns, which is consistent with the CAI analysis results. This study contributes to the understanding of plant virus evolution, reveals novel information about ApMV evolutionary fitness, and helps find better ApMV management strategies.
Keywords: ApMV, codon usage patterns, mutation pressure, natural selection, host adaptation
Understanding the evolution of virus-host interactions is so important, due to rapid evolution through genetic recombination, mutation, the potential of adaption to new or resistant hosts (Davino et al. 2017; Garcia-Arenal et al. 2001), fast adaptation to the different environmental conditions, and mostly lack effective chemical compounds (Elena et al. 2014). As the virus translation is dependent on the host cellular machinery, the interaction of a virus with a particular host must be studied based on its codon usage pattern. A remarkable role of codon usage bias (CUB) in the evolution of viruses was reported (Angellotti et al. 2007). The codon usage pattern of viruses indicates the evolutionary changes that allow the viruses to optimize their survival and better adapt toward fitness to the external environment and, most importantly, their host (Butt et al. 2014). Natural/translational selection and the mutational/neutral model are two major models, which explain the codon usage bias (Bulmer 1991; Hershberg and Petrov 2008). The natural selection model suggests that there is a co-adaptation of synonymous codon usage and the transfer RNA (tRNA) abundance to optimize translational efficiency (Zhou et al. 1999). Therefore, the efficient use of ribosomes and maximized growth rate of fast-growing organisms will be provided by the codon usage adaptation (Hershberg and Petrov 2008). The mutational model hypothesizes that genetic compositional constraints affect the possibility of mutational fixation, and this was observed in numerous RNA viruses (Adams and Antoniw 2003). The GC content is probably to be determined mostly by genome-wide mutation bias rather than by selective forces acting specifically on coding regions. Unfortunately, the studies on CUB and its role in the evolution of plant viruses are limited (Adams and Antoniw 2003). The recent advancement in sequencing technologies allows studying the codon usage behavior of viral diseases (He et al. 2019; He et al. 2017; Liu et al. 2012; Xu et al. 2008). It is presumed that viral CP evolved more rapidly than proteins involved in replication and expression of virus genomes (Callaway et al. 2001), thus providing a strong incentive to study the diversity of viruses based on CP genes. Apple mosaic virus is a key species of subgroup III in the Ilarvirus genus (Bromoviridae family) (Bujarski et al. 2012). ApMV causes economic yield losses in pome fruits worldwide. More than 65 species of woody or herbaceous plants belonging to 19 families have been reported as naturally or experimentally host for ApMV (Brunt et al. 1996; Cieslinska and Valasevich 2016; Tzanetakis and Martin 2005). The virus is graft and mechanically transmissible and persists in propagative infected materials such as scion, rootstocks, or buds, and has no known vector (Fulton 1972). The genome of ApMV is divided into three single-stranded RNA segments in which, coat protein (CP) and movement protein (MP) are coded by RNA3 (Bujarski et al. 2012). Phylogenetic analysis using complete CP sequences divided ApMV isolates into two major clusters. One cluster involves isolates from Maloideae and Trebouxia lichen algae while the second cluster involves isolates from Prunus, hop, and the other woody trees (Grimova et al. 2013). No relation has been shown between the geographic origins and clustering of ApMV isolates (Crowle et al. 2003; Petrzik 2005).
The genetic variability and population structure of ApMV, have already been studied. However, the synonymous codon usage patterns and selection pressure analysis, which provides significant information about the virus evolution as well as gene expression and functions, have not been reported. In this study, patterns of codon usage bias were investigated using 52 complete CP nucleotide sequences of isolates from apple (M. domesticus) and pear (Pyrus sp.) from Rosaceae family and hazelnut (Corylus sp.) belonging to Betulaceae family. These analyses reveal novel information about the evolutionary fitness of ApMV.
Fifty-two full ApMV CP sequences of apple (n = 36), pear (n = 7) and hazelnut (n = 9) were retrieved from NCBI GenBank. Data on ApMV isolates, including geographical location, host origin, and the time of collection are shown in Table S1. To clarify the genetic diversity of ApMV, CP sequences were aligned using CLUSTALX2 (Kumar et al. 2018). Maximum Likelihood (ML) tree was reconstructed by MEGAX (Kumar et al. 2018) using K2 + G + I method with 1000 Bootstrap replicates. Nucleotide diversity was estimated using Kimura two parameters implemented in MEGAX (Kumar et al. 2018). The sequence pairwise identity was classified using the SDTv1.2 program. The pairwise nucleotide diversity and identity are shown using color plots.
After deleting five non-bias codons including AUG (start codon), UGG (encoding Trp), and three termination codons UAA, UGA, and UAG, the component parameters of the ApMV CP sequences were calculated. The total percent nucleotide composition and the overall GC and AU contents were estimated by MEGAX (Kumar et al. 2018). Using CodonW 1.4.2 package, the overall frequencies of the occurrence of nucleotides (A%, U%, C%, and G%), the nucleotide at the third position of synonymous codons (A3%, U3%, C3%, and G3%), G+C at the first (GC1), second (GC2), and third (GC3) positions, and G+C at the first and second positions (GC1,2) for the CP gene sequence of each ApMV isolate were calculated. The codon usage data for the different hosts were obtained from the codon usage database (available at https://hive.biochemistry.gwu.edu/review/codon) (Athey et al. 2017).
RSCU value shows the relative application of synonymous codons among the combination of codons encoding similar amino acids (Sharp and Li 1986). Codon usage is applied less frequently, if an RSCU value is equal 1.0, but RSCU values with < 0.6 and > 1.6 are indicated to be “underrepresented” and “overrepresented, respectively (Sharp et al. 1986).
The maximum synonymous codons bias of the ApMV CP gene was inferred by the ENC analysis. The range of ENC values is differed from 20 (an excessive codon usage bias) to 61 (non-bias), respectively. Generally, highly expressed genes have the lower ENC value with the stronger codon preference termed as optimal codons, whereas lowly expressed genes with higher ENC value illustrate that all synonymous codons are used equally (Wright 1990). The ENC value was determined using CodonW v1.4.2.
Using the ENC versus GC3s values (ENC-plot), the effect of mutational pressure or natural selection on codon usage bias is analyzed. When the points are on the standard curve it shows that mutation pressure is the lonely factor for driving the codon usage bias. Otherwise, if the selection were the main force, the ENC values would lie lower than the standard curve (Wright 1990). In addition, the neutral evolution analysis was done to determine the influence rate of natural selection and mutation pressure on codon usage patterns of the ApMV CP gene by plotting the GC1,2s values of the synonymous codons against and GC3s values. GC3 indicates the abundance of G+C at the third codon position and GC12 represents the average of GC1 and GC2. The mutation pressure is shown using the slope of the regression line plotted between the GC3s and GC1,2s contents. Weak or no exterior selection pressure is indicated where regression line (s) near to the diagonal (slope = 1.0). Conversely, the deviation of regression curves from the diagonal demonstrates considerable effects of natural selection on codon usage bias.
Parity rule 2 (PR2) plot shows the influence of natural selection and mutation pressure on the codon usage of each gene using A3/(A3 + U3) value plotted versus G3/(G3 + C3) value. The center of the PR2 plot is 0.5 which indicates A=U and G = C (Sueoka 1999). If there is no deviance between mutation pressure and selection pressure, the points are placed in the center of the plot and vice versa. Furthermore, the significant tendency in codon usage variation of the ApMV CP sequences was examined by PCA analysis, which demonstrated the significant tendency in codon usage variation (Zhou et al. 1999). PCA plot of the 1st axis and the 2nd axis of the isolated strains according to the phylogroups were drawn.
The codon adaptation index (CAI) value for ApMV CP sequences was determined using the CAIcal SERVER (http://genomes.urv.cat/CAIcal/RCDI/). The CAI values ranging from 0.0 to 1.0 indicate the various degrees of adaptation to the host. The high CAI value of a sequence shows its stronger adaptability to the host, and conversely (Puigbò et al. 2010). In addition, the relative codon deoptimization index (RCDI) value of 1.0 shows that the virus acts in accordance with the host codon usage patterns. Otherwise, RCDI values of more than 1.0 show lower compatibility. The RCDI values were determined using the RCDI/eRCDI server (http://genomes.urv.cat/CAIcal/RCDI/). The influence of the codon usage bias of the hosts was measured by SiD value. The SiD was determined in this way:
In this formula
Phylogenetic analyses clustered the 52 ApMV isolates into two main groups, in which apple and pear isolates fell in one group (GI) whereas, those isolated from hazelnut cluster in another group (GII) (Figure 1a). Nucleotide identity ranged from 88 to 100% with higher identity (Figure 1b) and lower diversity (Figure 1c) indicated in GI. Nucleotide distance plots for GI and GII were (0.0 to 13.5%) and (13.5 to 19.6%), respectively (Figure 1c).
High frequency of G and A nucleotides were detected in the ApMV CP sequences, with average compositions of 28.84 ± 0.61% and 26.85 ± 0.54% (Table S2) respectively, in comparison with T (U) (24.48 ± 0.84%) and C (19.80 ± 0.72%). In contrast, the nucleotide composition was remarkably different for the nucleotide compositions at the 3rd position of synonymous codons. The most frequent nucleotide was G3s (31.90% ± 1.64), followed by T3s (28.59% ± 1.76), C3s (21.01% ± 1.57) and A3s (18.49% ± 1.68). The compositions of AU and GC in the CP coding sequences were 51.34% ± 1.22 and 48.65% ± 1.22, respectively, informing that there is an AU-biased composition in the ApMV CP gene. The mean GC contents for GC1,2s and GC3s at 1st, 2nd, and 3rd positions were 46.53 ± 0.53% and 51.02 ± 0.02%, respectively.
RSCU analysis was done for estimating the codon usage patterns of the ApMV CP sequences (Table 1). Twelve out of 18 frequently used codons were G/U-ending (6 ended to G and 6 ended to U), while the six remaining codons were ended to A or C (Table 1). This result indicates that U- and G-ending codons are favored in the ApMV CP gene. Regardless of the ApMV host, the RSCU value > 1.6 was detected for nine of the optimal synonymous codons (UUG, GUG, AGU, CCG, ACG, GCU, CAA, AGG, and GGU), with the highest preferred value for AGU codon (2.57). The variation of the codon usage bias across ApMV CP gene was calculated for the RSCU of each codon for each ApMV isolate and the results indicated three main clusters of codons (Figure 2). The first cluster generally included overrepresented codons (RSCU > 1), which contained A/U-ending codons (19 out of 59 codons) and G/C-ending codons (14 out of 59 codons). The second cluster consisted of mostly G/C-ending codons (11 out of 59 codons) and six codons ended to A/U that were generally underrepresented (RSCU < 1). The last and the smallest group consisted of five A/U-ending codons (UCA, GUA, GCA, CUU, and UAU) and four G/C ending codons (CGG, CUC, UGC, and CAG) that were underrepresented. Among the underrepresented codons, two UCA and UCG codons, which encode serine were found in most of the hazelnut isolates.
Table 1 . The relative synonymous codon usage value of 59 codons encoding 18 amino acids in the coat protein gene of apple mosaic virus according to hosts.
Codon | aa | Apple | Pear | Hazelnut | All |
---|---|---|---|---|---|
UUU | F | 1.04* | 1.05 | 1.11 | 1.07 |
UUC | F | 0.96 | 0.95 | 0.89 | 0.93 |
UUA | L | 1.29 | 2.03 | 1.13 | 1.48 |
UUG | L | 2.26 | 2.03 | 2.38 | 2.22 |
CUU | L | 0.74 | 0.44 | 0.62 | 0.60 |
CUC | L | 0.09 | 0.04 | 0.06 | 0.06 |
CUA | L | 0.51 | 0.75 | 0.62 | 0.63 |
CUG | L | 1.12 | 0.71 | 1.19 | 1.01 |
AUU | I | 1.00 | 1.01 | 0.89 | 0.97 |
AUC | I | 0.9 | 0.89 | 1.00 | 0.93 |
AUA | I | 1.1 | 1.09 | 1.11 | 1.10 |
GUU | V | 0.86 | 1.24 | 0.81 | 0.97 |
GUC | V | 1.1 | 0.63 | 1.06 | 0.93 |
GUA | V | 0.17 | 0.36 | 0.22 | 0.25 |
GUG | V | 1.87 | 1.77 | 1.92 | 1.85 |
UCU | S | 1.16 | 0.85 | 1.20 | 1.07 |
UCC | S | 1.51 | 1.25 | 1.63 | 1.46 |
UCA | S | 0.03 | 0.23 | 0.09 | 0.12 |
UCG | S | 0.11 | 0.45 | 0.00 | 0.19 |
AGU | S | 2.55 | 2.66 | 2.49 | 2.57 |
AGC | S | 0.64 | 0.57 | 0.60 | 0.60 |
CCU | P | 0.79 | 0.91 | 0.85 | 0.85 |
CCC | P | 0.45 | 0.32 | 0.46 | 0.41 |
CCA | P | 0.64 | 1.03 | 0.64 | 0.77 |
CCG | P | 2.13 | 1.74 | 2.05 | 1.97 |
ACU | T | 1.13 | 0.86 | 1.20 | 1.06 |
ACC | T | 0.46 | 0.52 | 0.46 | 0.48 |
ACA | T | 0.73 | 0.67 | 0.80 | 0.73 |
ACG | T | 1.69 | 1.95 | 1.54 | 1.73 |
GCU | A | 1.42 | 1.82 | 1.69 | 1.64 |
GCC | A | 1.26 | 1.27 | 1.16 | 1.23 |
GCA | A | 0.47 | 0.22 | 0.31 | 0.33 |
GCG | A | 0.84 | 0.69 | 0.84 | 0.79 |
UAU | Y | 0.77 | 0.37 | 0.73 | 0.62 |
UAC | Y | 1.23 | 1.63 | 1.27 | 1.38 |
CAU | H | 0.49 | 0.39 | 0.43 | 0.44 |
CAC | H | 1.51 | 1.61 | 1.57 | 1.56 |
CAA | Q | 1.48 | 1.89 | 1.52 | 1.63 |
CAG | Q | 0.52 | 0.11 | 0.48 | 0.37 |
AAU | N | 1.34 | 1.38 | 1.42 | 1.38 |
AAC | N | 0.66 | 0.63 | 0.58 | 0.62 |
AAA | K | 0.54 | 0.59 | 0.56 | 0.56 |
AAG | K | 1.46 | 1.41 | 1.44 | 1.44 |
GAU | D | 1.21 | 1.52 | 1.24 | 1.32 |
GAC | D | 0.79 | 0.48 | 0.76 | 0.68 |
GAA | E | 0.94 | 1.15 | 0.98 | 1.02 |
GAG | E | 1.06 | 0.85 | 1.02 | 0.98 |
UGU | C | 0.57 | 1.04 | 0.50 | 0.70 |
UGC | C | 1.43 | 0.96 | 1.50 | 1.30 |
CGU | R | 0.47 | 0.62 | 0.43 | 0.51 |
CGC | R | 0.28 | 0.62 | 0.38 | 0.43 |
CGA | R | 1.49 | 0.98 | 1.39 | 1.29 |
CGG | R | 0.01 | 0.00 | 0.00 | 0.00 |
AGA | R | 1.37 | 1.73 | 1.39 | 1.50 |
AGG | R | 2.37 | 2.04 | 2.41 | 2.27 |
GGU | G | 1.88 | 1.97 | 2.13 | 1.99 |
GGC | G | 0.79 | 0.54 | 0.84 | 0.72 |
GGA | G | 0.92 | 1.11 | 0.80 | 0.94 |
GGG | G | 0.41 | 0.37 | 0.23 | 0.34 |
*The most frequently used codons are shown in bold..
The importance of the ApMV CP codon usage bias was measured by ENC value. Low codon usage bias in all CP coding sequences of the ApMV with ENC average value 54.46 ± 2.04 (Table S2), represents an approximately constant and conserved genomic composition. However, the highest and lowest ENC values were indicated for the ApMV CP coding sequences of isolates from apple and hazelnut hosts, respectively (Figure 3).
The significant tendency in codon usage variation of the ApMV CP gene was examined by PCA analysis (Figure 4a). Among the three various hosts, several overlaps were detected between apple and pear isolates suggesting that the main codon usage trend is somewhat identical in these two hosts (Figure 4a). In addition, the principal axes are plotted according to the geographical locations of ApMV isolates (Figure 4b). By this analysis, no clustering was found between the isolates and geographical locations, which were isolated (Table S2). Clustering of the majority of ApMV isolates from apple (Figure 4a) near to origin by PCA, illustrated the possible origin of this virus from the apple host.
By ENC values against GC3s values, the data points belonging to three hosts clustered together under the normal ENC curve (Figure 5). When the data points drop below the standard curve, the codon usage is more affected by natural selection rather than the mutation pressure. In addition, the degree of mutational pressure and natural selection on the codon usage in ApMV CP gene was determined, using the neutrality analyses between GC1,2s and GC3s for all of the sequences, and the results were grouped by the ApMV hosts (Figure 6). A significant positive correlation (r2 = 0.5022,
The PR2-bias plot of the ApMV CP gene is shown in Figure 7. Along the ordinate, in the PR2 plot, all ApMV CP genes showed similar distribution, and all of them were distributed on the lower right area of the plot (the G > C side). The PR2-bias plot indicates a codon usage deviation between G + C and A + T at the 3rd nucleotide position. This unequal use of nucleotides composition with PR2 plot indicates that the combination of mutation pressure and natural selection is driving the codon usage patterns in the CP gene but the role of selection pressure is more important (Figure 7).
The CAI and RCDI analyses were done for assessment of the codon usage optimization and host adaptation of ApMV. The average CAI values of the CP coding sequences were 0.693, 0.678, and 0.630 for the hazelnut, apple, and pear, respectively (Figure 8). These results showed that ApMV host adaptation was highest for hazelnut and minimum for pear. In addition, the average RCDI values were highest for pear (1.975), followed by apple (1.792) and hazelnut (1.715), which shows codon usage deoptimization was the greatest for the pear (Figure 8). The SiD values were also calculated to investigate how the hosts’ codon usage patterns influence the ApMV CP codon usage pattern (Figure 9). The SiD value of hazelnut was greater than those of apple and pear suggesting that hazelnut had a higher influence on the ApMV CP gene in comparison with apple and pear.
Identification of codon usage patterns provides important information about the host-pathogen co-evolution, such as adaptation of pathogens to hosts and molecular evolution of genes (Butt et al. 2016; He et al. 2019; Pandit and Sinha 2011; Zhang et al. 2019). In comparison with eukaryotic and prokaryotic organisms, the importance of CUB in the evolution of plant viruses is less considered. In this study, we analyzed synonymous codon usage in CP sequences from 52 ApMV in order to understand its molecular evolution under the influence of multiple viral and host factors. It has previously been indicated that codon usage bias, or preference for one type of codon over another, can be significantly influenced by overall genomic composition (Jenkins and Holmes 2003). Nucleotide composition analysis indicated that the ApMV CP gene was AU rich. However, it appears that codons with U or G in the third position are preferred in the ApMV CP gene, which indicates possible codon usage bias (Table 1 and Figure 1). The uneven usage of A3/U3 and G3/C3 nucleotides in AU-rich CP genes in this study shows that the compositional patterns of the ApMV CP sequences are more complex than the commonly observed GC- and/or AU-rich compositions of most virus genes. This unequal use of nucleotides indicates the overlapping influences of mutational pressure and natural selection on the codon preferences in the present CP gene sequences as previously reported for
According to the existence of codon bias toward G and U ended codons in ApMV CP gene sequences, we analyzed this bias between different hosts of ApMV using ENC analyses. Generally, the stronger codon usage bias is indicated by a smaller ENC value and the ENC values less than 35 are illustrated for genes with considerable codon bias. For this case, the mean ENC value was 54.46 (Table S2), which shows slightly biased, relatively conserved, and stable coding sequences composition among different isolates. In addition, among the three hosts, those isolates from apples with higher mean ENC values showed a lower codon usage bias than isolates from hazelnut and pear (Figure 3). The low codon usage bias has been previously reported for some plant viruses including Begomoviruses (Xu et al. 2008),
The significant tendency in codon usage variation of the ApMV CP gene was examined by PCA analysis on RSCU values (Figure 4a). PCA analysis among three various hosts indicated several overlaps between apple and pear isolates which suggests that the main codon usage trend is identical in these two hosts. In addition, we plotted principal axes based on hosts and geographical isolation. Clustering of the majority of ApMV isolates from apple (Figure 4a) near to origin by PCA plot, illustrating the possible origin of this virus from apple host. In contrast, hazelnut isolates might have independently evolved due to biological variation and/or dispensation diversities. ApMV was firstly isolated and described from apple in the early 1940s (Bradford and Joly 1933) and later form other hosts including pear and hazelnuts (Brunt et al. 1996). Based on Principal component analysis (PCA) plots, it was inferred that ApMV originated from apple trees in Europe continent (Figure 4b, Figure S1). Apple trees are cultivated worldwide and are the most widely grown species in the genus Malus. Apple tree originated in Central Asia, where its wild ancestor, Malus sieversii, is still found today. Apples have been grown for thousands of years in Asia and Europe and were brought to North America by European colonists. The grouping of various ApMV isolates (Figure 1a and 1b), separated by thousands of miles within a single group indicated an important role of the mobility of ApMV’s natural host. This analysis show that the isolates of ApMV might have independently evolved in two clusters after diverging from a common ancestor. In addition, the role of natural hosts within area of infection, and susceptibility of hosts may have affected codon usage patterns in ApMV CP gene. Beside the composition frequencies of nucleotides, the ENC plot is considered to identify codon usage differentiation among genes in various organisms (Comeron and Aguadé 1998). After the ENC and GC3s values of ApMV CP gene were plotted, none of the isolates fell on the standard continuous curve (Figure 5) indicating that selection pressure is the major factor for driving the codon usage bias in CP gene of ApMV. Using neutral plot (GC3s versus GC1,2s values) the effects of mutation pressure and natural selection bias on codon usage patterns are determined showing that influence of natural selection dominates over mutation pressure (Figure 6). It was shown that mutational pressure has a major role in the CUB of plant viruses (Adams and Antoniw 2003). However, the present study show that both the natural selection and mutational pressure have influence on the CUB in plant viruses (Chakraborty et al. 2015).
It has been proposed that if mutation pressure alone influenced the synonymous codon usage bias, therefore the frequency of nucleotides A and U/T should be equal to that of C and G at the synonymous codon third position (Wang et al. 2016). Using PR2 plot analysis it was indicated that the frequency of GC and AU nucleotides at the third position of synonymous codon was not equal (Figure 7). The AU bias in ApMV CP gene demonstrates the potential influence of natural selection on codon usage patterns. The pathogen-host interactions can affect the dynamics, emergence, genetic divergence, and evolution of infectious diseases (Wang et al. 2016; Zhang et al. 2013). CAI is considered as an index of gene expression and can be used to evaluate the adaptation of viral genes to their hosts. The highest CAI value was calculated for hazelnut indicating that natural selection from these hosts has influenced the codon usage patterns (Figure 8). As inferred from the SiD analysis (Figure 9) hazelnut has a more effect on shaping ApMV RSCU patterns, which is in accord with the CAI analysis.
Although apple has always been suggested to be the primary ApMV host however, a strong link between ApMV and hazelnut was observed in this study. Based on our findings, this study showed that overall codon usage within the ApMV CP gene is slightly biased. The evolution of ApMV perhaps reflects a dynamic process of mutation and natural selection to adapt their codon usage to different environments and hosts. This study reflects an essential contribution to the understanding of plant virus evolution, reveals novel information about their evolutionary fitness of them, and helps find better management strategies of ApMV.
We gratefully thank to Dr. Hassan Ebrahimi, Department of Advanced Technology Fusion, Graduate School of Science and Engineering, Saga University, 1 Honjo-Machi, Saga 840-8502, Japan, for kindly supporting CodonW analysis and mathematical equations in this research.
The Iranian Research Institute of Plant Protection-IRIPP (Project No. 961463) partly supported this work.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The data sets analyzed during the present study are available in the GenBank repository (https://www.ncbi.nlm.nih.gov/)
No custom or special code or mathematical algorithm was used this study.
All authors contributed equally.
The authors confirm that the ethical policy of the journal, as mentioned on the journal’s author guidelines page, was ensured, and no ethical approval was required for this study as no samples or questionnaires were collected from animals or humans.
Written informed consent for study participation was obtained from all individual participants.
Written informed consent for study publication was obtained from all individual participants.
Table 1 . The relative synonymous codon usage value of 59 codons encoding 18 amino acids in the coat protein gene of apple mosaic virus according to hosts.
Codon | aa | Apple | Pear | Hazelnut | All |
---|---|---|---|---|---|
UUU | F | 1.04* | 1.05 | 1.11 | 1.07 |
UUC | F | 0.96 | 0.95 | 0.89 | 0.93 |
UUA | L | 1.29 | 2.03 | 1.13 | 1.48 |
UUG | L | 2.26 | 2.03 | 2.38 | 2.22 |
CUU | L | 0.74 | 0.44 | 0.62 | 0.60 |
CUC | L | 0.09 | 0.04 | 0.06 | 0.06 |
CUA | L | 0.51 | 0.75 | 0.62 | 0.63 |
CUG | L | 1.12 | 0.71 | 1.19 | 1.01 |
AUU | I | 1.00 | 1.01 | 0.89 | 0.97 |
AUC | I | 0.9 | 0.89 | 1.00 | 0.93 |
AUA | I | 1.1 | 1.09 | 1.11 | 1.10 |
GUU | V | 0.86 | 1.24 | 0.81 | 0.97 |
GUC | V | 1.1 | 0.63 | 1.06 | 0.93 |
GUA | V | 0.17 | 0.36 | 0.22 | 0.25 |
GUG | V | 1.87 | 1.77 | 1.92 | 1.85 |
UCU | S | 1.16 | 0.85 | 1.20 | 1.07 |
UCC | S | 1.51 | 1.25 | 1.63 | 1.46 |
UCA | S | 0.03 | 0.23 | 0.09 | 0.12 |
UCG | S | 0.11 | 0.45 | 0.00 | 0.19 |
AGU | S | 2.55 | 2.66 | 2.49 | 2.57 |
AGC | S | 0.64 | 0.57 | 0.60 | 0.60 |
CCU | P | 0.79 | 0.91 | 0.85 | 0.85 |
CCC | P | 0.45 | 0.32 | 0.46 | 0.41 |
CCA | P | 0.64 | 1.03 | 0.64 | 0.77 |
CCG | P | 2.13 | 1.74 | 2.05 | 1.97 |
ACU | T | 1.13 | 0.86 | 1.20 | 1.06 |
ACC | T | 0.46 | 0.52 | 0.46 | 0.48 |
ACA | T | 0.73 | 0.67 | 0.80 | 0.73 |
ACG | T | 1.69 | 1.95 | 1.54 | 1.73 |
GCU | A | 1.42 | 1.82 | 1.69 | 1.64 |
GCC | A | 1.26 | 1.27 | 1.16 | 1.23 |
GCA | A | 0.47 | 0.22 | 0.31 | 0.33 |
GCG | A | 0.84 | 0.69 | 0.84 | 0.79 |
UAU | Y | 0.77 | 0.37 | 0.73 | 0.62 |
UAC | Y | 1.23 | 1.63 | 1.27 | 1.38 |
CAU | H | 0.49 | 0.39 | 0.43 | 0.44 |
CAC | H | 1.51 | 1.61 | 1.57 | 1.56 |
CAA | Q | 1.48 | 1.89 | 1.52 | 1.63 |
CAG | Q | 0.52 | 0.11 | 0.48 | 0.37 |
AAU | N | 1.34 | 1.38 | 1.42 | 1.38 |
AAC | N | 0.66 | 0.63 | 0.58 | 0.62 |
AAA | K | 0.54 | 0.59 | 0.56 | 0.56 |
AAG | K | 1.46 | 1.41 | 1.44 | 1.44 |
GAU | D | 1.21 | 1.52 | 1.24 | 1.32 |
GAC | D | 0.79 | 0.48 | 0.76 | 0.68 |
GAA | E | 0.94 | 1.15 | 0.98 | 1.02 |
GAG | E | 1.06 | 0.85 | 1.02 | 0.98 |
UGU | C | 0.57 | 1.04 | 0.50 | 0.70 |
UGC | C | 1.43 | 0.96 | 1.50 | 1.30 |
CGU | R | 0.47 | 0.62 | 0.43 | 0.51 |
CGC | R | 0.28 | 0.62 | 0.38 | 0.43 |
CGA | R | 1.49 | 0.98 | 1.39 | 1.29 |
CGG | R | 0.01 | 0.00 | 0.00 | 0.00 |
AGA | R | 1.37 | 1.73 | 1.39 | 1.50 |
AGG | R | 2.37 | 2.04 | 2.41 | 2.27 |
GGU | G | 1.88 | 1.97 | 2.13 | 1.99 |
GGC | G | 0.79 | 0.54 | 0.84 | 0.72 |
GGA | G | 0.92 | 1.11 | 0.80 | 0.94 |
GGG | G | 0.41 | 0.37 | 0.23 | 0.34 |
*The most frequently used codons are shown in bold..
Journal of
Plant Biotechnology