Damla PehlivanT., Maryam MounadR.

Introduction

Changes that occur in an organism’s genome are called genetic mutations. Mistakes can happen during the replication of DNA, which might lead to mutations in organisms. Mutations can also be caused by UV light and exposure to certain environmental factors, such as chemical carcinogens like ethidium bromide1. Organisms, environments, and mutations can be pathogens; consequently, can cause diseases. A pathogen’s ability to cause disease is called pathogenicity2. Pathogens with a high pathogenicity level have a strong pathogenic influence on other organisms. The pathogenicity of a mutation depends on its location in the gene and predicting the pathogenicity of mutations is crucial for finding and prioritising pathogenic mutations. For this purpose, in silico tools are preferred to predict pathogenic mutations in clinical genetics.

Overview of DNA Regions in Genome 

The human genome has many significant regions, simply grouped as coding-DNA and non-coding DNA. 98% of the human genome contains non-coding regions: intergenic regions, introns, and UTRs (untranslated region)3. From these regions, different RNA types are coded: rRNA (ribosomal RNA) that helps translation, tRNA (transfer RNA) that carries amino acids to the ribosome, snRNA (small nuclear RNA) that splice introns, snoRNA (small nucleolar RNA) that participate in nucleotide modification, miRNA (micro RNA) that affects genes regulation, piRNA (piwi-interacting RNA) that thought to be silencing transposons, siRNA (small interfering RNA) that takes part in silencing the genes, lncRNA (long non-coding RNA) that regulates epigenetic regulations4-12. Only 2% of the human genome encodes for a protein, and these regions are called exons3. For instance, even though the 2q24.3-q32.3 region consists of 30 million bases, only a short part of it bears protein-coding sequences (Figure 1); and these protein-coding genes primarily consist of the intragenic regions (Figure 2)13,14.

Figure 1. Chromosome 2g24.3-q32.3 region. Genomic positions are shown at the top in Mb (megabases). Cytogenetic positions are shown in red. Protein coding genes are shown in blue.

Figure 2. Gene structure, Transcription, Post-transcription modification and Translation mechanisms. Genes essentially contain six different regions that have two main tasks. Intragenic and intergenic regions and the UTR make up the regulatory sequence of genes, whereas introns and exons make up the open reading frame (OFR). Non-coding regions contain enhancer, silencer, and promoter regions, and the UTR is only involved in transcription. Introns at OFR are not involved in translation via alternative splice mechanisms14

The promoter region in the human genome is composed of UTR, intron, and exon regions. The 5’UTR, 3’UTR and exonic regions are involved in transcription with alternative splicing while introns are not3. Every gene is able to produce many mRNA isoforms with alternative splicing (Figure 3)15. These mRNAs enter only the transcription coded by exonic regions.

Figure 3. Five different alternative splicing mechanisms. Every exon joins the translation by adhering to other exons’ alternative splicing regions with unique combinations. These unique combinations lead to the formation of many isoforms of a protein15.

Applications of Sequencing Technologies

Through the advancements in NGS (next-generation sequencing), predicting possible pathogenic genes and mutations in these genes became more efficient. When searching for candidate mutations that could cause disease, exonic regions are prioritized. Thus, WES (whole-exon sequencing) is preferred over WGS (whole-genome sequencing) in clinical research. If any results are found, WGS is used to analyze non-coding regions. Along with new advancements, sequencing technologies became more reasonably priced, causing an increase in genome sequencing projects. 

In complex genetic diseases, such as cancer and type 2 diabetes, GWAS (genome-wide association study) is used alongside SNP (single nucleotide polymorphism) to analyze intragenic regions because the variants are frequent within the populations. While WES is used for monogenic diseases as only exonic regions are the reason for monogenic diseases16.

Tools Used to Detect Exonic Regions

Single base changes at exonic regions may have different types; for example, missense mutations change the polypeptide structure, whereas silent mutations do not. Exonic regions may experience indel mutations too. These are simply deletion (omission of a base pair) and insertion (addition of a base pair). Indel mutations may lead to a shift in the amino acid sequence called frame-shift mutations. In addition, some mutations might occur at the start or stop codons, such as start-loss mutation when the start codon is lost, stop-loss mutation when the stop codon is lost, and stop-gain mutation when a new stop codon is acquired. Prediction of these mutations occurring in the exons is a significant step to finding variants. With the help of online tools listed in Table 1, the prediction of mutations in the exonic regions has become more efficient. All online tools mentioned below use different algorithms and scoring systems 17-22. Our next article will be about the differences between these online tools and how to use them.

Table 1. Some online tools to use when predicting the mutations in different regions, their scoring system and websites17-22.

 Mutation type and its regionScoring systemWebsite
SIFT (Sorting Intolerant From Tolerant)All mutations in the exonic regions0.0 – 1.0                (harmful – tolerable)https://sift.bii.a-star.edu.sg
PROVEAN (Protein Variation Effect Analyzer)All mutations in the exonic region>(-2.5), <(-2.5)       (benign, harmful)PROVEAN Home (jcvi.org)
Mutation Taster – 2All mutations in the intronic, exonic and UTR regionsPathogenic and polymorphism probability (between 0–1)https://www.mutationtaster.org/
CADD (Combined Annotation Dependent Depletion)All mutations in the exonic region  1.0 – 99.00                  >20 possibly pathogenicCADD – Combined Annotation Dependent Depletion (washington.edu)
M-CAP (Mendelian Clinically Applicable Pathogenicity)All point mutations in the exonic region  >0.025 possibly pathogenicM-CAP: Mendelian Clinically Applicable Pathogenicity Score, Bejerano Lab, Stanford University
PolyPhen 2 (Polymorphism Phenotyping)All point mutations in the exonic region0.0 – 1.0                  (benign – pathogenic)PolyPhen-2: prediction of functional effects of human nsSNPs (harvard.edu)

Conclusion

The unordinary exonic variants can be investigated using the results from the online tools mentioned above. All the tools have their own scoring systems, and where the variant located within the scoring system corresponds to its pathogenicity. According to the result, the researcher can prioritize and determine pathogenic mutations. The c.1302G>A, p.(Trp434*) mutation in the EDAR gene adds a new stop codon to the gene, rendering the polypeptide chain to get shortened. This mutation causes tooth agenesis, thus pathogenic, with a 0.99 probability according to Mutation Taster-2 and a 9.856 score according to PROVEAN23.

Acknowledgment

Finally, we would like to thank the author and the editor of the Turkish version for their contributions. The review article is available in Turkish on www.bioinforange.com.

https://www.bioinforange.com/bioinforeviews/biyoinformatik/yazilimlar/insan-genomundaki-mutasyonlarin-patojenitesini-yordayan-cevrimici-araclar/

References

1. Karki R, Pandya D, Elston RC, Ferlini C. Defining “mutation” and “polymorphism” in the era of personal genomics. BMC Med Genomics. Jul 2015;8:37. https://doi.org/10.1186/s12920-015-0115-z

2. Shapiro-Ilan DI, Fuxa JR, Lacey LA, Onstad DW, Kaya HK. Definitions of pathogenicity and virulence in invertebrate pathology. J Invertebr Pathol. Jan 2005;88(1):1-7. https://doi.org/10.1016/j.jip.2004.10.003

3. Makałowski W. The human genome structure and organization. Acta Biochim Pol. 2001;48(3):587- 98 

4. Brosius J, Raabe CA. What is an RNA? A top layer for RNA classification. RNA Biol. 2016;13(2):140-4. https://doi.org/10.1080/15476286.2015.1128064

5. Brimacombe R, Stiege W. Structure and function of ribosomal RNA. Biochem J. Jul 1985;229(1):1-17. https://doi.org/10.1042/bj2290001

6. O’Donoghue P, Ling J, Söll D. Transfer RNA function and evolution. RNA Biol. 2018;15(4-5):423-426. https://doi.org/10.1080/15476286.2018.1478942  

7. Bohnsack MT, Sloan KE. Modifications in small nuclear RNAs and their roles in spliceosome assembly and function. Biol Chem. 10 2018;399(11):1265-1276. https://doi.org/10.1515/hsz-2018-0205 

8. Scott MS, Ono M. From snoRNA to miRNA: Dual function regulatory non-coding RNAs. Biochimie. Nov 2011;93(11):1987-92. https://doi.org/10.1016/j.biochi.2011.05.02  

9. Pillai RS. MicroRNA function: multiple mechanisms for a tiny RNA? RNA. Dec 2005;11(12):1753-61. https://doi.org/10.1261/rna.2248605 10. Huang Y, Bai JY, Ren HT. PiRNAs biogenesis and its functions. Bioorg Khim. 2014 May-Jun 2014;40(3):320-6  

11. Dana H, Chalbatani GM, Mahmoodzadeh H, et al. Molecular Mechanisms and Biological Functions of siRNA. Int J Biomed Sci. Jun 2017;13(2):48-57 

12. Zhang X, Wang W, Zhu W, et al. Mechanisms and Functions of Long Non-Coding RNAs at Multiple Regulatory Levels. Int J Mol Sci. Nov 2019;20(22) https://doi.org/10.3390/ijms20225573  

13. Dimitrov B, Balikova I, de Ravel T, et al. 2q31.1 microdeletion syndrome: redefining the associated clinical phenotype. J Med Genet. Feb 2011;48(2):98-104. https://doi.org/10.1136/jmg.2010.079491  

14. Shafee T, Lowe R. Eukaryotic and prokaryotic gene structure. WikiJournal of Medicine. November 14, 2016 2017;4(1)(2) https://doi.org/10.15347/wjm/2017.002  

15. Wang Y, Liu J, Huang BO, et al. Mechanism of alternative splicing and its regulation. Biomed Rep. Mar 2015;3(2):152-158. https://doi.org/10.3892/br.2014.407  

16. Sun W, Zheng W, Simeonov A. Drug discovery and development for rare genetic disorders. Am J Med Genet A. Sep 2017;173(9):2307-2322. https://doi.org/10.1002/ajmg.a.38326   

17. Ng PC, Henikoff S. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. Jul 2003;31(13):3812-4. https://doi.org/10.1093/nar/gkg509  

18. Choi Y, Chan AP. PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics. Aug 2015;31(16):2745-7. https://doi.org/10.1093/bioinformatics/btv19  

19. Schwarz JM, Cooper DN, Schuelke M, Seelow D. MutationTaster2: mutation prediction for the deepsequencing age. Nat Methods. Apr 2014;11(4):361-2. https://doi.org/10.1038/nmeth.2890  

20. Rentzsch P, Witten D, Cooper GM, Shendure J, Kircher M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 01 2019;47(D1):D886-D894. https://doi.org/10.1093/nar/gky1016 

21. Jagadeesh KA, Wenger AM, Berger MJ, et al. M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity. Nat Genet. 12 2016;48(12):1581-1586. https://doi.org/10.1038/ng.3703  

22. Adzhubei IA, Schmidt S, Peshkin L, et al. A method and server for predicting damaging missense mutations. Nat Methods. Apr 2010;7(4):248-9. https://doi.org/10.1038/nmeth0410-248

23. Mumtaz S, Nalbant G, Yıldız Bölükbaşı E, et al. Novel EDAR mutation in tooth agenesis and variable associated features. Eur J Med Genet. Sep 2020;63(9):103926. https://doi.org/10.1016/j.ejmg.2020.10392 

error: Bioinfocodes 2021 All Rights Reserved - Mehmet Çalıseki
Share This

Share

Share this post for the scientific community