Advanced perspectives of biocomputational gene discovery
A Review
DOI:
https://doi.org/10.59317/efnfpt15Keywords:
Gene prediction, Bioinformatics tools, In silco approachesAbstract
Gene discovery is one of the most significant processes in understanding and analysis of an organism’s genome after its sequencing. The method comprises determining the location of intron structures, exon structures, and open reading frames (ORFs). It computationally specifies all of the genes with near 100% reliability. It can substantially reduce the amount of wet lab experimentation. In eukaryotic genomes, mosaic organization occurs when a gene is divided into exons by intervening noncoding regions (called introns). Several methods are available for gene finding such as laboratory-based approaches, feature-based approaches homology-based approaches, and statistical and HMM-based approaches. In this paper, we aim to discuss in silco approaches for gene prediction with available bioinformatics tools for gene finding.
Downloads
References
Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K. and Walter, P. 2002. Helper T cells and lymphocyte activation. In Molecular biology of the cell. 4th edition. Garland Science.
Bashath, S., Perera, N., Tripathi, S., Manjang, K., Dehmer, M. and Streib, F.E. 2022. A data-centric review of deep transfer learning with applications to text data. Information Sciences 585: 498-528.
Bibi, R., Kang, H.Y., Kim, D., Jang, J., Kim, C., Kundu, G.K. and Kang, C.K. 2021. Biochemical Composition of Seston Reflecting the Physiological Status and Community Composition of Phytoplankton in a Temperate Coastal Embayment of Korea. Water 13: 3221.
Burge, C.B. and Karlin, S.1998. Finding the genes in genomic DNA. Current Opinion In Structural Biology 8: 346-354.
Chen, S., Webb, G.I., Liu, L. and Ma, X. 2020. A novel selective naïve Bayes algorithm. Knowledge-Based Systems 192: 105361.
Chen, Y., Zheng, W., Li, W. and Huang, Y. 2021. Large group activity security risk assessment and risk early warning based on random forest algorithm. Pattern Recognition Letters 144: 1-5.
Dehghani, A.A., Movahedi, N., Ghorbani, K. and Eslamian, S. 2023. Decision tree algorithms. In Handbook of Hydroinformatics, Elsevier 171-187.
Flicek, P. 2003. Twinscan: A Software Package for Homology-Based Gene Prediction.
Ghorbani, M. and Karimi, H. 2015. Bioinformatics approaches for gene finding. Int. J. Sci 1: 12-15.
Guhaniyogi, J. and Brewer, G. 2001. Regulation of mRNA stability in mammalian cells. Gene 265: 11-23.
Juven-Gershon, T., Hsu, J.Y., Theisen, J.W. and Kadonaga, J.T. 2008. The RNA polymerase II core promoter—the gateway to transcription. Current Opinion in Cell Biology 20: 253-259.
Kim, S. and Deka, G.C. 2021. Hardware accelerator systems for artificial intelligence and machine learning. Academic Press.
Korf, I., Flicek, P., Duan, D. and Brent, M.R. 2001. Integrating genomic homology into gene structure prediction. Bioinformatics Oxford 17: 140-148.
Kozak, M.1999. Initiation of translation in prokaryotes and eukaryotes. Gene 234: 187-208.
Kumar, M., Singhal, S., Shekhar, S., Sharma, B. and Srivastava, G. 2022. Optimized stacking ensemble learning model for breast cancer detection and classification using machine learning. Sustainability 14: 13998.
Le, D.H. 2020. Machine learning-based approaches for disease gene prediction. Briefings in Functional Genomics 19: 350-363.
Leung, C.K., Chen, Y., Hoi, C.S., Shang, S. and Cuzzocrea, A. 2020, December. Machine learning and OLAP on big COVID-19 data. In 2020 IEEE International Conference on Big Data (Big Data) 5118-5127.
Levin-Karp, A., Barenholz, U., Bareia, T., Dayagi, M., Zelcbuch, L., Antonovsky, N., Noor, E. and Milo, R. 2013. Quantifying translational coupling in E. coli synthetic operons using RBS modulation and fluorescent reporters. ACS Synthetic Biology 2: 327-336.
Lu, G. and Moriyama, E.N. 2004. Vector NTI, a balanced all-in-one sequence analysis suite. Briefings in Bioinformatics 5: 378-388.
McClure, W.R. 1985. Mechanism and control of transcription initiation in prokaryotes. Annual Review of Biochemistry 54: 171-204.
Milanovic, S., Marković, N., Pamucar, D., Gigovic, L., Kostic, P. and Milanovic, S.D. 2020. Forest fire probability mapping in eastern Serbia: Logistic regression versus random forest method. Forests 12: 5.
Min, B., Oh, H., Ryu, G., Choi, S.H., Leung, C.K. and Yoo, K. 2020. Image classification for agricultural products using transfer learning. BigDAS 48.
Palazzo, A.F. and Lee, E.S. 2015. Non-coding RNA: what is functional and what is junk?. Frontiers in Genetics 6: 127231.
Parra, G., Agarwal, P., Abril, J.F., Wiehe, T., Fickett, J.W. and Guigo, R. 2003. Comparative gene prediction in human and mouse. Genome Research 13: 108-117.
Pennacchio, L.A., Bickmore, W., Dean, A., Nobrega, M.A. and Bejerano,G. 2013. Enhancers: five essential questions. Nature Reviews Genetics 14: 288-295.
Picardi, E. and Pesole, G. 2010. Computational methods for ab initio and comparative gene finding. Data mining techniques for the life sciences 269-284.
Polyak, K. and Meyerson, M. 2003. Overview: gene structure. Holland-Frei Cancer Medicine. 6th edition.
Sarumi, O.A. and Leung, C.K. 2022. Adaptive machine learning algorithm and analytics of big genomic data for gene prediction. Tracking and preventing diseases with artificial intelligence 103-123.
Shafee, T. and Lowe, R. 2017. Eukaryotic and prokaryotic gene structure.Wiki Journal of Medicine 4: 1-5.
Shah, S.P., McVicker, G.P., Mackworth, A.K., Rogic, S. and Ouellette, B.F.2003. Gene Comber: combining outputs of gene prediction programs for improved results. Bioinformatics 19: 1296-1297
Solovyev, V.V., Salamov, A.A. and Lawrence, C.B. 1994. Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. Nucleic Acids Research 22: 5156-5163.
Souza, J., Leung, C.K. and Cuzzocrea, A. 2020. An innovative big data predictive analytics framework over hybrid big data sources with an application for disease analytics. In Advanced Information Networking and Applications: Proceedings of the 34th International Conference on Advanced Information Networking and Applications. Springer International Publishing 669-680.
Sui, X., He, S., Vilsen, S.B., Meng, J., Teodorescu, R. and Stroe, D.I. 2021. A review of non-probabilistic machine learning-based state of health estimation techniques for Lithium-ion battery. Applied Energy 300:117346.
Tewari S, Mukhopadhyay CS. 2023. In silico Mining of Protein-coding and Non-coding RNA (ncRNA) Specific Genes in Exotic versus Indigenous Gaddi Dogs. Current Biotechnology. 12(3): 190-202.
Tian, T. and Salis, H.M. 2015. A predictive biophysical model of translational coupling to coordinate and control protein expression in bacterial operons. Nucleic Acids Research 43: 7137-7151.
Uberbacher, E.C., Hyatt, D. and Shah, M. 2003. Grail EXP and Genome Analysis Pipeline for Genome Annotation. Current Protocols in Human Genetics 39: 6-5.
Wang, J., Neil, M. and Fenton, N. 2020. A Bayesian network approach for cybersecurity risk assessment implementing and extending the FAIR model. Computers & Security 89: 101659.
Wang, Z., Chen, Y. and Li, Y. 2004. A brief review of computational gene prediction methods. Genomics, Proteomics and Bioinformatics 2: 216. Werner, F. and Grohmann, D. 2011. Evolution of multisubunit RNA polymerases in the three domains of life. Nature Reviews Microbiology 9: 85-98.
Wiper-Bergeron, N. and Skerjanc, I.S. 2009. Transcription and the control of gene expression. Bioinformatics for Systems Biology 33-49.
Xiong, J. 2006. Essential bioinformatics. Cambridge University Press.
Yada, T., Takagi, T., Totoki, Y., Sakaki, Y. and Takaeda, Y. 2002. DIGIT: a novel gene finding program by combining gene-finders. In Biocomputing 375-387.
Yeh, R.F., Lim, L.P. and Burge, C.B. 2001. Computational inference of homologous gene structures in the human genome. Genome Research 11: 803-816.
Zhang, M.Q. 2003. Using MZEF to find internal coding exons. Current Protocols in Bioinformatics 1: 4-2.