Predicting Gene Models Based on Evolutionary Conservation

With advances in DNA sequencing, it is easier to generate genome assemblies for any species. Genome annotation has not advanced at the same rate and many gene models, genomic regions that are transcribed and translated, predicted in individual plant genomes may be erroneous.
While large numbers of unique genes for species would suggest that de novo gene birth occurs at a high rate following speciation, studies suggest this is not the case, and protein‐coding genes share sequence identity between species. Researchers applied the concept of evolutionary conservation to introduce a new method to support the accurate prediction of gene models. Through comparative genomic analysis, they produced a representative gene set of 15,345 gene models from 12 legume species and used it to identify potentially erroneous gene models in each species’ annotation. If a gene did not have a match with the conserved set, the researchers considered it to have low confidence and to require further validation.
Using this new method, representative gene sets can be developed for any species and applied to support more accurate gene model predictions.
Adapted from Tay Fernandez, C. G., Bayer, P. E., Petereit, J., Varshney, R., Batley, J., & Edwards, D. (2023). The conservation of gene models can support genome annotation. The Plant Genome, 16, e20377. https://doi.org/10.1002/tpg2.20377
Text © . The authors. CC BY-NC-ND 4.0. Except where otherwise noted, images are subject to copyright. Any reuse without express permission from the copyright owner is prohibited.