JOM KITA KE POLITEKNIK

K-mer counting and curated libraries drive efficient annotation of repeats in plant genomes (Record no. 2318)

MARC details
042 ## - AUTHENTICATION CODE
Authentication code dc
100 10 - MAIN ENTRY--PERSONAL NAME
Personal name Contreras-Moreira, Bruno
Relator term author
245 00 - TITLE STATEMENT
Title K-mer counting and curated libraries drive efficient annotation of repeats in plant genomes
260 ## - PUBLICATION, DISTRIBUTION, ETC.
Date of publication, distribution, etc. 2021-11-01.
500 ## - GENERAL NOTE
General note /pmc/articles/PMC7614178/
500 ## - GENERAL NOTE
General note /pubmed/34562304
520 ## - SUMMARY, ETC.
Summary, etc. The annotation of repetitive sequences within plant genomes can help in the interpretation of observed phenotypes. Moreover, repeat masking is required for tasks such as whole-genome alignment, promoter analysis, or pangenome exploration. Although homology-based annotation methods are computationally expensive, k-mer strategies for masking are orders of magnitude faster. Here, we benchmarked a two-step approach, where repeats were first called by k-mer counting and then annotated by comparison to curated libraries. This hybrid protocol was tested on 20 plant genomes from Ensembl, with the k-mer-based Repeat Detector (Red) and two repeat libraries (REdat, last updated in 2013, and nrTEplants, curated for this work). Custom libraries produced by RepeatModeler were also tested. We obtained repeated genome fractions that matched those reported in the literature but with shorter repeated elements than those produced directly by sequence homology. Inspection of the masked regions that overlapped genes revealed no preference for specific protein domains. Most Red-masked sequences could be successfully classified by sequence similarity, with the complete protocol taking less than 2 h on a desktop Linux box. A guide to curating your own repeat libraries and the scripts for masking and annotating plant genomes can be obtained at https://github.com/Ensembl/plant-scripts.
540 ## - TERMS GOVERNING USE AND REPRODUCTION NOTE
Terms governing use and reproduction
540 ## - TERMS GOVERNING USE AND REPRODUCTION NOTE
Terms governing use and reproduction https://creativecommons.org/licenses/by/4.0/This work is licensed under a CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/) International license.
546 ## - LANGUAGE NOTE
Language note en
690 ## - LOCAL SUBJECT ADDED ENTRY--TOPICAL TERM (OCLC, RLIN)
Topical term or geographic name as entry element Article
655 7# - INDEX TERM--GENRE/FORM
Genre/form data or focus term Text
Source of term local
700 10 - ADDED ENTRY--PERSONAL NAME
Personal name Filippi, Carla V
Relator term author
9 (RLIN) 3003
700 10 - ADDED ENTRY--PERSONAL NAME
Personal name Naamati, Guy
Relator term author
700 10 - ADDED ENTRY--PERSONAL NAME
Personal name Girón, Carlos García
Relator term author
9 (RLIN) 3005
700 10 - ADDED ENTRY--PERSONAL NAME
Personal name Allen, James E
Relator term author
700 10 - ADDED ENTRY--PERSONAL NAME
Personal name Flicek, Paul
Relator term author
786 0# - DATA SOURCE ENTRY
Note Plant Genome
856 41 - ELECTRONIC LOCATION AND ACCESS
Uniform Resource Identifier <a href="http://dx.doi.org/10.1002/tpg2.20143">http://dx.doi.org/10.1002/tpg2.20143</a>
Public note Connect to this object online.

No items available.