UPS-indel: A better approach for finding indel redundancy
Layne T. Watson
Indel which represents the insertion and deletion of base pairs in the sequence of an organism is a very common form of genetic variation that takes place in the human genome. Being responsible for genetic diversity and human disease, indels have been considered as an important area in the genome research community. With progress in Next Generation Sequencing (NGS), a good number of indel calling tools have been developed and different databases store the results of different indel calling tools for future research. Different indels, though differing in allele sequence and position, can be biologically equivalent when they lead to the same altered sequences. Storing these biologically equivalent indels as distinct entries in databases causes data redundancy. Previous research showed that about 10% human indels stored in dbSNP are redundant due to lack of a unified system for identifying and representing equivalent indels. In this paper we describe UPS-indel, a utility tool that creates a universal positioning system for indels so that equivalent indels can be identified easily by a simple comparison of their coordinates generated by the proposed positioning system. Applying UPS-indel, we identify nearly 15% redundant indels in dbSNP (version 142) across all human chromosomes, higher than the previous report. UPS-indel is written in C++ and is freely available at http://bench.cs.vt.edu/ups-indel.
- Date of publication:
- October 13, 2016
- IEEE Computational Advances in Bio and Medical Sciences