Last call #ASHG15. If bioinformatics and a better graph-based genome is your area of interest, you’ll want to meet Dr. Wan-Ping Lee of Seven Bridges Genomics at 3pm Friday 10/9 in Room 316. Here’s a brief introduction:
Current short-read mapping algorithms utilize species-specific genome reference sequences to align reads from an individual. Many reads fail to map or are incorrectly mapped because each new genome typically contains many genetic variations not captured by the reference sequence. Our graph solution extends accuracy to a broader range than other methods.
The growing volume of analysis means that novel variants are being discovered in greater numbers in many high-profile projects. But accounting for those novel variants when aligning new reads is vital to improve sensitivity. After all, most variants found in a single individual are shared throughout the entire species.
That’s why we are developing a whole-genome read mapper that can take into account known variations, in addition to the genome reference. Our approach is to construct a directed acyclic graph (DAG) representing the reference sequence and allelic alternates. The mapper works in two phases. In a first read localization step, it identifies regions where a read is likely to map in the DAG. In a second local alignment step, we align the read against the DAG, using a graph-aware extension of the Smith-Waterman optimal alignment algorithm.
The power of this new read mapper is evident in the detection of mobile element insertions (MEIs) in a human sample. When constructing a DAG using known MEI sites in YRI population in the 1000 Genomes Project, we are able to detect over 95% of such sites present in a simulated genome. Using mappings considering known MEIs, we are able to eliminate more than 95% of falsely called SNPs and INDELs at or near the MEI insertion sites in traditionally mapped sequence alignments.
If you’re not at #ASHG, please check back here for more details and slides.