[Seminar] Indexing by Kernelization
Department of Computer Science, University of Helsinki
■ 호스트:박근수 교수 (880-1828,x1828)
Advances in DNA sequencing mean databases of thousands of human genomes will soon be commonplace. This talk will introduce a simple technique for reducing the size of conventional indexes on such highly repetitive texts. Given upper bounds on pattern lengths and edit distances, we preprocess the text with LZ77 to obtain a filtered text - or kernel - for which we store a conventional index. Later, given a query, we find all matches in the filtered text, then use their positions and the structure of the LZ77 parse to find all matches in the original text. Our experiments show this also significantly reduces both index size and query times.
Simon J. Puglisi is Academy of Finland Fellow in the Department of Computer Science at the University of Helsinki. Prior to that he held a Newton Fellowship at King's College London, and earlier still an Australian Postdoctoral Fellowship at Royal Melbourne Institute of Technology where he was a member of the Search Engine Group. He obtained his PhD in December 2007 from Curtin University, Western Australia. His current research focuses on efficient algorithms and data structures for searching and manipulating large strings, trees, and graphs, and applications in bioinformatics, information retrieval, and data mining. Frequent themes are the interplay between search and compression, and the boundary of theory and practice. He is the author of over 80 peer-reviewed articles, and served as PC chair of the 2015 Symposium on String Processing and Information Retrieval.