BIGDATA: F: DKA: DKM: Novel Out-of-core and Parallel Algorithms for Processing Biological Big Data

This is a project funded by NSF (1447711). Specific major goals of our project are: 1) To develop out-of-core algorithms for fundamental problems in Biological Big Data (BBD) processing such as sorting, graph problems, data structures, sequence analysis, and clustering; 2) To develop out-of-core algorithms for macro-problems including de novo genome and metagenome assembly, variant calling, and split-read mapping; 3) To implement the out-of-core and parallel algorithms and develop a software library.


Start Date: September 1, 2014


Sanguthevar Rajasekaran (University of Connecticut)
Reda Ammar (University of Connecticut)
Jinbo Bi (University of Connecticut)
Joerg Graf (University of Connecticut)
Sartaj Sahni (University of Florida)
George Weinstock (Jackson Laboratory)
Yufeng Wu (University of Connecticut)





  1. C. Chu, X. Li, and Y. Wu, SpliceJumper: a classification-based approach for calling splicing junctions from RNA-seq data, BMC Bioinformatics and Genomics Special Issues for ICCABS 2014, under review, 2015.
  2. M. Nicolae, S. Pathak and S. Rajasekaran, LFQC: A lossless compression algorithm for FASTQ files, Bioinformatics, 2015, doi: 10.1093/bioinformatics/btv384.
  3. S. Saha and S. Rajasekaran, NRRC: A Non-referential Reads Compression Algorithm, Proc. ISBRA, 2015, pp. 297-308.
  4. S. Saha and S. Rajasekaran, ERGC: An efficient referential genome compression algorithm, Bioinformatics, to appear, 2015.
  5. J. Sun, H.R. Kranzler, and J. Bi, Refining Multivariate Disease Phenotype for High Chip Heritability, Invited to submit to BMC Medical Genomics, 2015.
  6. J. Sun, J. Lu, T. Xu and J. Bi, Multi-view Sparse Co-clustering via Proximal Alternating Linearized Minimization, Journal of Machine Learning Research, special issue on International Conference on Machine Learning, pp. 757-766, Lille France 2015.
  7. T. Xu, J. Sun, and J. Bi, Longitudinal LASSO: Jointly Learning Features and Temporal Contingency for Outcome Prediction, Proceedings of ACM International Conference on Knowledge Discovery and Data Mining, Sydney, August 2015.