If you know me personally, then you know how often I complain about the gap between theory and practice in bioinformatics. Recently, I came across a research area that seeks to address this gap, though not in bioinformatics specifically. It is called experimental algorithms, or algorithm engineering. Briefly, “Experimental Algorithmics studies algorithms and data structures by joining experimental studies with the traditional theoretical analyses” (Moret, 2000). I had heard about this before in passing, but I finally took the time to look up details about what this is. I wish I had done this years earlier.
There is a lot that has been written about it, so I don’t think I have anything to add for now. But, I decided to make a post that collects all the resources I found and a few questions I had. This is partially because it will help me organize my thoughts, but also might be a good starting point for someone who, like me, wants to learn more about it. Please note that this list is definitely not comprehensive or representative. Please leave a comment to add things that I missed or if you have any other thoughts.
What is algorithm engineering / experimental algorithms and what problems is it intended to address?
- Sanders, Peter. “Algorithm engineering–an attempt at a definition using sorting as an example.” ALENEX 2010.
- Moret, Bernard ME. “Towards a discipline of experimental algorithmics.” Data Structures, Near Neighbor Searches, and Methodology: Fifth and Sixth DIMACS Implementation Challenges 59 (2002): 197-213.
- On experimental algorithmics: an interview with Catherine McGeoch and Bernard Moret
- Chapter 1 from Müller-Hannemann, Matthias, and Stefan Schirra. Algorithm Engineering: Bridging the Gap Between Algorithm Theory and Practice. Springer, 2001.
What is the methodology for experimentally evaluating an algorithm?
- A whole book about it: Müller-Hannemann, Matthias, and Stefan Schirra. Algorithm Engineering: Bridging the Gap Between Algorithm Theory and Practice. Springer, 2001.
- Chapter 4.8 struck me as particularly interesting. It talked about how to try to extrapolate asymptotic behavior from experiments.
- Another book about it: A Guide to Experimental Algorithmics by Catherine C. McGeoch
- (this book is behind a paywall that even Penn State and scihub are not able to penetrate…but there is a PDF floating out there if you google it)
- A recent article: Angriman, Eugenio, et al. “Guidelines for experimental algorithmics: A case study in network analysis.” Algorithms 12.7 (2019): 127.
Some classes that teach algorithm engineering:
What conferences / journals publish work in experimental algorithms / algorithm engineering?
- SIAM Symposium on Algorithm Engineering and Experiments (ALENEX)
- European Symposium on Algorithms (ESA): Engineering and Application Track
- Symposium on Experimental Algorithms (SEA)
- ACM Journal of Experimental Algorithmics (JEA)
What about bioinformatics?
- Here is an example of an older paper that presents itself in the mold of algorithm engineering:
- Moret, Bernard ME, David A. Bader, and Tandy Warnow. “High-performance algorithm engineering for computational phylogenetics.” The Journal of Supercomputing 22.1 (2002): 99-111.
- Here are two recent k-mer data structure bioinformatics papers
- Limasset, Antoine, et al. “Fast and scalable minimal perfect hashing for massive key sets.” 16th International Symposium on Experimental Algorithms. Vol. 11. 2017.
- Zentgraf, Jens, Timm, Henning , and Rahmann, Sven. “Cost-optimal assignment of elements in genome-scale multi-way bucketed Cuckoo hash tables.” ALENEX 2020.
Questions that I am left with:
- What does a paper in algorithm engineering / experimental algorithms look like? How is it different from other papers?
- I found this question difficult to answer. I looked through the recent ALENEX proceedings to get an idea. Compared to SODA papers, the difference was clear — there was always an experimental analysis, often extensive. But beyond that? It seemed that many bioinformatics papers published in a journal like Oxford Bioinformatics or in a conference like WABI and RECOMB could have been ALENEX papers. At least, the papers without much biology. I’m thinking for example of my papers on the compaction of de Bruijn graphs or on other papers for data structures for k-mers. Is it the case that for a paper to belong to the algorithm engineering community it should study an algorithm for a problem of BROAD interest, i.e. one that is relevant to more than just one application domain like bioinformatics? I don’t know.
- What is the relationship of algorithm engineering / experimental algorithms to Data Science?
- It seems like there is an intersection, e.g. the McGeoch book has a chapter on “Data Analysis” that would today be part of a Data Science course. Both fields differentiate themselves from Statistics and Computer Science by focusing on empirical performance, rather than theoretical properties of the algorithm or the data.
- What is the relationship of algorithm engineering / experimental algorithms to Computational Science?
- What are the big open questions in the field of algorithm engineering / experimental algorithms?
- Bioinformaticians often get asked this question, and it’s not an easy one to answer for us. I’m not sure if it is even a fair question — should a field necessarily have major open questions? It’s not a natural science, and maybe a field that is engineering rather than science does not need to have such questions. Either way, are there such questions in algorithm engineering / experimental algorithms?