Computing Triplet and Quartet Distances Between Trees

Talk by Gerth Stølting Brodal, Lector at Aarhus University

Abstract

The triplet and quartet distances are distance measures to compare two rooted and two unrooted trees, respectively. The leaves of the two trees should have the same set of n labels. The distances are defined by enumerating all subsets of three labels (triplets) and four labels (quartets), respectively, and counting how often the induced topologies in the two input trees are different.

We will present efficient algorithms for computing these distances, and how to compute the triplet distance in time O(n log n) and the quartet distance in time O(d n log n), where d is the maximal degree of any node in the two trees. Within the same time bounds, our framework also allows us to compute the parameterized triplet and quartet distances, where a parameter is introduced to weight resolved (binary) topologies against unresolved (non-binary) topologies.

The previous best algorithm for computing the triplet and parameterized triplet distances have O(n^2 ) running time, while the previous best algorithms for computing the quartet distance include an O(d^9 n log n) time algorithm and an O(n^2.688 ) time algorithm, where the latter can also compute the parameterized quartet distance. Since d< n, our algorithms improve on all these algorithms.

We will also present an experimental evaluation of the algorithms.  The typical overhead is about 2 KB and 10 KB per node in the input trees for triplet and quartet computations, respectively. This allows us to compute the distance measures for trees with up to millions of nodes. The limiting factor is the amount of memory available. With 31 GB of memory all our input instances can be solved within a few minutes.

Scientific host: Pawel Winter