1. Low-Latency Sliding-Window Aggregation in Worst-Case Constant Time
    (with Martin Hirzel and Scott Schneider)
    This paper presents DABA, a novel algorithm for aggregating FIFO sliding windows that requires only O(1) aggregation steps per operation in the worst case (not just on average). By supporting any monoidal operator, DABA asymptotically improves the performance of sliding-window aggregation without restricting the operator to be invertible. Our experimental results demonstrate that these theoretical improvements hold in practice. DABA is a substantial improvement over the state of the art in terms of both latency and throughput.
  2. Tutorial: Sliding-Window Aggregation Algorithms
    (with Martin Hirzel and Scott Schneider)
    We summarize advances in streaming aggregation on a sliding window.
  3. Streaming k-Means Clustering with Fast Queries
    (with Yu Zhang and Srikanta Tirthapura)
    We present methods for k-means clustering on a stream with a focus on providing fast responses to clustering queries. Our methods provide substantial improvement in the query time for cluster centers while retaining the desirable properties of provably small approximation error and low space usage. Our algorithms rely on a novel idea of "coreset caching" that systematically reuses coresets (summaries of data) computed for recent queries in answering the current clustering query. We present both theoretical analysis and detailed experiments demonstrating their correctness and efficiency.
  4. Parallel Shortest Paths Using Radius Stepping
    (with Guy E. Blelloch, Yan Gu, and Yihan Sun)
    We present an improved parallel algorithm for shortest paths on weighted and unweighted undirected graphs. It is based on Delta-stepping but picks a different "Delta" to step every round, providing provable guarantees on the work and depth. We also experimentally show that it performs better than theory on many real-world graphs.
  5. Work-Efficient Parallel Union-Find with Applications to Incremental Graph Connectivity
    (with Natcha Simsiri, Srikanta Tirthapura, and Kun-Lung Wu)
    We present data structures and algorithms for maintaining a union-find data structure when union and find operations are given and processed in bulk in parallel. We show that this requires no more work than processing each of them individually, representing an opportunity to scale well with the number of cores.
  6. Multicore Triangle Computations Without Tuning
    (with Julian Shun)
    We present multicore parallel algorithms for triangle counting, enumeration, and related variants that are cache friendly by designing cache-oblivious algorithms. We show experimentally that the algorithms are very fast and scale well to tens of cores.
  7. General Incremental Sliding-Window Aggregation
    PVLDB 8(7), 2015
    (with Martin Hirzel, Scott Schneider, and Kun-Lung Wu)
  8. Functional programming for dynamic and large data with self-adjusting computation
    (with Umut A. Acar and Yan Chen)
    We describe two techniques to reduce the memory requirement of self-adjusting programs. First, we show how to track dependencies in self-adjusting computation more precisely. Then, we provide a knob to control the granularity at which cells in modifiable lists is tracked. Together, this allows the user to tune the space consumption, trading between space and time.
  9. Parallel Streaming Frequency-based Aggregates
    (with Srikanta Tirthapura and Kun-Lung Wu)
    We describe a bulk-parallel model for streaming computation and present several algorithms related to frequency that are work-efficient compared to their sequential counterparts.
  10. Nearly-Linear Work Parallel SDD Solvers, Low-Diameter Decomposition, and Low-Stretch Subgraphs
    Theory of Computing Systems (TOCS), 2014
    (with Guy E. Blelloch, Anupam Gupta, Ioannis Koutis, Gary L. Miller, and Richard Peng)
    We present the design and analysis of a nearly-linear work parallel algorithm for solving symmetric diagonally dominant (SDD) linear systems. This builds on algorithms for finding low-diameter decomposition and finding low-stretch spanning trees/subgraphs.
  11. Parallel triangle counting in massive streaming graphs
    (with A. Pavan and Srikanta Tirthapura)
    We show how to parallelize the approximate triangle counting algorithm from our VLDB'13 paper in a cache-oblivious manner.
  12. Counting and Sampling Triangles from a Graph Stream
    PVLDB 6(14), 2013
    (with A. Pavan, Srikanta Tirthapura, and Kun-Lung Wu)
    We present a technique for sampling a triangle as well as estimating the triangle count in a straming graph. We further describe an implementation that yields nearly constant amortized time per edge arrival. We also show how the sampling and count estimation techniques extend to higher-order cliques.
  13. Parallel Probabilistic Tree Embeddings, k-Median, and Buy-at-Bulk Network Design
    (with Guy E. Blelloch and Anupam Gupta)
    We describe a parallel algorithm for computing an embedding of n-point metric space into a distribution of trees with O(log n) expected stretch. Using this, we give parallel algorithms for k-median and buy-at-bulk network design.
  14. Faster and Simpler Width-Independent Parallel Algorithms for Positive Semidefinite Programming
    (with Richard Peng)
    We give a simple parallel algorithm for approximately solving positive SDPs, improving upon the result of Jain and Yao (FOCS'11).
  15. Parallel and I/O Efficient Set Covering Algorithms
    (with Guy E. Blelloch and Harsha Simhadri)
    We design, analyze, and implement a parallel and I/O efficient algorithm for set cover, mimicking the greedy set cover algorithm. The idea also applies to max cover and min-sum set cover.
  16. Problem Based Benchmark Suite
    (with Julian Shun, Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Aapo Kyrola, and Harsha Simhadri)
    We present a set of benchmarks designed for comparing parallel algorithmic approaches, parallel programming language styles, and machine architectures across a broad set of problems. Each benchmark is defined concretely in terms of a problem specification and a set of input distributions.
  17. Efficient Parallel Approximation Algorithms
    PhD Thesis
  18. Near Linear-Work Parallel SDD Solvers, Low-Diameter Decomposition, and Low-Stretch Subgraphs
    (with Guy E. Blelloch, Anupam Gupta, Ioannis Koutis, Gary L. Miller, and Richard Peng)
    This paper presents a parallel algorithm for solving a class of linear systems known as symmetric diagonally dominant (SDD). On input an n-by-n matrix A with m nonzero entries, our algorithm can compute a solution vector in near linear-work and approximately m^{1/3} depth. In the process, we develop algorithms for low-diameter decomposition and low-stretch spanning trees and subgraphs. The solver, together with known reductions, gives rise to improved parallel algorithms for many graph problems (some of which were believed to be difficult to parallelize).
  19. Linear-Work Greedy Parallel Approximate Set Cover and Variants
    (with Guy E. Blelloch and Richard Peng)
    We present parallel greedy approximation algorithms for set cover and related problems. These algorithms build on an algorithm for solving a graph problem we formulate and study called Maximal Nearly Independent Set (MaNIS)---a graph abstraction of a key component in existing work on parallel set cover. We then derive a randomized algorithm for MaNIS that does linear work (in m the number of edges) with O(\log^2 m) depth. Using this, we obtain RNC algorithms for set cover, max cover, min-sum set cover, asymmetric k-center, and metric facility location.
  20. Hierarchical Diagonal Blocking and Precision Reduction Applied to Combinatorial Multigrid
    (with Guy E. Blelloch, Ioannis Koutis, and Gary L. Miller)
    We present a sparse-matrix representation, called Hierarchical Diagonal Blocking, which captures many of the existing optimization techniques for SpMV in a common representation. Using this together with precision-reduction techniques, we are able to reduce the bandwidth requirements and gain substantial speedups on multicore machines. As applications, we apply this to a linear combinatorial multigrid solver.
  21. Parallel Approximation Algorithms for Facility-Location Problems
    (with Guy E. Blelloch)
    We present the design and analysis of algorithms for facility-location problems, including NC and RNC, near work efficient (compared to sequential versions), low cache complexity algorithms for (metric) facility location, k-median, k-means, k-center. Our emphasis is on understanding how to parallelize different techniques in approximation algorithms.
  22. Traceable Data Types for Self-Adjusting Computation
    (with Umut A. Acar, Guy E. Blelloch, Ruy Ley-Wild, and Duru Turkoglu)
    We show how tracking dependencies at the granularity of data structuring operations can be done in self-adjusting computation, which previously tracked operations at the memory cell level. This work describes a formal semantics, data structure implementations, and extensions to the current library, and experimentally evaluates it on a broad range of benchmarks.
  23. Efficient Similarity Estimation for Systems Exploiting Data Redundancy
    (with David G. Andersen, Michael Kaminsky, and Himabindu Pucha)
    We present a technique for estimating how similar two more files are, using a compact representation ("handprinting") of each of the files. This work gives a theoretical analysis of the technique and using that as a guideline, puts forth a rule-of-thumb recommendation. This work also empirically evaluates the approach on large datasets.
  24. An Experimental Analysis of Self-Adjusting Computation
    ACM Transactions on Programming Languages and Systems (TOPLAS)
    (with Umut A. Acar, Guy E. Blelloch, Matthias Blume, and Robert Harper)
    This work describes the design and implementation of a framework for self-adjusting computation, combining and expanding on our PLDI'06 and ML'05 papers. The framework implemented as a Standard ML library is empirically evaluated on a reasonably large set of benchmarks.
  25. Simpler Analyses of Local Search Algorithms for Facility Location
    (with Anupam Gupta)
    We study local search algorithms for metric instances of facility location problems. We give a proof of the $k$-median result which avoids Arya et al.'s coupling argument. We also show that for the problem of opening $k$ facilities $F$ to minimize the objective function $\Phi_p(F) = \big(\sum_{j \in V} d(j, F)^p\big)^{1/p}$, the natural swap-based local-search algorithm is a $\Theta(p)$-approximation.
  26. Robust Kinetic Convex Hulls in 3D
    (with Umut A. Acar, Guy E. Blelloch, and Duru Turkoglu)
    We apply advances on self-adjusting computation to provide a robust motion simulation technique that combines kinetic event-based scheduling and the classic idea of fixed-time sampling. The idea is to divide time into a lattice of fixed-size intervals and process events at the resolution of an interval. We apply the approach to the problem of kinetic maintenance of convex hulls in 3D. The effectiveness of the proposal was experimentally evaluated.
  27. All-Norms and All-L_p-Norms Approximation Algorithms
    FSTTCS 2008
    (with Daniel Golovin, Anupam Gupta, and Amit Kumar)
    We examine approximation algorithms that simultaneously perform well on all norms, or on all L_p norms. We show that the greedy algorithm for set cover simultaneously O(p) approximates the L_p norm of the cover time vector for all values of p, and that no efficient algorithm can do better than O(p) for any fixed value of p. We also present algorithms and structural results for other problems such as k-facility-location, TSP, and average flow-time minimization.
  28. Kinetic 3D convex hulls via self-adjusting computation
    (with Umut A. Acar and Guy E. Blelloch)
    We illustrate a solution to kinetic 3D convex hulls using self-adjusting computation. First introduced by Basch, Guibas, and Hershberger, the kinetic approach to motion simulations requires maintaining a data structure along with a set of certificates: each certificate is a comparison and its failure time (the time at which the outcome of the comparison changes). To simulate motion, an event scheduler updates the certificates in chronological order of their failure times and invokes an update procedure that keeps the data structure consistent with the certificates.
  29. Kinetic Algorithms via Self-adjusting Computation
    (with Umut A. Acar, Guy E. Blelloch, and Jorge L. Vittes)
    Define a static algorithm as an algorithm that computes some combinatorial property of its input consisting of static, i.e., non-moving, objects. In this paper, we describe a technique for syntactically transforming static algorithms into kinetic algorithms, which compute properties of moving objects. The technique offers capabilities for composing kinetic algorithms, for integrating dynamic and kinetic changes, and for ensuring robustness even with fixed-precision floating-point arithmetic. To evaluate the effectiveness of the approach, we implement a library for performing the transformation, transform a number of algorithms, and give an experimental evaluation. The results show that the technique performs well in practice.
  30. An experimental analysis of self-adjusting computation
    (with Umut A. Acar, Guy E. Blelloch, and Matthias Blume)
    Dependence graphs and memoization can be used to efficiently update the output of a program as the input changes dynamically. Recent work has studied techniques for combining these approaches to effectively dynamize a wide range of applications. Toward this end various theoretical results were given. In this paper we describe the implementation of a library based on these ideas, and present experimental results on the efficiency of this library on a variety of applications. The results of the experiments indicate that the approach is effective in practice, often requiring orders of magnitude less time than recomputing the output from scratch. We believe this is the first experimental evidence that incremental computation of any type is effective in practice for a reasonably broad set of applications.
  31. A Library for Self-Adjusting Computation
    (with Umut A. Acar, Guy E. Blelloch, Matthias Blume, and Robert Harper)
    We present a Standard ML library for writing programs that automatically adjust to changes to their data. The library combines modifiable references and memoization to achieve efficient updates. We describe an implementation of the library and apply it to the problem of maintaining the convex hull of a dynamically changing set of points. Our experiments show that the overhead of the library is small, and that self-adjusting programs can adjust to small changes three-orders of magnitude faster than recomputing from scratch. The implementation relies on invariants that could be enforced by a modal type system. We show, using an existing language, abstract interfaces for modifiable references and for memoization that ensure the same safety properties without the use of modal types. The interface for memoization, however, does not scale well, suggesting a language-based approach to be preferable after all.