Fast hashing with strong concentration bounds

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

Standard

Fast hashing with strong concentration bounds. / Aamand, Anders; Knudsen, Jakob Bæk Tejs; Knudsen, Mathias Bæk Tejs; Rasmussen, Peter Michael Reichstein; Thorup, Mikkel.

STOC 2020 - Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing. red. / Konstantin Makarychev; Yury Makarychev; Madhur Tulsiani; Gautam Kamath; Julia Chuzhoy. Association for Computing Machinery, 2020. s. 1265-1278 (Proceedings of the Annual ACM Symposium on Theory of Computing).

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

Harvard

Aamand, A, Knudsen, JBT, Knudsen, MBT, Rasmussen, PMR & Thorup, M 2020, Fast hashing with strong concentration bounds. i K Makarychev, Y Makarychev, M Tulsiani, G Kamath & J Chuzhoy (red), STOC 2020 - Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing. Association for Computing Machinery, Proceedings of the Annual ACM Symposium on Theory of Computing, s. 1265-1278, 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, Chicago, USA, 22/06/2020. https://doi.org/10.1145/3357713.3384259

APA

Aamand, A., Knudsen, J. B. T., Knudsen, M. B. T., Rasmussen, P. M. R., & Thorup, M. (2020). Fast hashing with strong concentration bounds. I K. Makarychev, Y. Makarychev, M. Tulsiani, G. Kamath, & J. Chuzhoy (red.), STOC 2020 - Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing (s. 1265-1278). Association for Computing Machinery. Proceedings of the Annual ACM Symposium on Theory of Computing https://doi.org/10.1145/3357713.3384259

Vancouver

Aamand A, Knudsen JBT, Knudsen MBT, Rasmussen PMR, Thorup M. Fast hashing with strong concentration bounds. I Makarychev K, Makarychev Y, Tulsiani M, Kamath G, Chuzhoy J, red., STOC 2020 - Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing. Association for Computing Machinery. 2020. s. 1265-1278. (Proceedings of the Annual ACM Symposium on Theory of Computing). https://doi.org/10.1145/3357713.3384259

Author

Aamand, Anders ; Knudsen, Jakob Bæk Tejs ; Knudsen, Mathias Bæk Tejs ; Rasmussen, Peter Michael Reichstein ; Thorup, Mikkel. / Fast hashing with strong concentration bounds. STOC 2020 - Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing. red. / Konstantin Makarychev ; Yury Makarychev ; Madhur Tulsiani ; Gautam Kamath ; Julia Chuzhoy. Association for Computing Machinery, 2020. s. 1265-1278 (Proceedings of the Annual ACM Symposium on Theory of Computing).

Bibtex

@inproceedings{6fb53196f1d04037bad029d099578c83,
title = "Fast hashing with strong concentration bounds",
abstract = "Previous work on tabulation hashing by P{\c C}tra{\c s}cu and Thorup from STOC'11 on simple tabulation and from SODA'13 on twisted tabulation offered Chernoff-style concentration bounds on hash based sums, e.g., the number of balls/keys hashing to a given bin, but under some quite severe restrictions on the expected values of these sums. The basic idea in tabulation hashing is to view a key as consisting of c=O(1) characters, e.g., a 64-bit key as c=8 characters of 8-bits. The character domain ς should be small enough that character tables of size |ς| fit in fast cache. The schemes then use O(1) tables of this size, so the space of tabulation hashing is O(|ς|). However, the concentration bounds by P{\c C}tra{\c s}cu and Thorup only apply if the expected sums are g‰ |ς|. To see the problem, consider the very simple case where we use tabulation hashing to throw n balls into m bins and want to analyse the number of balls in a given bin. With their concentration bounds, we are fine if n=m, for then the expected value is 1. However, if m=2, as when tossing n unbiased coins, the expected value n/2 is ≫ |ς| for large data sets, e.g., data sets that do not fit in fast cache. To handle expectations that go beyond the limits of our small space, we need a much more advanced analysis of simple tabulation, plus a new tabulation technique that we call tabulation-permutation hashing which is at most twice as slow as simple tabulation. No other hashing scheme of comparable speed offers similar Chernoff-style concentration bounds.",
keywords = "Chernoff bounds, Concentration bounds, Hashing, Sampling, Streaming algorithms",
author = "Anders Aamand and Knudsen, {Jakob B{\ae}k Tejs} and Knudsen, {Mathias B{\ae}k Tejs} and Rasmussen, {Peter Michael Reichstein} and Mikkel Thorup",
year = "2020",
doi = "10.1145/3357713.3384259",
language = "English",
series = "Proceedings of the Annual ACM Symposium on Theory of Computing",
publisher = "Association for Computing Machinery",
pages = "1265--1278",
editor = "Konstantin Makarychev and Yury Makarychev and Madhur Tulsiani and Gautam Kamath and Julia Chuzhoy",
booktitle = "STOC 2020 - Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing",
note = "52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020 ; Conference date: 22-06-2020 Through 26-06-2020",

}

RIS

TY - GEN

T1 - Fast hashing with strong concentration bounds

AU - Aamand, Anders

AU - Knudsen, Jakob Bæk Tejs

AU - Knudsen, Mathias Bæk Tejs

AU - Rasmussen, Peter Michael Reichstein

AU - Thorup, Mikkel

PY - 2020

Y1 - 2020

N2 - Previous work on tabulation hashing by PÇtraşcu and Thorup from STOC'11 on simple tabulation and from SODA'13 on twisted tabulation offered Chernoff-style concentration bounds on hash based sums, e.g., the number of balls/keys hashing to a given bin, but under some quite severe restrictions on the expected values of these sums. The basic idea in tabulation hashing is to view a key as consisting of c=O(1) characters, e.g., a 64-bit key as c=8 characters of 8-bits. The character domain ς should be small enough that character tables of size |ς| fit in fast cache. The schemes then use O(1) tables of this size, so the space of tabulation hashing is O(|ς|). However, the concentration bounds by PÇtraşcu and Thorup only apply if the expected sums are g‰ |ς|. To see the problem, consider the very simple case where we use tabulation hashing to throw n balls into m bins and want to analyse the number of balls in a given bin. With their concentration bounds, we are fine if n=m, for then the expected value is 1. However, if m=2, as when tossing n unbiased coins, the expected value n/2 is ≫ |ς| for large data sets, e.g., data sets that do not fit in fast cache. To handle expectations that go beyond the limits of our small space, we need a much more advanced analysis of simple tabulation, plus a new tabulation technique that we call tabulation-permutation hashing which is at most twice as slow as simple tabulation. No other hashing scheme of comparable speed offers similar Chernoff-style concentration bounds.

AB - Previous work on tabulation hashing by PÇtraşcu and Thorup from STOC'11 on simple tabulation and from SODA'13 on twisted tabulation offered Chernoff-style concentration bounds on hash based sums, e.g., the number of balls/keys hashing to a given bin, but under some quite severe restrictions on the expected values of these sums. The basic idea in tabulation hashing is to view a key as consisting of c=O(1) characters, e.g., a 64-bit key as c=8 characters of 8-bits. The character domain ς should be small enough that character tables of size |ς| fit in fast cache. The schemes then use O(1) tables of this size, so the space of tabulation hashing is O(|ς|). However, the concentration bounds by PÇtraşcu and Thorup only apply if the expected sums are g‰ |ς|. To see the problem, consider the very simple case where we use tabulation hashing to throw n balls into m bins and want to analyse the number of balls in a given bin. With their concentration bounds, we are fine if n=m, for then the expected value is 1. However, if m=2, as when tossing n unbiased coins, the expected value n/2 is ≫ |ς| for large data sets, e.g., data sets that do not fit in fast cache. To handle expectations that go beyond the limits of our small space, we need a much more advanced analysis of simple tabulation, plus a new tabulation technique that we call tabulation-permutation hashing which is at most twice as slow as simple tabulation. No other hashing scheme of comparable speed offers similar Chernoff-style concentration bounds.

KW - Chernoff bounds

KW - Concentration bounds

KW - Hashing

KW - Sampling

KW - Streaming algorithms

UR - http://www.scopus.com/inward/record.url?scp=85086766216&partnerID=8YFLogxK

U2 - 10.1145/3357713.3384259

DO - 10.1145/3357713.3384259

M3 - Article in proceedings

AN - SCOPUS:85086766216

T3 - Proceedings of the Annual ACM Symposium on Theory of Computing

SP - 1265

EP - 1278

BT - STOC 2020 - Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing

A2 - Makarychev, Konstantin

A2 - Makarychev, Yury

A2 - Tulsiani, Madhur

A2 - Kamath, Gautam

A2 - Chuzhoy, Julia

PB - Association for Computing Machinery

T2 - 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020

Y2 - 22 June 2020 through 26 June 2020

ER -

ID: 258498058