share
Stack OverflowWhat are the lesser known but useful data structures?
[+796] [83] f3lix
[2009-02-01 11:12:25]
[ language-agnostic data-structures computer-science ]
[ https://stackoverflow.com/questions/500607/what-are-the-lesser-known-but-useful-data-structures ]

There are some data structures around that are really useful but are unknown to most programmers. Which ones are they?

Everybody knows about linked lists, binary trees, and hashes, but what about Skip lists [1] and Bloom filters [2] for example. I would like to know more data structures that are not so common, but are worth knowing because they rely on great ideas and enrich a programmer's tool box.

PS: I am also interested in techniques like Dancing links [3] which make clever use of properties of a common data structure.

EDIT: Please try to include links to pages describing the data structures in more detail. Also, try to add a couple of words on why a data structure is cool (as Jonas Kölker [4] already pointed out). Also, try to provide one data-structure per answer. This will allow the better data structures to float to the top based on their votes alone.

[+271] [2009-02-01 11:24:12] David Phillips

Tries [1], also known as prefix-trees or crit-bit trees [2], have existed for over 40 years but are still relatively unknown. A very cool use of tries is described in " TRASH - A dynamic LC-trie and hash data structure [3]", which combines a trie with a hash function.

[1] http://en.wikipedia.org/wiki/Trie
[2] http://cr.yp.to/critbit.html
[3] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.96.2143

Tries are a good one for sure, only remember what they are as its was on my Data Structures and Algorithms exam. - Mark Davidson
(12) very commonly used by spell-checkers - Steven A. Lowe
Burst tries are also an interesting variant, where you use only a prefix of the strings as nodes and otherwise store lists of strings in the nodes. - Torsten Marek
The regex engine in Perl 5.10 automatically creates tries. - Brad Gilbert
In my experience tries are painfully expensive, given that a pointer is generally longer than a char, which is a shame. They're only suitable for certain data-sets. - Joe
We make use of tries in the project I work on. We use them for partitioning 2D space and then quickly determining which partition contains a given point. - Scottie T
@Joe: Those are problems with naive trie implementations. In practice, you usually have a much higher branching factor that a single char and you usually compress lines in the tree in order to store common sequences of chars (like syllables) in a single node. - Jon Harrop
Tries are used in the T9 cell-phone auto-complete, yes? - Paul Nathan
(18) Since no SO question, regardless of topic, is complete without someone mentioning jQuery.... John Resig, creator of jQuery, has an interesting data structure series of posts where he looks at various trie implementations among others: ejohn.org/blog/revised-javascript-dictionary-search - Oskar Austegard
(4) "very commonly used by spell-checkers" - it's at least funny to see my spell checker claim not to know tries. - dascandy
Are tries really relatively unknown? - Gravity
1
[+231] [2009-02-01 19:11:07] lacop

Bloom filter [1]: Bit array of m bits, initially all set to 0.

To add an item you run it through k hash functions that will give you k indices in the array which you then set to 1.

To check if an item is in the set, compute the k indices and check if they are all set to 1.

Of course, this gives some probability of false-positives (according to wikipedia it's about 0.61^(m/n) where n is the number of inserted items). False-negatives are not possible.

Removing an item is impossible, but you can implement counting bloom filter, represented by array of ints and increment/decrement.

[1] http://en.wikipedia.org/wiki/Bloom_filter

(20) You forget to mention their use with dictionaries :) You can squeeze a full dictionary into a bloom filter with about 512k, like a hashtable without the values - Chris S
(8) Google cites the use of Bloom filters in there implementation of BigTable. - Brian Gianforcaro
(4) So this is useful because it allows us to cheaply test for the existence of an element in a set? (I'm new to bloom filters.) - Petrus Theron
(16) @FreshCode It actually lets you cheaply test for the absence of an element in the set since you can get false positives but never false negatives - Tom Savage
(26) @FreshCode As @Tom Savage said, it's more useful when checking for negatives. For example, you can use it as a fast and small (in terms of memory usage) spell checker. Add all of the words to it and then try to look up words the user enters. If you get a negative it means it's misspelled. Then you can run some more expensive check to find closest matches and offer corrections. - lacop
(1) Google Chrome implements the Safe Browsing filter using a bloom filter. Isn't this case more appropriate for checking for positives? - Abhinav Sarkar
(5) @abhin4v: Bloom filters are often used when most requests are likely to return an answer of "no" (such as the case here), meaning that the small number of "yes" answers can be checked with a slower exact test. This still results in a big reduction in the average query response time. Don't know if Chrome's Safe Browsing does that, but that would be my guess. - j_random_hacker
Set union/intersection are pretty straightforward as well. (bitwise OR and AND respectively) - fulmicoton
I learnt about Bloom filters through this Prag Prog Code Kata. - Skilldrick
@ Skilldrick: thanks for sharing link - Bhanu Krishnan
good link explaining bloom filters igvita.com/2008/12/27/scalable-datasets-bloom-filters-in-rub‌​y - Pete Brumm
(1) Bloom filters are so cool that I sometimes want to explain them to non-programmers. You're a graffiti artist in a city where everyone uses black spraypaint on white walls. Your always paint your tag the same way. So does every other tagger. Everyone paints over existing tags on a wall. You need to decide whether you've already tagged this wall. You can rule out a wall that has white in areas your tag would have covered. But you can't guarantee you've tagged a wall even if every part of your tag is black. If you understand why this must be true, you understand Bloom filters. - sowbug
2
[+140] [2009-02-18 20:01:37] Patrick

Rope [1]: It's a string that allows for cheap prepends, substrings, middle insertions and appends. I've really only had use for it once, but no other structure would have sufficed. Regular strings and arrays prepends were just far too expensive for what we needed to do, and reversing everthing was out of the question.

[1] http://en.wikipedia.org/wiki/Rope_%28data_structure%29

I've had thoughts of something like this for my own uses. Nice to know it's already been implemented somewhere else. - Kibbee
(15) There's an implementation in the SGI STL (1998): sgi.com/tech/stl/Rope.html - quark
(2) Without knowing what is was called I recently wrote something very similar to this for Java - performance has been excellent: code.google.com/p/mikeralib/source/browse/trunk/Mikera/src/… - mikera
Rope is pretty rare: stackoverflow.com/questions/1863440/… - Will
There was a nice article about ropes on Good Math, Bad Math: scienceblogs.com/goodmath/2009/01/… - Kuroki Kaze
(6) Mikera's link is stale, here's the current. - aptwebapps
3
[+128] [2009-02-18 19:53:04] mmcdole

Skip lists [1] are pretty neat.

Wikipedia [2]
A skip list is a probabilistic data structure, based on multiple parallel, sorted linked lists, with efficiency comparable to a binary search tree (order log n average time for most operations).

They can be used as an alternative to balanced trees (using probalistic balancing rather than strict enforcement of balancing). They are easy to implement and faster than say, a red-black tree. I think they should be in every good programmers toolchest.

If you want to get an in-depth introduction to skip-lists here is a link to a video [3] of MIT's Introduction to Algorithms lecture on them.

Also, here [4] is a Java applet demonstrating Skip Lists visually.

[1] http://en.wikipedia.org/wiki/Skip_list
[2] http://en.wikipedia.org/wiki/Skip_list
[3] http://video.google.com/videoplay?docid=-6710586843601387849
[4] http://iamwww.unibe.ch/~wenger/DA/SkipList/

+1 Qt uses skip lists rather than RB-trees for its sorted maps & sets. Yep, they're nifty (in imperative languages, anyway). - Michael Ekstrand
(2) Redis uses skip lists to implement "Sorted Sets". - antirez
Skip lists are probably my favorite data structure to use when I need a good data structure and I have no guarantees as to the order of the data, and I want a simpler implementation than other "balanced" data structures. Such a good thing. - earino
Interesting side-note: If you add enough levels to your skip lists, you essentially end up with a B-tree. - Riyad Kalla
4
[+92] [2009-02-01 12:23:44] Yuval F

Spatial Indices [1], in particular R-trees [2] and KD-trees [3], store spatial data efficiently. They are good for geographical map coordinate data and VLSI place and route algorithms, and sometimes for nearest-neighbor search.

Bit Arrays [4] store individual bits compactly and allow fast bit operations.

[1] http://en.wikipedia.org/wiki/Spatial_index
[2] http://en.wikipedia.org/wiki/R-tree
[3] http://en.wikipedia.org/wiki/Kd-tree
[4] http://en.wikipedia.org/wiki/Bit_array

(6) Spatial indices are also useful for N-body simulations involving long-range forces like gravity. - Justin Peel
Ah if only I had found this sooner that would have been very useful. Excellent data structures and complementary algorithms in here. - Robert Massaioli
5
[+87] [2010-05-22 23:02:18] Don Stewart

Zippers [1] - derivatives of data structures that modify the structure to have a natural notion of 'cursor' -- current location. These are really useful as they guarantee indicies cannot be out of bound -- used, e.g. in the xmonad window manager [2] to track which window has focused.

Amazingly, you can derive them by applying techniques from calculus [3] to the type of the original data structure!

[1] http://www.haskell.org/haskellwiki/Zipper
[2] http://donsbot.wordpress.com/2007/05/17/roll-your-own-window-manager-tracking-focus-with-a-zipper/
[3] http://en.wikibooks.org/wiki/Haskell/Zippers#Mechanical_Differentiation

(2) this is only useful in functional programming (in imperative languages you just keep a pointer or an index). Also tbh I still don't get how Zippers really work. - Stefan Monov
(4) @Stefan the point is that you don't need to keep a separate index or pointer now. - Don Stewart
6
[+69] [2009-02-01 12:12:30] Jonas Kölker

Here are a few:

  • Suffix tries. Useful for almost all kinds of string searching (http://en.wikipedia.org/wiki/Suffix_trie#Functionality). See also suffix arrays; they're not quite as fast as suffix trees, but a whole lot smaller.

  • Splay trees (as mentioned above). The reason they are cool is threefold:

    • They are small: you only need the left and right pointers like you do in any binary tree (no node-color or size information needs to be stored)
    • They are (comparatively) very easy to implement
    • They offer optimal amortized complexity for a whole host of "measurement criteria" (log n lookup time being the one everybody knows). See http://en.wikipedia.org/wiki/Splay_tree#Performance_theorems
  • Heap-ordered search trees: you store a bunch of (key, prio) pairs in a tree, such that it's a search tree with respect to the keys, and heap-ordered with respect to the priorities. One can show that such a tree has a unique shape (and it's not always fully packed up-and-to-the-left). With random priorities, it gives you expected O(log n) search time, IIRC.

  • A niche one is adjacency lists for undirected planar graphs with O(1) neighbour queries. This is not so much a data structure as a particular way to organize an existing data structure. Here's how you do it: every planar graph has a node with degree at most 6. Pick such a node, put its neighbors in its neighbor list, remove it from the graph, and recurse until the graph is empty. When given a pair (u, v), look for u in v's neighbor list and for v in u's neighbor list. Both have size at most 6, so this is O(1).

By the above algorithm, if u and v are neighbors, you won't have both u in v's list and v in u's list. If you need this, just add each node's missing neighbors to that node's neighbor list, but store how much of the neighbor list you need to look through for fast lookup.


(79) You couldn't just list one per answer, could ya? Makes it easier for the single, good data structures to float to the top. - KingNestor
(6) I could, but I'm still not /quite/ getting the hang of this weird wiki/forum hybrid. I'm not gonna edit right now (eat-sleep-rinse-repeat), and I'm probably gonna forget to do it later ;) - Jonas Kölker
The Heap ordered search tree is called a treap. One trick you can do with these is change the priority of a node to push it to the bottom of the tree where its easier to delete. - paperhorse
(1) "The Heap ordered search tree is called a treap." -- In the definition I've heard, IIRC, a treap is a heap-ordered search tree with random priorities. You could choose other priorities, depending on the application... - Jonas Kölker
(2) A suffix trie is almost but not quite the same as the much cooler suffix tree, which has strings and not individual letters on its edges and can be built in linear time(!). Also despite being asymptotically slower, in practice suffix arrays are often much faster than suffix trees for many tasks because of their smaller size and fewer pointer indirections. Love the O(1) planar graph lookup BTW! - j_random_hacker
@j_random_hacker: suffix arrays are not asymptotically slower. Here is ~50 lines of code for linear suffix array construction: cs.helsinki.fi/u/tpkarkka/publications/icalp03.pdf - Edward KMETT
(1) @Edward Kmett: I have in fact read that paper, it was quite a breakthrough in suffix array construction. (Although it was already known that linear time construction was possible by going "via" a suffix tree, this was the 1st undeniably practical "direct" algorithm.) But some operations outside of construction are still asymptotically slower on a suffix array unless a LCA table is also built. That can also be done in O(n), but you lose the size and locality benefits of the pure suffix array by doing so. - j_random_hacker
@j_random_hacker: Fair enough. Out of context it wasn't clear to me which asymptotics you were referring to. - Edward KMETT
7
[+65] [2009-12-14 23:16:09] zebrabox

I think lock-free alternatives to standard data structures i.e lock-free queue, stack and list are much overlooked.
They are increasingly relevant as concurrency becomes a higher priority and are much more admirable goal than using Mutexes or locks to handle concurrent read/writes.

Here's some links
http://www.cl.cam.ac.uk/research/srg/netos/lock-free/
http://www.research.ibm.com/people/m/michael/podc-1996.pdf [Links to PDF]
http://www.boyet.com/Articles/LockfreeStack.html

Mike Acton's [1] (often provocative) blog has some excellent articles on lock-free design and approaches

[1] http://cellperformance.beyond3d.com/articles/index.html

Lock-free alternatives are so important in todays multi-core, very parallel, scalability addicted world :-) - earino
Well, a disruptor does actually a better job in most cases. - deadalnix
@deadalnix Yeah, just been reading about disruptors. Interesting stuff! - zebrabox
8
[+55] [2009-02-18 20:17:50] Dana

I think Disjoint Set [1] is pretty nifty for cases when you need to divide a bunch of items into distinct sets and query membership. Good implementation of the Union and Find operations result in amortized costs that are effectively constant (inverse of Ackermnan's Function, if I recall my data structures class correctly).

[1] http://en.wikipedia.org/wiki/Disjoint-set_data_structure

(8) This is also called the "union-find data structure." I was in awe when I first learned about this clever data structure in algorithms class... - BlueRaja - Danny Pflughoeft
I wouldn't really call it 'lesser known', but +1 for mentioning it. - MAK
union-find-delete extensions allow a constant-time delete as well. - Peaker
(4) I used a Disjoint Set for my Dungeon generator, to ensure all rooms are reachable by passages :) - goldenratio
9
[+52] [2009-06-17 21:38:51] Adam Rosenfield

Fibonacci heaps [1]

They're used in some of the fastest known algorithms (asymptotically) for a lot of graph-related problems, such as the Shortest Path problem. Dijkstra's algorithm runs in O(E log V) time with standard binary heaps; using Fibonacci heaps improves that to O(E + V log V), which is a huge speedup for dense graphs. Unfortunately, though, they have a high constant factor, often making them impractical in practice.

[1] http://en.wikipedia.org/wiki/Fibonacci_heap

High constant factor as you said, and hard to implement well according to a friend who had to. Fianally not that cool, but still, maybe worth knowing. - p4bl0
These guys here made them run competetive in comparison to other heap kinds: cphstl.dk/Presentation/SEA2010/SEA-10.pdf There is a related data structure called Pairing Heaps that's easier to implement and that offers pretty good practical performance. However, the theoretical analysis is partially open. - Manuel
From my experience with Fibonacci heaps, I found out that costly operation of memory allocations makes it less efficient than a simple binary heap backended by an array. - jutky
10
[+44] [2009-02-18 20:26:32] spoulson

Anyone with experience in 3D rendering should be familiar with BSP trees [1]. Generally, it's the method by structuring a 3D scene to be manageable for rendering knowing the camera coordinates and bearing.

Binary space partitioning (BSP) is a method for recursively subdividing a space into convex sets by hyperplanes. This subdivision gives rise to a representation of the scene by means of a tree data structure known as a BSP tree.

In other words, it is a method of breaking up intricately shaped polygons into convex sets, or smaller polygons consisting entirely of non-reflex angles (angles smaller than 180°). For a more general description of space partitioning, see space partitioning.

Originally, this approach was proposed in 3D computer graphics to increase the rendering efficiency. Some other applications include performing geometrical operations with shapes (constructive solid geometry) in CAD, collision detection in robotics and 3D computer games, and other computer applications that involve handling of complex spatial scenes.

[1] http://en.wikipedia.org/wiki/Binary_space_partitioning

(2) John Carmac FTW - BlueRaja - Danny Pflughoeft
... and the related octrees and kd-trees. - Lloeki
11
[+43] [2009-02-22 04:47:47] Lurker Indeed

Huffman trees [1] - used for compression.

[1] http://en.wikipedia.org/wiki/Huffman_coding

Although it is interesting, isn't this sort of an 'Intro to Algorithms', here-is-an-example-of-a-greedy-algo type topic? - rshepherd
12
[+38] [2010-01-29 11:54:18] huitseeker

Have a look at Finger Trees [1], especially if you're a fan of the previously mentioned [2] purely functional data structures. They're a functional representation of persistent sequences supporting access to the ends in amortized constant time, and concatenation and splitting in time logarithmic in the size of the smaller piece.

As per the original article [3]:

Our functional 2-3 finger trees are an instance of a general design technique in- troduced by Okasaki (1998), called implicit recursive slowdown. We have already noted that these trees are an extension of his implicit deque structure, replacing pairs with 2-3 nodes to provide the flexibility required for efficient concatenation and splitting.

A Finger Tree can be parameterized with a monoid [4], and using different monoids will result in different behaviors for the tree. This lets Finger Trees simulate other data structures.

[1] http://en.wikipedia.org/wiki/Finger_trees
[2] https://stackoverflow.com/questions/500607/what-are-the-lesser-known-but-cool-data-structures/500633#500633
[3] http://www.soi.city.ac.uk/~ross/papers/FingerTree.html
[4] http://en.wikipedia.org/wiki/Monoid

Have a look at this duplicate answer, it's well worth reading ! - huitseeker
13
[+34] [2009-03-17 18:30:42] cdonner

Circular or ring buffer [1] - used for streaming, among other things.

[1] http://en.wikipedia.org/wiki/Circular_buffer

(6) This is most common data structure for buffering data. I think that this one is first one introduced in our class. - Luka Rahne
(4) Also, disgustingly, somehow managed to be patented (at least when used for video). ip.com/patent/USRE36801 - David Eison
Based on reading the link, I don't think the data structure itself is patented, but some invention based on it. I agree that this is definitely a very under-used data structure. - Gravity
14
[+33] [2010-01-28 20:03:29] BlueRaja - Danny Pflughoeft

I'm surprised no one has mentioned Merkle trees (ie. Hash Trees [1]).

Used in many cases (P2P programs, digital signatures) where you want to verify the hash of a whole file when you only have part of the file available to you.

[1] http://en.wikipedia.org/wiki/Hash_tree

I thought of 'hash trees' when I first learned about P2P. I didn't know they had such an obvious name. :) - Mateen Ulhaq
15
[+32] [2009-02-01 13:27:26] Jonas Kölker

<zvrba> Van Emde-Boas trees

I think it'd be useful to know why they're cool. In general, the question "why" is the most important to ask ;)

My answer is that they give you O(log log n) dictionaries with {1..n} keys, independent of how many of the keys are in use. Just like repeated halving gives you O(log n), repeated sqrting gives you O(log log n), which is what happens in the vEB tree.


(3) I fully agree that "why" is important, but that was not included in the question ;) - zvrba
They are nice from a theoretical point of view. In practice, however, it's quite tough to get competetive performance out of them. The paper I know got them to work well up to 32 bit keys (citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.7403) but the approach will not scale to more than maybe 34-35 bits or so and there is no implementation of that. - Manuel
Another reason why they are cool is that they are a key building block for a number of cache-oblivious algorithms. - Edward KMETT
16
[+31] [2009-02-01 11:27:36] starblue

How about splay trees [1]?

Also, Chris Okasaki's purely functional data structures [2] come to mind.

[1] http://en.wikipedia.org/wiki/Splay_tree
[2] http://okasaki.blogspot.com/2008/02/ten-years-of-purely-functional-data.html

17
[+29] [2009-05-10 21:56:05] A. Levy

An interesting variant of the hash table is called Cuckoo Hashing [1]. It uses multiple hash functions instead of just 1 in order to deal with hash collisions. Collisions are resolved by removing the old object from the location specified by the primary hash, and moving it to a location specified by an alternate hash function. Cuckoo Hashing allows for more efficient use of memory space because you can increase your load factor up to 91% with only 3 hash functions and still have good access time.

[1] http://en.wikipedia.org/wiki/Cuckoo_hashing

(5) Check hopscotch hashing claimed to be faster. - chmike
18
[+27] [2011-01-15 12:18:07] marcog

A min-max heap [1] is a variation of a heap [2] that implements a double-ended priority queue. It achieves this by by a simple change to the heap property: A tree is said to be min-max ordered if every element on even (odd) levels are less (greater) than all childrens and grand children. The levels are numbered starting from 1.

http://internet512.chonbuk.ac.kr/datastructure/heap/img/heap8.jpg

[1] http://cg.scs.carleton.ca/~morin/teaching/5408/refs/minmax.pdf
[2] http://en.wikipedia.org/wiki/Heap_%28data_structure%29

Tricky to implement. Even the best programmers can get it wrong. - finnw
(2) min-max heap is beautiful! Used in AI as well :) - Ricko M
19
[+26] [2011-03-24 22:20:26] btilly

I like Cache Oblivious datastructures [1]. The basic idea is to lay out a tree in recursively smaller blocks so that caches of many different sizes will take advantage of blocks that convenient fit in them. This leads to efficient use of caching at everything from L1 cache in RAM to big chunks of data read off of the disk without needing to know the specifics of the sizes of any of those caching layers.

[1] http://blogs.msdn.com/b/devdev/archive/2007/06/12/cache-oblivious-data-structures.aspx

(1) +1 Of course!!! - Jon Harrop
+1 – imho, currently underappreciated. - klickverbot
Interesting transcription from that link: "The key is the van Emde Boas layout, named after the van Emde Boas tree data structure conceived in 1977 by Peter van Emde Boas" - sergiol
20
[+23] [2010-05-23 17:21:19] Lucas

Left Leaning Red-Black Trees [1]. A significantly simplified implementation of red-black trees by Robert Sedgewick published in 2008 (~half the lines of code to implement). If you've ever had trouble wrapping your head around the implementation of a Red-Black tree, read about this variant.

Very similar (if not identical) to Andersson Trees.

[1] http://www.cs.princeton.edu/~rs/talks/LLRB/LLRB.pdf

21
[+22] [2010-09-19 17:54:55] Marko Tintor

Work Stealing Queue

Lock-free data structure for dividing the work equaly among multiple threads Implementation of a work stealing queue in C/C++? [1]

[1] https://stackoverflow.com/questions/2101789/implementation-of-a-work-stealing-queue-in-c-c

22
[+19] [2010-07-23 06:15:19] Edward KMETT

Bootstrapped skew-binomial heaps [1] by Gerth Stølting Brodal and Chris Okasaki:

Despite their long name, they provide asymptotically optimal heap operations, even in a function setting.

  • O(1) size, union, insert, minimum
  • O(log n) deleteMin

Note that union takes O(1) rather than O(log n) time unlike the more well-known heaps that are commonly covered in data structure textbooks, such as leftist heaps [2]. And unlike Fibonacci heaps [3], those asymptotics are worst-case, rather than amortized, even if used persistently!

There are multiple [4] implementations [5] in Haskell.

They were jointly derived by Brodal and Okasaki, after Brodal came up with an imperative heap [6] with the same asymptotics.

[1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.973
[2] http://hackage.haskell.org/packages/archive/heap/0.5.0/doc/html/Data-Heap.html
[3] http://en.wikipedia.org/wiki/Fibonacci_heap
[4] http://hackage.haskell.org/packages/archive/heaps/0.2/doc/html/Data-Heap.html
[5] http://hackage.haskell.org/package/meldable-heap
[6] http://www.brics.dk/~gerth/pub/wads95.html

23
[+18] [2009-02-01 13:29:08] Jasper Bekkers
  • Kd-Trees [1], spatial data structure used (amongst others) in Real-Time Raytracing, has the downside that triangles that cross intersect the different spaces need to be clipped. Generally BVH's are faster because they are more lightweight.
  • MX-CIF Quadtrees [2], store bounding boxes instead of arbitrary point sets by combining a regular quadtree with a binary tree on the edges of the quads.
  • HAMT [3], hierarchical hash map with access times that generally exceed O(1) hash-maps due to the constants involved.
  • Inverted Index [4], quite well known in the search-engine circles, because it's used for fast retrieval of documents associated with different search-terms.

Most, if not all, of these are documented on the NIST Dictionary of Algorithms and Data Structures [5]

[1] http://en.wikipedia.org/wiki/Kd-tree
[2] http://donar.umiacs.umd.edu/quadtree/rectangles/cifquad.html
[3] http://en.wikipedia.org/wiki/Hash_array_mapped_trie
[4] http://en.wikipedia.org/wiki/Inverted_index
[5] http://www.nist.gov/dads/

Added links to the datastructures for you. - mmcdole
What happened to "one data structure per answer"? - Marius Gedminas
24
[+18] [2010-07-23 00:04:15] community_owned

Ball Trees. Just because they make people giggle.

A ball tree is a data structure that indexes points in a metric space. Here's an article on building them. [1] They are often used for finding nearest neighbors to a point or accelerating k-means.

[1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.91.8209&rep=rep1&type=pdf

Think you could give a link too? - j_random_hacker
(1) There you go. They are a little obscure, I grant you. - community_owned
Thanks! (pad, pad, pad) - j_random_hacker
These are also commonly known as "vantage point" trees or vp-trees. en.wikipedia.org/wiki/Vp-tree - Edward KMETT
25
[+17] [2010-05-23 21:52:47] kerkeslager

Not really a data structure; more of a way to optimize dynamically allocated arrays, but the gap buffers [1] used in Emacs are kind of cool.

[1] http://en.wikipedia.org/wiki/Gap_buffer

(1) I would definitely consider that to be a data structure. - Christopher Barber
For anyone interested, this is exactly how the Document (e.g. PlainDocument) models backing the Swing text components are implemented as well; before 1.2 I believe the document models were straight Arrays, which lead to horrible insertion performance for large documents; as soon as they moved to Gap Buffers, all was right with the world again. - Riyad Kalla
26
[+16] [2010-07-23 03:45:43] eordano

Fenwick Tree. It's a data structure to keep count of the sum of all elements in a vector, between two given subindexes i and j. The trivial solution, precalculating the sum since the begining doesn't allow to update a item (you have to do O(n) work to keep up).

Fenwick Trees allow you to update and query in O(log n), and how it works is really cool and simple. It's really well explained in Fenwick's original paper, freely available here:

http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol24/issue3/spe884.pdf

Its father, the RQM tree is also very cool: It allows you to keep info about the minimum element between two indexes of the vector, and it also works in O(log n) update and query. I like to teach first the RQM and then the Fenwick Tree.


I'm afraid this is a duplicate. Perhaps you'd want to add to the previous answer ? - huitseeker
Also related are Segment Trees, which are useful for doing all sorts of range queries. - dhruvbird
@eordano: dead link ... - sergiol
27
[+14] [2009-02-01 13:06:37] zvrba

Van Emde-Boas trees [1]. I have even a C++ implementation [2] of it, for up to 2^20 integers.

[1] http://en.wikipedia.org/wiki/Van_Emde_Boas_tree
[2] http://zvrba.net/software/algorithm_toolbox.html

(1) duplicate answer - huitseeker
28
[+13] [2010-05-22 23:56:40] esad

Nested sets [1] are nice for representing trees in the relational databases and running queries on them. For instance, ActiveRecord (Ruby on Rails' default ORM) comes with a very simple nested set plugin [2], which makes working with trees trivial.

[1] http://en.wikipedia.org/wiki/Nested_set_model
[2] http://api.rubyonrails.org/classes/ActiveRecord/Acts/NestedSet/ClassMethods.html

+1 Yep, nested sets rule ;-) - alexander.biskop
but are they lesser known? - naugtur
29
[+12] [2010-05-23 07:31:25] mpen

It's pretty domain-specific, but half-edge data structure [1] is pretty neat. It provides a way to iterate over polygon meshes (faces and edges) which is very useful in computer graphics and computational geometry.

[1] http://www.flipcode.com/archives/The_Half-Edge_Data_Structure.shtml

30
[+12] [2010-05-24 14:07:11] user20493

Scapegoat trees. A classic problem with plain binary trees is that they become unbalanced (e.g. when keys are inserted in ascending order.)

Balanced binary trees (AKA AVL trees) waste a lot of time balancing after each insertion.

Red-Black trees stay balanced, but require a extra bit of storage for each node.

Scapegoat trees stay balanced like red-black trees, but don't require ANY additional storage. They do this by analyzing the tree after each insertion, and making minor adjustments. See http://en.wikipedia.org/wiki/Scapegoat_tree.


31
[+12] [2011-01-15 12:12:08] marcog

An unrolled linked list [1] is a variation on the linked list which stores multiple elements in each node. It can drastically increase cache performance, while decreasing the memory overhead associated with storing list metadata such as references. It is related to the B-tree.

record node {
    node next       // reference to next node in list
    int numElements // number of elements in this node, up to maxElements
    array elements  // an array of numElements elements, with space allocated for maxElements elements
}
[1] http://en.wikipedia.org/wiki/Unrolled_linked_list

32
[+11] [2010-07-23 05:57:29] Edward KMETT

2-3 Finger Trees [1] by Hinze and Paterson [2] are a great functional data structure swiss-army knife with great asymptotics for a wide range of operations. While complex, they are much simpler than the imperative structures by Persistent lists with catenation via recursive slow-down [3] by Kaplan and Tarjan that preceded them.

They work as a catenable deque with O(1) access to either end, O(log min(n,m)) append, and provide O(log min(n,length - n)) indexing with direct access to a monoidal prefix sum over any portion of the sequence.

Implementations exist in Haskell [4], Coq [5], F# [6], Scala [7], Java [8], C [9], Clojure [10], C# [11] and other languages.

You can use them to implement priority search queues [12], interval maps [13], ropes with fast head access [14], maps, sets, catenable sequences [15] or pretty much any structure where you can phrase it as collecting a monoidal [16] result over a quickly catenable/indexable sequence.

I also have some slides [17] describing their derivation and use.

[1] http://en.wikipedia.org/wiki/Finger_tree
[2] http://www.soi.city.ac.uk/~ross/papers/FingerTree.html
[3] http://portal.acm.org/citation.cfm?id=225058.225090
[4] http://hackage.haskell.org/package/fingertree-0.0.1.0
[5] http://coq.inria.fr/distrib/v8.2/contribs-20090619/FingerTrees.toc.html
[6] http://v2matveev.blogspot.com/2010/03/data-structures-finger-tree-part-1.html
[7] http://scala.sygneca.com/code/finger-trees
[8] http://functionaljava.googlecode.com/svn/artifacts/3.0/javadoc/fj/data/fingertrees/MakeTree.html
[9] http://swt.informatik.uni-freiburg.de/data/ls_swt/theses/developing-and-verifying-of-fingertrees
[10] http://github.com/Chouser/finger-tree/blob/master/finger_tree.clj
[11] http://blogs.msdn.com/b/ericlippert/archive/2008/02/12/immutability-in-c-part-eleven-a-working-double-ended-queue.aspx
[12] http://hackage.haskell.org/package/fingertree-psqueue
[13] http://hackage.haskell.org/packages/archive/fingertree/0.0.1.0/doc/html/Data-IntervalMap-FingerTree.html
[14] http://hackage.haskell.org/packages/archive/rope/0.6.1/doc/html/Data-Rope.html
[15] http://hackage.haskell.org/packages/archive/containers/0.2.0.1/doc/html/Data-Sequence.html
[16] http://haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/Data-Monoid.html
[17] http://comonad.com/reader/wp-content/uploads/2010/04/Finger-Trees.pdf

while you add valuable info, your answer would perhaps have been better formulated by editing the one it is duplicating - huitseeker
I didn't spot yours until after I'd written mine, and the prospect of merging 20-odd references together and dealing with whatever feelings I'd bruise by dropping the reference to markcc's incorrect original summary article in someone else's post were more than I'd cared to fiddle with. Feel free to fold it in and I'll delete this one though. - Edward KMETT
33
[+10] [2009-02-01 14:03:28] Marko Tintor

Pairing heaps [1] are a type of heap data structure with relatively simple implementation and excellent practical amortized performance.

[1] http://en.wikipedia.org/wiki/Pairing_heap

(3) The source code of the book "Data Structures and Algorithm Analysis in Java/C++" seems to include implementations of Pairing-Heaps users.cs.fiu.edu/~weiss/dsaa_c++3/code users.cs.fiu.edu/~weiss/dsaajava2/code - f3lix
I have written an implementation of pairing heap based on Weiss's book. But i don't use the extra array in extract-min as Weiss did. So it looks more neat and easy to understand. It is on my blog (the text is in Chinese, but the code is pure English) if anyone is interested in it: blog.csdn.net/ljsspace/article/details/6751900 - jscoot
34
[+10] [2010-01-28 20:21:54] MAK

One lesser known, but pretty nifty data structure is the Fenwick Tree [1] (also sometimes called a Binary Indexed Tree [2] or BIT). It stores cumulative sums and supports O(log(n)) operations. Although cumulative sums might not sound very exciting, it can be adapted to solve many problems requiring a sorted/log(n) data structure.

IMO, its main selling point is the ease with which can be implemented [3]. Very useful in solving algorithmic problems that would involve coding a red-black/avl tree otherwise.

[1] http://en.wikipedia.org/wiki/Fenwick_tree
[2] http://www.topcoder.com/tc?module=Static&d1=tutorials&d2=binaryIndexedTrees
[3] http://www.algorithmist.com/index.php/Fenwick_tree

(2) +1, can't believe so few people know of this structure. Its extremely easy to implement, its pretty much a magical Unicorn that can solve any problem. - Martín Fixman
35
[+10] [2011-03-24 16:02:54] Jonathan

I really really love Interval Trees [1]. They allow you to take a bunch of intervals (ie start/end times, or whatever) and query for which intervals contain a given time, or which intervals were "active" during a given period. Querying can be done in O(log n) and pre-processing is O(n log n).

[1] http://en.wikipedia.org/wiki/Interval_tree

(1) Cool. This ought to be standard in all RDBMS. Know of any that supports it? - Esben Skov Pedersen
I'm afraid I don't! I had to implement my own in Python. I understand they're widely used in bioinformatics for gene finding also. - Jonathan
36
[+10] [2011-03-24 16:20:20] yonkeltron

XOR Linked List [1] uses two XOR'd pointers to lessen the storage requirements for doubly-linked list. Kind of obscure but neat!

[1] http://en.wikipedia.org/wiki/XOR_linked_list

37
[+9] [2010-07-22 17:21:54] David Seiler

Splash Tables [1] are great. They're like a normal hash table, except they guarantee constant-time lookup and can handle 90% utilization without losing performance. They're a generalization of the Cuckoo Hash [2] (also a great data structure). They do appear to be patented [3], but as with most pure software patents I wouldn't worry too much.

[1] http://crpit.com/confpapers/CRPITV91Askitis.pdf
[2] http://en.wikipedia.org/wiki/Cuckoo_hashing
[3] http://www.faqs.org/patents/app/20080235488

38
[+8] [2009-02-18 20:29:37] erickson

Enhanced hashing algorithms are quite interesting. Linear hashing [1] is neat, because it allows splitting one "bucket" in your hash table at a time, rather than rehashing the entire table. This is especially useful for distributed caches. However, with most simple splitting policies, you end up splitting all buckets in quick succession, and the load factor of the table oscillates pretty badly.

I think that spiral hashing [2] is really neat too. Like linear hashing, one bucket at a time is split, and a little less than half of the records in the bucket are put into the same new bucket. It's very clean and fast. However, it can be inefficient if each "bucket" is hosted by a machine with similar specs. To utilize the hardware fully, you want a mix of less- and more-powerful machines.

[1] http://en.wikipedia.org/wiki/Linear_hash
[2] http://portal.acm.org/citation.cfm?id=901315&dl=GUIDE&coll=GUIDE&CFID=3731301&CFTOKEN=99586792

(1) I had to use linear hashing in a database class! Cool stuff. - Thomas Eding
39
[+8] [2009-02-19 01:13:10] Zuu

Binary decision diagram [1] is one of my favorite data structures, or in fact Reduced Ordered Binary Decision Diagram (ROBDD).

These kind of structures can for instance be used for:

  • Representing sets of items and performing very fast logical operations on those sets.
  • Any boolean expression, with the intention of finding all solutions for the expression

Note that many problems can be represented as a boolean expression. For instance the solution to a suduku can be expressed as a boolean expression. And building a BDD for that boolean expression will immediately yield the solution(s).

[1] http://en.wikipedia.org/wiki/Binary_decision_diagram

I thought sudoku was NP-hard? - Autodidact
Knuth even talks about these. (Under Computer Musings) scpd.stanford.edu/knuth/index.jsp Also, there are Zero Suppressed BDDs, which also have useful properties. - Theo Belaire
40
[+8] [2010-11-23 03:20:16] Andrew Whitaker

The Region Quadtree

(quoted from Wikipedia [1])

The region quadtree represents a partition of space in two dimensions by decomposing the region into four equal quadrants, subquadrants, and so on with each leaf node containing data corresponding to a specific subregion. Each node in the tree either has exactly four children, or has no children (a leaf node).

Quadtrees like this are good for storing spatial data, e.g. latitude and longitude or other types of coordinates.

This was by far my favorite data structure in college. Coding this guy and seeing it work was pretty cool. I highly recommend it if you're looking for a project that will take some thought and is a little off the beaten path. Anyway, it's a lot more fun than the standard BST derivatives that you're usually assigned in your data structures class!

In fact, as a bonus, I've found the notes from the lecture leading up to the class project (from Virginia Tech) here (pdf warning) [2].

[1] http://en.wikipedia.org/wiki/Quadtree#The_region_quadtree
[2] http://courses.cs.vt.edu/~cs3114/Spring10/Notes/T06.PRQuadTrees.pdf

(1) This was already (implicitly) mentioned here. Personally, I find quadtrees to be the obvious solution - R-trees are usually more efficient, and in my opinion much cooler. - BlueRaja - Danny Pflughoeft
41
[+7] [2009-02-02 21:34:11] Rafał Dowgird

I like treaps - for the simple, yet effective idea of superimposing a heap structure with random priority over a binary search tree in order to balance it.


42
[+6] [2009-03-24 21:34:13] user82238

Counted unsorted balanced btrees.

Perfect for text editor buffers.

http://www.chiark.greenend.org.uk/~sgtatham/algorithms/cbtree.html


43
[+6] [2009-06-13 10:56:39] bill

Fast Compact tries:

[1] http://judy.sourceforge.net
[2] http://members.optusnet.com.au/~askitisn/
[3] http://members.optusnet.com.au/~askitisn/

44
[+6] [2010-07-23 08:03:25] moritz

I sometimes use Inversion LIsts to store ranges, and they are often used to store character classes in regular expressions. See for example http://www.ibm.com/developerworks/linux/library/l-cpinv.html

Another nice use case is for weighted random decisions. Suppose you have a list of symbols and associated probabilites, and you want to pick them at random according to these probabilities

   a => 0.1
   b => 0.5
   c => 0.4

Then you do a running sum of all the probabilities:

  (0.1, 0.6, 1.0)

This is your inversion list. You generate a random number between 0 and 1, and find the index of the next higher entry in the list. You can do that with a binary search, because it's sorted. Once you've got the index, you can look up the symbol in the original list.

If you have n symbols, you have O(n) preparation time, and then O(log(n)) acess time for each randomly chosen symbol - independently of the distribution of weights.

A variation of inversion lists uses negative numbers to indicate the endpoint of ranges, which makes it easy to count how many ranges overlap at a certain point. See http://www.perlmonks.org/index.pl?node_id=841368 for an example.


45
[+6] [2011-01-15 11:39:40] huitseeker

Arne Andersson trees [1] are a simpler alternative to red-black trees, in which only right links can be red. This greatly simplifies maintenance, while keeping performance on par with red-black trees. The original paper gives a nice and short implementation [2] for insertion and deletion.

[1] http://en.wikipedia.org/wiki/Aa_tree
[2] http://user.it.uu.se/~arnea/abs/simp.html

IIRC, unbox the red nodes and you have a 2-3 finger tree. - Jon Harrop
46
[+6] [2011-03-24 16:19:07] pathikrit

DAWG [1]s are a special kind of Trie where similar child trees are compressed into single parents. I extended modified DAWGs and came up with a nifty data structure called ASSDAWG (Anagram Search Sorted DAWG). The way this works is whenever a string is inserted into the DAWG, it is bucket-sorted first and then inserted and the leaf nodes hold an additional number [2] indicating which permutations are valid if we reach that leaf node from root. This has 2 nifty advantages:

  1. Since I sort the strings before insertion and since DAWGs naturally collapse similar sub trees, I get high level of compression (e.g. "eat", "ate", "tea" all become 1 path a-e-t with a list of numbers at the leaf node indicating which permutations of a-e-t are valid).
  2. Searching for anagrams of a given string is super fast and trivial now as a path from root to leaf holds all the valid anagrams of that path at the leaf node using permutation-numbers.
[1] http://en.wikipedia.org/wiki/Directed_acyclic_word_graph
[2] http://en.wikipedia.org/wiki/Factorial_number_system

nice idea! also, don't miss the gaddag [en.wikipedia.org/wiki/GADDAG] - Martin DeMello
(you do, however, lose the ability to do pattern matches. and i'm not entirely sure how you'd do anagrams with . and * in them) - Martin DeMello
I used this to develop a multi-language Scrabble program. To handle blank tiles - when you are descending down from root to leaf, at each node, check how many blanks you have and you can choose to use up a blank to go to a different letter. - pathikrit
47
[+5] [2009-04-02 09:38:26] Antonio

I like suffix tree [1] and arrays [2] for string processing, skip lists [3] for balanced lists and splay trees [4] for automatic balancing trees

[1] http://en.wikipedia.org/wiki/Suffix_tree
[2] http://en.wikipedia.org/wiki/Suffix_array
[3] http://codingplayground.blogspot.com/2009/01/generic-skip-list-skiplist.html
[4] http://en.wikipedia.org/wiki/Splay_trees

48
[+5] [2009-04-02 09:58:24] Yngve Sneen Lindal

Take a look at the sideways heap, presented by Donald Knuth.

http://stanford-online.stanford.edu/seminars/knuth/071203-knuth-300.asx


49
[+5] [2010-05-25 07:07:47] Ashish

BK-Trees, or Burkhard-Keller Trees [1] are a tree-based data structure which can be used to quickly find near-matches to a string.

[1] http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-Part-1-BK-Trees

50
[+5] [2011-03-24 12:23:18] Sriram Srinivasan

Fenwick trees [1] (or binary indexed trees) are a worthy addition to ones toolkit. If you have an array of counters and you need to constantly update them while querying for cumulative counts (as in PPM compression), Fenwick trees will do all operations in O(log n) time and require no extra space. See also this topcoder tutorial [2] for a good introduction.

[1] http://en.wikipedia.org/wiki/Fenwick_tree
[2] http://www.topcoder.com/tc?module=Static&d1=tutorials&d2=binaryIndexedTrees

51
[+5] [2011-04-17 02:06:38] Fantius

Zobrist Hashing [1] is a hash function generally used for representing a game board position (like in Chess) but surely has other uses. One nice things about it is that is can be incrementally updated as the board is updated.

[1] http://en.wikipedia.org/wiki/Zobrist_hashing

52
[+4] [2009-02-02 08:59:29] mdm

Splay Trees are cool. They reorder themselves in a way that moves the most often queried elements closer to the root.


this is a duplicate answer Perhaps you'd want to add complementary elements to the previous answer ? - huitseeker
53
[+4] [2009-05-13 14:36:49] user97214

Getting away from all these graph structures, I just love the simple Ring-Buffer.

When properly implemented you can seriously reduce your memory footprint while maintaining performance and sometimes even improving it.


(6) Explaining the properties of a Ring-Buffer or adding a link to more information would be helpful to people who don't know what it is...which is kind of the point of this question! - A. Levy
(2) Duplicates an existing answer above. - Steve Guidi
54
[+4] [2010-01-17 10:49:13] Firas Assaad

You can use a min-heap to find the minimum element in constant time, or a max-heap to find the the maximum element. But what if you wanted to do both operations? You can use a Min-Max [1] to do both operations in constant time. It works by using min max ordering: alternating between min and max heap comparison between consecutive tree levels.

[1] http://www.cs.otago.ac.nz/staffpriv/mike/Papers/MinMaxHeaps/MinMaxHeaps.pdf

(1) This is one of my favourite data structures. There is even a variant called a min-max-median heap which allows O(1) retrieval of any of the three. - Martin
How are these related to finger trees? - Jon Harrop
55
[+4] [2010-07-17 17:17:21] Vaibhav Bajpai

Persistent Data Structures [1]

[1] http://en.wikipedia.org/wiki/Persistent_data_structure

56
[+4] [2010-09-22 18:31:20] karlphillip

B* tree [1]

It's a variety of B-tree that is efficient for searching at the cost of a more expensive insertion.

[1] http://en.wikipedia.org/wiki/B*

57
[+4] [2011-03-24 14:33:37] user201295

Per the Bloom Filter mentions, Deletable Bloom Filters (DlBF) are in some ways better than basic counting variants. See http://arxiv.org/abs/1005.0352


58
[+3] [2009-05-10 22:03:00] DanC89

Skip lists are actually pretty awesome: http://en.wikipedia.org/wiki/Skip_list


59
[+3] [2010-05-23 08:43:09] ade

Priority deque is cheaper than having to maintain 2 separate priority queues with different orderings. http://www.alexandria.ucsb.edu/middleware/javadoc/edu/ucsb/adl/middleware/PriorityDeque.html http://cphstl.dk/Report/Priority-deque/cphstl-report-2001-14.pdf


60
[+3] [2011-03-24 22:34:45] GWW

I think the FM-index [1] by Paolo Ferragina and Giovanni Manzini is really cool. Especially in bioinformatics. It's essentially a compressed full text index that utilizes a combination of a suffix array and a burrows-wheeler transform of the reference text. The index can be searched without decompressing the whole index.

[1] http://en.wikipedia.org/wiki/FM-index

61
[+3] [2011-03-25 05:17:33] st0le

Ternary Search Tree [1]

  • Quick prefix search (for incremental autocomplete,etc)
  • Partial Matching (When you want to find all words within X hamming distance of a string)
  • Wildcard Searches

Quite Easy to implement.

[1] http://en.wikipedia.org/wiki/Ternary_search_tree

Endorsed by the venerable Jon Bentley! - luser droog
62
[+3] [2011-03-27 05:35:59] dhruvbird

A queue implemented using 2 stacks is pretty space efficient (as opposed to using a linked list which will have at least a 1 extra pointer/reference overhead).

How to implement a queue using two stacks? [1]

This has worked well for me when the queues are huge. If I save 8 bytes on a pointer, it means that queues with a million entries save about 8MB of RAM.

[1] https://stackoverflow.com/questions/69192/using-stack-as-queue

63
[+2] [2010-02-17 19:03:09] Rudolf Olah

A proper string data structure. Almost every programmer settles for whatever native support that a language has for the structure and that's usually inefficient (especially for building strings, you need a separate class or something else).

The worst is treating a string as a character array in C and relying on the NULL byte for safety.


(3) The other extreme is C++, where every self-respecting library comes with its own string data type, which is of course incompatible with all the other string types except const char*. Personally, I prefer the environments where I don't have to spend so much time converting strings from one type to another. - Niki
That's one of the reasons I say wonders of C#. It's very rare the case where you need anything beyond string or StringBuilder native types for text. - sergiol
64
[+2] [2010-05-23 02:23:35] Gregable

PQ-Trees [1]

[1] http://knol.google.com/k/pq-trees-and-the-consecutive-ones-property

65
[+2] [2010-05-23 03:02:56] CJJ

I personally find sparse matrix data structures to be very interesting. http://www.netlib.org/linalg/html_templates/node90.html

The famous BLAS libraries use these. And when you deal with linear systems that contain 100,000's of rows and columns, it becomes critical to use these. Some of these also resemble the compact grid (basically like a bucket-sorted grid) which is common in computer graphics. http://www.cs.kuleuven.be/~ares/publications/LD08CFRGRT/LD08CFRGRT.pdf

Also as far as computer graphics is concerned, MAC grids are somewhat interesting, but only because they're clever. http://www.seas.upenn.edu/~cis665/projects/Liquation_665_Report.pdf


66
[+2] [2010-05-23 08:44:38] ade

Delta list/delta queue are used in programs like cron or event simulators to work out when the next event should fire. http://everything2.com/title/delta+list http://www.cs.iastate.edu/~cs554/lec_notes/delta_clock.pdf


How is it better than a priority queue? - Bruno Martinez
67
[+2] [2010-07-22 18:16:52] John Scipione

Bucket Brigade

They are used extensively in Apache. Basically they are a linked list that loops around on itself in a ring. I am not sure if they are used outside of Apache and Apache modules but they fit the bill as a cool yet lesser known data structure. A bucket is a container for some arbitrary data and a bucket brigade is a collection of buckets. The idea is that you want to be able to modify and insert data at any point in the structure.

Lets say that you have a bucket brigade that contains an html document with one character per bucket. You want to convert all the < and > symbols into &lt; and &gt; entities. The bucket brigade allows you to insert some extra buckets in the brigade when you come across a < or > symbol in order to fit the extra characters required for the entity. Because the bucket brigade is in a ring you can insert backwards or forwards. This is much easier to do (in C) than using a simple buffer.

Some reference on bucket brigades below:

Apache Bucket Brigade Reference [1]

Introduction to Buckets and Brigades [2]

[1] http://apr.apache.org/docs/apr/trunk/group___a_p_r___util___bucket___brigades.html
[2] http://www.apachetutor.org/dev/brigades

(7) Sounds like a marketing name for a circular linked list - BlueRaja - Danny Pflughoeft
Yeah, it sounds like a circular linked list with a variant record type along with some O(1) insertion properties. - Paul Nathan
This is also a popular way to implement queues and deques. - Jon Harrop
68
[+2] [2011-03-24 14:38:56] Trey Jackson

A corner-stitched data structure [1]. From the summary:

Corner stitching is a technique for representing rectangular two-dimensional objects. It appears to be especially well-suited for interactive editing systems for VLSI layouts. The data structure has two important features: first, empty space is represented explicitly; and second, rectangular areas are stitched together at their corners like a patchwork quilt. This organization results in fast algorithms (linear time or better) for searching, creation, deletion, stretching, and compaction. The algorithms are presented under a simplified model of VLSI circuits, and the storage requirements of the structure are discussed. Measurements indicate that corner stitching requires approximately three times as much memory space as the simplest possible representation.

[1] http://www.eecs.berkeley.edu/Pubs/TechRpts/1983/6352.html

69
[+2] [2011-03-24 14:59:05] Lukáš Nalezenec

Burrows–Wheeler transform [1] (block-sorting compression)

Its essential algorithm for compression. Let say that you want to compress lines on text files. You would say that if you sort the lines, you lost information. But BWT works like this - it reduces entropy a lot by sorting input, keeping integer indexes to recover the original order.

[1] http://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transform

(2) BWT is purely an algorithm and not a data structure though. - Jon Harrop
(2) @Jon, you're technically right, but why make the distinction? In designing data structures, I often find that a data structure and the algorithm around it go hand-in-hand. That one implies the other. Put another way, isn't the whole point of a data structure the operations you can perform on it and their runtime and memory use? I could say that the data structure for the Burrows-Wheeler transform plus run-length encoding is a data structure for representing arbitrary strings whose memory use (unlike a regular character array) is less than O(n) for many strings. And that's interesting. - Jonathan Tran
70
[+2] [2011-03-24 15:05:42] juancn

PATRICIA - Practical Algorithm to Retrieve Information Coded in Alphanumeric, D.R.Morrison (1968).

A PATRICIA tree is related to a Trie. The problem with Tries is that when the set of keys is sparse, i.e. when the actual keys form a small subset of the set of potential keys, as is very often the case, many (most) of the internal nodes in the Trie have only one descendant. This causes the Trie to have a high space-complexity.

http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/PATRICIA/


71
[+2] [2011-03-24 17:09:35] Daniel Trebbien

I am not sure if this data structure has a name, but the proposed tokenmap [1] data structure for inclusion into Boost is kind of interesting. It is a dynamically resizable map where look-ups are not only O(1), they are simple array accesses. I wrote most of the background material [2] on this data structure which describes the fundamental principle behind how it works.

Something like a tokenmap is used by operating systems to map file or resource handles to data structures representing the file or resource.

[1] http://svn.boost.org/svn/boost/sandbox/tokenmap/libs/tokenmap/doc/html/index.html
[2] http://svn.boost.org/svn/boost/sandbox/tokenmap/libs/tokenmap/doc/html/tokenmap/background.html

72
[+2] [2011-05-19 01:00:52] hugomg

Disjoint Set Forests [1] allow fast membership queries and union operations and are most famously used in Kruskal's Algorithm [2] for minimum spanning trees.

The really cool thing is that both operations have amortized running time proportional to the inverse of the Ackermann Function [3], making this the "fastest" non-constant time data structure.

[1] http://en.wikipedia.org/wiki/Disjoint-set_data_structure
[2] http://en.wikipedia.org/wiki/Kruskal%27s_algorithm
[3] http://en.wikipedia.org/wiki/Ackermann_function

73
[+1] [2009-02-18 19:47:57] Zuu
  • Binary decision diagram (my very favorite data structure, good for representing boolean equations, and solving them. Effective for a great lot of things)
  • Heaps (a tree where the parent of a node always maintains some relation to the children of the node, for instance, the parent of a node is always greater than each of it's children (max-heap) )
  • Priority Queues (really just min-heaps and max-heaps, good for maintaining order of a lot of elements there the e.g. the item with the highest value is supposed to be removed first)
  • Hash tables, (with all kinds of lookup strategies, and bucket overflow handling)
  • Balanced binary search trees (Each of these have their own advantages)
    • RB-trees (overall good, when inserting, lookup, removing and iterating in an ordered fashion)
    • Avl-trees (faster for lookup than RB, but otherwise very similar to RB)
    • Splay-trees (faster for lookup when recently used nodes are likely to be reused)
    • Fusion-tree (Exploiting fast multiplication for getting even better lookup times)
    • B+Trees (Used for indexing in databases and file systems, very efficient when latency to read/write from/to the index is significant).
  • Spatial indexes ( Excellent for querying for whether points/circles/rectangles/lines/cubes is in close proximity to or contained within each other)
    • BSP tree
    • Quadtree
    • Octree
    • Range-tree
    • Lots of similar but slightly different trees, and different dimensions
  • Interval trees (good finding overlapping intervals, linear)
  • Graphs
    • adjacency list (basically a list of edges)
    • adjacency matrix (a table representing directed edges of a graph with a single bit per edge. Very fast for graph traversal)

These are the ones i can come to think of. There are even more on wikipedia about data structures [1]

[1] http://en.wikipedia.org/wiki/Category:Data_structures

Thanks for giving constructive critique before downvoting this answer </sarcasm> - Zuu
Not the downvoter, but I'd guess it's because Heaps, PQs, Hash Tables and Binary Trees aren't what you'd call lesser known. - Dana
(16) @Zuu, Ok, I'll give some constructive criticism. You provided many data structures, of which only a small fraction could be considered "lesser known". There are no links in your post and it generally misses the entire point of the question. - mmcdole
It's hard to tell what people would understand by 'lesser known'. Some people barely know what a balanced tree is. And while people might know the term 'heap' they don't know it's a general data structure that can actually be used with sense in a given application. - Zuu
What goes for the links, sure i could look it all up, but i was nice enough to categorize them as well as linking to an index of data structures on Wikipedia. Also note that the last part of the question was added after i posted this answer :-) - Zuu
BDDs. Fear the most mindwarping datastructure :) - Tetha
What do you mean by saying Interval trees are "linear"? - Jonathan Graehl
@Zuu: Why don't you create new entries for elsewhere unmentioned Collections? - user unknown
@user unknown, I don't intend to 'maintain' my answers. I find it rude that questions can be change in the first place, effectively leaving all existing answers irrelevant. This answer reflects the original question. If anyone wants to split it up, they're free to do so :-) - Zuu
74
[+1] [2009-06-17 21:32:43] DShook

Binomial heap [1]'s have a lot of interesting properties, most useful of which is merging.

[1] http://en.wikipedia.org/wiki/Binomial_heap

75
[+1] [2010-05-25 16:44:15] Kelly S. French

Environment tracking recursive structures.

Compilers use a structure that is recursive but not like a tree. Inner scopes have a pointer to an enclosing scope so the nesting is inside-out. Verifying whether a variable is in scope is a recursive call from the inside scope to the enclosing scope.

public class Env
{    
    HashMap<String, Object> map;
    Env                     outer;

    Env()
    {
        outer = null;
        map = new HashMap();
    }

    Env(Env o)
    {
        outer = o;
        map = new HashMap();
    }

    void put(String key, Object value)
    {
        map.put(key, value);
    }

    Object get(String key)
    {
        if (map.containsKey(key))
        {
            return map.get(key);
        }
        if (outer != null)
        {
            return outer.get(key);
        }
        return null;
    }

    Env push()
    {
        return new Env(this);
    }

    Env pop()
    {
        return outer;
    }
}

I'm not sure if this structure even has a name. I call it an inside-out list.


(1) Isn't this essentially a Linked List of HashMap<String, Object>? - Wesley Wiser
It ends up being more of a tree, i.e. a HashMap of HashMaps. You can't even declare it statically because you don't know the depth ahead of time. If you tried, it would look like this: HashMap<String, HashMap<String, HashMap<String, HashMap<String, ...>>> > - Kelly S. French
One usually uses a Stack<HashMap<String,SymbolInformation>> for this, and one usually makes one pass through the AST, pushing and popping the stack as various scopes are encountered, and updating the hashes as new variables are added. If one really does want to keep the data structure around for lots of different branches of the tree at once, one uses a list in the Lisp sense, where the list nodes are immutable, and the later nodes in the list are shared by multiple lists. - Ken Bloom
76
[+1] [2010-07-13 23:05:05] Quonux

There is a clever Data-structure out there that uses Arrays to save the Data of the Elements, but the Arrays are linked together in an Linked-List/Array.

This does have the advantage that the iteration over the elements is very fast (faster than a pure linked-list approach) and the costs for moving the Arrays with the Elements around in Memory and/or (de-)allocation are at a minimum. (Because of this this data-structure is usefull for Simulation stuff).

I know about it from here:

http://software.intel.com/en-us/blogs/2010/03/26/linked-list-verses-array/

"...and that an additional array is allocated and linked in to the cell list of arrays of particles. This is similar in some respects to how TBB implemented its concurrent container."(it is about ther Performance of Linked Lists vs. Arrays)


In C++'s standard library, this is known as a deque: cplusplus.com/reference/stl/deque - Josh Townzen
This brings en.wikipedia.org/wiki/VList to my mind ... - f3lix
77
[+1] [2010-07-17 17:33:06] Jochen

Someone else already proposed Burkhard-Keller-Trees, but I thought I might mention them again in order to plug my own implementation. :)

http://well-adjusted.de/mspace.py/index.html

There are faster implementations around (see ActiveState's Python recipes or implementations in other languages), but I think/hope my code helps to understand these data structures.

By the way, both BK and VP trees can be used for much more than searching for similar strings. You can do similarity searches for arbitrary objects as long as you have a distance function that satisfies a few conditions (positivity, symmetry, triangle inequality).


Perhaps you may want to edit the previous answer ? - huitseeker
78
[+1] [2010-07-22 19:17:28] TMN

I had good luck with WPL Trees [1] before. A tree variant that minimizes the weighted path length of the branches. Weight is determined by node access, so that frequently-accessed nodes migrate closer to the root. Not sure how they compare to splay trees, as I've never used those.

[1] http://comjnl.oxfordjournals.org/cgi/content/short/34/5/444

79
[+1] [2011-03-24 16:52:19] habeanf

Half edge data structure [1] and winged edge [2] for polygonal meshes.

Useful for computational geometry algorithms.

[1] http://www.cgafaq.info/wiki/Half_edge_general
[2] http://en.wikipedia.org/wiki/Winged_edge

80
[+1] [2011-04-01 05:37:57] jyt

I think Cycle Sort [1] is a pretty neat sorting algorithm.

It's a sorting algorithm used to minimize the total number of writes. This is particularly useful when you're dealing with flash memory where the life-span of the flash memory is proportional to the amount of writes. Here is the Wikipedia article [2], but I recommend going to the first link. (nice visuals!)

[1] http://corte.si/posts/code/cyclesort/index.html
[2] http://en.wikipedia.org/wiki/Cycle_sort

81
[0] [2011-03-24 22:58:44] Jon Harrop

Right-angle triangle networks (RTINs) [1]

Beautifully simple way to adaptively subdivide a mesh. Split and merge operations are just a few lines of code each.

[1] http://geoinformatics.fsv.cvut.cz/gwiki/Modern_Algorithms_for_Real-Time_Terrain_Visualization_on_Commodity_Hardware#RTIN

82
[0] [2011-09-06 10:11:31] jscoot

I stumbled on another data structure Cartesian Tree [1] when i read about some algorithms related to RMQ and LCA. In a cartesian tree, the lowest common ancestor between two nodes is the minimum node between them. It is useful to convert a RMQ problem to LCA.

[1] http://en.wikipedia.org/wiki/Cartesian_tree

83