Skip to main content

Data Structures Overview

Time and Space Complexity Overviews

Click on a tab below to see the worst or average case time complexities for access, search, insertion, and deletion operations for common data structures. The worst case tab includes space complexity details for each data structure. The last tab includes details about the best, average, and worst case time complexities for various array sorting algorithms as well as a column with space complexity details.

Data StructureAccessSearchInsertionDeletionSpace
ArrayO(1)\perfVeryGood{O}{1}O(n)\perfAverage{O}{n}O(n)\perfAverage{O}{n}O(n)\perfAverage{O}{n}O(n)\perfAverage{O}{n}
StackO(n)\perfAverage{O}{n}O(n)\perfAverage{O}{n}O(1)\perfVeryGood{O}{1}O(1)\perfVeryGood{O}{1}O(n)\perfAverage{O}{n}
QueueO(n)\perfAverage{O}{n}O(n)\perfAverage{O}{n}O(1)\perfVeryGood{O}{1}O(1)\perfVeryGood{O}{1}O(n)\perfAverage{O}{n}
Singly-Linked ListO(n)\perfAverage{O}{n}O(n)\perfAverage{O}{n}O(1)\perfVeryGood{O}{1}O(1)\perfVeryGood{O}{1}O(n)\perfAverage{O}{n}
Doubly-Linked ListO(n)\perfAverage{O}{n}O(n)\perfAverage{O}{n}O(1)\perfVeryGood{O}{1}O(1)\perfVeryGood{O}{1}O(n)\perfAverage{O}{n}
Skip ListO(n)\perfAverage{O}{n}O(n)\perfAverage{O}{n}O(n)\perfAverage{O}{n}O(n)\perfAverage{O}{n}O(nlogn)\perfBad{O}{n\log n}
Hash TableN/A\colorbox{gray}{N/A}O(n)\perfAverage{O}{n}O(n)\perfAverage{O}{n}O(n)\perfAverage{O}{n}O(n)\perfAverage{O}{n}
Binary Search TreeO(n)\perfAverage{O}{n}O(n)\perfAverage{O}{n}O(n)\perfAverage{O}{n}O(n)\perfAverage{O}{n}O(n)\perfAverage{O}{n}
Cartesian TreeN/A\colorbox{gray}{N/A}O(n)\perfAverage{O}{n}O(n)\perfAverage{O}{n}O(n)\perfAverage{O}{n}O(n)\perfAverage{O}{n}
B-TreeO(logn)\perfGood{O}{\log n}O(logn)\perfGood{O}{\log n}O(logn)\perfGood{O}{\log n}O(logn)\perfGood{O}{\log n}O(n)\perfAverage{O}{n}
Red-Black TreeO(logn)\perfGood{O}{\log n}O(logn)\perfGood{O}{\log n}O(logn)\perfGood{O}{\log n}O(logn)\perfGood{O}{\log n}O(n)\perfAverage{O}{n}
Splay TreeN/A\colorbox{gray}{N/A}O(logn)\perfGood{O}{\log n}O(logn)\perfGood{O}{\log n}O(logn)\perfGood{O}{\log n}O(n)\perfAverage{O}{n}
AVL TreeO(logn)\perfGood{O}{\log n}O(logn)\perfGood{O}{\log n}O(logn)\perfGood{O}{\log n}O(logn)\perfGood{O}{\log n}O(n)\perfAverage{O}{n}
KD TreeO(n)\perfAverage{O}{n}O(n)\perfAverage{O}{n}O(n)\perfAverage{O}{n}O(n)\perfAverage{O}{n}O(n)\perfAverage{O}{n}

What is a data structure?

A data structure, as its name implies, is a way of structuring data. Since we are dealing with computers, we are ultimately talking about a way of specifically structuring data inside random-access memory (RAM). As the Wiki article notes:

In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, and the functions or operations that can be applied to the data (i.e., it is an algebraic structure about data).

Abstract data types (ADTs)

The Wiki page for ADTs lists a number of common ADTs, remarked on more below in a tabular format, that have proven to be useful across a great variety of applications. It's important to note the ADT list provided is not at all exhaustive, and among the ADTs listed, each one may be defined in different, non-equivalent ways.

Consider the Queue ADT. A queue is a list that implements a first-in-first-out (FIFO) policy on insertions and deletions; that is, the most important behaviors anyone might expect of a queue is that it's a list where elements can only be added to the end of the list and can only be removed from the front of the list. The "Add" operation might be called any number of things from one language to the next (e.g., add, enqueue, enter, append, etc.), and the same is true for the "Remove" operation (e.g., remove, popleft, dequeue, leave, etc.). But the most important thing is that by "Queue" we mean some interface that allows us to interact with a list-like object where the first element added will also be the first element removed.

If we want to implement the Queue ADT, then we will encounter many more variations in both choice and design:

  • Will our Queue ADT only allow for an "Add" and "Remove" interface or should we add additional operations to be able to check whether or not the queue is empty (e.g., isEmpty), calculate how many elements are in the queue at any given time (e.g., size), provide access to the values of the elements at the end or the beginning of the queue without altering the queue's value (e.g., peekleft or peekright, respectively)?
  • What data structure will we use to implement the Queue ADT? An array? A linked list? There are many choices, some better-suited than others (e.g., arrays incur an O(n)O(n) time cost when removing elements at the front whereas for linked lists the cost is O(1)O(1)).
Finding the source code for Python's built-in functions and types

As one helpful Stack Overflow post notes, you can generally find the source code for Python's built-in functions and built-in types at the following locations:

ADTDescription and Python implementation reference(s)
StringTLDR: '' (string literal results in an str object)      

A string is a sequence of characters. From the Python docs: "Textual data in Python is handled with str objects, or strings. Strings are immutable sequences of Unicode code points."
ListTLDR: [] (list literal results in a list sequence type)      

A list or sequence is an ADT that represents a finite number of ordered values, where the same value may occur more than once. Python's default list type is a dynamic array that allows efficient resizing for dynamic data storage as opposed to an array data structure that has a fixed size.

In Python, the docs indicate there are three basic sequence types: lists, tuples, and range objects (list, tuple, range). There are additional sequence types tailored for processing of binary data (i.e., bytes, bytearray, and memoryview) and text strings (i.e., str).

Python discusses common sequence operations and provides a table of operations supported by most sequences types. They even provide an abstract base class (ABC) to make it easier to correctly implement the commonly supported sequence operations on custom sequence types: collections.abc.Sequence.

Finally, it's important to note that a list can be implemented in various ways, not just with an array or a dynamic array. A linked list is a popular implementation of the List ADT when insertions and deletions are frequent and they predominately occur at the beginning of the list. If, however, direct access is important, as is often the case, then an array will likely be a more suitable implementation than a linked list. Python does not have a LinkedList class; instead, Python provides a deque which is implemented as a doubly-linked list under the hood.
QueueTLDR: collections.deque      

A queue is a collection of entities that are maintained in a sequence and can be modified by the addition of entities at one end of the sequence (enqueue) and the removal of entities from the other end of the sequence (dequeue). These operations make a queue "FIFO" or first-in-first-out.

Interestingly, Python has a queue module, but it is unlikely to be what you want if what you're primarily interested in is a data structure that allows O(1)O(1) insertions and deletions (i.e., enqueuing from the right and dequeuing from the left). As this post notes, "queue.Queue is intended for allowing different threads to communicate using queued messages/data, whereas collections.deque is simply intended as a data structure." Hence, the queue module is primarily a tool for using thread-safe queues.

If you don't care about concurrency or threading and you don't need the thread-safety features provided by the queue module, then deque is the clear choice. It doesn't include the overhead associated with thread-safe operations like the queue module does. The Python docs for queue even suggest deque as an alternative for this very reason: "collections.deque is an alternative implementation of unbounded queues with fast atomic append() and popleft() operations that do not require locking and also support indexing."
Double-ended queueTLDR: collections.deque      

A double-ended queue is an ADT that generalizes a queue, for which elements can be added to or removed from either the front (head) or back (tail).
Priority queueTLDR: [] or heapify(your_list) (requires heapq module)      

A priority queue is an ADT similar to a regular queue where each element in the queue has an associated priority. Elements are dequeued from the priority queue based on their associated value, with higher priority elements being dequeued before lower priority elements.

To ensure these ADT-defining operations are executed efficiently for priority queues, implementations of priority queues often rely on a heap data structure even though another method might be to use an unordered array. A heap is a tree-based data structure that satisfies the heap property: "In a max heap, for any given node C, if P is a parent node of C, then the key (the value) of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C."

The Python docs for heapq note the pop method returns the smallest item, not the largest, meaning Python's heap implementation is that of a min heap, as described above. If, however, a max heap is needed, then items for the min heap can be multiplied by -1 to simulate the behavior of a max heap (but be sure to multiply an item by -1 whenever you push or pop an item from the heap!).

It's probably worth mentioning that Python does have a queue.PriorityQueue class, but this should be avoided unless you're dealing with concurrency and worried about thread safety (for reasons remarked on above when discussing the Queue ADT). Using the heapq module provides the same data management benefits without the thread safety overhead.
Double-ended priority queueTLDR: No built-in support (but consider depq on PyPi)      

There's no nice reference/answer for a good Python implementation of the double-ended priority queue (DEPQ) ADT. It's tricky (the Wiki page is illustrative as to why this is the case), but the depq package on PyPi is probably worth checking out. It might also be worth trying to implement this ADT yourself!

Min-max heaps may prove fruitful if you try to implement this ADT yourself. There's an article that discusses min-max heaps and generalized priority queues that may be helpful as a starting point. There's also a detailed discussion of DEPQs in [6] where special attention is paid to using symmetric min-max heaps to implement DEPQs. Whatever way you choose, this is not a straightforward ADT to try to implement.
StackTLDR: [] (list literal results in a list sequence type that can easily be used as a stack)      

A stack is an ADT that serves as a collection of elements with two main operations: 1) push, which adds an element to the collection, and 2) pop, which removes the most recently added element. The Stack ADT is cleanly implemented in Python by simply using a list. As the docs note: "The list methods make it very easy to use a list as a stack, where the last element added is the first element retrieved ('last-in, first-out'). To add an item to the top of the stack, use append(). To retrieve an item from the top of the stack, use pop() without an explicit index." Sure, it's possible to implement the Stack ADT using a linked list or something else, but the list is the preferred way for most use cases.
SetTLDR: set()      

A set is an ADT that can store unique values, without any particular order. Python's set is a hash table-based implementation of the Set ADT.
MultisetTLDR: collections.Counter      

A multiset (also known as a "bag") is a modification of the concept of a set that, unlike a set, allows for multiple instances for each of its elements. The number of instances given for each element is called the multiplicity of that element in the multiset. Python does not have a built-in implementation of the Multiset ADT, but it's possible to use Counter instances as multisets. Specifically, the keys in a Counter instance are unique (so they're equivalent to a set) and the counts hold the multiplicity or the number of instances of each element.
MapTLDR: {} (dictionary literal results in a dict object)      

A map (also called an associative array, symbol table, dictionary, etc.) is an ADT that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. dict is a mapping type in Pyton that accomplishes exactly this: "A mapping object maps hashable values to arbitrary objects. Mappings are mutable objects. There is currently only one standard mapping type, the dictionary."
MultimapTLDR: collections.defaultdict(list OR set)      

A multimap is a generalization of a map or associative array ADT in which more than one value may be associated with and returned for a given key. Both map and multimap are particular cases of containers. Often the multimap is implemented as a map with lists or sets as the map values, which is exactly what defaultdict(list OR set) accomplishes (respectively).
CollectionTLDR: collections    

A collection is a grouping of some variable number of data items that have some shared significance to the problem being solved and need to be operated upon together in some controlled fashion. Python's collections module implements specialized container datatypes that provide alternatives to Python's general purpose built-in containers, namely dict, list, set, and tuple. The collections module is basically a buffet of specialized container data types.
ContainerTLDR: dict, list, set, tuple, collections (module)      

A container is a class or a data structure whose instances are collections of other objects. In other words, they store objects in an organized way that follows specific access rules. In Python, examples of built-in containers include dict, list, set, and tuple, but more container types are available in the collections module.
TreeTLDR: No built-in support  

A tree is a hierarchical structure of nodes, where each node has a value and a list of references to other nodes (children). Trees are not directly implemented in Python but can be built most notably by using classes.
GraphTLDR: No built-in support    

A graph represents a set of connections (edges) between nodes (vertices). There's no direct built-in Python implementation, but graphs can be represented in various ways, namely edge lists, adjacency lists, or adjacency matrices (among other representations).