Templates for Data Structures and Algorithms
Contents
Symbol | Designation |
---|---|
✓ | This is a problem from LeetCode's interview crash course. |
✠ | This is a problem stemming from work done through Interviewing IO. |
★ | A right-aligned ★ (one or more) indicates my own personal designation as to the problem's relevance, importance, priority in review, etc. |
- Backtracking
- Binary search
- Dynamic programming
- Graphs
- Greedy algorithms
- Heaps
- Linked lists
- Matrices
- Sliding window
- Stacks and queues
- Trees
- Manually determine order of nodes visited ("tick trick")
- Pre-order traversal
- Post-order traversal
- In-order traversal
- Level-order traversal
- Level-order (BFS)
- Induction (solve subtrees recursively, aggregate results at root)
- Traverse-and-accumulate (visit nodes and accumulate information in nonlocal variables)
- Combining templates: induction and traverse-and-accumulate
- Two pointers
- Miscellaneous
Backtracking
Remarks
TBD
def fn(curr, OTHER_ARGUMENTS...):
if (BASE_CASE):
# modify the answer
return
ans = 0
for (ITERATE_OVER_INPUT):
# modify the current state
ans += fn(curr, OTHER_ARGUMENTS...)
# undo the modification of the current state
return ans
Examples
LC 46. Permutations (✓) ★★
Given an array nums
of distinct integers, return all the possible permutations. You can return the answer in any order.
class Solution:
def permute(self, nums: List[int]) -> List[List[int]]:
def backtrack(curr):
if len(curr) == len(nums):
permutations.append(curr[:]) # note that we append a _copy_ of curr
return
for num in nums:
if num not in curr:
curr.append(num)
backtrack(curr)
curr.pop()
permutations = []
backtrack([])
return permutations
What actually is a permutation of nums
? It's essentially all possible orderings of the elements of nums
such that no element is duplicated. A backtracking strategy to generate all permutations sounds promising — what would the base case be? It would be when the current permutation being generated, say curr
, has the same length as the input array nums
: curr.length == nums.length
(of course, this assumes we've done our due diligence and have prevented duplicates from being added to curr
). The base case of curr.length == nums.length
means we have completed the process of generating a permutation and we cannot go any further; specifically, if we look at the process of generating permutations as a tree, then completing the generation of a permutation means we have reached a leaf node, as illustrated in the following image for the input nums = [1, 2, 3]
:
Building all permutations for this problem, where the input is an array of numbers, means we need all elements to make an appearance at the first index, all other elements to make an appearance at the second index, and so on. Hence, we should loop over all elements of nums
for each call to our backtrack
function, where we should always check to see if a number is already in curr
before adding it to curr
. Each call to backtrack
is like visiting a node in the tree of candidates being generated. The leaves are the base cases/answers to the problem.
For the solution given above, if we simply add print(curr)
after the line curr.append(num)
, then we can very clearly see how each call to backtrack
is like visiting a node in the tree (it's like performing a DFS on an imaginary tree):
[1]
[1, 2]
[1, 2, 3] # complete permutation (leaf node)
[1, 3]
[1, 3, 2] # complete permutation (leaf node)
[2]
[2, 1]
[2, 1, 3] # complete permutation (leaf node)
[2, 3]
[2, 3, 1] # complete permutation (leaf node)
[3]
[3, 1]
[3, 1, 2] # complete permutation (leaf node)
[3, 2]
[3, 2, 1] # complete permutation (leaf node)
The entire list of complete permutations is then returned as the answer:
[[1, 2, 3], [1, 3, 2], [2, 1, 3], [2, 3, 1], [3, 1, 2], [3, 2, 1]]
The time complexity above effectively amounts to roughly , because we iterate through all of nums
for each call to backtrack
, and membership to curr
is a linear cost when curr
is an array, and then we're guaranteed to make calls to backtrack
, and each call to backtrack
then results in calls, and so forth. If we wanted to make a micro-optimization, then we could introduce a hash set to make the membership checks on curr
instead of , but this change pales in comparison to the factorial cost of calling backtrack
so many times:
class Solution:
def permute(self, nums: List[int]) -> List[List[int]]:
def backtrack(curr):
if len(curr) == len(nums):
permutations.append(curr[:])
return
for num in nums:
if num not in lookup:
curr.append(num)
lookup.add(num)
backtrack(curr)
curr.pop()
lookup.remove(num)
lookup = set()
permutations = []
backtrack([])
return permutations
LC 78. Subsets (✓) ★★
Given an integer array nums
of unique elements, return all possible subsets (the power set). The solution set must not contain duplicate subsets. Return the solution in any order.
class Solution:
def subsets(self, nums: List[int]) -> List[List[int]]:
def backtrack(curr, num_idx):
subs.append(curr[:])
if num_idx == len(nums): # note that this base case is implied
return # since `backtrack` is only called in the
# for loop when `num_idx < len(nums)`
for i in range(num_idx, len(nums)):
curr.append(nums[i])
backtrack(curr, i + 1)
curr.pop()
subs = []
backtrack([], 0)
return subs
This problem is quite similar to LC 46. Permutations but with some very notable differences, namely container length and element order:
- Length: A subset can have any length from
0
throughn
(wheren
is the size of the input array of distinct integers), inclusive, but a permutation has a fixed length ofn
. - Order: The containers
[1, 2, 3]
and[3, 2, 1]
are considered to be different permutations but the same subset.
If we have a problem where containers like those above are considered to be duplicates and we do not want to consider duplicates (e.g., such as this problem concerned with finding subsets), then a common "trick" is to add a rule where each call of the backtrack
function allows us to only consider elements that come after the previously processed element:
This is a very common method of avoiding duplicates in backtracking problems — have an integer argument that represents a starting point for iteration at each function call.
For example, in this problem, we start with the root being the empty container, []
:
[]
With nums = [1, 2, 3]
, we can clearly consider each element as the beginning of its own subset:
[ ]
/ | \
[1] [2] [3]
Now what? Remember that calling backtrack
is like moving to another node; hence, to respect the strategy remarked on above, when we move to another node, that node should only involve elements that come after the one we have just processed. This means the tree of possibilities above should end up looking like the following:
[ ] # level subsets: []
/ | \
[1] [2] [3] # level subsets: [1], [2], [3]
/ \ |
[2] [3] [3] # level subsets: [1,2], [1,3], [2,3]
|
[3] # level subsets: [1,2,3]
The actual order of subset generation from our solution code is not hard to anticipate in light of the strategy we've been discussing, where, again, we're basically doing a DFS on an imaginary tree, and once we hit the last indexed element (i.e., when index == len(nums)
) we move back up the tree from child to parent:
[
[],
[1],
[1,2],
[1,2,3],
[1,3],
[2]
[2,3]
[3]
]
The order conjectured above is confirmed by the return value of our solution when nums = [1, 2, 3]
:
[[], [1], [1, 2], [1, 2, 3], [1, 3], [2], [2, 3], [3]]
The process becomes even clearer if we add the print statement print(subs)
after subs.append(curr[:])
as well as the print statement print('COMPLETED')
after if num_idx == len(nums):
. Making these modifications and running the solution code again on the input nums = [1,2,3]
results in the following being printed to standard output:
[[]]
[[], [1]]
[[], [1], [1, 2]]
[[], [1], [1, 2], [1, 2, 3]]
COMPLETED
[[], [1], [1, 2], [1, 2, 3], [1, 3]]
COMPLETED
[[], [1], [1, 2], [1, 2, 3], [1, 3], [2]]
[[], [1], [1, 2], [1, 2, 3], [1, 3], [2], [2, 3]]
COMPLETED
[[], [1], [1, 2], [1, 2, 3], [1, 3], [2], [2, 3], [3]]
COMPLETED
[[], [1], [1, 2], [1, 2, 3], [1, 3], [2], [2, 3], [3]]
Note that the text 'COMPLETED'
is only ever printed after an element/subset has been added to subs
that ends with 3
, which corresponds to the leaves of the tree shown earlier.
To summarize, when generating permutations, we had a length requirement, where we needed to use all of the elements in the input; hence, we only considered leaf nodes as part of the actual returned answer. With subsets, however, there is no length requirement; thus, every node should be in the returned answer, including the root node, which is why the very first line of the backtrack
function is to add a copy of curr
to the returned answer subs
.
LC 77. Combinations (✓) ★★
Given two integers n
and k
, return all possible combinations of k
numbers out of 1
... n
. You may return the answer in any order.
class Solution:
def combine(self, n: int, k: int) -> List[List[int]]:
def backtrack(curr, start_val):
if len(curr) == k:
combinations.append(curr[:])
return
for i in range(start_val, n + 1):
curr.append(i)
backtrack(curr, i + 1)
curr.pop()
combinations = []
backtrack([], 1)
return combinations
It can be very helpful to sketch out a tree of possibilities if we suspect backtracking may be an effective strategy for coming up with a solution (just like sketching out things for general tree problems, linked list problems, etc.).
The root of our tree would be []
, the empty list. Unlike in LC 78. Subsets, there is an explicit length requirement in terms of the elements that can be added to the container we must ultimately return; hence, not all nodes in the tree of possibilities will represent entities that should be added to our answer array.
What is clear is that each entry added to the list we must return must have a length of k
, and it must contain numbers from the list [1, ..., n]
, inclusive, where no number in the k
-length list is duplicated. This means we should consider combining some of the strategy points we used in both LC 46. Permutations and LC 78. Subsets, namely using a length requirement as part of the base case as well as preventing duplicates from being considered, respectively.
The thinking goes something like the following: start with the empty list, []
, as the root of the tree:
[]
Each k
-length list can have a number from [1, ..., n]
, inclusive; thus, the root should have n
children, namely nodes with values from 1
through n
(the example below uses the values of n = 4, k = 2
from the first example on LeetCode):
[ ]
[1] [2] [3] [4]
Now what? A complete solution will be generated only when the path has length k
, but each path right now has a length of 1
(but k = 2
for this example). To keep generating paths (i.e., possible solutions), we need to avoid considering duplicates, which means for each node value i
, we only subsequently consider node values [i + 1, ..., n]
, which results in our overall tree looking like the following:
[ ]
[1] [2] [3] [4]
[2] [3] [4] [3] [4] [4]
Since the input specifies k = 2
, the tree above suffices to report the following as the combinations list:
[
[1,2],
[1,3],
[1,4],
[2,3],
[2,4],
[3,4]
]
LC 797. All Paths From Source to Target (✓) ★
Given a directed acyclic graph (DAG) of n
nodes labeled from 0
to n - 1
, find all possible paths from node 0
to node n - 1
, and return them in any order.
The graph is given as follows: graph[i]
is a list of all nodes you can visit from node i
(i.e., there is a directed edge from node i
to node graph[i][j]
).
class Solution:
def allPathsSourceTarget(self, graph: List[List[int]]) -> List[List[int]]:
def backtrack(curr_path, node):
if node == len(graph) - 1:
paths.append(curr_path[:])
return
for neighbor in graph[node]:
curr_path.append(neighbor)
backtrack(curr_path, neighbor)
curr_path.pop()
paths = []
backtrack([0], 0)
return paths
This is a great backtracking problem because backtracking is not necessarily the obvious first strategy of attack. Even if we do think of backtracking, how exactly should we proceed? This becomes much clearer (again!) once we start to sketch out the tree of possibilities. What would the root of our tree be? It has to be node 0
since our goal is to find all paths from node 0
to node n - 1
. This also suggests something about our base case: we should terminate path generation whenever node n - 1
is reached. We don't have to worry about cycles or anything like that since we're told the graph is a DAG. Additionally, the graph is already provided as an adjacency list which makes traversing each node's neighbors quite easy.
So what's the strategy? Let's start with our root node (which needs to be part of every solution):
0
Now what? Each neighbor of node 0
needs to be considered (we're trying to get to node n - 1
from node 0
in whatever way is possible, which means exploring all possible paths). Let's use the input of the second example on LeetCode for a concrete illustration: graph = [[4,3,1],[3,2,4],[3],[4],[]]
. This graphs looks like the following:
As stated above, from 0
we need to consider each of its neighbors:
0
/ | \
1 3 4
The tree above makes it clear [0,4]
will be one possible path, but what are the other possible paths? It looks like leaf nodes will be the complete solutions that need to be added to our answer that we ultimately return. For each node that is not 4
(i.e., n - 1 == 4
in this case), we need to consider each possible neighbor (this will not be endless because we're told the graph is a DAG). Considering the neighbors for each node means our tree of possibilities ultimately ends up looking as follows:
0
/ | \
1 3 4
/ | \ |
2 3 4 4
| |
3 4
|
4
Hence, the set of possible paths is as expected:
[[0,4],[0,3,4],[0,1,3,4],[0,1,2,3,4],[0,1,4]]
LC 17. Letter Combinations of a Phone Number (✓) ★★
Given a string containing digits from 2-9
inclusive, return all possible letter combinations that the number could represent. Return the answer in any order.
A mapping of digit to letters (just like on the telephone buttons) is given below. Note that 1 does not map to any letters.
class Solution:
def letterCombinations(self, digits: str) -> List[str]:
if not digits:
return []
keypad = {
'2': 'abc',
'3': 'def',
'4': 'ghi',
'5': 'jkl',
'6': 'mno',
'7': 'pqrs',
'8': 'tuv',
'9': 'wxyz'
}
def backtrack(curr, start_idx):
if len(curr) == len(digits):
combinations.append("".join(curr))
return
for i in range(start_idx, len(digits)):
digit = digits[i]
for letter in keypad[digit]:
curr.append(letter)
backtrack(curr, i + 1)
curr.pop()
combinations = []
backtrack([], 0)
return combinations
As with most other backtracking problems, it helps if we start by sketching out the tree of possibilities, where we can imagine the root being an empty string:
''
Now let's consider the input digits = "23"
, the input for the first example on LeetCode. The desired output is ["ad","ae","af","bd","be","bf","cd","ce","cf"]
. How can this be achieved? The first digit is 2
, which means 'a'
, 'b'
, and 'c'
are valid starting letters for combinations:
' '
/ | \
'a' 'b' 'c'
We do not want to add duplicates of these letters for subsequent possible combinations but instead consider the potential letters arising from the next digit (i.e., we want to prevent processing duplicates by only ever processing digits after the current digit). The next digit in this case is 3
which corresponds to possible letters 'd'
, 'e'
, and 'f'
. Our tree now looks like the following:
' '
/ | \
'a' 'b' 'c'
/ | \ / | \ / | \
'd' 'e' 'f' 'd' 'e' 'f' 'd' 'e' 'f'
The tree of possibilities above makes it clear the letter combinations we should return are as follows, as expected:
[
'ad',
'ae',
'af',
'bd',
'be',
'bf',
'cd',
'ce',
'cf',
]
LC 39. Combination Sum (✓) ★
Given an array of distinct integers candidates
and a target integer target
, return a list of all unique combinations of candidates
where the chosen numbers sum to target
. You may return the combinations in any order.
The same number may be chosen from candidates
an unlimited number of times. Two combinations are unique if the frequency of at least one of the chosen numbers is different.
It is guaranteed that the number of unique combinations that sum up to target
is less than 150
combinations for the given input.
class Solution:
def combinationSum(self, candidates: List[int], target: int) -> List[List[int]]:
def backtrack(path, path_sum, next_path_node_idx):
if path_sum == target:
paths.append(path[:])
return
for i in range(next_path_node_idx, len(candidates)):
candidate = candidates[i]
if path_sum + candidate <= target:
path.append(candidate)
backtrack(path, path_sum + candidate, i)
path.pop()
paths = []
backtrack([], 0, 0)
return paths
This problem is similar to LC 77. Combinations, but now we are taksed with generating all combinations of candidates
such that each candidate number is allowed however many times we want, but the sum of the numbers for each combination/mixture of candidates has to equal the given target
.
Start by drawing a tree!
[]
Each number in candidates
may be part of a path sum we want to return, where the sum of the elements in the path equates to target
. Let's use the input provided in the first example on LeetCode: candidates = [2,3,6,7], target = 7
:
[ ]
/ / \ \
2 3 6 7
The problem description indicates any number may be reused however many times we desire, but we still don't want to generate duplicates such as [2,2,3]
, [2,3,2]
, etc. How can we avoid doing this? We can use the same strategy that many other backtracking problems use (i.e., ensure we only begin processing elements that are either the current element itself or elements that come after the current element):
[ ]
/ / \ \
2 3 6 7
/ / \ \ /|\ / \ |
2 3 6 7 3 6 7 6 7 7
..................................
When do we stop the process of generating paths? Since all possible values in candidates
are positive, this necessarily means a path is no longer valid if its path sum exceeds the given target
value.
Note that the following lines in the solution above are important:
# ...
for i in range(next_path_node_idx, len(candidates)):
candidate = candidates[i]
if path_sum + candidate <= target:
path.append(candidate)
# ...
It might be tempting to modify path_sum
with the candidate
value before entering the if
block, but this would be a mistake. Why? Because the list of candidates
is not necessarily ordered; hence, if we made the update path_sum += candidate
and the new path_sum
exceeded target
, then we would no longer consider that path, but this would also prevent us from exploring other branches using candidates of a possibly lesser value where path_sum + candidate
did not exceed target
.
LC 79. Word Search (✓) ★★★
Given an m x n
board
and a word
, find if the word exists in the grid.
The word can be constructed from letters of sequentially adjacent cells, where "adjacent" cells are horizontally or vertically neighboring. The same letter cell may not be used more than once.
class Solution:
def exist(self, board: List[List[str]], word: str) -> bool:
def valid(row, col):
return 0 <= row < m and 0 <= col < n
def backtrack(row, col, word_idx, seen):
if word_idx == len(word):
return True
for dr, dc in dirs:
next_row, next_col = row + dr, col + dc
if valid(next_row, next_col) and (next_row, next_col) not in seen:
next_char = board[next_row][next_col]
if next_char == word[word_idx]:
seen.add((next_row, next_col))
if backtrack(next_row, next_col, word_idx + 1, seen):
return True
seen.remove((next_row, next_col))
return False
m = len(board)
n = len(board[0])
dirs = [(1,0),(-1,0),(0,1),(0,-1)]
for row in range(m):
for col in range(n):
if board[row][col] == word[0] and backtrack(row, col, 1, {(row, col)}):
return True
return False
This problem has a very DFS feel to it, but what makes this a backtracking problem and not a strict DFS problem is because we may visit a square multiple times on the same initial function call. For example, suppose we're looking for the word CAMERA
in the following grid:
CAM
ARE
MPS
We may start exploring as follows (each |
represents the path taken):
C A M
|
A R E
|
M P S
The M
-letter above does not have E
as a neighbor so we remove the M
from consideration and move back up to A
, which also does not have an M
-neighbor except the one we just visited, which we know leads to nothing. So we remove A
as well and we're back at C
, and we note C
does have another A
-neighbor it can visit. And we end up seeing the following as a valid path:
C -> A -> M
|
A <- R <- E
M P S
This path visits the A
under C
again in order to come up with a valid path. Thus, while we may not be allowed to use a square more than once for the answer, there are possibly multiple ways to use a square to form different candidates, as illustrated above.
We should still use a set seen
like we would normally do with DFS solutions so as to avoid using the same letter in the same path. Unlike in DFS solutions, however, we will remove from the set seen
when backtracking and it's been determined that a path solution is no longer possible — we should only ever traverse an edge in the graph if we know that the resultant path could lead to an answer. Thus, we can also pass an index variable word_idx
that indicates we are currently looking for word[word_idx]
, and then only move to (next_row, next_col)
if word[word_idx]
is the correct letter.
Since the answer could start from any square whose letter matches word[0]
, we need to systematically start our backtracking approach from any square with word[0]
as its letter. If we exhaust all such squares and never find word
, then we should return false
. Of course, if we do find word
in the midst of one of our backtracking approaches, then we return true
immediately.
LC 52. N-Queens II (✓) ★★★
The n-queens puzzle is the problem of placing n
queens on an n x n
chessboard such that no two queens attack each other.
Given an integer n
, return the number of distinct solutions to the n-queens puzzle.
class Solution:
def totalNQueens(self, n: int) -> int:
def backtrack(row, cols, diagonals, anti_diagonals):
# base case: n queens have been placed
if row == n:
return 1
solution = 0
for col in range(n):
diagonal = row - col
anti_diagonal = row + col
# queen is not placeable, move to considering next column
if (col in cols or diagonal in diagonals or anti_diagonal in anti_diagonals):
continue
# add the queen to the board
cols.add(col)
diagonals.add(diagonal)
anti_diagonals.add(anti_diagonal)
# move to next row with the updated board state
solution += backtrack(row + 1, cols, diagonals, anti_diagonals)
# remove the queen from the board since we have
# explored all valid paths using the above backtrack function call
cols.remove(col)
diagonals.remove(diagonal)
anti_diagonals.remove(anti_diagonal)
return solution
return backtrack(0, set(), set(), set())
This problem is difficult. And it's often used to explain backtracking at various learning levels (e.g., university and college classes). The solution to this problem is informative, especially in regards to how we might want to go about modeling backtracking solutions to other problems.
Whenver we make a call to backtrack
, we're doing so by describing the state of whatever it is that is important for consideration in solving the problem at hand. For some problems, maybe that state involves a string and the number of used left and right parentheses (e.g., LC 22. Generate Parentheses), maybe the state involves the row and column of a grid under consideration along with the cells already seen and the index of a word that represents a character for which we are searching (e.g., LC 79. Word Search). And maybe, as in this problem, the state needs to involve a chessboard!
What state should we consider tracking? We need to somehow keep track of rows and columns in some way. The trickier part about this problem involves the diagonals. Even though we have a board in this problem, this is not like a graph problem where we need to conduct a DFS or BFS — fundamentally, our job at each step is to place a queen on the board so that it is not under attack. This is the backtracking part where we narrow down the range of possibilities we consider — the brute force approach would involve generating all board possibilities with queens and then narrow down which solutions actually work, but that would involve placing queens on the board in positions that are being attacked, which means the rest of the generation of possibilities is useless. We want to place queens in positions where they are not being attacked and where whatever subsequent positions we consider can actually lead to a solution. Only once we find a solution do we start to undo queen placement and try other queen placements, but the key is that every placement must be one that could potentially lead to success.
With everything in mind above, how should we proceed? First, it's helpful to just think about how queens move on a chessboard: unlimited moves vertically, horizontally, diagonally (top left to bottom right), and anti-diagonally (top right to bottom left). If we are given n
queens on an n x n
chessboard, then it's clear that, at the end of the board configuration, each row must have a queen and each column must have a queen. The trickiest part is managing the diagonals effectively.
But we can start by working with the insight above about rows and columns. Each time backtrack
is called should result in the placement of a queen on a new row; that is, we'll pass row
as an argument to backtrack
, and when we have finally reached row == n
(i.e., rows 0
through n - 1
have been filled with queens), then we will know we have found a valid solution. Finding a workable position on any given row means processing all column positions on that row until we have found a position that is not under attack.
What about columns? As noted above, for each row being processed, we will iterate through all row positions (i.e., across all columns) until we find a column position that is not under attack. The row we are processing is definitely not under attack due to how we're processing rows, but how do we know whether or not each column is under attack? If another queen is on any column we're considering for a current row, then that column position is invalid because that other queen's line of attack runs through the current position we're considering. We need to consider the next column position. To keep track of which columns are already occupied by queens (i.e., which columns would be considered to be under attack when we're trying to add a new queen), whenever we add a queen to the board, we should add the column position of the queen to a set cols
that we can use when we're trying to add subsequent queens.
Cool! We consider rows sequentially so it's guaranteed each row we're considering is not under attack. We have a set cols
that shows which columns are occupied. It seems like we need a similar "strategy" for diagonals and anti-diagonals that we currently have for columns; that is, whenever we add a queen to the board, we should also note which diagonal and anti-diagonal just became occupied. But how do we effectively apply a singular label or marker to a diagonal or anti-diagonal that spans multiple positions on the board?
This seems really tricky to do at first until we make note of a brilliant observation concerning diagonal and anti-diagonal movement (R
and C
below represent row and column position, respectively):
-
diagonal movement (top left to bottom right): If we start at any cell
(R, C)
and move one position up, then we will be at cell(R - 1, C - 1)
. If we move one position down, then we will be at(R + 1, C + 1)
. In general, if we letdr
anddc
represent the change in row or column value, respectively, then we will find ourselves always moving from cell(R, C)
to(R + dr, C + dc)
on a given diagonal. Importantly, note thatdr == dc
since a change in any direction effects each value in the same way (e.g., moving 4 spaces up meansdr == dc == -4
since the row and column values both decrease by 4).But how does any of this help in service of creating a unique label for a diagonal? The insight lies in how the row and column values relate to each other. Let
(R, C)
be the coordinates of a cell on a diagonal, and let's label the cell as having valueR - C
. Then what happens whenever we move from cell(R, C)
to cell(R + dr, C + dc)
? Sincedr == dc
, we have(R + dr) - (C + dc) = R + dr - C - dr = R - C
; that is, the label for any cell on a diagonal is the same:Hence, we should label visited diagonals by adding
row - col
values to adiagonals
set. -
anti-diagonal movement (top right to bottom left): We can make a similar argument to the one above about effectively labeling anti-diagonals. Let
(R, C)
represent any given cell on an anti-diagonal. Then moving up a cell would take us to cell(R - 1, C + 1)
while moving down a cell would take us to cell(R + 1, C - 1)
; that is, the row and column values of a cell change at the same rate in an inversely proportional manner. We essentially havedr == dc
again, but there needs to be a sign difference in how we represent a movement from(R, C)
to another cell:(R + dr, C - dc)
. Then(R + dr) + (C - dc) = R + dr + C - dr = R + C
:Hence, we should label visited anti-diagonals by adding
row + col
values to ananti-diagonals
set.
LC 22. Generate Parentheses (✓) ★★★
Given n
pairs of parentheses, write a function to generate all combinations of well-formed parentheses.
class Solution:
def generateParenthesis(self, n: int) -> List[str]:
def backtrack(curr_str, left_count, right_count):
if len(curr_str) == 2 * n:
valid_parens.append("".join(curr_str))
return
if left_count < n:
curr_str.append('(')
backtrack(curr_str, left_count + 1, right_count)
curr_str.pop()
if left_count > right_count:
curr_str.append(')')
backtrack(curr_str, left_count, right_count + 1)
curr_str.pop()
valid_parens = []
backtrack([], 0, 0)
return valid_parens
The editorial for this solution on LeetCode is quite good. The key insights (with the first being somewhat minor but the second one being critical):
- We should append the string we're building whenever its length is
2n
because the string we're building, when taking a backtracking approach anyway, must have the potential to be valid and a complete solution at every point. Hence, once/if the candidate string reaches a length of2n
, then we know the string satisfies the constraints of the problem and is a complete solution. - As a starting point, we can add as many left-parentheses as we want without fear of producing an invalid string (so long as the number of left parentheses we add doesn't exceed
n
). We then need to start adding right parentheses. But how can we add right parentheses in general? We should only ever consider adding a right parenthesis when the total number of left parentheses exceeds the number of right parentheses. That way when we add a right parenthesis there's a chance the subsequent string could be a well-formed parenthetical string of length2n
, as desired.
The LeetCode editorial linked above shows how sketching out a tree of possibilities is very useful for this problem, where the following illustration is for the beginning of the case where n = 2
:
We can start to see how the logic of the solution above makes sense. For the sake of completeness and concreteness, consider the input of the first example on LeetCode, n = 3
, and the corresponding desired output:
["((()))","(()())","(())()","()(())","()()()"]
We can make our own diagram that shows how each solution is built (the x
indicates that potential solution path is longer pursued since further work on that path cannot possibly lead to a correct answer):
_____________________________________________________"
/ \
_________________________________________________(________________________________________________ x
/ \
_______________________((_______________________ _______________________()
/ \ / \
(((___ _________________(()________________ _________________()(________________ x
/ \ / \ / \
x ((()____ (()(____ _________(()) ()((____ _________()()
/ \ / \ / \ / \ / \
x ((())__ x (()()__ (())(__ x x ()(()__ ()()(__ x
/ \ / \ / \ / \ / \
x ((())) x (()()) x (())() x ()(()) x ()()()
As a note of reference, the tree above was generated using the binarytree
package:
from binarytree import build2
my_tree = build2([
'"',
"(", 'x',
"((", "()", None, None,
"(((", "(()", "()(", "x",
"x", "((()", "(()(", "(())", "()((", "()()", None, None,
None, None, "x", "((())", "x", "(()()", "(())(", "x", "x", "()(()", "()()(", "x",
None, None, "x", "((()))", None, None, "x", "(()())", "x", "(())()", None, None, None, None, "x", "()(())", "x", "()()()"
])
root = my_tree.levelorder[0]
print(root)
LC 967. Numbers With Same Consecutive Differences (✓) ★★
Return all non-negative integers of length n
such that the absolute difference between every two consecutive digits is k
.
Note that every number in the answer must not have leading zeros. For example, 01
has one leading zero and is invalid.
You may return the answer in any order.
class Solution:
def numsSameConsecDiff(self, n: int, k: int) -> List[int]:
def digits_to_int(digits):
ans = digits[0]
for i in range(1, len(digits)):
ans = (ans * 10) + digits[i]
return ans
def backtrack(digit_arr):
if len(digit_arr) == n:
res.append(digits_to_int(digit_arr))
return
for num in range(10):
if abs(digit_arr[-1] - num) == k:
digit_arr.append(num)
backtrack(digit_arr)
digit_arr.pop()
res = []
for start_digit in range(1, 10):
backtrack([start_digit])
return res
The impossibility of leading zeros can complicate things if we're not careful. We still need to consider the digit 0
as part of potential constraint-satisfying integers. An easy way to handle this is to completely prevent 0
from being a possible leading digit at the outset. Execute the backtrack
function for digit arrays that begin with each number 1
through 9
, inclusive, and append complete solutions to an overall results array, res
, that we ultimately return. It's easiest to manage the digits if we build each integer using a digits array and then return the actual integer once the digits array represents a constraint-satisfying integer.
LC 216. Combination Sum III (✓) ★
Find all valid combinations of k
numbers that sum up to n
such that the following conditions are true:
- Only numbers
1
through9
are used. - Each number is used at most once.
Return a list of all possible valid combinations. The list must not contain the same combination twice, and the combinations may be returned in any order.
class Solution:
def combinationSum3(self, k: int, n: int) -> List[List[int]]:
def backtrack(nums_arr, curr_sum, next_num):
if len(nums_arr) == k and curr_sum == n:
res.append(nums_arr[:])
return
for num in range(next_num, 10):
if curr_sum + num <= n:
nums_arr.append(num)
backtrack(nums_arr, curr_sum + num, num + 1)
nums_arr.pop()
res = []
backtrack([], 0, 1)
return res
The valid combinations can only involve positive integers, which means the sum we're building can only get bigger. Hence, we should only consider as possibilities combinations of integers for which the sum does not exceed n
. Also, the numbers used as part of the combination need to be unique. Which means whenever we start with a certain number we should only ever consider subsequent integer values.
Binary search
Array
Remarks
TBD
def binary_search(arr, target):
left = 0 # starting index of search range (inclusive)
right = len(arr) - 1 # ending index of search range (inclusive)
result = -1 # result of -1 indicates target has not been found yet
while left <= right: # continue search while range is valid
mid = left + (right - left) // 2 # prevent potential overflow
if arr[mid] < target: # target is in right half
left = mid + 1 # (move `left` to narrow search range to right half)
elif arr[mid] > target: # target is in left half
right = mid - 1 # (move `right` to narrow search range to left half)
else: # target found (i.e., arr[mid] == target)
result = mid # store index where target is found (early return, if desired)
right = mid - 1 # uncomment to find first occurrence by narrowing search range to left half
# left = mid + 1 # uncomment to find last occurrence by narrowing search range to right half
if result != -1:
return result # target was found; return its index
else:
return left # target was not found; return insertion point to maintain sorted order
# NOTES:
# only one of the following lines should be uncommented in the else clause: `right = mid - 1` and `left = mid + 1` should
# `right = mid - 1` uncommented and `left = mid + 1` commented out results in searching for first occurrence of target
# `left = mid + 1` uncommented and `right = mid - 1` commented out results in searching for last occurrence of target
Examples
LC 704. Binary Search
Given an array of integers nums
which is sorted in ascending order, and an integer target
, write a function to search target
in nums
. If target
exists, then return its index. Otherwise, return -1
.
class Solution:
def search(self, nums: List[int], target: int) -> int:
left = 0
right = len(nums) - 1
while left <= right:
mid = left + (right - left) // 2
if target < nums[mid]: # target in left half, move right boundary
right = mid - 1
elif target > nums[mid]: # target in right half, move left boundary
left = mid + 1
else:
return mid # target at current mid position, return
return -1
The code above is the template for a basic binary search. We're guaranteed that the numbers in nums
are unique, which means target
, if it exists, will be found and the first index at which it occurs (and only index) will be returned.
LC 74. Search a 2D Matrix
Write an efficient algorithm that searches for a value in an m x n
matrix. This matrix has the following properties:
- Integers in each row are sorted from left to right.
- The first integer of each row is greater than the last integer of the previous row.
class Solution:
def searchMatrix(self, matrix: List[List[int]], target: int) -> bool:
m = len(matrix)
n = len(matrix[0])
left = 0
right = m * n - 1
while left <= right:
mid = left + (right - left) // 2
val = matrix[mid // n][mid % n]
if target < val:
right = mid - 1
elif target > val:
left = mid + 1
else:
return True
return False
This problem is very similar to the following classic binary search problem: LC 704. Binary Search. The main difference is that in this problem we basically need to view the 2D array matrix
as virtually flattened into a 1D array with index values idx
bounded by 0 <= idx <= m * n - 1
. The key part of the solution above is, of course, the following line: val = matrix[mid // n][mid % n]
. This lets us seamlessly find a cell value in the virtually flattened matrix by converting a given index value to the appropriate row and column in the original input matrix. We could make this more explicit by abstracting away the logic in the line above into its own function:
def index_to_cell_val(idx):
row = idx // n
col = idx % n
return matrix[row][col]
The idea is that we can binary search on the virtually flattened 1D array.
LC 35. Search Insert Position (✓)
Given a sorted array of distinct integers and a target value, return the index if the target is found. If not, return the index where it would be if it were inserted in order.
class Solution:
def searchInsert(self, nums: List[int], target: int) -> int:
left = 0
right = len(nums) - 1
while left <= right:
mid = left + (right - left) // 2
if target < nums[mid]:
right = mid - 1
elif target > nums[mid]:
left = mid + 1
else:
return mid
return left
Distinctness of the input integers in nums
ensures returning mid
will be the index of target
if it's found; otherwise, we return left
, which will be the index where we should insert target
in order to keep a sorted array.
LC 633. Sum of Square Numbers★
Given a non-negative integer c
, decide whether there're two integers a
and b
such that a2 + b2 = c
.
class Solution:
def judgeSquareSum(self, c: int) -> bool:
def binary_search(left, right, target):
while left <= right:
b = left + (right - left) // 2
if target < b * b:
right = b - 1
elif target > b * b:
left = b + 1
else:
return True
return False
a = 0
while a * a <= c:
b_squared = c - a * a
if binary_search(0, b_squared, b_squared):
return True
a += 1
return False
Binary search can crop up in all sorts of unexpected places. This is one of them. The idea is that we iteratively search the space [0, c - a^2]
for a value of b
such that b^2 == c - a^2
that way a^2 + b^2 == c
, as desired. The funkier part of the binary search solution is what the usual mid
denotes in the binary search itself, namely the b
-value we're looking for such that b^2 == target
, where target == c - a^2
. Hence, when we make adjustments to the left or right endpoints, we're actually comparing the target
value against b * b
where b
takes the role of the normal mid
value. If it's ever the case that the target
is neither less than b * b
nor greater than b * b
, then we've found an integer b
-value that satisfies the equation (and we've done so using binary search!).
LC 2300. Successful Pairs of Spells and Potions
You are given two positive integer arrays spells
and potions
, of length n
and m
, respectively, where spells[i]
represents the strength of the i
th spell and potions[j]
represents the strength of the j
th potion.
You are also given an integer success
. A spell and potion pair is considered successful if the product of their strengths is at least success
.
Return an integer array pairs
of length n
where pairs[i]
is the number of potions that will form a successful pair with the i
th spell.
class Solution:
def successfulPairs(self, spells: List[int], potions: List[int], success: int) -> List[int]:
def successful_potions(spell_strength):
# need spell * potion >= success (i.e., potion >= threshold)
threshold = success / spell_strength
left = 0
right = len(potions)
while left < right:
mid = left + (right - left) // 2
if threshold <= potions[mid]:
right = mid
else:
left = mid + 1
return len(potions) - left
potions.sort()
return [ successful_potions(spell) for spell in spells ]
For any potion to actually be successful, it must be greater than or equal to success / spell
for any given spell. Since we process potions
for each spell in spells
, this suggests we should pre-process potions
by sorting. Then we can conduct a binary search (where the target is equal to success / spell
for a given spell) to determine the where the insertion point would need to be in order for a new potion to be successful. We want everything to the right of that point. Hence, if len(potions)
is the length of the potions array and we determine that the leftmost insertion point should occur at i
, then we want everything from [i, len(potions) - 1]
; that is, (len(potions) - 1) - i + 1 == len(potions) - i
, where the first + 1
reflects inclusivity of i
. An example will make this clear.
Let's say we sort potions
and have potions = [1, 2, 3, 4, 5]
, and success = 7
. We have a spell with a strength of 3
. To form a successful pair, we need a potion with a strength of at least 7 / 3 = 2.3333
. If we do a binary search for this value on potions
, we will find an insertion index of 2
. Every potion on this index and to the right can form a successful pair. There are 3
indices in total (the potions with strength 3
, 4
, 5
). In general, if there are m
potions, the final index is m - 1
. If the insertion index is i
, then the range [i, m - 1]
has a size of (m - 1) - i + 1 = m - i
.
LC 2389. Longest Subsequence With Limited Sum (✓) ★
You are given an integer array nums
of length n
, and an integer array queries of length m
.
Return an array answer
of length m
where answer[i]
is the maximum size of a subsequence that you can take from nums
such that the sum of its elements is less than or equal to queries[i]
.
A subsequence is an array that can be derived from another array by deleting some or no elements without changing the order of the remaining elements.
class Solution:
def answerQueries(self, nums: List[int], queries: List[int]) -> List[int]:
def binary_search(query):
left = 0
right = len(nums)
while left < right:
mid = left + (right - left) // 2
if query < nums[mid]:
right = mid
else:
left = mid + 1
return left
nums.sort()
for i in range(1, len(nums)):
nums[i] += nums[i - 1]
return [ binary_search(query) for query in queries ]
This problem invites us to dust off our knowledge of prefix sums — because that's really what we need to effectively answer this problem. We need to sort the input nums
, create a prefix sum (either mutate the input directly, as above, or create a new array), and then conduct a binary search on the prefix sum where each time we try to find what would need to be the rightmost insertion point if we were to add query
to the prefix sum array.
Why does this work. Suppose we had the prefix sum [0, 1, 3, 5, 5, 7, 9]
, and the query value we were given was 5
. Where would we need to insert 5
in the prefix sum above to maintain sorted order so that 5
was as far right as possible? It would need to be at index i == 5
(right after the other two 5
values). This means the original numbers in nums
responsible for the [0, 1, 3, 5, 5]
part of the preifx sum can all be removed so that the sum is less than or equal to the query value 5
. That is why the solution above works.
Solution space
Remarks
As noted in a LeetCode editorial on binary search on solution spaces, we need a few conditions to be met in order to effectively conduct our search:
- Possibility/condition/check/feasible function can execute in rougly time — we can quickly, in or better, verify if the task is possible for a given threshold value,
threshold
; that is, we define a function,possible(threshold)
, that returns a boolean that indicates if the given task is possible or impossible when given the specificthreshold
value. - Max/min characteristic when task is possible given the specific
threshold
value — if the task is possible for a numberthreshold
and we are looking for
- a maximum, then it is also possible for all numbers less than
threshold
. - a minimum, then it is also possible for all numbers greater than
threshold
.
- Max/min characteristic when task is impossible given the specific
threshold
value — if the task is impossible for a numberthreshold
and we are looking for
- a maximum, then it is also impossible for all numbers greater than
threshold
. - a minimum, then it is also impossible for all numbers less than
threshold
.
The above depictions can be somewhat difficult to envision at first so it can be helpful to draw out a very simple outline as if we're on a number line from 0 to infinity, left to right, as demonstrated below.
Looking for a maximum threshold:
Example use case (illegal parking): Maximize time spent parked illegally without getting a ticket. Under various conditions (e.g., parking enforcers, location, etc.), we can imagine this being possible for a certain amount of time before it becomes impossible. We'd like to maximize the POSSIBLE amount of time we do not have to worry about getting a ticket before it becomes IMPOSSIBLE to avoid getting a ticket:
-----------------------
| Possible | Impossible |
-----------------------
0 ^ ...inf
(threshold binary searched for)
As can be seen above, given a threshold
amount of time, our task of going undetected when parked illegally is
- possible for all numbers less than or equal to
threshold
- impossible for all numbers greater than
threshold
Looking for a minimum threshold:
Example use case (mandatory online trainings): Minimize time spent on a manadotry online training page before clicking to continue without arousing suspicion. Many online training requirements are modules that are "click-through" in nature, where an employee must complete the module but should not "click to continue" until a sufficient amount of time has elapsed to indicate the employee has possibly consumed all of the information on the page. The goal is to minimize the amount of time spent on any given page. We can imagine this being impossible for a certain amount of time before it becomes possible. We'd like to minimize the POSSIBLE amount of time we are required to be on any given training page where it is IMPOSSIBLE to avoid doing so until a certain amount of time has elapsed:
-----------------------
| Impossible | Possible |
-----------------------
0 ^ ...inf
(threshold binary searched for)
As can be seen above, given a threshold
amount of time, our task of having to remain on a given training page before being allowed to continue making progress through the training is
- impossible for all numbers less than
threshold
- possible for all numbers greater than or equal to
threshold
TAKEAWAY:
-
Minimum: When searching to minimize a value on a solution space, our goal is to find a value,
threshold
, where the condition we're testing for is possible (i.e.,possible(threshold)
returnstrue
) andthreshold
is minimized within the region of possibilities. Specifically, if we letl
andr
represent the smallest and largest possible solutions in the solution space, respectively, then we're essentially searching for thethreshold
value, sayx
, betweenl
andr
such thatpossible(x)
returnstrue
but any smaller value ofx
, sayx - ε
, results inpossible(x - ε)
returningfalse
. We can use our previous illustration to capture this:l x r
-----------------------
| Impossible | Possible |
----------------------- -
Maximum: When searching to maximize a value on a solution space, our goal is to find a value,
threshold
, where the condition we're testing for is possible (i.e.,possible(threshold)
returnstrue
) andthreshold
is maximized within the region of possibilities. Specifically, if we letl
andr
represent the smallest and largest possible solutions in the solution space, respectively, then we're essentially searching for thethreshold
value, sayx
, betweenl
andr
such thatpossible(x)
returnstrue
but any larger value ofx
, sayx + ε
, results inpossible(x + ε)
returningfalse
. We can use our previous illustration to capture this:l x r
-----------------------
| Possible | Impossible |
-----------------------
The template below makes everything discussed above feasible:
def binary_search_sol_space(arr):
def possible(threshold):
# this function is implemented depending on the problem
return BOOLEAN
left = MINIMUM_POSSIBLE_ANSWER # minimum possible value in solution space (inclusive)
right = MAXIMUM_POSSIBLE_ANSWER # maximum possible value in solution space (inclusive)
result = -1 # desired result (-1 to indicate no valid value found yet)
while left <= right: # continue search while range is valid
mid = left + (right - left) // 2
if possible(mid):
result = mid # mid satisfies condition; update result
right = mid - 1 # adjust right to find smaller valid value (minimization)
else:
left = mid + 1 # mid doesn't satisfy condition; search right half
# IMPORTANT: swap `right = mid - 1` and `left = mid + 1`
# if looking to maximize valid value (i.e., instead of minimize)
return result # return best value found satisfying condition
Above, left
, right
, result
stand for l
, r
, x
, respectively, in regards to the notation we used previously to visualize the solution space on which we are binary searching. A few things worth noting about the template above:
- Line
13
: This is whereresult
is updated. Note howresult
is only updated oncepossible(mid)
is true for somemid
value; that is, if what we're looking to minimize or maximize is actually not possible, thenresult
will never be updated, and a value of-1
will be returned to indicate no valid value was found. - Lines
14
and16
: Whether or not these lines should be swapped depends on if the problem at hand is a minimization (no swap) or maximization (swap) problem. Specificially, the template above, in its default state, is set up for minimization problems. Why? Because once a validmid
value is found, we narrow the search space to the left withright = mid - 1
, which corresponds to trying to find a smaller valid value (minimization). On the other hand, if we're trying to maximize the valid values we find, then we need to narrow the search space to the right withleft = mid + 1
, which corresponds to trying to find a larger valid value (maximization).
Thankfully, the template above is quite similar to the template for binary searching on arrays, which means less effort needs to be devoted to memorization, and more time can be spent on understanding.
def binary_search_sol_space(arr):
def possible(threshold):
# this function is implemented depending on the problem
return BOOLEAN
left = MINIMUM_POSSIBLE_ANSWER # minimum possible value in solution space (inclusive)
right = MAXIMUM_POSSIBLE_ANSWER # maximum possible value in solution space (inclusive)
result = -1 # desired result (-1 to indicate no valid value found yet)
while left <= right: # continue search while range is valid
mid = left + (right - left) // 2
if possible(mid):
result = mid # mid satisfies condition; update result
right = mid - 1 # adjust right to find smaller valid value (minimization)
else:
left = mid + 1 # mid doesn't satisfy condition; search right half
# IMPORTANT: swap `right = mid - 1` and `left = mid + 1`
# if looking to maximize valid value (i.e., instead of minimize)
return result # return best value found satisfying condition
Examples
LC 875. Koko Eating Bananas (✓) ★
Koko loves to eat bananas. There are n
piles of bananas, the i
th pile has piles[i]
bananas. The guards have gone and will come back in h
hours.
Koko can decide her bananas-per-hour eating speed of k
. Each hour, she chooses some pile of bananas and eats k
bananas from that pile. If the pile has less than k
bananas, she eats all of them instead and will not eat any more bananas during this hour.
Koko likes to eat slowly but still wants to finish eating all the bananas before the guards return.
Return the minimum integer k
such that she can eat all the bananas within h
hours.
class Solution:
def minEatingSpeed(self, piles: List[int], h: int) -> int:
def possible(speed):
hours_spent = 0
for banana in piles:
hours_spent += -(banana // -speed)
if hours_spent > h:
return False
return True
left = 1
right = max(piles)
while left < right:
mid = left + (right - left) // 2
if possible(mid):
right = mid
else:
left = mid + 1
return left
The idea is for Koko to go as slowly as possible while still being able to eat all bananas within h
hours. Our goal is to find the speed at which Koko can eat all bananas but as soon as we decrease the speed it becomes impossible (that will give us the minimized speed for which eating all bananas is possible).
Hence, we can binary search on the solution space where the solution space is comprised of speeds in the range [min_possible_speed, max_possible_speed]
, inclusive. What would make sense as a minimum possible speed? The speed needs to be an integer and it clearly can't be 0
; hence, the minimum possible speed is 0
so we set left = 0
. What about the maximum possible speed? Each pile can be consumed within a single hour if the speed is the size of the pile with the greatest number of bananas; hence, the maximum possible speed we should account for is max(piles)
so we set right = max(piles)
.
All that is left to do now is to greedily search for the leftmost value in the solution space that satisfies the possible
constraint.
LC 1631. Path With Minimum Effort (✓) ★★
You are a hiker preparing for an upcoming hike. You are given heights
, a 2D array of size rows x columns
, where heights[row][col]
represents the height of cell (row, col)
. You are situated in the top-left cell, (0, 0)
, and you hope to travel to the bottom-right cell, (rows-1, columns-1)
(i.e., 0-indexed). You can move up, down, left, or right, and you wish to find a route that requires the minimum effort.
A route's effort is the maximum absolute difference in heights between two consecutive cells of the route.
Return the minimum effort required to travel from the top-left cell to the bottom-right cell.
class Solution:
def minimumEffortPath(self, heights: List[List[int]]) -> int:
def valid(row, col):
return 0 <= row < m and 0 <= col < n
def possible(route_max_abs_diff):
seen = {(0,0)}
def dfs(row, col):
if row == m - 1 and col == n - 1:
return True
for dr, dc in dirs:
next_row, next_col = row + dr, col + dc
if valid(next_row, next_col) and (next_row, next_col) not in seen:
cell_val = heights[row][col]
adjacent_cell_val = heights[next_row][next_col]
if abs(adjacent_cell_val - cell_val) <= route_max_abs_diff:
seen.add((next_row, next_col))
if dfs(next_row, next_col):
return True
return False
return dfs(0, 0)
m = len(heights)
n = len(heights[0])
dirs = [(1,0),(-1,0),(0,1),(0,-1)]
left = 0
right = max([element for row in heights for element in row]) - 1
while left < right:
mid = left + (right - left) // 2
if possible(mid):
right = mid
else:
left = mid + 1
return left
This is a great problem for a variety of reasons. It can be somewhat tricky at first though. The idea is that if we can find a path from the top left to the bottom right using a certain amount of effort
, then we can certainly find a path that works for any greater amount of effort (i.e., effort + 1
, effort + 2
, etc.). But can we find a valid path using less effort? That's the real question.
If we let our solution space be the amount of effort required for a valid path, then we can binary search from the minimum possible amount of effort to the maximum possible amount of effort, inclusive. The goal is to find the effort amount for a valid path where any effort less than that would not result in a valid path.
What are the min/max effort bounds? The minimum possible effort is 0
because all of the numbers in the path could be the same. What about the maximum possible effort? That would be the max amount in the matrix minus the smallest amount:
max([element for row in heights for element in row]) - 1
We could also observe the constraint 1 <= heights[i][j] <= 10^6
, which means either of the following options would be valid for the problem at hand:
# option 1
left = 0
right = max([element for row in heights for element in row]) - 1
# option 2
left = 0
right = 10 ** 6 - 1
The first option is obviously more costly in some respects, but the second option could result in a maximum boundary that is much larger than what we need. Since binary search is so fast, the second option is really not an issue though.
Finally, it should be noted that a stack-based DFS is also quite effective here to avoid the space overhead required by the call stack to deal with recursion:
class Solution:
def minimumEffortPath(self, heights: List[List[int]]) -> int:
def valid(row, col):
return 0 <= row < m and 0 <= col < n
def possible(route_max_abs_diff):
seen = {(0,0)}
def dfs(start_row, start_col):
stack = [(start_row, start_col)]
while stack:
row, col = stack.pop()
if row == m - 1 and col == n - 1:
return True
for dr, dc in dirs:
next_row, next_col = row + dr, col + dc
if valid(next_row, next_col) and (next_row, next_col) not in seen:
cell_val = heights[row][col]
next_cell_val = heights[next_row][next_col]
if abs(next_cell_val - cell_val) <= route_max_abs_diff:
seen.add((next_row, next_col))
stack.append((next_row, next_col))
return False
return dfs(0, 0)
m = len(heights)
n = len(heights[0])
dirs = [(1,0),(-1,0),(0,1),(0,-1)]
left = 0
right = max([element for row in heights for element in row]) - 1
while left < right:
mid = left + (right - left) // 2
if possible(mid):
right = mid
else:
left = mid + 1
return left
LC 1870. Minimum Speed to Arrive on Time (✓) ★
You are given a floating-point number hour
, representing the amount of time you have to reach the office. To commute to the office, you must take n
trains in sequential order. You are also given an integer array dist
of length n
, where dist[i]
describes the distance (in kilometers) of the i
th train ride.
Each train can only depart at an integer hour, so you may need to wait in between each train ride.
- For example, if the
1
st train ride takes1.5
hours, you must wait for an additional0.5
hours before you can depart on the2
nd train ride at the 2 hour mark.
Return the minimum positive integer speed (in kilometers per hour) that all the trains must travel at for you to reach the office on time, or -1
if it is impossible to be on time.
Tests are generated such that the answer will not exceed 107
and hour
will have at most two digits after the decimal point.
class Solution:
def minSpeedOnTime(self, dist: List[int], hour: float) -> int:
if len(dist) > -(hour // -1):
return -1
def possible(speed):
hours_spent = 0
for i in range(len(dist) - 1):
distance = dist[i]
hours_spent += -(distance // -speed)
if hours_spent > hour:
return False
hours_spent += dist[-1] / speed
return hours_spent <= hour
left = 1
right = 10 ** 7
while left < right:
mid = left + (right - left) // 2
if possible(mid):
right = mid
else:
left = mid + 1
return left
This one is a bit wonky to reason about at first. In part because of how the time elapsed is evaluated. It's similar to LC 875. Koko Eating Bananas in that we always take the ceiling of time evaluations, but here the time spent on the last train is not rounded up (we rounded up the time spent on all piles for the Koko eating bananas problem).
When will it be impossible to reach the office on time? It's not when len(dist) > hour
, as LeetCode's second example with input dist = [1,3,2], hour = 2.7
shows. However, it will be impossible if len(dist) > ceil(hour)
. The third example, with input dist = [1,3,2], hour = 1.9
illustrates this, where the earliest the third train cen depart is at the 2 hour mark.
The idea is to conduct a binary search on the range of speeds [min_speed_possible, max_speed_possible]
for which we'll be able to make the trip on time (i.e., where the total number of hours spent is less than or equal to hour
), where we want to minimize the speed required to make the trip on time. If we can make the trip on time for a given speed
, then we can definitely make the trip on time if we increase the speed to speed + 1
. We want to determine when making it on time is possible for speed
but impossible for any valid speed value less than this. Binary search it is!
What's the minimum possible speed? We're told the speed reported must be a positive integer so we set left = 1
. What about the maximum possible speed? We're told that the answer will not exceed 10 ** 7
; hence, we set right = 10 ** 7
.
LC 1283. Find the Smallest Divisor Given a Threshold (✓)
Given an array of integers nums
and an integer threshold
, we will choose a positive integer divisor
, divide all the array by it, and sum the division's result. Find the smallest divisor
such that the result mentioned above is less than or equal to threshold
.
Each result of the division is rounded to the nearest integer greater than or equal to that element. (For example: 7/3 = 3
and 10/2 = 5
).
It is guaranteed that there will be an answer.
class Solution:
def smallestDivisor(self, nums: List[int], threshold: int) -> int:
def possible(divisor):
running_sum = 0
for num in nums:
running_sum += -(num // -divisor)
if running_sum > threshold:
return False
return True
left = 1
right = max(nums)
while left < right:
mid = left + (right - left) // 2
if possible(mid):
right = mid
else:
left = mid + 1
return left
This problem is quite similar to LC 875. Koko Eating Bananas. Our solution space is a range of possible divisors, and our goal is to minimize the divisor so that the running sum obtained by dividing each number in nums
by divisor
(and perforing the subsequent rounding up to the nearest integer) never exceeds threshold
.
If we can achieve this task for divisor
, then we can definitely achieve the same task by increasing the value of divisor
(e.g., divisor + 1
). Our goal, then, is to find a divisor
such that the task is possible but as soon as we decrease the value of divisor the task becomes impossible.
What would the minimum divisor be for our solution space? We're told it must be a positive integer; hence, we set left = 1
. What about the maximum divisor? Since each division result gets rounded up to the nearest integer, the smallest the running sum could be would occur if we chose the divisor to be the maximum value in nums
. Then no division would result in a value greater than 1
, and since we're told nums.length <= threshold <= 10^6
, we let right = max(nums)
.
LC 410. Split Array Largest Sum (✓) ★★★
Given an array nums
which consists of non-negative integers and an integer m
, you can split the array into m
non-empty continuous subarrays.
Write an algorithm to minimize the largest sum among these m
subarrays.
class Solution:
def splitArray(self, nums: List[int], k: int) -> int:
def possible(max_sum):
num_subarrays = 0
subarray_sum = 0
idx = 0
while idx < len(nums):
if nums[idx] > max_sum:
return False
subarray_sum += nums[idx]
if subarray_sum > max_sum:
subarray_sum = nums[idx]
num_subarrays += 1
if num_subarrays > k:
return False
idx += 1
return (num_subarrays + 1) <= k
left = 0
right = sum(nums)
while left < right:
mid = left + (right - left) // 2
if possible(mid):
right = mid
else:
left = mid + 1
return left
This problem is excellent and somewhat similar to LC 1231. Divide Chocolate in nature. The tip off that the problem may involve binary searching on a solution space is given by the fact if a subarray sum subarray_sum
works as a solution to the problem then increasing the value of subarray_sum
will certainly work as well. Our goal is to find a subarray sum that, when decreased at all, results in not being able to fulfill the requirements of the problem. All of this implies we should be able to conduct a binary search on the solution space of minimum subarray sum values.
What would the smallest possible subarray sum value be? Since values of 0
are allowed in nums
, we should set left = 0
. What about the largest possible subarray sum value? No subarray can have a larger sum than the entire array; hence, we set right = sum(nums)
.
The harder part for this problem is designing the possible
function effectively. The intuition is that we construct the k
subarray sums in a greedy fashion, where we keep adding to one of the k
subarrays until the given max_sum
has been exceeded, at which point we move on to constructing the next subarray sum. If the number of subarrays we must use to accomplish this ever exceeds k
, then we issue an early return of false
. If, however, we process all values and the total number of subarrays used is less than or equal to k
, then we return true
because k
will always be less than or equal to nums.length
per the constraint 1 <= k <= min(50, nums.length)
; that is, if somehow we've processed all values in nums
and have never exceeded max_sum
for a subarray sum and the number of subarrays used is much smaller than k
, then we can simply distribute the values in the subarrays to fill up the remaining empty subarrays until the total number of subarrays equals k
(the sum of the subarrays from which values are borrowed can only decrease due to the constraint 0 <= nums[i] <= 10^6
).
LC 1482. Minimum Number of Days to Make m Bouquets★★
Given an integer array bloomDay
, an integer m
and an integer k
.
We need to make m
bouquets. To make a bouquet, you need to use k
adjacent flowers from the garden.
The garden consists of n
flowers, the i
th flower will bloom in the bloomDay[i]
and then can be used in exactly one bouquet.
Return the minimum number of days you need to wait to be able to make m
bouquets from the garden. If it is impossible to make m
bouquets return -1
.
class Solution:
def minDays(self, bloomDay: List[int], m: int, k: int) -> int:
def possible(days):
bouquets_formed = flower_count = 0
for flower in bloomDay:
if flower <= days:
flower_count += 1
else:
flower_count = 0
if flower_count == k:
bouquets_formed += 1
flower_count = 0
if bouquets_formed == m:
return True
return False
# not enough flowers for required number of bouquets
if len(bloomDay) < (m * k):
return -1
left = min(bloomDay)
right = max(bloomDay)
while left < right:
mid = left + (right - left) // 2
if possible(mid):
right = mid
else:
left = mid + 1
return left
As usual, the primary difficulty in this problem is identifying it as having a nice binary search solution. The idea is to binary search on the solution space where the solution space is identified as being binary searchable as follows: if I can form m
bouquets in d
days, then I can definitely form m
bouquets in > d
days. We want to find the minimum number for d
such that trying to form m
bouquets in any fewer days is impossible. We can binary search for that number of days, as shown above.
LC 1231. Divide Chocolate (✓) ★★★
You have one chocolate bar that consists of some chunks. Each chunk has its own sweetness given by the array sweetness
.
You want to share the chocolate with your K
friends so you start cutting the chocolate bar into K+1
pieces using K
cuts, each piece consists of some consecutive chunks.
Being generous, you will eat the piece with the 8* and give the other pieces to your friends.
Find the maximum total sweetness of the piece you can get by cutting the chocolate bar optimally.
class Solution:
def maximizeSweetness(self, sweetness: List[int], k: int) -> int:
def possible(min_sweetness):
pieces = 0
piece_sweetness = 0
for chunk_sweetness in sweetness:
piece_sweetness += chunk_sweetness
if piece_sweetness >= min_sweetness:
pieces += 1
piece_sweetness = 0
if pieces == k + 1:
return True
return False
left = 1
right = sum(sweetness) + 1
while left < right:
mid = left + (right - left) // 2
if not possible(mid):
right = mid
else:
left = mid + 1
return left - 1
This is such a fantastic problem, but it is quite difficult. As usual (for binary search problems on solution spaces anyway), constructing the possible
function is where much of the difficulty lies. Our goal is to ensure we can actually come up with k + 1
pieces of chocolate to distribute where each piece meets or exceeds the required min_sweetness
threshold. If we can do that, then we're in business. But, of course, as the problem indicates, we want to maximize the minimum sweetness of our own piece of chocolate (since all other pieces distributed must have the same or more sweetness compared to our own).
Hence, we need to binary search on a range of possible minimum sweetness values. What would the smallest possible sweetness be? We're told from the constraint that every chunk of the chocolate has a sweetness of at least 1
; hence, we set left = 1
. What about the largest possible sweetness? Note that k == 0
is possible, which means there's a possibility where we need to share the chocolate bar with no one — in such a case, we would want to consume all of the sweetness, sum(sweetness)
. But since we're binary searching a solution space for a maximum value, we need to set right = sum(sweetness) + 1
as opposed to right = sum(sweetness)
. Why? Because we might miss the maximum value in an off-by-one error otherwise; for example, consider the input sweetness = [5,5], k = 0
. The while loop terminates once left == right
and right == sum(sweetness)
, but we return left - 1
, which is equal to 10 - 1 == 9
instead of the obviously correct answer of 10
.
LC 1552. Magnetic Force Between Two Balls★★
In universe Earth C-137, Rick discovered a special form of magnetic force between two balls if they are put in his new invented basket. Rick has n empty baskets, the i
th basket is at position[i]
, Morty has m
balls and needs to distribute the balls into the baskets such that the minimum magnetic force between any two balls is maximum.
Rick stated that magnetic force between two different balls at positions x
and y
is |x - y|
.
Given the integer array position
and the integer m
. Return the required force.
class Solution:
def maxDistance(self, position: List[int], m: int) -> int:
def possible(min_mag_force):
balls_placed = 1
prev_ball_pos = position[0]
for idx in range(1, len(position)):
curr_ball_pos = position[idx]
if curr_ball_pos - prev_ball_pos >= min_mag_force:
balls_placed += 1
prev_ball_pos = curr_ball_pos
if balls_placed == m:
return True
return False
position.sort()
left = 1
right = (position[-1] - position[0]) + 1
while left < right:
mid = left + (right - left) // 2
if not possible(mid):
right = mid
else:
left = mid + 1
return left - 1
The problem description is one of the hardest things about this problem. But after fighting to understand it, our thinking will gradually start to resemble what's included in the hints and point us to binary search on the solution space as a good strategy:
Hint 1: If you can place balls such that the answer is x,
then you can do it for y where y < x.
Hint 2: Similarly, if you cannot place balls such that
the answer is x then you can do it for y where y > x.
Hint 3: Binary search on the answer and greedily see if it is possible.
This problem is quite similar to LC 1231. Divide Chocolate in many ways, but instead of trying to maximize the sweetness of our least sweet piece of chocolate amongst k
friends, we are trying to maximize the magnetic force between the least magnetically attracted pair of balls. The idea is to binary search on possible answer values for the minimum magnetic force required and to maximize that value as much as possible.
Hence, our first task is to build a possible
function to determine whether or not the task at hand is possible for some given magnetic force, and we greedily try to determine the possibility; that is, we place a ball whenever the magnetic force provided has been met or exceeded (this ensures the magnetic force provided is, indeed, minimum). All balls placed must have at least min_mag_force
magnetic force between them. Our goal is to maximize that value.
The smallest possible magnetic force would be when we have n
positions and m == n
balls, where the positions are all spaced a single unit apart. That would give us a magnetic force of 1
, meaning we should set left = 1
. The largest possible magnetic force would be the last position value minus the first position value (assuming the position
array to be sorted at that point). Since the right endpoint needs to be included and we're always returning left - 1
, we should set right = (position[-1] - position[0]) + 1
.
Dynamic programming
Remarks
TBD
def fn(arr):
# 1. define a function that will compute/contain
# the answer to the problem for any given state
def dp(STATE):
# 3. use base cases to make the recurrence relation useful
if BASE_CASE:
return 0
if STATE in memo:
return memo[STATE]
# 2. define a recurrence relation to transition between states
ans = RECURRENCE_RELATION(STATE)
memo[STATE] = ans
return ans
memo = {}
return dp(STATE_FOR_WHOLE_INPUT)
def fn(arr):
# 1. initialize a table (array, list, etc.)
# to store solutions of subproblems.
dp_table = INITIALIZE_TABLE()
# 2. fill the base cases into the table.
dp_table = FILL_BASE_CASES(dp_table)
# 3. iterate over the table in a specific order
# to fill in the solutions of larger subproblems.
for STATE in ORDER_OF_STATES:
dp_table[STATE] = CALCULATE_STATE_FROM_PREVIOUS_STATES(dp_table, STATE)
# 4. the answer to the whole problem is now in the table,
# typically at the last entry or a specific position.
return dp_table[FINAL_STATE_OR_POSITION]
# example usage
arr = [INPUT_DATA]
result = fn(arr)
Examples
LC 746. Min Cost Climbing Stairs
You are given an integer array cost
where cost[i]
is the cost of ith
step on a staircase. Once you pay the cost, you can either climb one or two steps.
You can either start from the step with index 0
, or the step with index 1
.
Return the minimum cost to reach the top of the floor.
class Solution:
def minCostClimbingStairs(self, cost: List[int]) -> int:
def dp(step):
if step in memo:
return memo[step]
step_cost = min(dp(step - 1) + cost[step - 1], dp(step - 2) + cost[step - 2])
memo[step] = step_cost
return step_cost
memo = dict()
memo[0] = memo[1] = 0
return dp(len(cost))
The solution above may be the simplest, but there are multiple ways of going about this problem. What follows is mostly meant for potential use in the future.
This problem highlights how we can interpret the same problem in two fundamentally different ways and still end up with a correct answer. Specifically, we may interpret the line
You can either start from the step with index
0
, or the step with index1
.
from the problem stem in two notable ways:
- We start after step
0
, and we have to reach stepn
(i.e., we consider the top of the floor to be the step beyond the last step,n - 1
), where the cost of each step is taken into account once we have departed from that step. In this sense, step0
and step1
both cost0
because we're told we can start from the step with index0
or the step with index1
— it costs nothing to get to that step, and the cost of that step is only considered once we've left it. - We start before step
0
, and we have to reach the last step, stepn - 1
, where the cost of each step is taken into account once landed on. In this sense, choosing to go to step0
at the beginning means it costscost[0]
to do so; similarly, choosing to go to step1
instead means it costscost[1]
to do so. The goal, then, is to minimize the cost it takes to get to either stepn - 2
orn - 1
because once we get to either of those steps and that step's cost is taken into account, we can reach the top of the floor, as desired.
Which interpretation we choose to go with ultimately does not matter in terms of the correctness of our result, but the differences will manifest themselves in our implementation(s).
Generally speaking, a DP solution may be implemented top-down with memoization or bottom-up with tabulation. Furthermore, the recurrence used to characterize how subproblems are broken down into smaller and smaller subproblems may be either a backward recurrence (i.e., the usual approach where index values decrease) or a forward recurrence (i.e., more of a chronological approach, where index values increase).
Graphs
BFS
Remarks
Assume the nodes are numbered from 0
to n - 1
and the graph is given as an adjacency list. Depending on the problem, you may need to convert the input into an equivalent adjacency list before using the templates.
Basic motivation, concepts, and considerations for BFS
The following observations are largely derived from this video. The underlying graph referred to for the rest of this note is assumed to be represented as an adjacency list of index arrays.
The basic idea behind BFS is that you have a starting vertex and you explore vertices in order of how many links they are away from it: first explore the starting vertex, then all vertices one link away (i.e., incident edges involving the start vertex and its immediate neighbors), then all vertices two links away, and so forth, until you explore all vertices reachable from the start vertex. That's it. We can go ahead and come up with some basic pseudocode to model how we might do this.
ResetGraph()
for v in V
v.discovered = False
BFS(startVertex)
ResetGraph()
startVertex.discovered = True
Q = new Queue()
Q.enqueue(startVertex)
while(not Q.isEmpty())
u = Q.dequeue()
for each v such that u -> v
if(not v.discovered)
v.discovered = True
Q.enqueue(v)
Running the algorithm above implies a specific "breadth first search tree". The root is the start node, and if one node discovers another node (i.e., u -> v
), then it's that node's parent in the tree (i.e., u
is considered to be v
's parent in the tree). We can imagine that, for the same graph, we could get a different breadth first search tree based on whatever node we picked as the start node (it can get even wilder if we pick several start nodes, something that happens when executing a multi-source BFS).
A more detailed implementation of BFS would be as follows, where we not only create the breadth first search tree but also store the number of links needed to reach each vertex.
ResetGraph()
for v in V
v.discovered = False
v.dist = +inf
v.pi = nil
BFS(startVertex)
ResetGraph()
startVertex.discovered = True
startVertex.dist = 0
Q = new Queue()
Q.enqueue(startVertex)
while(not Q.isEmpty())
u = Q.dequeue()
for each v such that u -> v
if(not v.discovered)
v.discovered = True
v.dist = u.dist + 1
v.pi = u
Q.enqueue(v)
Of course, for the scheme above to work, assume there's space to store extra information with each vertex, v
, namely the distance needed to reach the vertex as well as well as the predecessor parent for that vertex, v.dist
and v.pi
, respectively. This information can be associated with the vertex itself so that we end up representing a vertex basically as a list, where each slot represents, say, the vertex index, the distance to that vertex, and then the predecessor for that vertex: [v_index, v_dist, v_pred]
. These items then get enqueued, dequeued, and updated accordingly. But it's more common to leave the vertex indices to themselves and to instead initialize dist
and pred
arrays of length n
, where n
represents the total number of vertices in the graph (remember we're assuming the underlying graph is an adjacency list of index arrays, where nodes are labeled according to their index).
Certainly an observation worth making is that vertices with a smaller distance value must be dequeued and processed before any vertex with a greater distance value is dequeued and processed. Predecessor values are recorded in part to aid in the process of potentially reconstructing the shortest path from the start vertex to one of its reachable vertices.
Reconstructing the shortest path
This can be done recursively as follows:
FindPath(s, t)
if (s == t)
return new List().add(t)
if (t.pi == nil) # no path exists
return nil
return FindPath(s, t.pi).add(t)
Or iteratively (perhaps more intuitively), where the implementation below assumes we've maintained a pred
array where pred[x]
houses the predecessor for node x
if it exists or is -1
otherwise:
ReconstructPath(s, t, pred)
path = new Stack()
path.push(t)
while path.peek() != s:
if pred[path.peek()] == -1:
return None # no path exists from s to t
path.push(pred[path.peek()])
reverse path # obtain original path direction from s to t
return path
In CLRS (i.e., [21]), the PrintPath
procedure is provided to simply print the path from the vertex s
to the vertex v
(i.e., as opposed to really reconstructing the path):
PrintPath(G, s, v)
if v == s:
print s
else if v.pi == nil:
print "no path from" s "to" v "exists
else PrintPath(G, s, v.pi)
print v
We can use Python to actually implement the PrintPath
procedure but in a way where we actually obtain the path:
def shortest_path(s, t, path, pred):
if s == t:
path.append(s)
return path
elif pred[t] == -1:
return [] # shortest path from s to t does not exist
else:
path.append(t)
return shortest_path(s, pred[t], path, pred)
In most cases, however, we would probably be more inclined to reconstruct the path in an iterative fashion, where the code is more readable and we avoid some of the overhead associated with recursion:
def shortest_path(s, t, pred):
path = [t]
while path[-1] != s:
parent = pred[path[-1]]
if parent == -1:
return [] # shortest path from s to t does not exist
path.append(parent)
path.reverse()
return path
If we look at CLRS, we will note that there's only a minor change from the "full implementation" algorithm above and how it appears in CLRS: instead of marking vertices as discovered or undiscovered, they use three colors:
- White: Undiscovered
- Gray: Discovered but not yet finished (i.e., when it's in the queue waiting to be dequeued)
- Black: Finished (i.e., dequeued and all unvisited child nodes enqueued)
That's it:
ResetGraph()
for v in V
v.color = White
v.dist = +inf
v.pi = nil
BFS(startVertex)
ResetGraph()
startVertex.color = Gray
startVertex.dist = 0
Q = new Queue()
Q.enqueue(startVertex)
while(not Q.isEmpty())
u = Q.dequeue()
for each v such that u -> v
if(v.color == White)
v.color = Gray
v.dist = u.dist + 1
v.pi = u
Q.enqueue(v)
u.color = Black
from collections import deque
def fn(graph):
queue = deque([START_NODE])
seen = {START_NODE}
ans = 0
while queue:
node = queue.popleft()
# do some logic
for neighbor in graph[node]:
if neighbor not in seen:
seen.add(neighbor)
queue.append(neighbor)
return ans
Examples
LC 1091. Shortest Path in Binary Matrix (✓)
Given an n x n
binary matrix grid
, return the length of the shortest clear path in the matrix. If there is no clear path, return -1
.
A clear path in a binary matrix is a path from the top-left cell (i.e., (0, 0)
) to the bottom-right cell (i.e., (n - 1, n - 1)
) such that:
- All the visited cells of the path are
0
. - All the adjacent cells of the path are 8-directionally connected (i.e., they are different and they share an edge or a corner).
The length of a clear path is the number of visited cells of this path.
class Solution:
def shortestPathBinaryMatrix(self, grid: List[List[int]]) -> int:
def valid(row, col):
return 0 <= row < n and 0 <= col < n and grid[row][col] == 0
dirs = [(1,0),(1,1),(1,-1),(-1,0),(-1,1),(-1,-1),(0,-1),(0,1)]
n = len(grid)
seen = {(0,0)}
if grid[0][0] != 0 or grid[n-1][n-1] != 0:
return -1
queue = deque([(0,0,1)])
while queue:
row, col, path_length = queue.popleft()
if row == n - 1 and col == n - 1:
return path_length
for dr, dc in dirs:
next_row, next_col = row + dr, col + dc
next_node = (next_row, next_col)
if valid(*next_node) and next_node not in seen:
queue.append((*next_node, path_length + 1))
seen.add(next_node)
return -1
It's fairly conventional in BFS solutions for graphs to encode with each node additional information like the current level for that node or some other kind of stateful data. We do not need to encode anything other than each node's position in the seen
set because whenever we encounter a node it will be in the fewest steps possible (i.e., the trademark of BFS solutions ... finding shortest paths).
LC 863. All Nodes Distance K in Binary Tree (✓)
We are given a binary tree (with root node root
), a target
node, and an integer value K
.
Return a list of the values of all nodes that have a distance K
from the target
node. The answer can be returned in any order.
class Solution:
def distanceK(self, root: TreeNode, target: TreeNode, k: int) -> List[int]:
def build_parent_lookup(node, parent = None):
if not node:
return
parent_lookup[node] = parent
build_parent_lookup(node.left, node)
build_parent_lookup(node.right, node)
parent_lookup = dict()
build_parent_lookup(root)
seen = {target}
queue = deque([target])
for _ in range(k):
num_nodes_this_level = len(queue)
for _ in range(num_nodes_this_level):
node = queue.popleft()
for neighbor in [node.left, node.right, parent_lookup[node]]:
if neighbor and neighbor not in seen:
seen.add(neighbor)
queue.append(neighbor)
return [ node.val for node in queue ]
The key in the solution above is to recognize that a BFS traversal will give us exactly what we want if we have some way to reference each node's parent node. The build_parent_lookup
function, which uses DFS to build a hashmap lookup for each node's parent node, gives us this. Much of the rest of the problem then becomes a standard BFS traversal.
LC 542. 01 Matrix (✓)
Given a matrix consists of 0
and 1
, find the distance of the nearest 0
for each cell.
The distance between two adjacent cells is 1
.
class Solution:
def updateMatrix(self, mat: List[List[int]]) -> List[List[int]]:
def valid(row, col):
return 0 <= row < m and 0 <= col < n and mat[row][col] == 1
m = len(mat)
n = len(mat[0])
res = [[0] * n for _ in range(m)]
dirs = [(-1,0),(1,0),(0,1),(0,-1)]
seen = set()
queue = deque()
for i in range(m):
for j in range(n):
if mat[i][j] == 0:
seen.add((i, j))
queue.append((i, j, 0))
while queue:
row, col, dist = queue.popleft()
for dr, dc in dirs:
next_row, next_col = row + dr, col + dc
next_node = (next_row, next_col)
if valid(*next_node) and next_node not in seen:
res[next_row][next_col] = dist + 1
seen.add(next_node)
queue.append((*next_node, dist + 1))
return res
The solution above takes advantage of a so-called multi-source BFS (i.e., a BFS traversal with multiple starting sources). The solution also takes advantage of the fact that cells with a value of 0
do not need to be updated; hence, our BFS can start from all nodes with a value of 0
and explore outwards, updating non-zero nodes (i.e., just nodes with value 1
for this problem) with the distance so far from a node with value 0
. Our intuition tells us to start from the nodes with value 1
, but it's much easier to start from the nodes with value 0
.
Additionally, in the valid
function above, the condition and mat[row][col] == 1
is not necessary since all nodes with value 0
are added to the seen
set before exploring outwards, which means all subsequent nodes we'll explore that are both valid and not in seen
will have a non-zero value (i.e., 1
for this problem). This conditional of the valid
function is only kept for the sake of clarity, but it's worth noting that it's not necessary here.
LC 1293. Shortest Path in a Grid with Obstacles Elimination (✓) ★★
Given a m * n
grid, where each cell is either 0
(empty) or 1
(obstacle). In one step, you can move up, down, left or right from and to an empty cell.
Return the minimum number of steps to walk from the upper left corner (0, 0)
to the lower right corner (m-1, n-1)
given that you can eliminate at most k
obstacles. If it is not possible to find such walk return -1
.
class Solution:
def shortestPath(self, grid: List[List[int]], k: int) -> int:
# ensures the next node to be visited is in bounds
def valid(row, col):
return 0 <= row < m and 0 <= col < n
m = len(grid)
n = len(grid[0])
dirs = [(1,0),(-1,0),(0,1),(0,-1)]
seen = {(0,0,k)}
queue = deque([(0,0,k,0)])
while queue:
row, col, rem, steps = queue.popleft()
# only valid nodes exist in the queue
if row == m - 1 and col == n - 1:
return steps
for dr, dc in dirs:
next_row, next_col = row + dr, col + dc
next_node = (next_row, next_col)
if valid(*next_node):
next_val = grid[next_row][next_col]
# if the next value is not an obstacle, then proceed with visits as normal
if next_val == 0:
if (*next_node, rem) not in seen:
seen.add((*next_node, rem))
queue.append((*next_node, rem, steps + 1))
# the next value is an obstacle: can we still remove obstacles? if so, proceed with visits
else:
if rem > 0 and (*next_node, rem - 1) not in seen:
seen.add((*next_node, rem - 1))
queue.append((*next_node, rem - 1, steps + 1))
return -1
This is an excellent problem for thinking through how a node's state should be recorded in the seen
set; that is, the majority of BFS and DFS traversals on matrix graphs simply record a node's position (i.e., row and column) because the node's position fully describes the state we do not want to visit again. But for some problems, like this one, it's helpful to record more information than just a node's position. Specifically, the state we do not want to visit more than once is a node's position in addition to the number of remaining obstacles we can move.
Thinking about using the seen
set to record states we do not want to visit multiple times is much more accurate and reflective of our actual goal — only perform computation when absolutely necessary.
LC 1129. Shortest Path with Alternating Colors (✓) ★★
Consider a directed graph, with nodes labelled 0, 1, ..., n-1
. In this graph, each edge is either red or blue, and there could be self-edges or parallel edges.
Each [i, j]
in red_edges
denotes a red directed edge from node i
to node j
. Similarly, each [i, j]
in blue_edges
denotes a blue directed edge from node i
to node j
.
Return an array answer
of length n
, where each answer[X]
is the length of the shortest path from node 0
to node X
such that the edge colors alternate along the path (or -1
if such a path doesn't exist).
class Solution:
def shortestAlternatingPaths(self, n: int, redEdges: List[List[int]], blueEdges: List[List[int]]) -> List[int]:
def build_graph(edge_arr):
graph = defaultdict(list)
for node, neighbor in edge_arr:
graph[node].append(neighbor)
return graph
RED_GRAPH = build_graph(redEdges)
BLUE_GRAPH = build_graph(blueEdges)
RED = 1
BLUE = 0
ans = [float('inf')] * n
seen = {(0,RED), (0,BLUE)}
queue = deque([(0,RED,0),(0,BLUE,0)])
while queue:
node, color, steps = queue.popleft()
ans[node] = min(ans[node], steps)
alt_color = 1 - color
graph = RED_GRAPH if alt_color == 1 else BLUE_GRAPH
for neighbor in graph[node]:
if (neighbor, alt_color) not in seen:
seen.add((neighbor, alt_color))
queue.append((neighbor, alt_color, steps + 1))
return [ val if val != float('inf') else -1 for val in ans ]
This is such a great problem in so many ways. The idea is to execute a "semi-multi-source" BFS, where we start with node 0
as if it's red as well as if it's blue. Then we expand outwards.
We also take advantage of a nice numerical trick: 1 - 1 == 0
, and 1 - 0 == 1
. This allows us to effectively (and efficiently) alternate between colors.
LC 1926. Nearest Exit from Entrance in Maze (✓)
You are given an m x n
matrix maze (0-indexed) with empty cells (represented as '.'
) and walls (represented as '+'
). You are also given the entrance
of the maze, where entrance = [entrancerow, entrancecol]
denotes the row and column of the cell you are initially standing at.
In one step, you can move one cell up, down, left, or right. You cannot step into a cell with a wall, and you cannot step outside the maze. Your goal is to find the nearest exit from the entrance
. An exit is defined as an empty cell that is at the border of the maze
. The entrance
does not count as an exit.
Return the number of steps in the shortest path from the entrance
to the nearest exit, or -1
if no such path exists.
class Solution:
def nearestExit(self, maze: List[List[str]], entrance: List[int]) -> int:
def valid(row, col):
return 0 <= row < m and 0 <= col < n and maze[row][col] == '.'
m = len(maze)
n = len(maze[0])
dirs = [(1,0),(-1,0),(0,1),(0,-1)]
start_node = tuple(entrance)
seen = {start_node}
queue = deque([(*start_node, 0)])
exit_rows = {0, m - 1}
exit_cols = {0, n - 1}
while queue:
row, col, moves = queue.popleft()
if row in exit_rows or col in exit_cols:
if row != start_node[0] or col != start_node[1]:
return moves
for dr, dc in dirs:
next_row, next_col = row + dr, col + dc
next_node = (next_row, next_col)
if valid(*next_node) and next_node not in seen:
seen.add(next_node)
queue.append((*next_node, moves + 1))
return -1
There are several variables to initialize before the proper traversal and that is okay.
LC 909. Snakes and Ladders (✓)
On an N x N
board
, the numbers from 1
to N*N
are written boustrophedonically starting from the bottom left of the board, and alternating direction each row. For example, for a 6 x 6 board, the numbers are written as follows:
You start on square 1
of the board (which is always in the last row and first column). Each move, starting from square x
, consists of the following:
- You choose a destination square
S
with numberx+1
,x+2
,x+3
,x+4
,x+5
, orx+6
, provided this number is<= N*N
.
- (This choice simulates the result of a standard 6-sided die roll: ie., there are always at most 6 destinations, regardless of the size of the board.)
- If
S
has a snake or ladder, you move to the destination of that snake or ladder. Otherwise, you move toS
.
A board square on row r
and column c
has a "snake or ladder" if board[r][c] != -1
. The destination of that snake or ladder is board[r][c]
.
Note that you only take a snake or ladder at most once per move: if the destination to a snake or ladder is the start of another snake or ladder, you do not continue moving. (For example, if the board is [[4,-1],[-1,3]]
, and on the first move your destination square is 2
, then you finish your first move at 3
, because you do not continue moving to 4
.)
Return the least number of moves required to reach square N*N
. If it is not possible, return -1
.
class Solution:
def snakesAndLadders(self, board: List[List[int]]) -> int:
def valid(row, col):
return 0 <= row < n and 0 <= col < n
def label_to_cell(num):
row = (num - 1) // n
col = (num - 1) % n
if row % 2 == 1:
col = (n - 1) - col
return [(n - 1) - row, col]
n = len(board)
seen = {1}
queue = deque([(1,0)])
while queue:
label, moves = queue.popleft()
if label == n ** 2:
return moves
for next_label in range(label + 1, min(label + 6, n ** 2) + 1):
row, col = label_to_cell(next_label)
if valid(row, col) and next_label not in seen:
seen.add(next_label)
if board[row][col] != -1:
queue.append((board[row][col], moves + 1))
else:
queue.append((next_label, moves + 1))
return -1
This is a rough one, mostly because of the convoluted problem description. The trickiest part is coming up with a performant way of converting from label to cell:
def label_to_cell(num):
row = (num - 1) // n
col = (num - 1) % n
if row % 2 == 1:
col = (n - 1) - col
return [(n - 1) - row, col]
Coming up with the function above takes some patience and experimentation at first. Another challenge is the mental change from generally recording in the seen
set the position of the node we're processing; in this case, we only care about what node labels we've previously seen. As we explore neighbors, we add each label to the seen
set, but the items we add to the queue will depend on whether or not we've encounter a snake or ladder.
LC 1376. Time Needed to Inform All Employees (✓) ★
A company has n
employees with a unique ID for each employee from 0
to n - 1
. The head of the company is the one with headID
.
Each employee has one direct manager given in the manager
array where manager[i]
is the direct manager of the i
th employee, manager[headID] = -1
. Also, it is guaranteed that the subordination relationships have a tree structure.
The head of the company wants to inform all the company employees of an urgent piece of news. He will inform his direct subordinates, and they will inform their subordinates, and so on until all employees know about the urgent news.
The i
th employee needs informTime[i]
minutes to inform all of his direct subordinates (i.e., After informTime[i]
minutes, all his direct subordinates can start spreading the news).
Return the number of minutes needed to inform all the employees about the urgent news.
class Solution:
def numOfMinutes(self, n: int, headID: int, manager: List[int], informTime: List[int]) -> int:
# treat the graph as directed (from managers to subordinates)
def build_adj_list(manager_arr):
graph = defaultdict(list)
for emp in range(n):
graph[manager_arr[emp]].append(emp)
del graph[-1]
return graph
graph = build_adj_list(manager)
minutes_needed = informTime[headID]
seen = {headID}
queue = deque([(headID, 0)]) # unique base case
while queue:
manager, time_so_far = queue.popleft()
for subordinate in graph[manager]:
minutes_needed = max(minutes_needed, time_so_far + informTime[manager])
seen.add(subordinate)
queue.append((subordinate, time_so_far + informTime[manager]))
return minutes_needed
This seems like a mostly natural BFS problem, but one of the tricks involves effectively handling how queue
is initialized. It's tempting to do queue = deque([(headID, informTime[headID])])
, but this would be wrong when subordinates exist because we almost certainly end up overcounting (this is because time_so_far + informTime[manager]
is the time so far for each subordinate node of a manager).
Sometimes BFS problems can be tricky because of how we handle queue initialization. This is definitely one of those problems.
LC 994. Rotting Oranges (✓) ★★
You are given an m x n
grid
where each cell can have one of three values:
0
representing an empty cell,1
representing a fresh orange, or2
representing a rotten orange.
Every minute, any fresh orange that is 4-directionally adjacent to a rotten orange becomes rotten.
Return the minimum number of minutes that must elapse until no cell has a fresh orange. If this is impossible, return -1
.
class Solution:
def orangesRotting(self, grid: List[List[int]]) -> int:
def valid(row, col):
return 0 <= row < m and 0 <= col < n and grid[row][col] == 1
def find_fresh_and_rotten(mat):
fresh = 0
rotten = set()
for i in range(m):
for j in range(n):
if mat[i][j] == 1:
fresh += 1
elif mat[i][j] == 2:
rotten.add((i,j))
return fresh, rotten
m = len(grid)
n = len(grid[0])
dirs = [(1,0),(-1,0),(0,1),(0,-1)]
fresh, seen = find_fresh_and_rotten(grid)
if fresh == 0:
return 0
queue = deque([(*rotten, 0) for rotten in seen])
while queue:
row, col, mins = queue.popleft()
for dr, dc in dirs:
next_row, next_col = row + dr, col + dc
if valid(next_row, next_col) and (next_row, next_col) not in seen:
seen.add((next_row, next_col))
fresh -= 1
if fresh == 0:
return mins + 1
queue.append((next_row, next_col, mins + 1))
return -1
The solution editorial makes it clear there are several ways of going about solving this problem. The approach above is arguably more straightforward than the solutions provided in the editorial though.
The key idea is to pre-process the grid in order to find the total number of fresh oranges as well as where the rotten oranges are located (if no fresh oranges are found, then immediately return 0
) — we then execute a mult-source BFS from each rotten orange, and we keep track of how many fresh oranges remain as each cell is processed (if the counter ever reaches 0
, then we immediately return the number of minutes required at that point).
DFS (recursive)
Remarks
Assume the nodes are numbered from 0
to n - 1
and the graph is given as an adjacency list. Depending on the problem, you may need to convert the input into an equivalent adjacency list before using the templates.
Basic motivation, concepts, and considerations for DFS
The following observations are largely derived from this video. The underlying graph referred to for the rest of this note is assumed to be represented as an adjacency list of index arrays.
If you're looking at a graph from above, then breadth first search is pretty intuitive. It expands from your search vertex like a wave in a pond. But lots of times you're not looking at a graph from above — you're looking at it from within (e.g., you're exploring a maze, an area in a video game, etc.).
The simple implementation for DFS is somewhat similar to BFS:
ResetGraph(G)
for v in V
v.discovered = False
DFSVertex(u)
u.discovered = True
for each v such that u -> v
if (not v.discovered)
DFSVertex(v)
DFS(G, startVertex)
ResetGraph(G)
DFSVertex(startVertex)
We start from a single vertex and then go to adjacent vertices. If the next vertex has several adjacent vertices, it doesn't matter — we just choose one of them and keep exploring, ignoring other vertices we could have explored along the way. We do this by making recursive calls to DFSVertex for each new vertex.
Like in BFS, if you think about which vertex discovers another, that implies a "depth first search tree", and we can similarly store parent values in a variable:
ResetGraph(G)
for v in V
v.pi = nil
v.discovered = False
DFSVertex(u)
u.discovered = True
for each v such that u -> v
if (not v.discovered)
v.pi = u
DFSVertex(v)
DFS(G, startVertex)
ResetGraph(G)
DFSVertex(startVertex)
Similar to BFS, in DFS we can reconstruct paths from the root of the tree — but the paths are less interesting here because they might have more than the minimum number of links. So instead of counting links, we track when each vertex was discovered (i.e., reached) and finished (i.e., fully explored) with time stamps. Of course, we don't need actual time stamps — just a relative order will do:
ResetGraph(G)
for v in V
v.pi = nil
v.discovered = -1
v.finished = -1
time = 1
DFSVertex(u)
u.discovered = time++
for each v such that u -> v
if (v.discovered < 0)
v.pi = u
DFSVertex(v)
u.finished = time++
DFS(G, startVertex)
ResetGraph(G)
DFSVertex(startVertex)
CLRS pseudocode for DFS
It's probably worth reproducing the pseudocode from [21] to serve as a comparison point. The use of colors is similar to what was used for BFS:
- White: Undiscovered
- Gray: Discovered but not yet finished (i.e., it is actively being explored)
- Black: Finished (i.e., it is done being explored)
With the above in mind, here is the DFS algorithm as it appears in CLRS:
DFS(G)
for each vertex u in G.V
u.color = White
u.pi = nil
time = 0
for each vertex u in G.V
if u.color == White
DFSVisit(G, u)
DFSVisit(G, u)
time = time + 1 # white vertex has just been discovered
u.d = time
u.color = Gray
for each vertex v in G.Adj[u] # explore each edge (u, v)
if v.color == White
v.pi = u
DFSVisit(G, v)
time = time + 1
u.f = time
u.color = Black # blacken u; it is finished
Why would we want to track time stamps when executing a DFS? It turns out that DFS time stamps will be useful for other tasks like topological sorting. Additionally, for some algorithms, you want to run DFS from one vertex while for others you want to run it on the whole graph.
To do that, after initializing the graph once (i.e.,m with ResetGraph
), run DFS on each previously undiscovered vertex; then, instead of getting a DFS tree rooted at one vertex (i.e., startVertex
) that only includes vertices reachable from that vertex, we get a forest that will always contain all the graph's vertices.
In terms of our earlier pseudocode for the full implementation, this means changing the block
DFS(G, startVertex)
ResetGraph(G)
DFSVertex(startVertex)
to the following:
DFS(G)
ResetGraph(G)
for u in V
if (u.discovered < 0)
DFSVertex(u)
We then get the following as our updated full implementation:
ResetGraph(G)
for v in V
v.pi = nil
v.discovered = -1
v.finished = -1
time = 1
DFSVertex(u)
u.discovered = time++
for each v such that u -> v
if (v.discovered < 0)
v.pi = u
DFSVertex(v)
u.finished = time++
DFS(G)
ResetGraph(G)
for u in V
if (u.discovered < 0)
DFSVertex(u)
It's a useful exercise to consider an example graph and work out how everything above is represented and calculated.
Worked example
Let's consider an example of the algorithm above in action on the following graph:
We'll execute top-level searches on vertices in alphabetical order. Doing so yields the following completely searched graph, as illustrated in the linked video at the top of this note:
As can be seen above, A
, E
, and I
are the roots of their own depth first search trees in this depth first search forest.
To reproduce the final figure above using code, we can first represent the graph by mapping vertices A
through I
to 0
through 8
, inclusive. It also helps to define a lookup table for ease of reference:
graph = [
[3],
[0, 6],
[1, 3, 6],
[1, 5, 6],
[7],
[2],
[],
[3, 4, 5],
[4],
]
lookup = {
-1: ' ',
0: 'A',
1: 'B',
2: 'C',
3: 'D',
4: 'E',
5: 'F',
6: 'G',
7: 'H',
8: 'I',
}
Now we can write the dfs
function to execute a DFS on the entire graph — the inner visit
function is where the DFSVertex
method is implemented (a reformatting of the output is included at the bottom of the code):
def dfs(graph):
n = len(graph)
discovered = [-1] * n
finished = [-1] * n
pred = [-1] * n
time = 0
def visit(node):
nonlocal time
time += 1
discovered[node] = time
for nbr in graph[node]:
if discovered[nbr] < 0:
pred[nbr] = node
visit(nbr)
time += 1
finished[node] = time
for node in range(n):
if discovered[node] < 0:
visit(node)
return discovered, finished, [ lookup[parent] for parent in pred ]
print(dfs(graph))
"""
( A B C D E F G H I
[ 1, 3, 8, 2, 13, 7, 4, 14, 17], # discovered times
[12, 6, 9, 11, 16, 10, 5, 15, 18], # finished times
[ , D, F, A, , D, B, E, ] # pi values
)
"""
The next two aspects of DFS we will consider, namely parenthesis notation and edge classification are remarked on in greater detail in [21]. The relevant snippets from CLRS are reproduced below (it would likely be helpful to look at these snippets before looking at these topics in the context of the example we've been working through).
Parenthesis theorem (CLRS)
Depth-first search yields valuable information about the structure of a graph. Perhaps the most basic property of depth-first search is that the predecessor subgraph does indeed form a forest of trees, since the structure of the depth-first trees exactly mirrors the structure of recursive calls of DFS-VISIT
. That is, if and only if was called during a search of 's adjacency list. Additionally, vertex is a descendant of vertex in the depth-first forest if and only if is discovered during the time in which is gray.
Another important property of depth-first search is that discovery and finish times have parenthesis structure. If the DFS-VISIT
procedure were to print a left parenthesis "" when it discovers vertex and to print a right parenthesis "" when it finishes , then the printed expression would be well formed in the sense that the parentheses are properly nested.
The following theorem provides another way to characterize the parenthesis structure.
In any depth-first search of a (directed or undirected) graph , for any two vertices and , exactly one of the following three conditions holds:
- the intervals and are entirely disjoint, and neither nor is a descendant of the other in the depth-first forest,
- the interval is contained entirely within the interval , and is a descendant of in a depth-first tree, or
- the interval is contained entirely within the interval , and is a descendant of in a depth-first tree.
Edge classification (CLRS)
You can obtain important information about a graph by classifying its edges during a depth-first search. For example, Section 20.4 will show that a directed graph is acyclic if and only if a depth-first search yields no "back" edges (Lemma 20.11).
The depth-first forest produced by a depth-first search on graph can contain four types of edges
- Tree edges are edges in the depth-first forest . Edge is a tree edge if was first discovered by exploring edge .
- Back edges are those edges connecting a vertex to an ancestor in a depth-first tree. We consider self-loops, which may occur in directed graphs, to be back edges.
- Forward edges are those nontree edges connecting a vertex to a proper descendant in a depth-first tree.
- Cross edges are all other edges. They can go between vertices in the same depth-first tree, as long as one vertex is not an ancestor of the other, or they can go between vertices in different depth-first trees.
The DFS algorithm has enough information to classify some edges as it encounters them. The key idea is that when an edge is first explored, the color of vertex says something about the edge:
- WHITE indicates a tree edge,
- GRAY indicates a back edge, and
- BLACK indicates a forward or cross edge.
The first case is immediate from the specification of the algorithm. For the second case, observe that the gray vertices always form a linear chain of descendants corresponding to the stack of active DFS-VISIT
invocations. The number of gray vertices is 1 more than the depth in the depth-first forest of the vertex most recently discovered. Depth-first search always explores from the deepest gray vertex, so that an edge that reaches another gray vertex has reached an ancestor. The third case handles the remaining possibility.
The entire depth first search from the worked example above can be summarized nicely using parentheses:
An opening parentheses stands for when the depth first search call is made on a vertex, and the closed parentheses stands for when that call exits:
The parentheses are properly nested because the inner recursive calls must complete before the code that called them. A child will always be nested in its parent, and a vertex is only the child of at most one vertex, the one that discovered it.
If we just count the parentheses from the beginning, they will match the discovery and finish times:
Recall the fully explored graph for reference and comparison:
In CLRS they do two more things. First, they color vertices in the following manner:
- White: Undiscovered
- Gray: Discovered (but unifinished)
- Black: Finished
Second, they classify edges:
- Tree edges (parent to a child): go to an undiscovered vertex
- Back edges (to an ancestor): to a discovered but unfinished vertex (creates a cycle). During DFS, every back edge completes a cycle. Removing back edges from a graph would remove all cycles.
- Forward edges (to a non-child descendant): to a finished vertex discovered after the current vertex. A forward edge is basically an edge that goes to an indirect descendant, not a direct child. For the graph we've been considering, the edge from
D
toG
is a forward edge. How can we tell? BecauseG
is finished but its discovery time is after that of the current node,D
. The vertexG
was discovered and explored during the lifetime ofG
— it's a descendant ofD
but not a direct descendant. NodeB
is the direct descendant ofD
; nodeG
is the direct descendant ofB
; and nodeG
is an indirect descendant ofD
. - Cross edges (everything else): to a vertex finished before the current vertex's discovery. It's essentially any edge that's not captured in the edge classifications above. It can go from one branch of a tree to another or even from one tree to another. And there isn't any ancestor or descendant relation between the vertices it links. You can tell because it leads to a vertex that finished before the current vertex was discovered. Its parentheses don't overlap.
If we color our example graph so that tree edges are red, back edges are black, forward edges are blue, and cross edges are green, then we end up with the following:
For undirected graphs, we end up seeing each edge twice, once from each vertex. If we classify the edge the first time we see it, then there won't be any forward or cross edges, only tree and back edges.
Unlike breadth first search, if a graph is more connected, then its depth first search trees tend to be taller and more vine-like. If vertices have lots of outgoing edges, then you can keep finding new vertices to explore, and the tree depth can get large. This brings us to a possible implementation hiccup for DFS — for each recursive call, we push variables onto our program's call stack. Different languages have different limits to how deep the call stack can be. A graph with 20,000 vertices might try to make 20,000 nested recursive DFS calls, which can give you a program stack overflow error. To avoid this, we can use our own stack instead of the implicit call stack with recursion.
We can do this in a few ways, but here is one possible manner:
ResetGraph(G)
for v in V
v.pi = nil
v.discovered = -1
v.finished = -1
time = 1
ExploreVertex(u, S)
if (u.discovered < 0)
u.discovered = time++
S.push(u)
for each v such that u -> v
S.push(u -> v)
else if (u.finished < 0)
u.finished = time++
# else: ignore, top level search of finished vertex
ExploreEdge(u -> v, S)
if (v.discovered < 0)
(u -> v).label = "treeEdge"
v.pi = u
ExploreVertex(v, S)
else if (v.finished < 0)
(u -> v).label = "backEdge
else if (v.discovered > u.discovered)
(u -> v).label = "forwardEdge"
else
(u -> v).label = "crossEdge"
DFS(G)
ResetGraph(G)
S = new Stack()
for u in V
S.push(u)
while (not S.isEmpty())
x = S.pop()
if (x.isVertex())
ExploreVertex(x, S)
else
ExploreEdge(x, S)
Let's go through how the pseudocode above is meant to function:
- Line
32
: Here, we make a stack where we can push edges and vertices. - Lines
33
-36
: Push all vertices on to the stack and pop while the stack isn't empty. - Lines
9-13
: If we pop an undiscovered vertex, then discover it! Push it again to finish it later, and push its outgoing edges. When we pop it again later, after it's discovered (line14
), then finish it (line15
). And if we pop an already finished vertex, then just ignore it (line16
). That's the equivalent of looping through all vertices but only running depth first search on the undiscovered ones. - Lines
19
-28
: When we pop an edge from the stack, if it leads to an undiscovered vertex, then it's a tree edge and label it as so (line20
) and explore that vertex (line22
); otherwise, just label the edge and we're done with it (lines23
-28
).
In the two lines where we push either all vertices (line 34
) or all edges from a vertex (line 13
), if we push them in the opposite order that you would normally loop through them in the recursive version of the algorithm, then the version above will give us the same results, same times, edge classifications, etc. It will all be the same as the recursive version. The code above doesn't look quite as clean, but it's a nice parallel to see that while breadth first search explicitly uses a first in first out queue of vertices, depth first search can explicitly use a stack of vertices and edges instead of just implicitly using the program call stack.
If we implement all of the pseudocode above for the iterative version, then we will end up with something like the following (the blocks of code have been highlighted where ordering has intentionally been reversed to ensure the same results as the recursive version):
def dfs_iter(graph):
n = len(graph)
edge_classifications = dict()
discovered = [-1] * n
finished = [-1] * n
pred = [-1] * n
time = 0
def explore_vertex(node):
nonlocal time
if discovered[node] < 0:
time += 1
discovered[node] = time
stack.append(node)
for i in range(len(graph[node]) - 1, -1, -1):
nbr = graph[node][i]
stack.append((node, nbr))
elif finished[node] < 0:
time += 1
finished[node] = time
def explore_edge(edge):
node, nbr = edge
if discovered[nbr] < 0:
edge_classifications[edge] = 'treeEdge'
pred[nbr] = node
explore_vertex(nbr)
elif finished[nbr] < 0:
edge_classifications[edge] = 'backEdge'
elif discovered[nbr] > discovered[node]:
edge_classifications[edge] = 'forwardEdge'
else:
edge_classifications[edge] = 'crossEdge'
stack = []
for node in range(n - 1, -1, -1):
stack.append(node)
while stack:
x = stack.pop()
if not isinstance(x, tuple):
explore_vertex(x)
else:
explore_edge(x)
return discovered, finished, pred, { (lookup[edge[0]], lookup[edge[1]]): edge_classifications[edge] for edge in edge_classifications }
The outcome, formatted manually for the sake of clarity, matches exactly what was produced previously using the recursive approach (the edge classification also matches what we would expect):
"""
( A B C D E F G H I
[ 1, 3, 8, 2, 13, 7, 4, 14, 17], # discovered
[12, 6, 9, 11, 16, 10, 5, 15, 18], # finished
[-1, 3, 5, 0, -1, 3, 1, 4, -1], # predecessors
{ # edge classifications
('A', 'D'): 'treeEdge',
('D', 'B'): 'treeEdge',
('B', 'A'): 'backEdge',
('B', 'G'): 'treeEdge',
('D', 'F'): 'treeEdge',
('F', 'C'): 'treeEdge',
('C', 'B'): 'crossEdge',
('C', 'D'): 'backEdge',
('C', 'G'): 'crossEdge',
('D', 'G'): 'forwardEdge',
('E', 'H'): 'treeEdge',
('H', 'D'): 'crossEdge',
('H', 'E'): 'backEdge',
('H', 'F'): 'crossEdge',
('I', 'E'): 'crossEdge'
}
)
"""
Why is the order reversal in the iterative version important in order to ensure the same results as in the recursive version? Because stacks are LIFO data structures — if we push elements onto the stack in the same order as we would process them recursively, then they will be popped off in reverse order. Consider the worked example we've been dealing with throughout this entire note. If we pushed the vertices A
through I
onto the stack in that order, then the first vertex popped off would be F
, not A
. But that's not the desired result! Hence, we push vertices I
, H
, ... , B
, A
onto the stack in that order so they are popped in the order A
, B
, ... , H
, I
, much as the order they are processed in the recursive version. Similarly, the order in which neighboring vertices are processed needs to be reversed as well; that is, we need to push neighbors onto the stack in reverse order — this ensures that when they are popped off the stack, they are processed in the original order.
Consider a generalized example to fully clarify things:
- Adjacency list: Let's say
u
has neighbors[v1, v2, v3]
. - Recursive DFS: Processes
v1
, thenv2
, thenv3
. - Iterative DFS (without reversal):
- Push
v1
,v2
,v3
onto the stack. - Pop and process
v3
,v2
,v1
(reverse order).
- Push
- Iterative DFS (with reversal):
-
Push
v3
,v2
,v1
onto the stack. -
Pop and process
v1
,v2
,v3
(original order).
-
def fn(graph):
def dfs(node):
ans = 0
# do some logic
for neighbor in graph[node]:
if neighbor not in seen:
seen.add(neighbor)
ans += dfs(neighbor)
return ans
seen = {START_NODE}
return dfs(START_NODE)
Examples
LC 547. Number of Provinces (✓)
There are n
cities. Some of them are connected, while some are not. If city a
is connected directly with city b
, and city b
is connected directly with city c
, then city a
is connected indirectly with city c
.
A province is a group of directly or indirectly connected cities and no other cities outside of the group.
You are given an n x n
matrix isConnected
where isConnected[i][j] = 1
if the ith
city and the jth
city are directly connected, and isConnected[i][j] = 0
otherwise.
Return the total number of provinces.
class Solution:
def findCircleNum(self, isConnected: List[List[int]]) -> int:
def build_adj_list(adj_mat):
graph = defaultdict(list)
n = len(adj_mat)
for node in range(n):
for neighbor in range(node + 1, n):
if isConnected[node][neighbor] == 1:
graph[node].append(neighbor)
graph[neighbor].append(node)
return graph
def dfs(node):
for neighbor in graph[node]:
if neighbor not in seen:
seen.add(neighbor)
dfs(neighbor)
graph = build_adj_list(isConnected)
seen = set()
provinces = 0
for city in range(len(isConnected)):
if city not in seen:
provinces += 1
seen.add(city)
dfs(city)
return provinces
Cities are nodes, connected cities are provinces (i.e., connected components). The idea here is to explore all provinces by starting with each city and seeing how many cities we can explore from that city — every time we have to start a search again from a new city, we increment the number of overall provinces encountered thus far.
LC 200. Number of Islands (✓)
Given an m x n
2D binary grid grid
which represents a map of '1'
s (land) and '0'
s (water), return the number of islands.
An island is surrounded by water and is formed by connecting adjacent lands horizontally or vertically. You may assume all four edges of the grid are all surrounded by water.
class Solution:
def numIslands(self, grid: List[List[str]]) -> int:
def valid(row, col):
return 0 <= row < m and 0 <= col < n and grid[row][col] == '1'
def dfs(row, col):
for dr, dc in dirs:
next_row, next_col = row + dr, col + dc
neighbor = (next_row, next_col)
if neighbor not in seen and valid(*neighbor):
seen.add(neighbor)
dfs(*neighbor)
m = len(grid)
n = len(grid[0])
dirs = [(-1,0),(1,0),(0,-1),(0,1)]
seen = set()
islands = 0
for i in range(m):
for j in range(n):
node = (i, j)
if node not in seen and grid[i][j] == '1':
islands += 1
seen.add(node)
dfs(*node)
return islands
Each "island" is a connected component — our job is to count the total number of connected components. The traversal is a fairly standard DFS traversal on a grid-like graph.
LC 1466. Reorder Routes to Make All Paths Lead to the City Zero (✓)
There are n
cities numbered from 0
to n-1
and n-1
roads such that there is only one way to travel between two different cities (this network form a tree). Last year, The ministry of transport decided to orient the roads in one direction because they are too narrow.
Roads are represented by connections
where connections[i] = [a, b]
represents a road from city a
to b
.
This year, there will be a big event in the capital (city 0
), and many people want to travel to this city.
Your task consists of reorienting some roads such that each city can visit the city 0
. Return the minimum number of edges changed.
It's guaranteed that each city can reach the city 0
after reorder.
class Solution:
def minReorder(self, n: int, connections: List[List[int]]) -> int:
roads = set()
def build_adj_list(edge_arr):
graph = defaultdict(list)
for node, neighbor in edge_arr:
roads.add((node, neighbor))
graph[node].append(neighbor)
graph[neighbor].append(node)
return graph
def dfs(node):
ans = 0
for neighbor in graph[node]:
if neighbor not in seen:
seen.add(neighbor)
if (node, neighbor) in roads:
ans += 1
ans += dfs(neighbor)
return ans
graph = build_adj_list(connections)
seen = {0}
return dfs(0)
This is a tough one to come up with if you haven't seen it before. The solution approach above is quite clever. The idea is to build an undirected graph in the form of an adjacency list and then to conduct a DFS from node 0
, which means every edge we encounter is necessarily leading away from 0
; hence, if that edge appeared in the original road configuration, roads
, then we know that road's direction must be changed so that it faces toward node 0
instead of away.
LC 841. Keys and Rooms (✓)
There are N
rooms and you start in room 0
. Each room has a distinct number in 0, 1, 2, ..., N-1
, and each room may have some keys to access the next room.
Formally, each room i
has a list of keys rooms[i]
, and each key rooms[i][j]
is an integer in [0, 1, ..., N-1]
where N = rooms.length
. A key rooms[i][j] = v
opens the room with number v
.
Initially, all the rooms start locked (except for room 0
).
You can walk back and forth between rooms freely.
Return true
if and only if you can enter every room.
class Solution:
def canVisitAllRooms(self, rooms: List[List[int]]) -> bool:
def dfs(node):
for neighbor in rooms[node]:
if neighbor not in seen:
seen.add(neighbor)
dfs(neighbor)
seen = {0}
dfs(0)
return len(seen) == len(rooms)
It's quite nice that the given input, rooms
, is already in the form of an adjacency list (as an index array). The key insight is to realize that we can use our seen
set to determine whether or not all rooms have been visited after conducting a DFS from node 0
(i.e., room 0
); that is, if seen
is the same length as rooms
after the DFS from node 0
, then we can say it's possible to visit all rooms (otherwise it's not).
LC 1971. Find if Path Exists in Graph (✓)
There is a bi-directional graph with n
vertices, where each vertex is labeled from 0
to n - 1
(inclusive). The edges in the graph are represented as a 2D integer array edges
, where each edges[i] = [ui, vi]
denotes a bi-directional edge between vertex ui
and vertex vi
. Every vertex pair is connected by at most one edge, and no vertex has an edge to itself.
You want to determine if there is a valid path that exists from vertex start
to vertex end
.
Given edges
and the integers n
, start
, and end
, return true
if there is a valid path from start
to end
, or false
otherwise.
class Solution:
def validPath(self, n: int, edges: List[List[int]], source: int, destination: int) -> bool:
def build_adj_list(edge_arr):
graph = defaultdict(list)
for node, neighbor in edge_arr:
graph[node].append(neighbor)
graph[neighbor].append(node)
return graph
def dfs(node):
if node == destination:
return True
for neighbor in graph[node]:
if neighbor not in seen:
seen.add(neighbor)
if dfs(neighbor):
return True
return False
graph = build_adj_list(edges)
seen = {source}
return dfs(source)
The solution above is a direct application of a DFS traversal. The hardest part is arguably coming up with an effective way of writing the dfs
function. It's better not to rely on a nonlocal variable unless we really need to. The idea is that we should stop searching if we encounter a node whose value is equal to the destination. If that is not the case, then we try to explore further. If our DFS comes up empty, then we return False
, and that will propagate back up the recursion chain.
LC 323. Number of Connected Components in an Undirected Graph (✓)
You have a graph of n
nodes. You are given an integer n
and an array edges
where edges[i] = [ai, bi]
indicates that there is an edge between ai
and bi
in the graph.
Return the number of connected components in the graph.
class Solution:
def countComponents(self, n: int, edges: List[List[int]]) -> int:
def build_adj_list(edge_arr):
graph = defaultdict(list)
for node, neighbor in edge_arr:
graph[node].append(neighbor)
graph[neighbor].append(node)
return graph
def dfs(node):
for neighbor in graph[node]:
if neighbor not in seen:
seen.add(neighbor)
dfs(neighbor)
graph = build_adj_list(edges)
seen = set()
cc = 0
for node in range(n):
if node not in seen:
cc += 1
seen.add(node)
dfs(node)
return cc
Counting the number of connected components in a graph via DFS traversal is a very common task. Sometimes the nature of the connected components may be obfuscated at first (i.e., we have to come up with a way to first model the connections and then determine the number of connected components), but that is not the case here.
One thing worth noting in the solution above is how we deftly take care of the case where a node is not represented in the original edge list we're provided. We simply increment the count of the number of connected components, cc
, as soon as we encounter a node we have not seen before, and we do this for all n
nodes. For nodes that are not connected to any other nodes, are dfs
function effectively does not execute.
LC 695. Max Area of Island (✓)
Given a non-empty 2D array grid
of 0
's and 1
's, an island is a group of 1
's (representing land) connected 4-directionally (horizontal or vertical.) You may assume all four edges of the grid are surrounded by water.
Find the maximum area of an island in the given 2D array. (If there is no island, the maximum area is 0
.)
class Solution:
def maxAreaOfIsland(self, grid: List[List[int]]) -> int:
def valid(row, col):
return 0 <= row < m and 0 <= col < n and grid[row][col] == 1
def dfs(row, col):
connected_area = 0
for dr, dc in dirs:
next_row, next_col = row + dr, col + dc
next_node = (next_row, next_col)
if valid(*next_node) and next_node not in seen:
seen.add(next_node)
connected_area += 1 + dfs(*next_node)
return connected_area
m = len(grid)
n = len(grid[0])
dirs = [(-1,0),(1,0),(0,1),(0,-1)]
seen = set()
max_area = 0
for i in range(m):
for j in range(n):
if (i, j) not in seen and grid[i][j] == 1:
seen.add((i, j))
max_area = max(max_area, 1 + dfs(i, j))
return max_area
The idea here is to basically find the largest connected component.
LC 2368. Reachable Nodes With Restrictions (✓)
There is an undirected tree with n
nodes labeled from 0
to n - 1
and n - 1
edges.
You are given a 2D integer array edges
of length n - 1
where edges[i] = [ai, bi]
indicates that there is an edge between nodes ai
and bi
in the tree. You are also given an integer array restricted
which represents restricted nodes.
Return the maximum number of nodes you can reach from node 0
without visiting a restricted node.
Note that node 0
will not be a restricted node.
class Solution:
def reachableNodes(self, n: int, edges: List[List[int]], restricted: List[int]) -> int:
restricted = set(restricted)
def build_adj_list(edge_arr):
graph = defaultdict(list)
for node, neighbor in edge_arr:
if node not in restricted and neighbor not in restricted:
graph[node].append(neighbor)
graph[neighbor].append(node)
return graph
def dfs(node):
for neighbor in graph[node]:
if neighbor not in seen and neighbor not in restricted:
seen.add(neighbor)
dfs(neighbor)
graph = build_adj_list(edges)
seen = {0}
dfs(0)
return len(seen)
The idea behind the solution above is to start by ensuring our graph only has valid nodes. This means getting rid of all edges that contain one (or both) nodes from the restricted
list, which we start by "setifying" in order to make it possible to have lookups.
It's worth reflecting on why it behooves us to get rid of an edge when one of its nodes is from the restricted
set. If the node in the restricted is the source, then there's no way to get to its destination. If the restricted node is the destination, then we will not go there from the source. Whatever the case, it is a waste of time and space to consider edges that have one (or both) nodes from the restricted
set.
At the end, the number of nodes reached from 0
is the length of the set seen
, which is why we return len(seen)
. We could just as well kept track of the number of visited nodes by just using the dfs
function itself:
class Solution:
def reachableNodes(self, n: int, edges: List[List[int]], restricted: List[int]) -> int:
restricted = set(restricted)
def build_adj_list(edge_arr):
graph = defaultdict(list)
for node, neighbor in edge_arr:
if node not in restricted and neighbor not in restricted:
graph[node].append(neighbor)
graph[neighbor].append(node)
return graph
def dfs(node):
ans = 0
for neighbor in graph[node]:
if neighbor not in seen and neighbor not in restricted:
seen.add(neighbor)
ans += 1 + dfs(neighbor)
return ans
graph = build_adj_list(edges)
seen = {0}
return dfs(0) + 1
LC 1020. Number of Enclaves (✓) ★★★
You are given an m x n
binary matrix grid
, where 0
represents a sea cell and 1
represents a land cell.
A move consists of walking from one land cell to another adjacent (4-directionally) land cell or walking off the boundary of the grid
.
Return the number of land cells in grid
for which we cannot walk off the boundary of the grid in any number of moves.
class Solution:
def numEnclaves(self, grid: List[List[int]]) -> int:
def valid(row, col):
return 0 <= row < m and 0 <= col < n and grid[row][col] == 1
def process_perimeter(mat):
first_row = 0
last_row = m - 1
first_col = 0
last_col = n - 1
boundary_lands = set()
for row in [ first_row, last_row ]:
for col in range(n):
if mat[row][col] == 1:
boundary_lands.add((row, col))
for col in [ first_col, last_col ]:
for row in range(1, m - 1):
if mat[row][col] == 1:
boundary_lands.add((row, col))
return boundary_lands
def dfs(row, col):
ans = 0
for dr, dc in dirs:
next_row, next_col = row + dr, col + dc
if valid(next_row, next_col) and (next_row, next_col) not in seen:
seen.add((next_row, next_col))
ans += 1 + dfs(next_row, next_col)
return ans
m = len(grid)
n = len(grid[0])
dirs = [(1,0),(-1,0),(0,1),(0,-1)]
seen = set()
enclaves = 0
boundary_lands = process_perimeter(grid)
for boundary_land in boundary_lands:
seen.add(boundary_land)
dfs(*boundary_land)
for i in range(m):
for j in range(n):
if (i, j) not in seen and grid[i][j] == 1:
seen.add((i, j))
enclaves += 1 + dfs(i, j)
return enclaves
The solution above could almost certainly be improved, but it captures the core idea for almost any effective solution to this problem — we need to pre-process to identify all land cells that reside on the boundary. A DFS from each of those cells should be used to identify invalid land cells. There are a few ways of doing this — we could mutate the input grid by letting our DFS mark all boundary land cells and invalid connected land cells with 0
or some other value. Then we simply need to count the number of cells with 1
in them. The following solution is much more optimized for this kind of approach:
class Solution:
def numEnclaves(self, grid: List[List[int]]) -> int:
def valid(row, col):
return 0 <= row < m and 0 <= col < n and grid[row][col] == 1
def process_perimeter(mat):
first_row = 0
last_row = m - 1
first_col = 0
last_col = n - 1
for row in [ first_row, last_row ]:
for col in range(n):
if mat[row][col] == 1:
dfs(row, col)
for col in [ first_col, last_col ]:
for row in range(1, m - 1):
if mat[row][col] == 1:
dfs(row, col)
def dfs(row, col):
grid[row][col] = 0
for dr, dc in dirs:
next_row, next_col = row + dr, col + dc
if valid(next_row, next_col):
dfs(next_row, next_col)
m = len(grid)
n = len(grid[0])
dirs = [(1,0),(-1,0),(0,1),(0,-1)]
process_perimeter(grid)
enclaves = 0
for i in range(m):
for j in range(n):
enclaves += grid[i][j]
return enclaves
LC 2192. All Ancestors of a Node in a Directed Acyclic Graph (✓)
You are given a positive integer n
representing the number of nodes of a Directed Acyclic Graph (DAG). The nodes are numbered from 0
to n - 1
(inclusive).
You are also given a 2D integer array edges
, where edges[i] = [fromi, toi]
denotes that there is a unidirectional edge from fromi
to toi
in the graph.
Return a list answer
, where answer[i]
is the list of ancestors of the i
th node, sorted in ascending order.
A node u
is an ancestor of another node v
if u
can reach v
via a set of edges.
class Solution:
def getAncestors(self, n: int, edges: List[List[int]]) -> List[List[int]]:
def build_rev_adj_list(edge_arr):
graph = defaultdict(list)
for node, neighbor in edge_arr:
graph[neighbor].append(node)
return graph
def dfs(node, visited):
for neighbor in graph[node]:
if neighbor not in visited:
visited.add(neighbor)
dfs(neighbor, visited)
return visited
graph = build_rev_adj_list(edges)
return [ sorted(list(dfs(node, set()))) for node in range(n) ]
This is a fun one. The key idea is to invert or reverse the edge directions and then perform a DFS from each node, 0
through n - 1
, inclusive, to determine what the ancestors are for each node. Why does this work? Because the ancestors of a target node are whatever nodes have edges that lead to the target node; hence, executing a DFS from the target node once edges have been inverted gives us all ancestral nodes for the target node.
LC 990. Satisfiability of Equality Equations (✓) ★★★
Given an array equations of strings that represent relationships between variables, each string equations[i]
has length 4
and takes one of two different forms: "a==b"
or "a!=b"
. Here, a
and b
are lowercase letters (not necessarily different) that represent one-letter variable names.
Return true
if and only if it is possible to assign integers to variable names so as to satisfy all the given equations.
class Solution:
def equationsPossible(self, equations: List[str]) -> bool:
def build_graph(edges):
graph = defaultdict(list)
for equation in edges:
if equation[1] == '=':
node = equation[0]
neighbor = equation[3]
graph[node].append(neighbor)
graph[neighbor].append(node)
return graph
def dfs(node, label):
if label_lookup[node] == -1:
label_lookup[node] = label
for neighbor in graph[node]:
dfs(neighbor, label)
graph = build_graph(equations)
label_lookup = { chr(i): -1 for i in range(97, 122 + 1) }
for char in 'abcdefghijklmnopqrstuvwxyz':
dfs(char, ord(char))
for equation in equations:
if equation[1] == '!':
if label_lookup[equation[0]] == label_lookup[equation[3]]:
return False
return True
Equation problems that require being interpreted as graphs are never very intuitive and always require some creativity. The solution editorial for this problem is quite good and highlights one slick way of approach this problem with DFS as the underlying mechanism for driving the solution logic.
The key idea is to treat each possible variable (a
through z
) as a node and then to use the provided equations
where ==
is the comparison to essentially label all equal variables in the same way (i.e., it's like we're assigning a number or color to each node in a connected component). Once this has been done, we process all equations where !=
is the comparison operator — if the nodes that cannot be equal are in separate components (i.e., they have different labels), then we are fine; if, however, two nodes cannot be equal but they share the same label, then this means they must also be equal to other (a contradiction).
LC 1905. Count Sub Islands★★★
You are given two m x n
binary matrices grid1
and grid2
containing only 0
's (representing water) and 1
's (representing land). An island is a group of 1
's connected 4-directionally (horizontal or vertical). Any cells outside of the grid are considered water cells.
An island in grid2
is considered a sub-island if there is an island in grid1
that contains all the cells that make up this island in grid2
.
Return the number of islands in grid2
that are considered sub-islands.
class Solution:
def countSubIslands(self, grid1: List[List[int]], grid2: List[List[int]]) -> int:
def valid(row, col):
return 0 <= row < m and 0 <= col < n and grid2[row][col] == 1
def dfs(row, col):
if not valid(row, col):
return True
grid2[row][col] = 0
is_subisland = grid1[row][col] == 1
for dr, dc in dirs:
is_subisland &= dfs(row + dr, col + dc)
return is_subisland
m = len(grid2)
n = len(grid2[0])
dirs = [(-1, 0), (1, 0), (0, -1), (0, 1)]
sub_islands = 0
for i in range(m):
for j in range(n):
if grid2[i][j] == 1:
if dfs(i, j):
sub_islands += 1
return sub_islands
The approach above is easier to explain if we generously add comments to the code:
class Solution:
def countSubIslands(self, grid1: List[List[int]], grid2: List[List[int]]) -> int:
def valid(row, col):
return 0 <= row < m and 0 <= col < n and grid2[row][col] == 1
def dfs(row, col):
# early return True if DFS hits out-of-bounds cell or non-island/water cell (0) in grid2
# Boundary cells: When the DFS reaches the boundary of the island (i.e., a cell outside the
# grid or a non-island/water cell), it should not cause the island to be disqualified as a sub-island
# Island check: The goal is to verify that all 1s in the connected component we're exploring in grid2
# correspond to 1s in grid1; the DFS should stop (return True) if it hits water or goes out of bounds
# because these conditions do not invalidate the sub-island
# Summary: Essentially, the "if not valid(row, col): return True" condition ensures the DFS only
# continues exploring valid island cells within grid2's bounds and stops when it reaches the edge
# of the island or goes out of bounds. Returning True here is the equivalent of saying, "This path
# is fine, keep checking the rest of the island."
if not valid(row, col):
return True
grid2[row][col] = 0 # mark cell as visited
is_subisland = grid1[row][col] == 1 # does the land cell in grid2 correspond to a land cell in grid1?
for dr, dc in dirs:
is_subisland &= dfs(row + dr, col + dc)
return is_subisland
m = len(grid2)
n = len(grid2[0])
dirs = [(-1, 0), (1, 0), (0, -1), (0, 1)]
sub_islands = 0
for i in range(m):
for j in range(n):
if grid2[i][j] == 1: # start exploring an island in grid2
if dfs(i, j): # all land cells in grid2 correspond to land cells in grid1
sub_islands += 1 # (out of bounds and non-island/water cells do not invalidate sub-islands)
return sub_islands
Time: . Let and represent the number of rows and columns, respectively. We iterate over all rows and columns in grid2
.
Space: . The main space cost is from the recursive call stack.
LC 947. Most Stones Removed with Same Row or Column★★
On a 2D plane, we place n
stones at some integer coordinate points. Each coordinate point may have at most one stone.
A stone can be removed if it shares either the same row or the same column as another stone that has not been removed.
Given an array stones
of length n
where stones[i] = [xi, yi]
represents the location of the i
th stone, return the largest possible number of stones that can be removed.
class Solution:
def removeStones(self, stones: List[List[int]]) -> int:
def build_graph(coords):
graph = defaultdict(list)
for i in range(len(coords)):
x1, y1 = coords[i]
for j in range(i + 1, len(coords)):
x2, y2 = coords[j]
if x1 == x2 or y1 == y2:
graph[(x1, y1)].append((x2, y2))
graph[(x2, y2)].append((x1, y1))
return graph
def dfs(row, col):
for neighbor in graph[(row, col)]:
if neighbor not in seen:
seen.add(neighbor)
dfs(*neighbor)
graph = build_graph(stones)
seen = set()
connected_components = 0
for stone in stones:
stone = tuple(stone)
if stone not in seen:
connected_components += 1
seen.add(stone)
dfs(*stone)
return len(stones) - connected_components
The solution editorial for this problem is quite good in terms of highlighting the core concepts needed for implementing the DFS approach:
-
Connected components: Two stones are considered "connected" if they share a row or column, but this connection extends beyond just pairs of stones. If stone A is connected to stone B and stone B is connected to stone C, then all three stones form part of the same group, even if A and C don’t directly share a row or column. This concept is akin to connected components in graph theory, where a connected component is a group of nodes where you can reach any node from any other node in the group.
-
Calculating remaining stones: Since every stone in a connected component shares a row or column with at least one other stone, we can remove all but one stone. The remaining stone cannot be removed as it no longer shares coordinates with any other stone, having eliminated all others in its component. Therefore, if our 2-D plane contains multiple connected components, each can be reduced to a single stone. The maximum number of stones that can be removed can be mathematically expressed as:
Max removable stones = Total stones - Number of connected components
The implementation thus boils down to two parts:
- Represent the stones as a graph.
- Count the number of connected components in this graph.
For the first part, we can utilize an adjacency list, where for each stone, we maintain a list of all other stones it's connected to (i.e., shares a row or column with). Unfortunately, there does not seem to be a more efficient way of doing this than by means of an approach. We can use DFS for the second part, as shown above.
Time: . Let be the number of stones. We iterate over all pairs of stones.
Space: . All stones could be on the same row or column.
DFS (iterative)
Remarks
Assume the nodes are numbered from 0
to n - 1
and the graph is given as an adjacency list. Depending on the problem, you may need to convert the input into an equivalent adjacency list before using the templates.
def fn(graph):
stack = [START_NODE]
seen = {START_NODE}
ans = 0
while stack:
node = stack.pop()
# do some logic
for neighbor in graph[node]:
if neighbor not in seen:
seen.add(neighbor)
stack.append(neighbor)
return ans
Examples
TBD
Topological sort (Kahn's algorithm)
Number of paths between nodes in a DAG
A single-source shortest path (SSSP) algorithm tries to find the shortest path from a start
node to all other nodes in the graph. We can make a similar assessment concerning the number of paths between nodes in a DAG: we want to find the number of paths from some start
node to all other nodes in the DAG.
A template for doing so is as follows:
# T: O(V + E); S: O(V)
def num_paths(graph, start):
n = len(graph)
top_order = topological_sort(graph)
count = [0] * n
count[start] = 1
for node in top_order:
for neighbor in graph[node]:
count[neighbor] += count[node]
return count
This template works as follows:
- Perform topological sort: Use the topological sort template (i.e., Kahn's algorithm) to find a topological ordering of the nodes (this assumes our graph is a DAG).
- Initialize path count array: Create an array or map
count
wherecount[v]
will store the number of path fromstart
totarget
, wheretarget
is an arbitrary node in the DAG. Initializecount[start] = 1
since there is exactly one path fromstart
to itself. Then initializecount[target] = 0
for all other nodestarget != start
. - Process nodes in topological order: Iterate through each node
node
in the topologically sorted order. For each outgoing edge[node, neighbor]
, update the path count forneighbor
by adding the current path count ofnode
:count[neighbor] += count[node]
. Importantly, this ensures that by the time we processneighbor
, all possible paths leading tonode
have already been accounted for. - Retrieve result: After processing all nodes,
count
will contain the total number of distinct paths fromstart
to every other node in the DAG. If we are only interested in the number of distinct paths fromstart
to a specific other node,target
, then we can returncount[target]
instead ofcount
.
As an example, consider the DAG
S → A → T
S → B → T
A → B
which can be represented as an adjacency list as follows (where S
, A
, B
, and T
are mapped to 0
, 1
, 2
, and 3
, respectively):
graph = {
0: [1, 2], # (S, A), (S, B)
1: [2, 3], # (A, B), (A, T)
2: [3], # (B, T)
3: []
}
Running the template code on this graph with S
as the start node yields the following: [1, 1, 2, 3]
, which means there's 1
path from S
to itself, 1
path from S
to A
, 2
paths from S
to B
, and 3
paths from S
to T
.
SSSPs on DAGs (shortest and longest paths)
Given a start
node, we can find single-source shortest paths in (weighted) DAGs by first generating a topological ordering of the nodes (e.g., using Khan's algorithm, as provided in the template) and then relaxing the edges of all nodes in the graph, following the ordering, where both of these steps can be done in linear time proportional to the size of the graph:
# T: O(V + E); S: O(V + E)
def shortest_path_DAG(graph, start):
n = len(graph)
top_order = topological_sort(graph)
distances = [float('inf')] * n
distances[start] = 0
for node in top_order:
for neighbor, weight in graph[node]:
if distances[node] + weight < distances[neighbor]: # relax edge 'node -> neighbor' with 'weight'
distances[neighbor] = distances[node] + weight
return distances
The returned distances
array will contain the length of the shortest path from start
to each node. In the case of DAGs, since there are no cycles, we can also find single-source longest paths by simply negating the weight of each edge, finding the shortest path, and then negating the results:
# T: O(V + E); S: O(V + E)
def longest_path_DAG(graph, start):
n = len(graph)
top_order = topological_sort(graph)
distances = [float('inf')] * n
distances[start] = 0
for node in top_order:
for neighbor, weight in graph[node]:
weight = -weight
if distances[node] + weight < distances[neighbor]: # relax edge 'node -> neighbor' with 'weight'
distances[neighbor] = distances[node] + weight
distances = [ -distance for distance in distances]
return distances
In both cases, note that our original template code for producing a topological ordering needs to be slightly modified where we traverse neighboring nodes: the line for neighbor in graph[node]:
needs to be changed to for neighbor, _ in graph[node]:
since tuples are now stored in the graph instead of individual nodes (each node's edge weight must be recorded). Additionally, before trying to report a shortest or longest path in a DAG, we need to ensure a topological ordering was actually produced (i.e., no cycle was found for whatever graph we're considering).
William Fiset's video on shortest/longest paths in DAGs provides a nice example of a weighted DAG:
We can map the labelled nodes A, ... , H
to 0, ..., 8
, respectively, whereby we can then represent the graph as follows:
graph = {
0: [(1, 3), (2, 6)],
1: [(2, 4), (3, 4), (4, 11)],
2: [(3, 8), (6, 11)],
3: [(4, -4), (5, 5), (6, 2)],
4: [(7, 9)],
5: [(7, 1)],
6: [(7, 2)],
7: []
}
Running the code snippets above for shortest and longest paths, where A
is the start node, gives us the following:
# A B C D E F G H
[0, 3, 6, 7, 3, 12, 9, 11] # shortest path from node A to each node
[0, 3, 7, 15, 14, 20, 18, 23] # longest path from node A to each node
Inventing a topological sort algorithm (Kahn's algorithm)
The following observations and algorithms in pseudocode appear in Kahn's Algorithm for Topological Sorting video on the "Algorithms with Attitude" YouTube channel.
The linked video above does a fantastic job illustrating how one might "invent" a way to find a topological ordering for a graph by a series of observations and iterative improvements. The final observation and algorithm reflect the template for finding topological orderings.
Observation and algorithm (1):
- A vertex with no incoming edges can go first (this suggests the in-degree for graph nodes will play a part in developing an effective algorithm) — recall that a directed edge
A -> B
meansB
"depends" onA
in the sense thatA
must come beforeB
in whatever eventual topological ordering of the nodes we produce. - A DAG must contain a vertex with no incoming edges or else we are dead in the water from the beginning (if no such node existed, then this would mean every node has a dependency and thus we do not have a sensible starting point); fortunately, by definition, a DAG has to have at least one node with no incoming edges (otherwise we would have a cycle).
- If you want to find a vertex with an in-degree of
0
(i.e., no incoming edges), then start at an arbitrary vertex and follow a path of incoming edges backwards:- If we get to a vertex with no incoming edges, then great!
- If not, then once we've gone back edges in a graph with vertices, then we must be repeating vertices which means we must have a cycle.
The observations above make it possible for us to come up with a first stab at an algorithm:
Topsort(G)
while G isn't empty
find u in V with no incoming edges
put u next in the topological ordering
remove u and all its outgoing edges from G
Observation and algorithm (2):
The algorithm above works and its efficiency depends on implementation details. If we get the graph as an outgoing adjacency list (the usual graph representation for search algorithms we are often given or need to build), then finding a vertex with no incoming edges might take a while. And we do that once for each vertex we put in the ordering. To make it fast, maybe we don't need to do that from scratch after every vertex.
To start the algorithm in time linear for the size of the graph, we can create an incoming list for each vertex. With those incoming adjacency lists, we can find all vertices with no incoming edges, and those are the vertices that can go first. We can grab all of those vertices to make an ordered list of vertices that are ready to go, A
and C
in the sample graph in the video.
If we let A
go first, then we use its outgoing adjacency list to figure out which edges to delete from the incoming adjacency lists of B
, then E
. When we delete A
's edge to vertex E
, we see it is E
's last incoming edge so we can add E
to our growing ordered list. Finally, we delete A
from G
's incoming list, and we finally delete A
from the graph. We continue through the list this way, where each time we get to a vertex, we look at its outgoing edges, delete them from the corresponding incoming edge lists, and if it's the last incoming edge for a vertex, then we can add that vertex to our ordered list.
To summarize this second set of observations:
- There's no need to start from scratch each time:
- Compute incoming adjacency list just once to start
- Grab a set of all 0 in-degree vertices
- Find new 0 in-degree vertices as vertices are removed from the graph
If we incorporate the new observations into pseudocode for an improved algorithm, then we will have something like the following:
Topsort(G)
create incoming edge adjacency list for each vertex (time linear in the graph size)
S = ordered list
add all vertices with no incoming edges to S (order V time)
while not done with S
consider next vertex u from S
put u next in the topological ordering
while removing u and its outgoing edges from G
if vertex v's incoming edge becomes empty
add v to S
This algorithm still follows the idea of the first one, but it's a bit more efficient in its implementation because it decreases some of the repeated work.
Observation and algorithm (3):
Because of some of the changes we just made to the algorithm, we don't need to delete the vertex u
from the graph anymore. We just need to delete it from the incoming adjacency lists it is in. We can imagine using those lists that we created at the start of the algorithm as working space while the outgoing lists don't have to change at all.
Also, that first round of changes introduced a bit of clutter. We don't need to have one ordered list of vertices that are ready to go and another for our topological order, especially since both of them are in the same order. We can drop the extra list.
The observations above are really in service of cleaning things up a bit. To summarize:
- Remove vertices from incoming adjacency lists, not from the graph as a whole
- No need to keep two different ordered sets, with the same order
Topsort(G)
create incoming edge adjacency list for each vertex (time linear in the graph size)
S = ordered list
add all vertices with no incoming edges to S (order V time)
while not done with S
consider next vertex u from S
for each outgoing edge u -> v
remove u from v's incoming list
if v's incoming list is empty
add v to S
Observation and algorithm (4, final algorithm):
Can we do better? How do we really use those incoming lists? The only thing we use each incoming list for is to see if it's empty. We're just keeping these perfect lists, deleting exactly the right edge from each, in order to see if it's empty. We never use the list contents, only its size. So how about we just track the size? Instead of incoming adjacency lists, track in-degrees and decrement them instead of deleting edges. If the in-degree goes to 0
, then add them to the list. That's the whole algorithm.
In summary: We don't care what the incoming edges are, only the in-degree.
Topsort(G)
find inDegree for each vertex
S = ordered list
add all vertices with no incoming edges to S
while not done with S
consider next vertex u from S
for each outgoing edge u -> v
decrement v's inDegree
if v's inDegree is 0
add v to S
Comments: Finding all in-degrees takes linear time in the size of the graph. Over the course of the entire algorithm, each vertex is added to the ordered list at most once, either at the start (if it has no incoming edges) or when its in-degree gets decremented to 0
. Each vertex in the list is considered at most once, and each edge from the vertex causes at most one constant-time decrement. So the algorithm takes time linear in the graph size.
Something to note when developing this algorithm: What if you had made those last changes to deal with in-degrees instead of incoming lists but kept the separate topological order alongside the ordered list?
Topsort(G)
find inDegree for each vertex
S = ordered list
add all vertices with no incoming edges to S
while not done with S
consider next vertex u from S
#highlight-next-line
put u next in the topological ordering # add this back
for each outgoing edge u -> v
decrement v's inDegree
if v's inDegree is 0
add v to S
The algorithm still runs in linear time. What if you replaced your ordered list with a queue or even a stack? It still works. And still in linear time. This version, where S
is a stack, uses the order that things were removed from the stack, but what if we used the order where things went into the stack? It still works and in linear time.
What about cycles?
What if the directed graph we run Kahn's algorithm on has a cycle? Then we won't find a topological ordering, which doesn't exist for a graph with cycles. But what happens if we run Kahn's algorithm anyway? The algorithm won't ever decrement the in-degree of any vertex in a cycle to 0
, and vertices reachable from cycles won't ever get to in-degree 0
either.
The list returned by the algorithm will only include vertices that aren't in or reachable from any cycle. We can just check the size of the returned list to see if we have the whole graph or not. If not, then it depends on why we were trying to find a topological ordering in the first place before we can figure out what we should do next.
# T: O(V + E); S: O(V + E)
def topological_sort(graph):
n = len(graph) # n-vertex graph provided as an adjacency list
in_degree = [0] * n # incoming degree for each node that will decrease as nodes
for node in range(n): # are 'peeled off' and placed in the generated topological order
for neighbor in graph[node]:
in_degree[neighbor] += 1
deg_zero = [] # nodes that have no incoming edges and are ready to be
for node in range(n): # 'peeled off' and placed in the topological ordering
if in_degree[node] == 0:
deg_zero.append(node)
top_order = []
while deg_zero:
node = deg_zero.pop() # 'peel off' node and
top_order.append(node) # place in topological order
for neighbor in graph[node]:
in_degree[neighbor] -= 1
if in_degree[neighbor] == 0:
deg_zero.append(neighbor)
if len(top_order) != n: # cycle exists if all nodes can't be 'peeled off'
return []
return top_order
Examples
TBD
Topological sort (DFS)
Motivation for DFS-based algorithm for finding a topological sort
The motivation for the DFS-based algorithm for finding a topological sort, as described below, is largely inspired by this video. My goal here is mostly to provide more explanations and working code.
The idea behind Kahn's algorithm for finding a topological sort was to find a vertex that could go first and work from the beginning; of course, we could also try to find something that could go last and work our way towards the first item. To do that, we could modify Kahn's algorithm to work from the back, but that would really just be Kahn's algorithm again in a different way. It might be slightly slower than before because it would first have to convert the graph into an incoming edge adjacency list representation. That incoming vs. outgoing adjacency list is the only non-symmetrical difference between finding the ordering front to back or back to front.
Another way to find a vertex that can go last is to just run depth first search on the outgoing lists until we get to a vertex with no outgoing edges. That vertex can go last. We'll use the same graph from the example exploration in Kahn's algorithm:
But we can rearrange this graph to make the depth first search tree easier to see (and add space below for an ordering):
If we run depth first search alphabetically, then we will start with vertex A
, and then we find vertex B
along a tree edge (edges will be colored according to the color scheme used when first discussing DFS: tree edges (red), back edges (black), forward edges (blue), cross edges (green)). The vertex B
has no outgoing edges so it can go last in our topological order:
Our depth first search finishes B
and returns to A
, which has two other outgoing edges. If we want to, we can continue our depth first search looking for other vertices that have no outgoing edges, but let's stop for just a second and think about what the edges from vertex A
actually mean. In the graph above, A
has edges to B
, E
, and G
, so A
has to go before each of those vertices in the topological order. Even more than that, A
has to go before any vertex that it can reach in the entire graph, either because it has a direct edge to them or because it has a longer path to them (e.g., vertex D
). If A
has to precede E
, and E
has to precede D
, then A
has to precede D
. But what vertices are reachable from A
? Exactly those that we will reach when we run depth first search on A
. Vertex A
can go in the topological order as long as it is before everything that is discovered in the course of a depth first search on it. If we continue the search, then we go to E
, then D
, and we see a cross edge to B
, then we go to F
, which has no outgoing edges so F
can go last (i.e., out of the remaining vertices):
When depth first search returns to D
, just like with vertex A
, vertex D
has to precede everything that it can reach in its depth first search. Continuing that search of D
, we get to G
, see that it has no outgoing edges, and then we put it after all vertices waiting to be finished:
Now when we return to D
, all of its outgoing edges have been explored. Everything that has to follow it in the topological ordering, everything it can reach, F
and G
, have already secured a slot in the ordering after D
. So D
can go last out of all the unfinished vertices:
We continue our depth first search. E
has a forward edge and then finishes so we can put it in the latest unreserved slot:
Vertex A
has a forward edge to G
and then finishes so we can put it in the latest open slot:
Whenever we finish a vertex, we put in the latest remaining unreserved slot in our topological order. We continue the search on the entire graph, not just the top-level call on vertex A
. The top-level search ignores B
because it's already searched, but C
isn't. But when C
finishes, it will add C
to the front of the order. Everything else is already explored so all other top-level searches will just move on (i.e., when executing DFS on all nodes of the graph, starting from A
gives us everything except C
; once C
is done, all other vertices will be skipped):
We can now admire our work:
If we repurpose work done in another note (the DFS note on its motivation and a non-recursive way of executing a DFS), then we can define the graph in this example (and a lookup for ease of reference)
graph = [
[1, 4, 6],
[],
[6],
[1, 5, 6],
[3, 6],
[],
[],
]
lookup = {
-1: ' ',
0: 'A',
1: 'B',
2: 'C',
3: 'D',
4: 'E',
5: 'F',
6: 'G',
}
and investigate the non-recursive DFS output:
"""
( A B C D E F G
[ 1, 2, 13, 5, 4, 6, 8], # discovered
[12, 3, 14, 10, 11, 7, 9], # finished
[-1, 0, -1, 4, 0, 3, 3], # predecessors
{ # edge classifications
('A', 'B'): 'treeEdge',
('A', 'E'): 'treeEdge',
('E', 'D'): 'treeEdge',
('D', 'B'): 'crossEdge',
('D', 'F'): 'treeEdge',
('D', 'G'): 'treeEdge',
('E', 'G'): 'forwardEdge',
('A', 'G'): 'forwardEdge',
('C', 'G'): 'crossEdge'
}
)
"""
It's worth noting that the topological sort perfectly aligns with the reverse order in which vertices were finished in the DFS:
# Topological order: C A E D G F B
# finish times
# A B C D E F G <- node correspondence
# [12, 3, 14, 10, 11, 7, 9] <- unordered finish times
# ...
# [14, 12, 11, 10, 9, 7, 3] <- reverse order of finish times
# C A E D G F B <- topological order
We can also confirm all of the edge classifications as they appear in the video:
At this point, it's worth asking: What exactly is the algorithm we used here to produce a topological ordering? It's really quite simple:
- Run depth first search on entire graph
- Set the topological order to be the reverse order of vertex finish times
That's it. Because the graph is acyclic, when depth first search on a vertex completes, everything that vertex can reach has been explored — as long as the vertex is before those things in the ordering, it's set. To topologically sort, you don't really need the start or finish times. You could modify depth first search to return the topological ordering directly.
The following graphic does a decent job of giving some intuition as to why this DFS algorithm works for producing a topological ordering when the graph is acyclic:
Specifically, using parthensis notation and edge classification (both remarked on in the note for recursive DFS), notice that tree edges, forward edges, and cross edges, the closing parentheses for v
come before the closing parentheses for u
, which is what we want for all edges u -> v
since u -> v
implies that v
has u
as a dependency that needs to be resolved and thus u
must come earlier in the ordering. For these three edge types, v
will finish before u
. So if we take vertices in reverse order of finish time, then any edge from u
to v
will put v
after u
, just like we need it to be.
The only problem is back edges, where vertex u
would finish before v
. But for depth first search on an acyclic graph, there are no back edges since a back edge completes a cycle between a vertex and its ancestor. So don't worry about back edges (unless we're not guaranteed the input graph is acyclic, in which case we need to implement logic for cycle detection).
Two things worth considering in regarde to possible variations of the implementation we've been discussing:
- Subset of vertices: If we only want to consider vertices reachable from a subset of vertices, maybe even a single vertex, then simply modify the top-level depth first search to only search from that subset or that one vertex. The recursive part doesn't change.
- Non-recursive DFS: If we're possibly at risk of a stack overflow, then it may be beneficial to consider an iterative implementation of DFS instead of the recursive implementation.
Finally, it's important to note that if we run our recursive DFS algorithm for finding a topological ordering on a graph that does have cycles, then unlike Kahn's algorithm, our algorithm will return an order including all vertices. Of course, the order returned isn't a topological order (because that doesn't exist for a graph that has cycles), but it can still be helpful due to some special properties that can be exploited based on finishing times. Kosaraju's algorithm for finding strongly connected components (SCCs) makes use of such a non-topological order.
Revised template when input is guaranteed to be a DAG
If it is guaranteed that the input graph is directed and acyclic (i.e., a DAG), then we do not need to track the visiting status of different nodes because a cycle will not be possible by definition:
# T: O(V + E); S: O(V)
def topsort(graph):
n = len(graph)
top_order = []
visited = [False] * n
def dfs(node):
visited[node] = True
for nbr in graph[node]:
if not visited[nbr]:
dfs(nbr)
top_order.append(node)
for node in range(n):
if not visited[node]:
dfs(node)
return top_order
The code above is simply streamlined for contexts where we know for certain that the input graph does not have a cycle.
# T: O(V + E); S: O(V)
def topsort(graph):
n = len(graph) # graph assumed to be provided as an adjacency list of index arrays
top_order = []
visited = [0] * n # 0: unvisited; 1: visiting; 2: visited
def dfs(node):
visited[node] = 1 # mark as visiting
for nbr in graph[node]:
if visited[nbr] == 0: # visit unvisited neighbor
if not dfs(nbr):
return False
elif visited[nbr] == 1: # found a back edge: graph has a cycle
return False
visited[node] = 2 # mark as visited
top_order.append(node) # add node after all descendants have been visited
return True
for node in range(n):
if not visited[node]:
if not dfs(node): # topological sort not possible
return [] # (return empty list)
return top_order
Examples
TBD
Dijkstra (lazy)
How Dijkstra and BFS are similar but strikingly different
William Fiset's YouTube video on Dijkstra is a gem and well worth watching.
Interviewing IO fruitfully compares BFS to Dijkstra and highlights the differences and explains the motivation (paraphrasing to follow): BFS is easier than Dijkstra because we encounter the nodes ordered by distance (with BFS, it's as if we have a weighted graph where all edge weights equal 1
). Hence, in BFS, we assign the correct distance to nodes as soon as we reach them for the first time (as a neighbor of the current node), and this happens when we add them to the queue. Thus, in BFS, every node in the queue already has the correct shortest distance. This is not the case in Dijkstra; specifically, especially in the case of lazy Dijkstra, which is the approach most people use, we may find some initial path to a node and later on find a shorter path.
Concrete example of how we can find some initial path to a node and later on find a shorter path
The following screenshot is from William Fiset's linked video above (at 7:21):
The small graph example above shows there are two ways to get from node 0
(the source node) to node 1
:
0 -> 1
: Distance of4
0 -> 2 -> 1
: Distance of1 + 2 = 3
The second path is shorter even though there are more edges. Why does this matter? Because of how the priority queue is being used and maintained. Specifically, we add (0, 0)
as the first element to the priority queue to indicate that we plan to visit node 0
with a best distance of 0
. Then the algorithm actually starts and we look inside the priority queue for the first time and discover we should visit node 0
. What nodes should we visit after visiting node 0
? As with BFS and DFS, Dijkstra is a search algorithm; specifically, Dijkstra is a search algorithm for the shortest path to each node in a graph from a given source node — we conduct our search by visiting neighbors; that is, we will next want to visit either node 1
or node 2
, with new best distances 4
and 1
, respectively (both of these distances are significantly less than infinity!).
At this point, we've visited all the nodes from node 0
. Our priority queue started with just the lone element (0, 0)
, which we poppped from the priority queue in order to visit all neighboring nodes to node 0
. We visited nodes 1
and 2
in the process, adding (1, 4)
and (2, 1)
to the priority queue, respectively. We're now done visiting all neighbors of node 0
so which node should we visit next?
Dijkstra's algorithm always selects the next most promising node in the priority queue. To do this, simply poll the next best key-value pair from the priority queue, which is (2, 1)
in this case because the distance 1
in (2, 1)
is less than the distance 4
in (1, 4)
. So we pop (2, 1)
from the priority queue and plan to visit the neighboring nodes to node 2
, namely the nodes 1
and 3
. At this point, our priority queue looks something like the following:
(0, 0)(1, 4) -> (2, 1) (1, 3) (3, 6)
The tuple
indicates that node (0, 0)0
has been fully processed. We've also updated the distance array for its neighboring nodes as we've added them to the priority queue. Then we popped (2, 1)
from the priority queue as it was the most promising node; the indicator -> (2, 1)
means node 2
is currently being processed. In the midst of processing node 2, we've added tuples (1, 3)
and (3, 6)
to the priority queue, and we've updated the distances array as well, which currently looks like the following:
0 1 2 3 4 # <- node index
[ 0, 3, 1, 6, inf ] # <- distances
Now node 2
has been fully processed and our priority queue looks as follows:
(0, 0)(1, 4)(2, 1)-> (1, 3) (3, 6)
This means the next most promising node is node 1
. From node 1
, we can visit node 3
for a cumulative distance of 4
from the source node; hence, we add (3, 4)
to the priority queue, and then we mark (1, 3)
as processed:
(0, 0)(1, 4)(2, 1)(1, 3)(3, 6) (3, 4)
What node is most promising in the priority queue now? It's (1, 4)
. But we have already found a better route to get to node 1
since distances[1]
has a value of 3
, which is less than 4
. Hence, we can ignore this entry in the priority queue.
This example above shows exactly how we can find some initial path to a node (0 -> 1
) and later on find a shorter path (0 -> 2 -> 1
). Dijkstra is all about the shortest path being found to a node once that node has been extracted from the priority queue for the first time (e.g., (1, 3)
was extracted from the priority queue first, and then (1, 4)
was extracted and ignored).
In Dijkstra, we do not know the real shortest distance to a node until it is extracted from the priority queue, not when it is simply added to the priority queue (as illustrated in the concrete example above).
In eager Dijkstra, we update the priority (i.e., distance) of the nodes in the priority queue when we find a shorter path. In lazy Dijkstra, we simply add the node again with the new priority without removing the previous occurrence of that node in the PQ. Hence, in lazy Dijkstra, nodes can appear multiple times in the priority queue with different priorities. This is not a problem. Why? Because we only care about the first time we extract a node from the priority queue. That first time gives us the shortest distance from the source node to the node just popped from the priority queue. The second time (and any subsequent times) the same node is popped from the priority queue, we simply discard the node as soon as we extract it by means of the following lines from the template:
# ...
curr_dist, node = heapq.heappop(min_heap)
if curr_dist > distances[node]:
continue
# ...
Dijkstra with a target/destination node
If we only care about the distance or shortest path from the source
to a specific node, target
, then we can halt the algorithm with an early termination as soon as we extract the target node from min_heap
(not as soon as we add the target node to min_heap
). In Dijkstra, unlike in BFS, we only know that we have found the shortest distance to a node once we have extracted it for the first time from the priority queue, min_heap
in the case of the template:
# T: O(E log V); S: O(V + E)
def dijkstra(graph, source, target):
n = len(graph) # Dijkstra on graph with n nodes
distances = [float('inf')] * n # "infinitely" far from source (unvisited nodes)
distances[source] = 0
min_heap = []
heapq.heappush(min_heap, (0, source)) # heap contents: (d(v, source), v), where
# d gives a distance from source node
# to node v, another node in graph
while min_heap:
curr_dist, node = heapq.heappop(min_heap)
if node == target:
return curr_dist
if curr_dist > distances[node]: # optimization for lazy Dijkstra: ignore current path
continue # if we already found a better one (i.e., node was previously
# extracted from min_heap with a smaller distance)
for neighbor, weight in graph[node]:
dist = curr_dist + weight
if dist < distances[neighbor]: # add neighbor to min_heap if it creates a shorter path
distances[neighbor] = dist
heapq.heappush(min_heap, (dist, neighbor))
return float('inf')
Dijkstra with shortest path reconstruction
Dijkstra finds the shortest path from a source node to all other nodes in a graph. If we want to reconstruct the shortest paths themselves, then we need to compute the predecessors of each node in the shortest path tree. A small modification to the template is needed:
# T: O(E log V); S: O(V + E)
def dijkstra(graph, source):
n = len(graph) # Dijkstra on graph with n nodes
distances = [float('inf')] * n # "infinitely" far from source (unvisited nodes)
distances[source] = 0
pred = [None] * n
min_heap = []
heapq.heappush(min_heap, (0, source)) # heap contents: (d(v, source), v), where
# d gives a distance from source node
# to node v, another node in graph
while min_heap:
curr_dist, node = heapq.heappop(min_heap)
if curr_dist > distances[node]: # optimization for lazy Dijkstra: ignore current path
continue # if we already found a better one (i.e., node was previously
# extracted from min_heap with a smaller distance)
for neighbor, weight in graph[node]:
dist = curr_dist + weight
if dist < distances[neighbor]: # add neighbor to min_heap if it creates a shorter path
distances[neighbor] = dist
pred[neighbor] = node
heapq.heappush(min_heap, (dist, neighbor))
return distances, pred
All that remains is to reverse the steps from any given node to the source and then reverse that path to get the original shortest path from source to destination.
# T: O(E log V); S: O(V + E)
def dijkstra_shortest_path(graph, source, target):
distances, predecessors = dijkstra(graph, source)
path = []
# if target node is not accessible from source node,
# then return an empty path
if distances[target] == float('inf'):
return path
# traverse the previous nodes to build the path
node = target
while node is not None:
path.append(node)
node = predecessors[node]
# reverse the path to get the correct order from start to end
path.reverse()
return path
# T: O(E log V); S: O(V + E)
def dijkstra(graph, source):
n = len(graph) # Dijkstra on graph with n nodes (assumed to be adjacency list)
distances = [float('inf')] * n # "infinitely" far from source (unvisited nodes)
distances[source] = 0
min_heap = []
heapq.heappush(min_heap, (0, source)) # heap contents: (d(v, source), v), where
# d gives a distance from source node
# to node v, another node in graph
while min_heap:
curr_dist, node = heapq.heappop(min_heap)
if curr_dist > distances[node]: # optimization for lazy Dijkstra: ignore current path
continue # if we already found a better one (i.e., node was previously
# extracted from min_heap with a smaller distance)
for neighbor, weight in graph[node]:
dist = curr_dist + weight
if dist < distances[neighbor]: # add neighbor to min_heap if it creates a shorter path
distances[neighbor] = dist
heapq.heappush(min_heap, (dist, neighbor))
return distances
Examples
LC 743. Network Delay Time (✓)
You are given a network of n
nodes, labeled from 1
to n
. You are also given times
, a list of travel times as directed edges times[i] = (ui, vi, wi)
, where ui
is the source node, vi
is the target node, and wi
is the time it takes for a signal to travel from source to target.
We will send a signal from a given node k
. Return the time it takes for all the n nodes to receive the signal. If it is impossible for all the n
nodes to receive the signal, return -1
.
class Solution:
def networkDelayTime(self, times: List[List[int]], n: int, k: int) -> int:
def build_graph(edge_list):
graph = defaultdict(list)
for source, destination, time in edge_list:
graph[source].append((destination, time))
return graph
graph = build_graph(times)
time_cost = [float('inf')] * (n + 1)
time_cost[0] = 0
time_cost[k] = 0
min_heap = [(0, k)]
while min_heap:
curr_time, node = heapq.heappop(min_heap)
if curr_time > time_cost[node]:
continue
for neighbor, time in graph[node]:
new_time = curr_time + time
if new_time < time_cost[neighbor]:
time_cost[neighbor] = new_time
heapq.heappush(min_heap, (new_time, neighbor))
min_time = 0
for time in time_cost:
if time == float('inf'):
return -1
min_time = max(min_time, time)
return min_time
In some ways, this is sort of a quintessential Dijkstra problem. We need to minimize the path from the source node, k
, to all other nodes. Since nodes are numbered 1
through n
, inclusive, we pretend as if there are n + 1
nodes in order to simplify the mechanics — we subsequently set the time cost for the dummy node and source node to 0
(i.e., nodes 0
and k
, respectively).
Time: TBD
Space: TBD
LC 787. Cheapest Flights Within K Stops (✓) ★★★
There are n
cities connected by m
flights. Each flight starts from city u
and arrives at v
with a price w
.
Now given all the cities and flights, together with starting city src
and the destination dst
, your task is to find the cheapest price from src
to dst
with up to k
stops. If there is no such route, output -1
.
class Solution:
def findCheapestPrice(self, n: int, flights: List[List[int]], src: int, dst: int, k: int) -> int:
def build_graph(edges):
graph = defaultdict(list)
for start, end, cost in edges:
graph[start].append((end, cost))
return graph
graph = build_graph(flights)
cost = [[float('inf')] * (k + 2) for _ in range(n)] # cheapest cost for node by jumps
cost[src][0] = 0 # 0 cost for 0th jump to starting point
min_heap = [(0, 0, src)] # (cost, stops, node)
while min_heap:
curr_cost, curr_stops, node = heapq.heappop(min_heap)
if node == dst:
return curr_cost
if curr_stops > k:
continue
for neighbor, price in graph[node]:
new_cost = curr_cost + price
new_stops = curr_stops + 1
if new_cost < cost[neighbor][new_stops]:
cost[neighbor][new_stops] = new_cost
heapq.heappush(min_heap, (new_cost, new_stops, neighbor))
return -1
The main twist for Dijkstra in this problem is that the cheapest path to any given node may be valid or invalid based on the number of stops required to get there. We need to handle cheapest path determinations carefully. The "normal" Dijkstra algorithm would suffice if we had (i.e., if we allowed an infinite number of stops) because the problem is simply reduced to a traditional shortest path problem. But when there is a limit on the number of stops, k
, the problem becomes more complex because it's not just about finding the shortest path but the shortest path within a constrained number of stops. This constraint fundamentally changes the nature of the problem (and hence the nature of our Dijkstra implementation): We can no longer simply use the shortest distance globally (i.e., just using a distances
array) — we need to keep the best possible distance for each node considering the number of stops used to get there.
This means we need to use a data structure that can record both the cumulative total cost to get to a node as well as the number of stops involved. In the solution above, we use the following structure:
cost = [[float('inf')] * (k + 2) for _ in range(n)]
In the context of the flight problem, a "stop" means a node visited on the path from the source to the destination, not the act of moving from one node to another. Hence, if our path were 0 -> 1 -> 2
, then there is only one "stop" even though three nodes are involved. If we have n
nodes and k
stops, then cost
will look like
cost = [
# [ before first stop, first stop, ..., last stop, after last stop ] # node 0
# [ before first stop, first stop, ..., last stop, after last stop ] # node 1
# ...
# [ before first stop, first stop, ..., last stop, after last stop ] # node n - 1
]
where each of the n
entries are k + 2
units long (with all default values equal to inf
). We fill the cost
array for each node, where each of the k + 2
slots indicate the minimum cost for reaching that node in that many stops. Hence, the source node will have a cost of 0
in its first slot because we begin at that point with no associated cost. The first slot for all other nodes is inf
because there's no way to reach such nodes before the first stop.
For the sake of optimization, as soon as the destination node is popped from the priority queue, we should return curr_cost
because this cost is guaranteed to represent the cheapest path to the destination node.
Time: TBD.
Space: TBD.
The following solution is also a viable way of approaching this problem:
class Solution:
def findCheapestPrice(self, n: int, flights: List[List[int]], src: int, dst: int, k: int) -> int:
def build_graph(edge_list):
graph = defaultdict(list)
for source, destination, price in edge_list:
graph[source].append((destination, price))
return graph
graph = build_graph(flights)
min_heap = [(0, 0, src)] # cost, stops, node
costs = {(src, 0): 0} # (node, num_stops): cost ... min cost to get to node in num_stops stops
while min_heap:
curr_cost, curr_stops, node = heapq.heappop(min_heap)
if node == dst:
return curr_cost
if curr_stops <= k:
for neighbor, price in graph[node]:
new_cost = curr_cost + price
new_stops = curr_stops + 1
if (neighbor, new_stops) not in costs or new_cost < costs[(neighbor, new_stops)]:
costs[(neighbor, new_stops)] = new_cost
heapq.heappush(min_heap, (new_cost, new_stops, neighbor))
return -1
Time: TBD
Space: TBD
LC 1514. Path with Maximum Probability★★
You are given an undirected weighted graph of n
nodes (0-indexed), represented by an edge list where edges[i] = [a, b]
is an undirected edge connecting the nodes a
and b
with a probability of success of traversing that edge succProb[i]
.
Given two nodes start
and end
, find the path with the maximum probability of success to go from start
to end
and return its success probability.
If there is no path from start
to end
, return 0
. Your answer will be accepted if it differs from the correct answer by at most 10-5
.
class Solution:
def maxProbability(self, n: int, edges: List[List[int]], succProb: List[float], start_node: int, end_node: int) -> float:
def build_graph(edge_list):
adj_list = defaultdict(list)
for i in range(len(edge_list)):
start, end = edge_list[i]
edge_weight = succProb[i]
adj_list[start].append((end, edge_weight))
adj_list[end].append((start, edge_weight))
return adj_list
graph = build_graph(edges)
max_prob = [0.0] * n
max_prob[start_node] = 1.0
max_heap = [(-1.0, start_node)]
while max_heap:
curr_prob, node = heapq.heappop(max_heap)
curr_prob = -curr_prob
if node == end_node:
return curr_prob
if curr_prob < max_prob[node]:
continue
for neighbor, prob in graph[node]:
new_prob = curr_prob * prob
if new_prob > max_prob[neighbor]:
max_prob[neighbor] = new_prob
heapq.heappush(max_heap, (-new_prob, neighbor))
return 0
Since Python only has a min heap, we simulate a max heap by pushing the negated version of whatever current path probability exists to the node in question onto the min heap. We simply need to be careful when handling the sign of the probability (i.e., when popping from and adding to the heap).
class Solution:
def maxProbability(self, n: int, edges: List[List[int]], succProb: List[float], start_node: int, end_node: int) -> float:
def build_graph(edge_list):
adj_list = defaultdict(list)
for i in range(len(edge_list)):
start, end = edge_list[i]
edge_weight = -math.log(succProb[i])
adj_list[start].append((end, edge_weight))
adj_list[end].append((start, edge_weight))
return adj_list
max_prob = [float('inf')] * n
max_prob[start_node] = 0
graph = build_graph(edges)
max_heap = [(0, start_node)]
while max_heap:
curr_prob, node = heapq.heappop(max_heap)
if curr_prob > max_prob[node]:
continue
for nei in graph[node]:
nei_node, path_prob = nei
new_prob = curr_prob + path_prob
if new_prob < max_prob[nei_node]:
max_prob[nei_node] = new_prob
heapq.heappush(max_heap, (new_prob, nei_node))
return math.exp(-max_prob[end_node])
The solution above uses logarithms to increase precision. It is largely based on observations highlighted in this solution comment:
I am surprised to see such a long editorial (three different algorithms with a lot of what sounds like chat GPT generated explanations of the code and a plethora of diagrams) that at no point attempts to justify why using these algorithms give the correct answer for the problem.
"We need to find the path from start to end that has the largest product of its edges" - fine
"BFS is an algorithm I know but it's not used on graphs with weighted edges, let's use Dijkstra instead because that one works on weighted graphs" - cool story bro, but Dijkstra is a shortest path algorithm, not a largest product algorithm...
For anyone who is curious why it works and hasn't worked it out themselves:
- Call product of the edges in a path
- We want to find the path with maximum .
- Since the logarithm is a monotonically growing function, the path with largest is also the path with largest , (and the smallest )
- Due to the properties of the logarithm,
- Negating both sides gives us the following: .
- In summary: maximizing , the explicit goal of the problem, is equivalent to minimizing , which is just the sum of the negative logarithms of the edge weights. This equivalent modified problem IS a shortest path problem.
- Furthermore, since , that means and . Non-negative edges, a requirement for Dijkstra to work properly.
- So yeah, TL;DR: "weighted graph, can't be BFS, let's use Dijkstra" is dumb, but since edge weights are probabilities, finding the maximum product is equivalent to a shortest path problem with non-negative edge weights, and Dijkstra "just works"
ChatGPT sheds some light on why we might want to use logarithms for certain problems:
- Problem: When calculating the product of many terms, especially in probabilities (e.g., in Bayesian inference or machine learning), the product can become extremely small, leading to underflow.
- Solution: Take the logarithm of each term, sum these logarithms, and then exponentiate the result. This approach is particularly useful in algorithms like the Viterbi algorithm or when dealing with partition functions in statistical mechanics.
The equality we take advantage of is as follows:
Bellman-Ford
Intuition for Bellman-Ford algorithm
Bellman-Ford, despite its innocent and simplistic appearance on the surface, can be rather difficult to learn at first. As a popular video on Bellman-Ford notes:
Of all the shortest path algorithms in graph theory, Bellman-Ford is definitely one of the simplest; yet, I struggled as an undergrad student trying to learn this algorithm, which is part of the reason I'm making this video.
Part of the struggle may be be rooted in how this algorithm is taught/introduced in several textbooks. It is informative to look at expositions on this algorithm in [17] (DPV) and [21] (CLRS). Specifically, we look at the introductions and examples provided in both texts, test the examples with our template, and then return to unearth a key revelation that is likely the source of struggle for many people first learning this algorithm.
Description and example in DPV
The exposition that follows is based on what appears in [17] (some parts have been modified so as to ensure these remarks are self-contained).
Dijkstra's algorithm [yes, Dijkstra's algorithm is covered before Bellman-Ford in DPV] works in part because the shortest path from the starting point to any node must pass exclusively through nodes that are closer than . This no longer holds when edge lengths can be negative. In the figure below, the shortest path from to passes through , a node that is further away! [Comically, the edge weight of is clearly an error and should instead read .]
What needs to be changed in order to accommodate this new complication? To answer this, let's take a particular high-level view of Dijkstra's algorithm. A crucial invariant is that the dist
values it maintains are always either overestimates or exactly correct. They start off at , and the only way they ever change is by updating along an edge:
PROCEDURE UPDATE((u, v) in E)
dist(v) = min{dist(v), dist(u) + l(u, v)}
This update operation is simply an expression of the fact that the distance to cannot possibly be more than the distance to , plus . It has the following properties.
- It gives the correct distance to in the particular case where is the second-last node in the shortest path to , and is correctly set.
- It will never make too small, and in this sense it is safe. For instance, a slew of extraneous
update
's can't hurt.
This operation is extremely useful: it is harmless, and if used carefully, will correctly set distances. In fact, Dijkstra's algorithm can be thought of simply as a sequence of update
's. We know this particular sequence doesn't work with negative edges, but is there some other sequence that does? To get a sense of the properties this sequence must possess, let's pick a node and look at the shortest path to it from .
This path can have at most edges. Why? The answer is probably more nuanced than it seems at first. The following explanation is given in [21]:
Can a shortest path contain a cycle? As we have just seen, it cannot contain a negative-weight cycle. Nor can it contain a positive-weight cycle, since removing the cycle from the path produces a path with the same source and destination vertices and a lower path weight. That is, if is a path and is a positive-weight cycle on this path (so that and ), then the path has weight , and so cannot be a shortest path from to .
That leaves only 0-weight cycles. You can remove a 0-weight cycle from any path to produce another path whose weight is the same. Thus, if there is a shortest path from a source vertex to a destination vertex that contains a 0-weight cycle, then there is another shortest path from to without this cycle. As long as a shortest path has 0-weight cycles, you can repeatedly remove these cycles from the path until you have a shortest path that is cycle-free. Therefore, without loss of generality, assume that shortest paths have no cycles, that is, they are simple paths.
Since any acyclic path in a graph contains at most distinct vertices [i.e., a path passing through every vertex], it also contains at most edges. Assume, therefore, that any shortest path contains at most edges.
The last paragraph above is the key takeaway. Returning to our description of the shortest path image: the shortest path from to can have at most edges per the excerpt above. If the sequence of updates performed includes , in that order (though not necessarily consecutively), then by the first property the distance to will be correctly computed. It doesn't matter what other updates occur on these edges, or what happens in the rest of the graph, because updates are safe.
But still, if we don't know all the shortest paths beforehand, how can we be sure to update the right edges in the right order? Here is an easy solution: simply update all the edges, times! The resulting procedure is called the Bellman-Ford algorithm and is shown below in pseudocode:
PROCEDURE ShortestPaths(G, l, s)
Input: Directed graph G = (V, E)
edge lengths {l_e : e in E} with no negative cycles.
Output: For all vertices u reachable from s, dist(u) is set
to the distance from s to u
for all u in V:
dist(u) = +infty
prev(u) = nil
dist(s) = 0
repeat |V| - 1 times:
for all e in E:
update(e)
An example run of the algorithm is shown below:
A note about implementation: for many graphs, the maximum number of edges in any shortest path is substantially less than , with the result that fewer rounds of updates are needed. Therefore, it makes sense to add an extra check to the shortest-path algorithm, to make it terminate immediately after any round in which no update occurred.
What about negative cycles? If the length of edge in the figure above were changed to , then the graph would have a negative cycle . In such situations, it doesn't make sense to even ask about shortest paths. There is a path of length 2 from to . But going round the cycle, there's also a path of length 1, and going round multiple times, we find paths of lengths 0, , , and so on.
The shortest-path problem is ill-posed in graphs with negative cycles. As might be expected, our algorithm in the pseudocode above (i.e., Bellman-Ford) works only in the absence of such cycles. But where did this assumption appear in the derivation of the algorithm? Well, it slipped in when we asserted the existence of a shortest path from to .
Fortunately, it is easy to automatically detect negative cycles and issue a warning. Such a cycle would allow us to endlessly apply rounds of update
operations, reducing dist
estimates every time. So instead of stopping after iterations, perform one extra round. There is a negative cycle if and only if some dist
value is reduced during this final round.
Testing the DPV example with the template
To effectively test our template code, first convert the example graph into an adjacency list:
Map the letter nodes to their numeric equivalent:
S: 0
A: 1
B: 2
C: 3
D: 4
E: 5
F: 6
G: 7
Then we get the following adjacency list representation of the graph above:
graph = {
0: [(1, 10), (7, 8)],
1: [(5, 2)],
2: [(1, 1), (3, 1)],
3: [(4, 3)],
4: [(5, -1)],
5: [(2, -2)],
6: [(1, -4), (5, -1)],
7: [(6, 1)]
}
Let's now use a modified version of the template code where comments have been removed and we've added a print
statement to show the distances
array after each iteration of relaxing all edges:
def bellman_ford(graph, start):
n = len(graph)
distances = [float('inf')] * n
distances[start] = 0
predecessors = [None] * n
for _ in range(n - 1):
edge_updated = False
for node in range(n):
for neighbor, weight in graph[node]:
if distances[node] != float('inf') and distances[node] + weight < distances[neighbor]:
edge_updated = True
distances[neighbor] = distances[node] + weight
predecessors[neighbor] = node
print(distances)
if not edge_updated:
return distances, predecessors
for node in range(n):
for neighbor, weight in graph[node]:
if distances[node] != float('inf') and distances[node] + weight < distances[neighbor]:
return False
return distances, predecessors
graph = {
0: [(1, 10), (7, 8)],
1: [(5, 2)],
2: [(1, 1), (3, 1)],
3: [(4, 3)],
4: [(5, -1)],
5: [(2, -2)],
6: [(1, -4), (5, -1)],
7: [(6, 1)]
}
bellman_ford(graph, 0)
We get the following printed to the console (the initial distances
array, before any iteration has taken place, has been included for the sake of clarity):
# S A B C D E F G
[ 0, inf, inf, inf, inf, inf, inf, inf ] # after iteration 0 (initial distances array)
[ 0, 10, 10, inf, inf, 12, 9, 8 ] # after iteration 1
[ 0, 5, 10, 11, 14, 8, 9, 8 ] # after iteration 2
[ 0, 5, 5, 11, 14, 7, 9, 8 ] # after iteration 3
[ 0, 5, 5, 6, 9, 7, 9, 8 ] # after iteration 4
[ 0, 5, 5, 6, 9, 7, 9, 8 ] # after iteration 5 (no edges updated, terminate early)
If we're first learning about Bellman-Ford, then the output above is probably very confusing. The end result, namely the final distances
array, is the same as in the book's example: [0, 5, 5, 6, 9, 7, 9, 8]
. But everything else (i.e., the intermediate results) is very different. Why? We'll return to this example to see why, but let's first look at the example from CLRS to get a hint.
Description and example in CLRS
The terminology and notation used in the following initial description by CLRS may be foreign, but the illustrated example should clear up any confusion.
The Bellman-Ford algorithm solves the single-source shortest-paths problem in the general case in which edge weights may be negative. Given a weighted, directed graph with source vertex and weight function , the Bellman-Ford algorithm returns a boolean value indicating whether there is a negative-weight cycle that is reachable from the source. If there is such a cycle, the algorithm indicates that no solution exists. If there is no such cycle, the algorithm produces the shortest paths and their weights.
The procedure BELLMAN-FORD relaxes edges, progressively decreasing an estimate on the weight of a shortest path from the source to each vertex until it achieves the actual shortest-path weight . The algorithm returns TRUE if and only if the graph contains no negative-weight cycles that are reachable from the source.
% page 612 (CLRS 4th Ed.) \begin{algorithm} \caption{\textsc{Bellman-Ford}($G, w, s$)} \begin{algorithmic} \PROCEDURE{Initialize-Single-Source}{$G, s$}\ENDPROCEDURE \FOR{$i=1$ to $|G.v| - 1$} \FOR{each edge $(u, v)\in G.E$} \PROCEDURE{RELAX}{$u, v, w$}\ENDPROCEDURE \ENDFOR \ENDFOR \FOR{each edge $(u, v)\in G.E$} \IF{$v.d > u.d + w(u, v)$} \RETURN \textsc{false} \ENDIF \ENDFOR \RETURN \textsc{true} \end{algorithmic} \end{algorithm}
The figure below shows the execution of the Bellman-Ford algorithm on a graph with 5 vertices. After initializing the and values of all vertices in line 1, the algorithm makes passes over the edges of the graph. Each pass is one iteration of the for loop of lines 2-4 and consists of relaxing each edge of the graph once. Figures (b)-(e) show the state of the algorithm after each of the four passes over the edges. After making passes, lines 5-8 check for a negative-weight cycle and return the appropriate boolean value.
The figure above shows the execution of the Bellman-Ford algorithm. The source is vertex . The values appear within the vertices, and blue edges indicate predecessor values: if edge is blue, then . In this particular example, each pass relaxes the edges in the order , , , , , , , , , . (a) The situation just before the first pass over the edges. (b)–(e) The situation after each successive pass over the edges. Vertices whose shortest-path estimates and predecessors have changed due to a pass are highlighted in orange. The and values in part (e) are the ûnal values. The Bellman-Ford algorithm returns TRUE in this example.
Testing the CLRS example with the template
Let's see if we can test our template against the example provided in CLRS:
As with the DPV example, we should start by mapping the letter nodes to their numeric equivalent. What number should we assign to each node? The order in which each edge is relaxed suggests a natural labeling scheme:
[
(t, x), (t, y), (t, z), # t: 0
(x, t), # x: 1
(y, x), (y, z), # y: 2
(z, x), (z, s), # z: 3
(s, t), (s, y) # s: 4
]
What will we get when we represent the graph as an adjacency list and use the modified template as before (where we printed the distances
array after each iteration)? Let's see:
def bellman_ford(graph, start):
# ...
graph = {
0: [(1, 5), (2, 8), (3, -4)],
1: [(0, -2)],
2: [(1, -3), (3, 9)],
3: [(1, 7), (4, 2)],
4: [(0, 6), (2, 7)],
}
bellman_ford(graph, 4)
We get the following printed to the console (the initial distances
array, before any iteration has taken place, has been included for the sake of clarity):
# t x y z s
[ inf, inf, inf, inf, 0 ] # after iteration 0 (initial distances array)
[ 6, inf, 7, inf, 0 ] # after iteration 1
[ 6, 4, 7, 2, 0 ] # after iteration 2
[ 2, 4, 7, 2, 0 ] # after iteration 3
[ 2, 4, 7, -2, 0 ] # after iteration 4 (|V| - 1 iterations with no negative cycle)
Each line above exactly matches the figures (a)-(e), respectively. How is it possible that our template reproduced exactly what was given in CLRS and was not even close for DPV? The key revelation is provided in the next and final remark.
Key revelation via basic example: order of edge relaxation does not effect end result but completely determines intermediate results
As another comment mentions in the previously linked video, let's consider the very basic linearly connected graph 0 -> 1 -> 2 -> 3
, where the source node is 0
and all edge weights are 1
. Such a graph may be represented in code with the following adjacency list:
graph = {
0: [(1, 1)],
1: [(2, 1)],
2: [(3, 1)],
3: []
}
Since the order of edge updates in Bellman-Ford is random, let's consider what would happen in the worst possible case. For the first iteration, we would update the edge 0 -> 1
at the end; that is, let's purposely let the edge 0 -> 1
be the last edge we "randomly" update. For the graph 0 -> 1 -> 2 -> 3
with edge weights 1
, this means 0 -> 1
is the only edge whose relaxation/processing reduces the distances to a node (i.e., the distance to node 1
is now 1
instead of infinity). Similarly, for iterations 2 and 3 we update 1 -> 2
and 2 -> 3
at the end:
[0, inf, inf, inf] # after iteration 0 (initial distances array)
[0, 1, inf, inf] # after iteration 1
[0, 1, 2, inf] # after iteration 2
[0, 1, 2, 3] # after iteration 3 (|V| - 1 iterations with no negative cycle)
Hence, in the worst case, it can take iterations to propagate the edge weights appropriately. Let's test our template code with this basic graph. If we run the same template code that we ran for the examples in DPV and CLRS, then we get the following:
[0, inf, inf, inf] # after iteration 0 (initial distances array)
[0, 1 2, 3] # after iteration 1
[0, 1 2, 3] # after iteration 2 (no edges updated, terminate early)
What's going on? Why aren't we getting the results discussed above? Because the results discussed above were in the worst case, where we purposely delayed processing the edge connected to the source node until the end. Let's examine the template code:
def bellman_ford(graph, start):
n = len(graph)
distances = [float('inf')] * n
distances[start] = 0
predecessors = [None] * n
for _ in range(n - 1):
edge_updated = False
for node in range(n):
for neighbor, weight in graph[node]:
if distances[node] != float('inf') and distances[node] + weight < distances[neighbor]:
edge_updated = True
distances[neighbor] = distances[node] + weight
predecessors[neighbor] = node
print(distances)
if not edge_updated:
return distances, predecessors
for node in range(n):
for neighbor, weight in graph[node]:
if distances[node] != float('inf') and distances[node] + weight < distances[neighbor]:
return False
return distances, predecessors
The highlighted line for node in range(n):
ensures we always start edge relaxations from the node with label 0
, whether or not that node is the source node. Recall the edge labeling from the CLRS example:
[
(t, x), (t, y), (t, z), # t: 0
(x, t), # x: 1
(y, x), (y, z), # y: 2
(z, x), (z, s), # z: 3
(s, t), (s, y) # s: 4
]
Note how the source node, highlighted above, has a node label of 4
. This means that all edges connected to the source update at the end when for node in range(n):
fires. Hence, the only edges relaxed in a meaningful way after the first iteration are those directly connected with the source node.
What if we changed the highlighted line for node in range(n):
to for node in [3, 2, 1, 0]:
for the basic graph example? Then all edges directly connected to the source node are processed last. We're basically customizing our template to be as inefficient as possible so as to show how we can ensure the maximum number of iterations occur. And, sure enough, running the template code with this modification results in exactly what we discussed for the worst case scenario:
[0, inf, inf, inf] # after iteration 0 (initial distances array)
[0, 1, inf, inf] # after iteration 1
[0, 1, 2, inf] # after iteration 2
[0, 1, 2, 3] # after iteration 3 (|V| - 1 iterations with no negative cycle)
Using the key revelation to better understand the DPV example
Let's return to the DPV example that started all of this:
Running our template code resulted in the following:
# S A B C D E F G
[ 0, inf, inf, inf, inf, inf, inf, inf ] # after iteration 0 (initial distances array)
[ 0, 10, 10, inf, inf, 12, 9, 8 ] # after iteration 1
[ 0, 5, 10, 11, 14, 8, 9, 8 ] # after iteration 2
[ 0, 5, 5, 11, 14, 7, 9, 8 ] # after iteration 3
[ 0, 5, 5, 6, 9, 7, 9, 8 ] # after iteration 4
[ 0, 5, 5, 6, 9, 7, 9, 8 ] # after iteration 5 (no edges updated, terminate early)
The end result was the same as that in the figure, but the intermediate results were very different. The "key revelation" discussion in the previous widget explains why. The line for node in range(n):
in the main loop of our template code ensured we started by processing all edges directly connected to whatever node was labelled 0
. Of course, in this case, node S
, the source node, was labelled as node 0
, which means all edges directly connected to the source node were the first ones to be meaningfully relaxed. This efficiency would usually be considered a good thing. But not when we're trying to better understand an example!
CLRS gave us the exact order in which edges were processed, making it clear they were trying to be as inefficient as possible in order to highlight when iterations could be necessary. Similarly, if we intentionally process the edges connected to the source node last, then maybe we can go about reproducing the table provided in DPV, where iterations are necessary to get the final distances
array.
After some tinkering, we discover that if we replace the line for node in range(n):
in the main loop with for node in [4, 3, 2, 5, 1, 6, 7, 0]:
, then we end up reproducing the table of results provided in the DPV example:
# S A B C D E F G
[ 0, inf, inf, inf, inf, inf, inf, inf ] # after iteration 0 (initial distances array)
[ 0, 10, inf, inf, inf, inf, inf, 8 ] # after iteration 1
[ 0, 10, inf, inf, inf, 12, 9, 8 ] # after iteration 2
[ 0, 5, 10, inf, inf, 8, 9, 8 ] # after iteration 3
[ 0, 5, 6, 11, inf, 7, 9, 8 ] # after iteration 4
[ 0, 5, 5, 7, 14, 7, 9, 8 ] # after iteration 5
[ 0, 5, 5, 6, 10, 7, 9, 8 ] # after iteration 6
[ 0, 5, 5, 6, 9, 7, 9, 8 ] # after iteration 7 (|V| - 1 iterations with no negative cycle)
What's the exact order in which the edges were processed? Since the letter-number mapping for the nodes in this example is
D, C, B, E, A, F, G, S
[4, 3, 2, 5, 1, 6, 7, 0]
we can use the adjacency list representation of the graph
graph = {
0: [(1, 10), (7, 8)],
1: [(5, 2)],
2: [(1, 1), (3, 1)],
3: [(4, 3)],
4: [(5, -1)],
5: [(2, -2)],
6: [(1, -4), (5, -1)],
7: [(6, 1)]
}
to fully specify the order in which edges are relaxed for each iteration:
# D, C, B, E, A, F, G, S
# [4, 3, 2, 5, 1, 6, 7, 0]
[
(D, E), # D-connected edges
(C, D), # C-connected edges
(B, A), (B, C), # B-connected edges
(E, B), # E-connected edges
(A, E), # A-connected edges
(F, A), (F, E), # F-connected edges
(G, F), # G-connected edges
(S, A), (S, G) # S-connected edges
]
As has been thoroughly demonstrated by now, the order in which edges are processed does not matter in terms of the end result, but the order completely determines the intermediate results of the distances
array.
Why we do |V| - 1 iterations in Bellman-Ford
One of the better explanations for why exactly we do