2022-04-09
This is inspired by the ADSP: The Podcast, in particular “Episode 72: C++ Algorithm Family Feud!”.
Given a vector of integers of length at least two, find two largest elements in the vector:
<int, int> max2(const vector<int> &input); pair
Our goal is to solve this problem using algorithms shipped in the C++ standard library.
The podcast hosts debate whether we can return repeated elements when
the input has repeated elements. As the problem is stated here,
we can: two largest elements in a vector are two elements that are
>=
all other elements.
We can sort the input and return its first two elements:
<int, int> max2(const vector<int> &input) {
pairauto sorted = input;
::sort(sorted, std::ranges::greater());
rangesreturn {sorted[0], sorted[1]};
}
This solution has two downsides:
O(N logN)
solution, where N
is the size of the input. It is possible
to solve this with a linear solution.Sorting produces more information than we need, that is, it sorts the entire input when we are only interested in the first two elements of the sorted input. Partial sorting does exactly what we need: it produces the first X elements in the sorted input.
<int, int> max2(const vector<int> &input) {
pairauto sorted = input;
::partial_sort(sorted, sorted.begin() + 2,
ranges::greater());
rangesreturn {sorted[0], sorted[1]};
}
The complexity of this algorithm is O(N log M)
, where
N
is the size of the input, and M
is the
number of sorted elements we requested; in this case M = 2
,
so the complexity is now linear: O(N)
.
However, we still have the downside of mutating / copying the input.
The nth_element
algorithm can be used to find the
n-th
largest element in the input. Calling it twice, once
with n = 0
and once with n = 1
should produce
the desired answer. I’ll skip this solution.
The accumulate algorithm takes an input value and “combines” it with the other elements in a collection, one at a time. The analogy with left folding is that we think of accumulating as folding the collection, left to right, one element at a time.
In our case, we start with a pair of integers containing the first two elements of the vector:
Then we combine this pair with the other input elements by replacing the smaller components of the pair:
The code looks like this:
<int, int> max2(const vector<int> &input) {
pairauto accumulate_function =
[] (pair<int, int> current_max, int next_value)
-> pair<int, int> {
if (next_value >= current_max.first)
return {next_value, current_max.first};
if (next_value >= current_max.second)
return {current_max.first, next_value};
return current_max;
};
auto initial_value = make_pair(input[0], input[1]);
return accumulate(input.begin() + 2, input.end(),
, accumulate_function);
initial_value}
This solution neither modifies the input, nor makes copies of the input. It is also linear in complexity.
Cppreference
describes std::reduce
as:
similar to
std::accumulate
, except out of order
Like std::accumulate
, std::reduce
combines
elements of the input in order to produce the final result, but this is
done in an unspecified order.
Let’s consider our previous example:
We again initialize our answer to the first two elements of
input
. One possible way to combine combine the
rest of the elements is:
Because these are done in an unspecified order, our accumulate function needs to handle a few different inputs:
int
s. See how 0
and 9
were combined.pair
and an int
. See how
<7, 3>
and 8
were combined.int
and a pair
(not shown above).<8, 7>
and
<9, 5>
were combined.In other words, the accumulate function needs to work with different input types.
If we also want deterministic output, the accumulate function needs to be associative and commutative. This is not relevant to us but could be, for example, when doing arithmetic on floating point numbers.
One way to implement four different versions of the accumulate function is with operator overloading:
struct Reducer {
<int, int> operator()(int a, int b) {
pairreturn {std::max(a, b), std::min(a, b)};
}
<int, int> operator()(pair<int, int> current_max,
pairint next_value) {
// See `accumulate` lambda in the Accumulate solution.
}
<int, int> operator()(int next_value,
pair<int, int> current_max) {
pairreturn (*this)(current_max, next_value);
}
<int, int> operator()(pair<int, int> pair1,
pair<int, int> pair2);
pair};
The first overload (int
, int
) is trivial:
we just make a pair.
The second overload (pair<int, int>
,
int
) is just the accumulate lambda we used in the
std::accumulate
solution.
The third overload (int
,
pair<int, int>
) is identical to the second one, with
the arguments swapped.
The fourth overload is equivalent to solving a simple version of the problem: find the largest two elements out of four. This is tedious to implement, it is safer to call one of our previous solutions and let the compiler optimize it:
<int, int> operator()(pair<int, int> pair1,
pair<int, int> pair2) {
pairauto arr = std::to_array({pair1.first, pair1.second,
.first, pair2.second});
pair2return max2_array(arr);
}
I’m assuming that max2_array
is an implementation of the
accumulate
solution that works with an
std::array<4, int>
instead of a vector
.
This should ensure the optimizer does its job well. If you can read LLVM
IR, or are familiar with a RISC-like assembly, here’s the proof. Note how
there are no loops, no branches, and the code is fairly short. Try
changing std::array
to std::vector
and see how
different the code becomes.
The reason the previous solution requires four overloads is that we
are mixing two different types: pair<int, int>
and
int
. What if we transformed our input from int
to pair<int, int>
before doing the reduction?
This is exactly what transform_reduce
does: it applies a
transformation to each element of the input before performing the
reduction.
This idea can be implemented as follows:
<int, int> max2(const vector<int> &input) {
pairauto transform = [](int a) {
return make_pair(a, numeric_limits<int>::min());
};
auto reduce = [](pair<int, int> pair1,
<int, int> pair2) {
pairauto arr = to_array({pair1.first, pair1.second,
.first, pair2.second});
pair2return max2_array(arr);
};
auto initial_value = make_pair(input[0], input[1]);
return transform_reduce(input.begin() + 2, input.end(),
,
initial_value, transform);
reduce}
In the above, max2_array
is the same function described
in the reduce solution.
The experiment of using different algorithms to solve the same
problem always teaches me something new; for example, I had no idea that
std::transform_reduce
existed. Similarly, having the
compiler solve the four-element instance of the problem is something
that I only considered because I sat down and thought about the
problem.
I hope you learned something too, or that it helps you with your next NVIDIA interview ;) Definitely checkout the ADSP podcast!