2022-04-09

This is inspired by the ADSP: The Podcast, in particular “Episode 72: C++ Algorithm Family Feud!”.

Given a vector of integers of length at least two, find two largest elements in the vector:

`<int, int> max2(const vector<int> &input); pair`

Our goal is to solve this problem using algorithms shipped in the C++ standard library.

The podcast hosts debate whether we can return repeated elements when the input has repeated elements. As the problem is stated *here*, we can: two largest elements in a vector are two elements that are `>=`

all other elements.

We can sort the input and return its first two elements:

```
<int, int> max2(const vector<int> &input) {
pairauto sorted = input;
::sort(sorted, std::ranges::greater());
rangesreturn {sorted[0], sorted[1]};
}
```

This solution has two downsides:

- We need to copy the input, or mutate the original.
- It is not optimal in complexity: this is an
`O(N logN)`

solution, where`N`

is the size of the input. It is possible to solve this with a linear solution.

Sorting produces more information than we need, that is, it sorts the entire input when we are only interested in the first two elements of the sorted input. Partial sorting does exactly what we need: it produces the first X elements in the sorted input.

```
<int, int> max2(const vector<int> &input) {
pairauto sorted = input;
::partial_sort(sorted, sorted.begin() + 2,
ranges::greater());
rangesreturn {sorted[0], sorted[1]};
}
```

The complexity of this algorithm is `O(N log M)`

, where `N`

is the size of the input, and `M`

is the number of sorted elements we requested; in this case `M = 2`

, so the complexity is now linear: `O(N)`

.

However, we still have the downside of mutating / copying the input.

The `nth_element`

algorithm can be used to find the `n-th`

largest element in the input. Calling it twice, once with `n = 0`

and once with `n = 1`

should produce the desired answer. I’ll skip this solution.

The accumulate algorithm takes an input value and “combines” it with the other elements in a collection, one at a time. The analogy with left folding is that we think of accumulating as folding the collection, left to right, one element at a time.

In our case, we start with a pair of integers containing the first two elements of the vector:

Then we combine this pair with the other input elements by replacing the smaller components of the pair:

The code looks like this:

```
<int, int> max2(const vector<int> &input) {
pairauto accumulate_function =
[] (pair<int, int> current_max, int next_value)
-> pair<int, int> {
if (next_value >= current_max.first)
return {next_value, current_max.first};
if (next_value >= current_max.second)
return {current_max.first, next_value};
return current_max;
};
auto initial_value = make_pair(input[0], input[1]);
return accumulate(input.begin() + 2, input.end(),
, accumulate_function);
initial_value}
```

This solution neither modifies the input, nor makes copies of the input. It is also linear in complexity.

Cppreference describes `std::reduce`

as:

similar to

`std::accumulate`

, except out of order

Like `std::accumulate`

, `std::reduce`

combines elements of the input in order to produce the final result, but this is done in an unspecified order.

Let’s consider our previous example:

We again initialize our answer to the first two elements of `input`

. *One* possible way to combine combine the rest of the elements is:

Because these are done in an unspecified order, our accumulate function needs to handle a few different inputs:

- Two
`int`

s. See how`0`

and`9`

were combined. - A
`pair`

and an`int`

. See how`<7, 3>`

and`8`

were combined. - An
`int`

and a`pair`

(not shown above). - Two pairs. See how
`<8, 7>`

and`<9, 5>`

were combined.

In other words, the accumulate function needs to work with different input types.

If we also want *deterministic* output, the accumulate function needs to be associative and commutative. This is not relevant to us but could be, for example, when doing arithmetic on floating point numbers.

One way to implement four different versions of the accumulate function is with operator overloading:

```
struct Reducer {
<int, int> operator()(int a, int b) {
pairreturn {std::max(a, b), std::min(a, b)};
}
<int, int> operator()(pair<int, int> current_max,
pairint next_value) {
// See `accumulate` lambda in the Accumulate solution.
}
<int, int> operator()(int next_value,
pair<int, int> current_max) {
pairreturn (*this)(current_max, next_value);
}
<int, int> operator()(pair<int, int> pair1,
pair<int, int> pair2);
pair};
```

The first overload (`int`

, `int`

) is trivial: we just make a pair.

The second overload (`pair<int, int>`

, `int`

) is just the accumulate lambda we used in the `std::accumulate`

solution.

The third overload (`int`

, `pair<int, int>`

) is identical to the second one, with the arguments swapped.

The fourth overload is equivalent to solving a simple version of the problem: find the largest two elements out of four. This is tedious to implement, it is safer to call one of our previous solutions and let the compiler optimize it:

```
<int, int> operator()(pair<int, int> pair1,
pair<int, int> pair2) {
pairauto arr = std::to_array({pair1.first, pair1.second,
.first, pair2.second});
pair2return max2_array(arr);
}
```

I’m assuming that `max2_array`

is an implementation of the `accumulate`

solution that works with an `std::array<4, int>`

instead of a `vector`

. This should ensure the optimizer does its job well. If you can read LLVM IR, or are familiar with a RISC-like assembly, here’s the proof. Note how there are no loops, no branches, and the code is fairly short. Try changing `std::array`

to `std::vector`

and see how different the code becomes.

The reason the previous solution requires four overloads is that we are mixing two different types: `pair<int, int>`

and `int`

. What if we transformed our input from `int`

to `pair<int, int>`

before doing the reduction?

This is exactly what `transform_reduce`

does: it applies a transformation to each element of the input before performing the reduction.

This idea can be implemented as follows:

```
<int, int> max2(const vector<int> &input) {
pairauto transform = [](int a) {
return make_pair(a, numeric_limits<int>::min());
};
auto reduce = [](pair<int, int> pair1,
<int, int> pair2) {
pairauto arr = to_array({pair1.first, pair1.second,
.first, pair2.second});
pair2return max2_array(arr);
};
auto initial_value = make_pair(input[0], input[1]);
return transform_reduce(input.begin() + 2, input.end(),
,
initial_value, transform);
reduce}
```

In the above, `max2_array`

is the same function described in the reduce solution.

The experiment of using different algorithms to solve the same problem always teaches me something new; for example, I had no idea that `std::transform_reduce`

existed. Similarly, having the compiler solve the four-element instance of the problem is something that I only considered because I sat down and thought about the problem.

I hope you learned something too, or that it helps you with your next NVIDIA interview ;) Definitely checkout the ADSP podcast!