Suppose I have two std::bitset
s:
[0][1][1]
[1][1][0][1]
I want to concatenate them into [0][1][1][1][1][0][1]
, which equals 61 in decimal. How can I do that? 🤔 This article aims to show you how.
✏Solution
Let me unveil the answer first:
template<std::size_t N1, std::size_t N2>
auto concat(const std::bitset<N1>& lhs, const std::bitset<N2>& rhs) -> std::bitset<N1 + N2> {
return std::bitset<N1 + N2>(lhs.to_string() + rhs.to_string());
}
The idea is fairly simple. We leverage the conversion between std::bitset
and std::string
, making std::bitset
concatenation as intuitive as string concatenation.
⌨Code
Here’s a simple example:
int main()
{
const std::bitset<3> left("011");
const std::bitset<4> right("1101");
const std::bitset<7> result = concat(left, right);
std::cout << "lhs: " << left.to_string() << std::endl;
std::cout << "rhs: " << right.to_string() << std::endl;
std::cout << "result: " << result.to_string() << std::endl;
}
And the output would be:
lhs: 011
rhs: 1101
result: 0111101
🧀Application
Fair enough, you may wonder:
QuestionWhy on earth would we need to concatenate
std::bitset
? 🤷♂️
One application I can think of is Meta’s implementation of HyperLogLog called Presto. If you’re not familiar or interested in it, no worries - let me simplify it for you.
Imagine you live in a world with an exorbitantly high cost of living, even pricier than Switzerland. In this world, one bit costs $100!😲💔
- You design a 7-bit slot to store decimal values, costing you $700. 😒
- Then, you realize most of the numbers you store are no more than 15, e.g.,
3, 13, 8, 1, 2, 0, 5, ...
. - This means the extra 3 bits are redundant, costing you an unnecessary $300!\
- To save money, you decide to use a 4-bit slot by default. If it’s not enough, you allocate additional slots (spending $300 only when necessary).
This is the essence of HyperLogLog Presto’s dense implementation. When a value grows too large for the predefined “dense” size, it spills into an “overflow” segment. The concatenation of these segments enables a compact yet scalable representation.
To retrieve the original value, you can use this:
std::bitset<TOTAL_BUCKET_SIZE> combined{
overflow_bucket.to_string() + // 👈 Only if the data overflows the dense bucket
dense_bucket.to_string()
};
const unsigned long combined_value = combined.to_ulong();
🌾Resource
- HyperLogLog: Analysis of a Near-Optimal Cardinality Estimation Algorithm – Philippe Flajolet et al.
- Meta’s HyperLogLog Presto
- std::bitset - cppreference.com
By mastering this simple yet powerful technique, you’ll unlock a reusable tool for compact data representation, perfect for scenarios requiring efficient bit-level operations.