Logo Xingxin on Bug

How to concatenate C++ std::bitset?

December 3, 2024
3 min read

Suppose I have two std::bitsets:

  1. [0][1][1]
  2. [1][1][0][1]

I want to concatenate them into [0][1][1][1][1][0][1], which equals 61 in decimal. How can I do that? 🤔 This article aims to show you how.

✏Solution

Let me unveil the answer first:

template<std::size_t N1, std::size_t N2> 
auto concat(const std::bitset<N1>& lhs, const std::bitset<N2>& rhs) -> std::bitset<N1 + N2> { 
    return std::bitset<N1 + N2>(lhs.to_string() + rhs.to_string());
}

The idea is fairly simple. We leverage the conversion between std::bitset and std::string, making std::bitset concatenation as intuitive as string concatenation.

⌨Code

Here’s a simple example:

int main()
{
    const std::bitset<3> left("011");
    const std::bitset<4> right("1101");
 
    const std::bitset<7> result = concat(left, right);
 
    std::cout << "lhs: " << left.to_string() << std::endl;
    std::cout << "rhs: " << right.to_string() << std::endl;
    std::cout << "result: " << result.to_string() << std::endl;
}

And the output would be:

lhs: 011
rhs: 1101
result: 0111101

🧀Application

Fair enough, you may wonder:

Question

Why on earth would we need to concatenate std::bitset? 🤷‍♂️

One application I can think of is Meta’s implementation of HyperLogLog called Presto. If you’re not familiar or interested in it, no worries - let me simplify it for you.


Imagine you live in a world with an exorbitantly high cost of living, even pricier than Switzerland. In this world, one bit costs $100!😲💔

  1. You design a 7-bit slot to store decimal values, costing you $700. 😒
  2. Then, you realize most of the numbers you store are no more than 15, e.g., 3, 13, 8, 1, 2, 0, 5, ....
  3. This means the extra 3 bits are redundant, costing you an unnecessary $300!\
  4. To save money, you decide to use a 4-bit slot by default. If it’s not enough, you allocate additional slots (spending $300 only when necessary).

This is the essence of HyperLogLog Presto’s dense implementation. When a value grows too large for the predefined “dense” size, it spills into an “overflow” segment. The concatenation of these segments enables a compact yet scalable representation.

To retrieve the original value, you can use this:

std::bitset<TOTAL_BUCKET_SIZE> combined{
    overflow_bucket.to_string() + // 👈 Only if the data overflows the dense bucket
    dense_bucket.to_string()
};
const unsigned long combined_value = combined.to_ulong();

🌾Resource

  1. HyperLogLog: Analysis of a Near-Optimal Cardinality Estimation Algorithm – Philippe Flajolet et al.
  2. Meta’s HyperLogLog Presto
  3. std::bitset - cppreference.com

By mastering this simple yet powerful technique, you’ll unlock a reusable tool for compact data representation, perfect for scenarios requiring efficient bit-level operations.