Description
Abstract
From ByteDance Programming Language Team
We suggest using SwissTable in the runtime to replace the original implementation of the hashmap.
SwissTable is now used in abseil and rust by default. We borrowed ideas from the two implementations and combined some optimization ideas of Go's original hashmap to construct the current implementation.
See the following comment for performance comparison(the GitHub issue has a character limit).
Motivations
-
We need to improve the performance of the hashmap as much as possible. At ByteDance, our Go service consumes about 4% of the CPU on hashmap. It is worth noting that on many services, the CPU consumption of
mapassign
andmapaccess
are almost 1:1, so the improvement of insertion performance is equally important. -
Support dynamic adjustment of the capacity of the hashmap. Some Go services will make a map too large due to burst traffic, which will cause a lot of pressure on the memory, but the map does not shrink after elements removal. We found that there are also many related discussions in the community:
Implementations
-
Basic version: https://go-review.googlesource.com/c/go/+/426614
-
SIMD version: TODO
Tasks
- Basic version: without SIMD, follow the original memory layout.
- Add SIMD support for x86: WIP, in the following CL.
- Aggregates all tophash into an array: In the plan, it requires more performance comparison data.
- Support hashmap resizing: More discussion is needed.
- Set bucketCnt to 4 on 32-bit platforms: More discussion is needed.
Advantages compared to the original implementation
-
The overall performance is better, and we can also use SIMD to improve performance.
-
SwissTable's LoadFactor can be set higher to save some memory.
-
Open-addressing and making dynamic adjustments to the capacity is easier to implement.
-
After allocating a fixed-size hashmap, if the number of inserted elements does not exceed the capacity, no additional memory is required, and the performance will be significantly improved, which has a greater advantage over reused fixed-size hashmap. (The original hashmap may still allocate additional memory even when the limit is not exceeded)
Disadvantages compared to the original implementation
-
The rehash is done at once, so
- Some services that are sensitive to latency may experience performance degradation. (but we do not observe this for now)
- In some cases, when the hashmap needs to grow, it may consume more CPU than the original implementation. (The previous implementation uses incremental rehash)
-
Since SwissTable needs to ensure that each bucket has at least one empty slot, it will cause performance degradation in some cases. For example, if we insert an element when there are seven elements in the hashmap, the CPU consumption will significantly increase because the hashmap needs to grow.
Implementation details
Here is an overview of the basic version. For more detailed implementation, you can read the code. Before reading the code, it is recommended to check the video or the article in the references to get a general idea of how it works.
Differences with SwissTable
- Move
tophash
into the bucket, not as an array alone.
The classic SwissTable aggregates all tophash into an array (also called control bytes array, here we follow the convention, called tophash), while the original hashmap of Go puts tophash and items in the same bucket.
Our experiments outside the runtime found that if tophash is used as an array alone, we couldn’t find any performance wins. This is probably because we need two memory allocations when initializing the hashmap.
So we use the bucket memory layout almost the same as the original in the current version. In the future, we may consider aggregating all tophash into an array, using a data structure similar to sync.Pool
, so that all tophash of the hashmap can be reused to improve performance.
Due to this change, when we delete a key/value, we only need to determine whether the current bucket still has empty slots, because the location where we access the bucket is relatively fixed, unlike the classic implementation, which can be accessed from different addresses.
Differences with the original implementation
For hashmap header (hmap
) and hiter
- Removed fields about overflow and incremental rehash.
For bucket (bmap
)
-
The tophash is still 8-bit(uint8), but 7-bit for hash info and 1-bit as flags.
-
Removed overflow pointer.
For the way of querying
- The query process changes from querying the overflow pointer to finding the next bucket (using triangular probing) until the first empty slot is found. SwissTable theoretically guarantees that there is at least one empty slot.
For initializing bucket arrays
- The initial state of the tophash is
0xff
instead of0
;0xff
represents the empty slot now.
References
Metadata
Metadata
Assignees
Labels
Type
Projects
Status