高性能缓存 Theine v0.5.0 发布, Zero Allocation & 读取速度大幅优化

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

The Go Programming Language

› http://golang.org/

› Go Playground

Go Projects

› Revel Web Framework

这是一个创建于 387 天前的主题，其中的信息可能已经有所发展或是发生改变。

https://github.com/Yiling-J/theine-go

直接放上最新的 100% read/write throughput benchmark 结果，其他比例以及缓存命中率可以看 Readme:

100% read (cpu 8/16/32)

goos: linux
goarch: amd64
pkg: github.com/maypok86/benchmarks/throughput
cpu: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz

BenchmarkCache/zipf_otter_reads=100%,writes=0%-8                88954334                14.78 ns/op       67648151 ops/s
BenchmarkCache/zipf_theine_reads=100%,writes=0%-8               51908306                21.87 ns/op       45729075 ops/s
BenchmarkCache/zipf_ristretto_reads=100%,writes=0%-8            27217994                42.36 ns/op       23606992 ops/s

BenchmarkCache/zipf_otter_reads=100%,writes=0%-16               132372591                8.397 ns/op     119086508 ops/s
BenchmarkCache/zipf_theine_reads=100%,writes=0%-16              85420364                13.78 ns/op       72549558 ops/s
BenchmarkCache/zipf_ristretto_reads=100%,writes=0%-16           47790158                25.17 ns/op       39734070 ops/s

BenchmarkCache/zipf_otter_reads=100%,writes=0%-32               174121321                7.078 ns/op     141273879 ops/s
BenchmarkCache/zipf_theine_reads=100%,writes=0%-32              118185849               10.45 ns/op       95703790 ops/s
BenchmarkCache/zipf_ristretto_reads=100%,writes=0%-32           66458452                18.85 ns/op       53055079 ops/s

100% write (cpu 8/16/32)

goos: linux
goarch: amd64
pkg: github.com/maypok86/benchmarks/throughput
cpu: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz

BenchmarkCache/zipf_otter_reads=0%,writes=100%-8                 1567917               723.0 ns/op         1383080 ops/s
BenchmarkCache/zipf_theine_reads=0%,writes=100%-8                2194747               542.4 ns/op         1843615 ops/s
BenchmarkCache/zipf_ristretto_reads=0%,writes=100%-8             1839237               642.5 ns/op         1556503 ops/s

BenchmarkCache/zipf_otter_reads=0%,writes=100%-16                1384345               846.0 ns/op         1181980 ops/s
BenchmarkCache/zipf_theine_reads=0%,writes=100%-16               1915946               528.8 ns/op         1891008 ops/s
BenchmarkCache/zipf_ristretto_reads=0%,writes=100%-16            1765465               697.3 ns/op         1434089 ops/s

BenchmarkCache/zipf_otter_reads=0%,writes=100%-32                1265883               979.8 ns/op         1020607 ops/s
BenchmarkCache/zipf_theine_reads=0%,writes=100%-32               1953358               526.1 ns/op         1900935 ops/s
BenchmarkCache/zipf_ristretto_reads=0%,writes=100%-32            1618098               696.1 ns/op         1436625 ops/s

benchmem 100% write (cpu 32)

goos: linux
goarch: amd64
pkg: github.com/maypok86/benchmarks/throughput
cpu: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz

BenchmarkCache/zipf_otter_reads=0%,writes=100%-32                80 B/op          1 allocs/op
BenchmarkCache/zipf_theine_reads=0%,writes=100%-32               0 B/op           0 allocs/op
BenchmarkCache/zipf_ristretto_reads=0%,writes=100%-32            112 B/op         3 allocs/op

如果你还在用 Ristretto 个人建议换成 Theine 或者 Otter ，Ristretto 和你想的可能不太一样:

Ristretto 的写入操作是异步的，不能写入完立刻读取。
Ristretto 的写入有一定随机性，为了提高写速度 Ristretto 使用了 select/default 把数据写入 channel ，导致数据可能丢失。把 default 去掉写入速度会大幅下降。
Ristretto 的 Cost 参数默认情况下不 work ，导致 Ristretto 实际存储的数据量可能远低于你想要的数据量，需要手工把IgnoreInternalCost设为 true ，但这个参数在 Readme 里完全没有提到。
读写高并发下 Ristretto 有比较严重的 contention ，75% Read 下速度大约是 Theine/Otter 的 1/4 ，这还是丢失部分写入的情况下。

theine

性能

优化

5 条回复

matrix1010

2024-10-11 10:20:02 +08:00

再附加一个: Ristretto 不会在 cache 里保存你的 key ，而是计算两个 hash 用作 key, 虽然几率很低但理论上有可能发生碰撞。而且这种方法也导致 Ristretto 无法添加 Range 或者 Callback 之类的 API ，因为完全没有存真实的 key

lesismal

2024-10-11 10:53:44 +08:00

Ristretto 这个 select/default 有点难崩, 还不如去掉 chan 异步直接执行, 嫌弃锁范围大同步竞争浪费必须并发的话可以考虑另一种方式: 用 chan 做 trigger, 只有 queue head 时才 trigger 写协程并且 trigger 单次处理完当前所有, pop push 各自粒度的锁
两个 hash 用作 key 是图什么, 避免大 key 的浪费, 或者用固定的[N]byte 避免指针类型的 gc item 数量?

lesismal

2024-10-11 10:55:09 +08:00

> 两个 hash 用作 key 是图什么, 避免大 key 的浪费, 或者用固定的[N]byte 避免指针类型的 gc item 数量?

关键是, 这样就不好去做非精确 key 相关的了, 比如遍历 list 之类的

matrix1010

2024-10-11 12:07:07 +08:00

@lesismal 避免大 key ，不过 Ristretto 的第一版只用了 1 个 hash: https://github.com/dgraph-io/ristretto/issues/30 。在 Theine 里就用的类似 chan 做 trigger 的方法，当 write chan 收到数据时继续尝试额外获取 N 个数据，没有就直接返回。高写入并发下能批量处理，低并发下也能保持时效性:

for first := range s.writeChan {
s.writeBuffer = append(s.writeBuffer, first)
loop:
for i := 0; i < WriteBufferSize-1; i++ {
select {
case item, ok := <-s.writeChan:
s.writeBuffer = append(s.writeBuffer, item)
default:
break loop
}
}
}

lesismal

2024-10-11 12:31:51 +08:00

我的一些协程池为了减少常驻协程数量, 也是类似的, 但不用 chan 了, 只是 queue head go func 里批量处理所有然后退出, 不是 head 就 push

chan 在性能场景确实挺贵的