我有 5 亿个键，但我的 Redis 数据库里有什么！？

或者：越大，越难掉落（Fetch）

虽然你可以认为它们是“富人的问题”，但大型数据库确实带来了巨大的挑战。扩展和性能可能是其中最受关注的，但即使是微不足道的操作，也会随着数据库大小的增长而变得难以克服。最近，在处理一位用户的数据库的内存碎片问题时，我再次想起了这个事实。

为了调查碎片问题，我需要有关数据库键的信息。具体来说，为了了解导致数据库出现高碎片率的原因，我想估计键值的大小、它们的 TTL 以及它们是否正在被使用。唯一的问题是，该特定数据库拥有超过 500,000,000 个键。因此，迭代所有键以获取我正在寻找的信息是不切实际的。因此，我没有使用蛮力方法，而是开发了一个小型 Python 脚本，它帮助我快速获得了数据的良好估计。这是一个脚本输出的示例

Skipped 0 keys
Size range  Count   Volatile   Avg TTL  Avg idle
0-49        9346    9346       188522   26039
600-649     32      32         35055    48105
650-699     241     241        35690    47514
700-749     231     231        41808    41045
750-799     62      62         42681    40406
800-849     64      64         42840    39630
850-899     17      17         59546    24997
900-949     3       3          82829    3570
1050-1099   4       4          44159    39322

我的脚本并没有处理整个数据山，而是基本上使用少量（可定义的）随机样本来生成我需要的数据（即平均数据大小、TTL 等）。虽然脚本的结果不如获取和处理所有数据那样准确，但它为我提供了我正在寻找的信息。您可以在下面立即找到脚本的源代码，我希望您会发现它有用。

import redis

r = redis.Redis()
base = 0 # Smallest allowed key in samples
jump = 50 # Key size bins
top = 1300 # Largest allowed key in sample
samples = 1000 # Numbers of samples

bins = []
for i in xrange(1+(top-base)/jump):
  bins.append({'count':0,'ttl':0,'idle':0,'volatile':0})

found = 0
for i in range(samples):
    k = r.randomkey()
    idle = r.object("idletime", k) # Must read idle time first before accessing the key
    if not r.type(k) == 'string':
        continue
    l = r.strlen(k)
    if l < base or l > top:
        continue
    found += 1
    ttl = r.ttl(k)
    b = bins[(l - base)/jump]
    b['count'] += 1
    if ttl is not None:
        b['ttl'] += ttl
        b['volatile'] += 1
    b['idle'] += idle

start = base
print "Skipped %d keys"%(samples - sum([b['count'] for b in bins]))
print '%-13s %-10s %-10s %-10s %-10s'%('Size range', 'Count', 'Volatile', 'Avg TTL', 'Avg idle')
for b in bins:
    if b['count']:
        print "%-13s %-10d %-10d %-10d %-10d"%('%d-%d'%(start, start+jump-1), b['count'], b['volatile'], b['ttl']/b['volatile'] if b['volatile'] else 0, b['idle']/b['count'])
    start += jump