Top-K

Top-K 是一种概率数据结构，允许您查找数据流中最常见的项目。

Top K 是 Redis 开源版中的一种概率数据结构，用于估计数据流中排名最高的 K 个元素。

在此语境中，“排名最高”是指“附加有最高数字或分数的元素”，其中分数可以是该元素在流中出现的次数计数，因此该数据结构非常适合查找流中出现频率最高的元素。一个非常常见的应用是检测网络异常和 DDoS 攻击，Top K 可以回答以下问题：来自同一地址或同一 IP 的请求流量是否突然增加？

确实，它与 Count-Min Sketch 的功能存在一些重叠，但这两种数据结构有其差异，应应用于不同的用例。

Redis 开源版对 Top-K 的实现基于 Gong Junzhi 等人提出的 HeavyKeepers 算法。它摒弃了一些旧方法，如“全部计数”和“全部接受部分计数”，转而采用“指数衰减计数”策略，该策略对小流量（mouse flow）有偏向性，而对大流量（elephant flow）影响有限。此实现结合使用两种数据结构：一个哈希表，用于存储概率计数（很像 Count-Min Sketch）；一个最小堆，用于存储计数最高的 K 个项目。这确保了比以往的概率算法更高的准确性和更短的执行时间，同时将内存使用量保持在 Sorted Set 通常所需的一小部分。它还有一个额外的好处，即当元素添加到或从 Top K 列表中移除时，能够获得实时通知。

用例

热门话题标签（社交媒体平台、新闻分发网络）

此应用回答以下问题

在过去 X 小时内，人们提及最多的 K 个话题标签是什么？
今天阅读/查看次数最高的 K 条新闻是什么？

数据流是指传入的社交媒体帖子，您可以从中解析出不同的话题标签。

TOPK.LIST 命令的时间复杂度为 O(K*log(k))，因此如果 K 很小，则无需保留所有话题标签的单独集合或有序集合。您可以直接从 Top K 本身查询。

示例

本示例将向您展示如何在网上购物时跟踪使用的关键词“bike”；例如，“bike store”和“bike handlebars”。操作如下。

使用 TOPK.RESERVE 初始化一个带有特定参数的 Top K 草图。注意：width、depth 和 decay_constant 参数可以省略，如果不存在，它们将分别设置为默认值 7、8 和 0.9。

> TOPK.RESERVE key k width depth decay_constant

使用 TOPK.ADD 向草图中添加项目。如您所见，可以同时添加多个项目。如果在添加额外项目时返回某个项目，则表示该项目已从顶部项目的最小堆中降级，下方将表示返回的项目不再属于前 5 名，否则返回 nil。这允许动态检测进入或从 Top K 列表中驱逐的“重量级”项目（heavy-hitter）。在下面的示例中，“pedals”取代了“handlebars”，在添加“pedals”后，“handlebars”被返回。另请注意，“store”和“seat”第二次添加时没有返回任何内容，因为它们已经在 Top K 中了。
使用 TOPK.LIST 列出目前为止输入的项目。

使用 TOPK.QUERY 查看项目是否在 Top K 列表中。就像 TOPK.ADD 一样，可以同时查询多个项目。

>_ Redis CLI

> TOPK.RESERVE bikes:keywords 5 2000 7 0.925
OK
> TOPK.ADD bikes:keywords store seat handlebars handles pedals tires store seat
1) (nil)
2) (nil)
3) (nil)
4) (nil)
5) (nil)
6) handlebars
7) (nil)
8) (nil)
> TOPK.LIST bikes:keywords
1) store
2) seat
3) pedals
4) tires
5) handles
> TOPK.QUERY bikes:keywords store handlebars
1) (integer) 1
2) (integer) 0

Python

"""
Code samples for Top-K pages:
    https://redis.ac.cn/docs/latest/develop/data-types/probabilistic/top-k/
"""

import redis

r = redis.Redis(decode_responses=True)


res1 = r.topk().reserve("bikes:keywords", 5, 2000, 7, 0.925)
print(res1)  # >>> True

res2 = r.topk().add(
    "bikes:keywords",
    "store",
    "seat",
    "handlebars",
    "handles",
    "pedals",
    "tires",
    "store",
    "seat",
)
print(res2)  # >>> [None, None, None, None, None, 'handlebars', None, None]

res3 = r.topk().list("bikes:keywords")
print(res3)  # >>> ['store', 'seat', 'pedals', 'tires', 'handles']

res4 = r.topk().query("bikes:keywords", "store", "handlebars")
print(res4)  # >>> [1, 0]

Node.js

import assert from 'assert';
import { createClient } from 'redis';

const client = createClient();
await client.connect();


const res1 = await client.topK.reserve('bikes:keywords', 5, {
    width: 2000,
    depth: 7,
    decay: 0.925
});
console.log(res1);  // >>> OK

const res2 = await client.topK.add('bikes:keywords', [
  'store',
  'seat',
  'handlebars',
  'handles',
  'pedals',
  'tires',
  'store',
  'seat'
]);
console.log(res2);  // >>> [null, null, null, null, null, 'handlebars', null, null]

const res3 = await client.topK.list('bikes:keywords');
console.log(res3);  // >>> ['store', 'seat', 'pedals', 'tires', 'handles']

const res4 = await client.topK.query('bikes:keywords', ['store', 'handlebars']);
console.log(res4);  // >>> [1, 0]

Java-Sync

package io.redis.examples;


public class TopKExample {

    public void run() {
        UnifiedJedis jedis = new UnifiedJedis("redis://127.0.0.1:6379");


        String res1 = jedis.topkReserve("bikes:keywords", 5L, 2000L, 7L, 0.925D);
        System.out.println(res1); // >>> True

        List<String> res2 = jedis.topkAdd("bikes:keywords",
                "store",
                "seat",
                "handlebars",
                "handles",
                "pedals",
                "tires",
                "store",
                "seat");

        System.out.println(res2); // >>> [None, None, None, None, None, 'handlebars', None, None]

        List<String> res3 = jedis.topkList("bikes:keywords");
        System.out.println(res3); // >>> ['store', 'seat', 'pedals', 'tires', 'handles']

        List<Boolean> res4 = jedis.topkQuery("bikes:keywords", "store", "handlebars");
        System.out.println(res4); // >>> [1, 0]

        jedis.close();
    }
}

package example_commands_test

import (
	"context"
	"fmt"

	"github.com/redis/go-redis/v9"
)


func ExampleClient_topk() {
	ctx := context.Background()

	rdb := redis.NewClient(&redis.Options{
		Addr:     "localhost:6379",
		Password: "", // no password docs
		DB:       0,  // use default DB
	})


	res1, err := rdb.TopKReserve(ctx, "bikes:keywords", 5).Result()

	if err != nil {
		panic(err)
	}

	fmt.Println(res1) // >>> OK

	res2, err := rdb.TopKAdd(ctx, "bikes:keywords",
		"store",
		"seat",
		"handlebars",
		"handles",
		"pedals",
		"tires",
		"store",
		"seat",
	).Result()

	if err != nil {
		panic(err)
	}

	fmt.Println(res2) // >>> [     handlebars  ]

	res3, err := rdb.TopKList(ctx, "bikes:keywords").Result()

	if err != nil {
		panic(err)
	}

	fmt.Println(res3) // [store seat pedals tires handles]

	res4, err := rdb.TopKQuery(ctx, "bikes:keywords", "store", "handlebars").Result()

	if err != nil {
		panic(err)
	}

	fmt.Println(res4) // [true false]

}


using NRedisStack.RedisStackCommands;
using NRedisStack.Tests;
using StackExchange.Redis;



public class Topk_tutorial
{
    public void run()
    {
        var muxer = ConnectionMultiplexer.Connect("localhost:6379");
        var db = muxer.GetDatabase();


        bool res1 = db.TOPK().Reserve("bikes:keywords", 5, 2000, 7, 0.925);
        Console.WriteLine(res1);    // >>> True

        RedisResult[]? res2 = db.TOPK().Add("bikes:keywords",
                "store",
                "seat",
                "handlebars",
                "handles",
                "pedals",
                "tires",
                "store",
                "seat"
        );

        if (res2 is not null)
        {
            Console.WriteLine(string.Join(", ", string.Join(", ", res2.Select(r => $"{(r.IsNull ? "Null" : r)}"))));
            // >>> Null, Null, Null, Null, Null, handlebars, Null, Null
        }

        RedisResult[] res3 = db.TOPK().List("bikes:keywords");

        if (res3 is not null)
        {
            Console.WriteLine(string.Join(", ", string.Join(", ", res3.Select(r => $"{(r.IsNull ? "Null" : r)}"))));
            // >>> store, seat, pedals, tires, handles
        }

        bool[] res4 = db.TOPK().Query("bikes:keywords", "store", "handlebars");
        Console.WriteLine(string.Join(", ", res4)); // >>> True, False

        // Tests for 'topk' step.


    }
}

容量规划

为 Top K 草图选择大小相对容易，因为您只需要设置的两个参数与您希望在列表中保留的元素数量 (K) 直接相关。

如果您知道所需的 k，就可以轻松推导出 width 和 depth

width = k*log(k)
depth =  log(k)  # but a minimum of 5

对于 decay_constant，您可以使用值 0.9，该值在许多情况下被认为是最佳的，但您可以尝试不同的值，找到最适合您用例的值。

性能

Top-K 中的插入操作时间复杂度为 O(K + depth) ≈ O(K)，查找操作时间复杂度为 O(K)，其中 K 是列表中要保留的顶部元素数量，depth 是使用的哈希函数数量。

学术资源

HeavyKeeper: 一种用于查找 Top-k 大象流的精确算法。

参考资料

认识 Top-K：RedisBloom 的绝佳概率数据结构补充

产品

工具

获取 Redis

联系

学习

最新

了解工作原理

Top-K

用例

示例

容量规划

性能

学术资源

参考资料

本页内容