重排器

在本 Notebook 中,我们将展示如何使用 RedisVL 根据输入查询对搜索结果(文档、块或记录)进行重排。目前 RedisVL 支持通过以下方式进行重排:

在运行本 Notebook 之前,请确保:

  1. 已安装 redisvl 并为本 Notebook 激活该环境。
  2. 有一个正在运行的 Redis Stack 实例,并激活了 RediSearch > 2.4。

例如,您可以使用 Docker 在本地运行 Redis Stack

docker run -d -p 6379:6379 -p 8001:8001 redis/redis-stack:latest

这将使 Redis 在端口 6379 上运行,RedisInsight 在 http://localhost:8001 上运行。

# import necessary modules
import os

简单重排

重排为传统(词法)或语义搜索策略生成的搜索结果提供关联度提升。

作为一个简单演示,请看下面的段落和用户查询:

query = "What is the capital of the United States?"
docs = [
    "Carson City is the capital city of the American state of Nevada. At the 2010 United States Census, Carson City had a population of 55,274.",
    "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division controlled by the United States. Its capital is Saipan.",
    "Charlotte Amalie is the capital and largest city of the United States Virgin Islands. It has about 20,000 people. The city is on the island of Saint Thomas.",
    "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district. The President of the USA and many major national government offices are in the territory. This makes it the political center of the United States of America.",
    "Capital punishment (the death penalty) has existed in the United States since before the United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states. The federal government (including the United States military) also uses capital punishment."
]

重排的目标是为初始搜索结果提供更精细的质量改进。使用 RedisVL,这很可能是来自全文搜索或向量搜索等搜索操作的结果。

使用 Cross-Encoder 重排器

要使用 cross-encoder 重排器,我们需要初始化一个 HFCrossEncoderReranker 实例,并传递一个合适的模型(如果未提供模型,则使用 cross-encoder/ms-marco-MiniLM-L-6-v2 模型)

from redisvl.utils.rerank import HFCrossEncoderReranker

cross_encoder_reranker = HFCrossEncoderReranker("BAAI/bge-reranker-base")

使用 HFCrossEncoderReranker 重排文档

通过获得的重排器实例,我们可以根据与初始查询的相关性对文档列表进行重排和截断。

results, scores = cross_encoder_reranker.rank(query=query, docs=docs)
for result, score in zip(results, scores):
    print(score, " -- ", result)
0.07461125403642654  --  {'content': 'Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district. The President of the USA and many major national government offices are in the territory. This makes it the political center of the United States of America.'}
0.05220315232872963  --  {'content': 'Charlotte Amalie is the capital and largest city of the United States Virgin Islands. It has about 20,000 people. The city is on the island of Saint Thomas.'}
0.3802368640899658  --  {'content': 'Carson City is the capital city of the American state of Nevada. At the 2010 United States Census, Carson City had a population of 55,274.'}

使用 Cohere 重排器

要初始化 Cohere 重排器,您需要安装 cohere 库并提供正确的 Cohere API Key。

#!pip install cohere
import getpass

# setup the API Key
api_key = os.environ.get("COHERE_API_KEY") or getpass.getpass("Enter your Cohere API key: ")
from redisvl.utils.rerank import CohereReranker

cohere_reranker = CohereReranker(limit=3, api_config={"api_key": api_key})

使用 CohereReranker 重排文档

下面我们将使用 CohereReranker 根据与初始查询的相关性对上面的文档列表进行重排和截断。

results, scores = cohere_reranker.rank(query=query, docs=docs)
for result, score in zip(results, scores):
    print(score, " -- ", result)
0.9990564  --  Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district. The President of the USA and many major national government offices are in the territory. This makes it the political center of the United States of America.
0.7516481  --  Capital punishment (the death penalty) has existed in the United States since before the United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states. The federal government (including the United States military) also uses capital punishment.
0.08882029  --  The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division controlled by the United States. Its capital is Saipan.

处理半结构化文档

通常,初始结果集包含其他元数据和组件,这些可以用来引导重排的关联性。为了实现这一点,我们可以设置 rank_by 参数并提供包含这些附加字段的文档。

docs = [
    {
        "source": "wiki",
        "passage": "Carson City is the capital city of the American state of Nevada. At the 2010 United States Census, Carson City had a population of 55,274."
    },
    {
        "source": "encyclopedia",
        "passage": "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division controlled by the United States. Its capital is Saipan."
    },
    {
        "source": "textbook",
        "passage": "Charlotte Amalie is the capital and largest city of the United States Virgin Islands. It has about 20,000 people. The city is on the island of Saint Thomas."
    },
    {
        "source": "textbook",
        "passage": "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district. The President of the USA and many major national government offices are in the territory. This makes it the political center of the United States of America."
    },
    {
        "source": "wiki",
        "passage": "Capital punishment (the death penalty) has existed in the United States since before the United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states. The federal government (including the United States military) also uses capital punishment."
    }
]
results, scores = cohere_reranker.rank(query=query, docs=docs, rank_by=["passage", "source"])
for result, score in zip(results, scores):
    print(score, " -- ", result)
0.9988121  --  {'source': 'textbook', 'passage': 'Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district. The President of the USA and many major national government offices are in the territory. This makes it the political center of the United States of America.'}
0.5974905  --  {'source': 'wiki', 'passage': 'Capital punishment (the death penalty) has existed in the United States since before the United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states. The federal government (including the United States military) also uses capital punishment.'}
0.059101548  --  {'source': 'encyclopedia', 'passage': 'The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division controlled by the United States. Its capital is Saipan.'}

使用 VoyageAI 重排器

要初始化 VoyageAI 重排器,您需要安装 voyageai 库并提供正确的 VoyageAI API Key。

#!pip install voyageai
import getpass

# setup the API Key
api_key = os.environ.get("VOYAGE_API_KEY") or getpass.getpass("Enter your VoyageAI API key: ")
from redisvl.utils.rerank import VoyageAIReranker

reranker = VoyageAIReranker(model="rerank-lite-1", limit=3, api_config={"api_key": api_key})# Please check the available models at https://docs.voyageai.com/docs/reranker

使用 VoyageAIReranker 重排文档

下面我们将使用 VoyageAIReranker 根据与初始查询的相关性对上面的文档列表进行重排和截断。

results, scores = reranker.rank(query=query, docs=docs)
for result, score in zip(results, scores):
    print(score, " -- ", result)
0.796875  --  Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district. The President of the USA and many major national government offices are in the territory. This makes it the political center of the United States of America.
0.578125  --  Charlotte Amalie is the capital and largest city of the United States Virgin Islands. It has about 20,000 people. The city is on the island of Saint Thomas.
0.5625  --  Carson City is the capital city of the American state of Nevada. At the 2010 United States Census, Carson City had a population of 55,274.
评价此页面
返回顶部 ↑