语义缓存

使用 RedisVL 进行语义缓存

注意
本文档是 此 Jupyter 笔记本 的转换形式。

在开始之前,请确保以下事项

  1. 您已安装 RedisVL 并已激活该环境。
  2. 您有一个运行的 Redis 实例,并具有搜索和查询功能。

LLM 的语义缓存

RedisVL 提供了一个 SemanticCache 接口,该接口使用 Redis 的内置缓存功能和向量搜索来存储先前回答问题的响应。这减少了发送到 LLM 服务的请求和令牌数量,降低了成本,并通过减少生成响应所需的时间来提高应用程序吞吐量。

本文档将教你如何将 Redis 用作应用程序的语义缓存。

首先导入 OpenAI,以便你可以使用其 API 来响应用户提示。你还会创建一个简单的 ask_openai 帮助方法来辅助。

import os
import getpass
import time

from openai import OpenAI

import numpy as np

os.environ["TOKENIZERS_PARALLELISM"] = "False"

api_key = os.getenv("OPENAI_API_KEY") or getpass.getpass("Enter your OpenAI API key: ")

client = OpenAI(api_key=api_key)

def ask_openai(question: str) -> str:
    response = client.completions.create(
      model="gpt-3.5-turbo-instruct",
      prompt=question,
      max_tokens=200
    )
    return response.choices[0].text.strip()
# Test
print(ask_openai("What is the capital of France?"))
The capital of France is Paris.

初始化 SemanticCache

初始化时,SemanticCache 将自动在 Redis 中为语义缓存内容创建索引。

from redisvl.extensions.llmcache import SemanticCache

llmcache = SemanticCache(
    name="llmcache",                     # underlying search index name
    prefix="llmcache",                   # redis key prefix for hash entries
    redis_url="redis://127.0.0.1:6379",  # redis connection url string
    distance_threshold=0.1               # semantic cache distance threshold
)
# look at the index specification created for the semantic cache lookup
$ rvl index info -i llmcache

    Index Information:
    ╭──────────────┬────────────────┬──────────────┬─────────────────┬────────────╮
     Index Name    Storage Type    Prefixes      Index Options      Indexing 
    ├──────────────┼────────────────┼──────────────┼─────────────────┼────────────┤
     llmcache      HASH            ['llmcache']  []                        0 
    ╰──────────────┴────────────────┴──────────────┴─────────────────┴────────────╯
    Index Fields:
    ╭───────────────┬───────────────┬────────┬────────────────┬────────────────╮
     Name           Attribute      Type    Field Option      Option Value 
    ├───────────────┼───────────────┼────────┼────────────────┼────────────────┤
     prompt         prompt         TEXT    WEIGHT                       1 
     response       response       TEXT    WEIGHT                       1 
     prompt_vector  prompt_vector  VECTOR                                 
    ╰───────────────┴───────────────┴────────┴────────────────┴────────────────╯

基本缓存使用

question = "What is the capital of France?"
# Check the semantic cache -- should be empty
if response := llmcache.check(prompt=question):
    print(response)
else:
    print("Empty cache")

    Empty cache

你的初始缓存检查应该是空的,因为你尚未在缓存中存储任何内容。在下面,将 question、正确的 response 和任何任意 metadata(作为 Python 字典对象)存储在缓存中。

# Cache the question, answer, and arbitrary metadata
llmcache.store(
    prompt=question,
    response="Paris",
    metadata={"city": "Paris", "country": "france"}
)
# Check the cache again
if response := llmcache.check(prompt=question, return_fields=["prompt", "response", "metadata"]):
    print(response)
else:
    print("Empty cache")

[{'id': 'llmcache:115049a298532be2f181edb03f766770c0db84c22aff39003fec340deaec7545', 'vector_distance': '9.53674316406e-07', 'prompt': 'What is the capital of France?', 'response': 'Paris', 'metadata': {'city': 'Paris', 'country': 'france'}}]
# Check for a semantically similar result
question = "What actually is the capital of France?"
llmcache.check(prompt=question)[0]['response']

    'Paris'

自定义距离阈值

对于大多数用例,正确的语义相似性阈值不是固定值。根据嵌入模型的选择、输入查询的属性和业务用例,阈值可能需要更改。

幸运的是,你可以像下面所示的那样在任何时候无缝地调整阈值。

# Widen the semantic distance threshold
llmcache.set_threshold(0.3)
# Really try to trick it by asking around the point
# But is able to slip just under our new threshold
question = "What is the capital city of the country in Europe that also has a city named Nice?"
llmcache.check(prompt=question)[0]['response']

    'Paris'
# Invalidate the cache completely by clearing it out
llmcache.clear()

# should be empty now
llmcache.check(prompt=question)

    []

使用 TTL

Redis 使用可选的生存时间 (TTL) 策略,以便在将来的某个时间点使单个键过期。这使你可以专注于数据流和业务逻辑,而无需担心复杂的清理任务。

SemanticCache 上设置的 TTL 策略允许你暂时保留缓存条目。将 TTL 策略设置为 5 秒。

llmcache.set_ttl(5) # 5 seconds
llmcache.store("This is a TTL test", "This is a TTL test response")

time.sleep(5)
# confirm that the cache has cleared by now on it's own
result = llmcache.check("This is a TTL test")

print(result)

[]
# Reset the TTL to null (long lived data)
llmcache.set_ttl()

简单的性能测试

接下来,你将衡量使用 SemanticCache 获得的加速效果。你将使用 time 模块来衡量使用和不使用 SemanticCache 生成响应所需的时间。

def answer_question(question: str) -> str:
    """Helper function to answer a simple question using OpenAI with a wrapper
    check for the answer in the semantic cache first.

    Args:
        question (str): User input question.

    Returns:
        str: Response.
    """
    results = llmcache.check(prompt=question)
    if results:
        return results[0]["response"]
    else:
        answer = ask_openai(question)
        return answer
start = time.time()
# asking a question -- openai response time
question = "What was the name of the first US President?"
answer = answer_question(question)
end = time.time()

print(f"Without caching, a call to openAI to answer this simple question took {end-start} seconds.")

Without caching, a call to openAI to answer this simple question took 0.5017588138580322 seconds.
llmcache.store(prompt=question, response="George Washington")
# Calculate the avg latency for caching over LLM usage
times = []

for _ in range(10):
    cached_start = time.time()
    cached_answer = answer_question(question)
    cached_end = time.time()
    times.append(cached_end-cached_start)

avg_time_with_cache = np.mean(times)
print(f"Avg time taken with LLM cache enabled: {avg_time_with_cache}")
print(f"Percentage of time saved: {round(((end - start) - avg_time_with_cache) / (end - start) * 100, 2)}%")

启用 LLM 缓存的平均时间:0.2560166358947754 节省的时间百分比:82.47%


```bash
# check the stats of the index
$ rvl stats -i llmcache

    Statistics:
    ╭─────────────────────────────┬─────────────╮
    │ Stat Key                    │ Value       │
    ├─────────────────────────────┼─────────────┤
    │ num_docs                    │ 1           │
    │ num_terms                   │ 19          │
    │ max_doc_id                  │ 3           │
    │ num_records                 │ 23          │
    │ percent_indexed             │ 1           │
    │ hash_indexing_failures      │ 0           │
    │ number_of_uses              │ 19          │
    │ bytes_per_record_avg        │ 5.30435     │
    │ doc_table_size_mb           │ 0.000134468 │
    │ inverted_sz_mb              │ 0.000116348 │
    │ key_table_size_mb           │ 2.76566e-05 │
    │ offset_bits_per_record_avg  │ 8           │
    │ offset_vectors_sz_mb        │ 2.09808e-05 │
    │ offsets_per_term_avg        │ 0.956522    │
    │ records_per_doc_avg         │ 23          │
    │ sortable_values_size_mb     │ 0           │
    │ total_indexing_time         │ 1.211       │
    │ total_inverted_index_blocks │ 19          │
    │ vector_index_sz_mb          │ 3.0161      │
    ╰─────────────────────────────┴─────────────╯
# Clear the cache AND delete the underlying index
llmcache.delete()
RATE THIS PAGE
Back to top ↑