Redis 作为向量数据库的快速入门指南
了解如何使用 Redis 作为向量数据库
本快速入门指南将帮助您
- 了解向量数据库是什么
- 创建 Redis 向量数据库
- 创建向量嵌入并将向量存储起来
- 查询数据并执行向量搜索
了解向量数据库
数据通常是非结构化的,这意味着它没有由定义明确的模式描述。非结构化数据的示例包括文本段落、图像、视频或音频。使用向量嵌入是存储和搜索非结构化数据的一种方法。
**什么是向量?** 在机器学习和 AI 中,向量是表示数据的数字序列。它们是模型的输入和输出,以数值形式封装了底层信息。向量将非结构化数据(如文本、图像、视频和音频)转换为机器学习模型可以处理的格式。
- **为什么它们很重要?** 向量捕获了数据中固有的复杂模式和语义含义,使它们成为各种应用的强大工具。它们使机器学习模型能够更有效地理解和操作非结构化数据。
- **增强传统搜索。** 传统关键词或词汇搜索依赖于单词或短语的精确匹配,这可能具有局限性。相比之下,向量搜索或语义搜索利用了向量嵌入中捕获的丰富信息。通过将数据映射到向量空间,相似的项目根据其含义彼此靠近。这种方法允许更准确和更有意义的搜索结果,因为它考虑了查询的上下文和语义内容,而不仅仅是使用的确切词语。
创建 Redis 向量数据库
您可以使用 Redis Stack 作为向量数据库。它允许您
- 在哈希或 JSON 文档中存储向量和关联的元数据
- 创建和配置用于搜索的二级索引
- 执行向量搜索
- 更新向量和元数据
- 删除和清理
最简单的入门方法是使用 Redis Cloud
-
创建一个 免费帐户。
-
按照说明创建免费数据库。
此免费 Redis Cloud 数据库开箱即用,包含所有 Redis Stack 功能。
或者,您可以使用 安装指南 在本地机器上安装 Redis Stack。
您需要为 Redis 服务器配置以下功能:JSON、搜索和查询。
安装所需的 Python 包
创建 Python 虚拟环境并使用 pip
安装以下依赖项
redis
:您可以在此文档网站的 客户端 部分找到有关redis-py
客户端库的更多详细信息。pandas
:Pandas 是一个数据分析库。sentence-transformers
:您将使用 SentenceTransformers 框架生成全文嵌入。tabulate
:pandas
使用tabulate
渲染 Markdown。
您还需要在 Python 代码中导入以下内容
"""
Code samples for vector database quickstart pages:
https://redis.ac.cn/docs/latest/develop/get-started/vector-database/
"""
import json
import time
import numpy as np
import pandas as pd
import requests
import redis
from redis.commands.search.field import (
NumericField,
TagField,
TextField,
VectorField,
)
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.query import Query
from sentence_transformers import SentenceTransformer
URL = ("https://raw.githubusercontent.com/bsbodden/redis_vss_getting_started"
"/main/data/bikes.json"
)
response = requests.get(URL, timeout=10)
bikes = response.json()
json.dumps(bikes[0], indent=2)
client = redis.Redis(host="localhost", port=6379, decode_responses=True)
res = client.ping()
# >>> True
pipeline = client.pipeline()
for i, bike in enumerate(bikes, start=1):
redis_key = f"bikes:{i:03}"
pipeline.json().set(redis_key, "$", bike)
res = pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
res = client.json().get("bikes:010", "$.model")
# >>> ['Summit']
keys = sorted(client.keys("bikes:*"))
# >>> ['bikes:001', 'bikes:002', ..., 'bikes:011']
descriptions = client.json().mget(keys, "$.description")
descriptions = [item for sublist in descriptions for item in sublist]
embedder = SentenceTransformer("msmarco-distilbert-base-v4")
embeddings = embedder.encode(descriptions).astype(np.float32).tolist()
VECTOR_DIMENSION = len(embeddings[0])
# >>> 768
pipeline = client.pipeline()
for key, embedding in zip(keys, embeddings):
pipeline.json().set(key, "$.description_embeddings", embedding)
pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
res = client.json().get("bikes:010")
# >>>
# {
# "model": "Summit",
# "brand": "nHill",
# "price": 1200,
# "type": "Mountain Bike",
# "specs": {
# "material": "alloy",
# "weight": "11.3"
# },
# "description": "This budget mountain bike from nHill performs well..."
# "description_embeddings": [
# -0.538114607334137,
# -0.49465855956077576,
# -0.025176964700222015,
# ...
# ]
# }
schema = (
TextField("$.model", no_stem=True, as_name="model"),
TextField("$.brand", no_stem=True, as_name="brand"),
NumericField("$.price", as_name="price"),
TagField("$.type", as_name="type"),
TextField("$.description", as_name="description"),
VectorField(
"$.description_embeddings",
"FLAT",
{
"TYPE": "FLOAT32",
"DIM": VECTOR_DIMENSION,
"DISTANCE_METRIC": "COSINE",
},
as_name="vector",
),
)
definition = IndexDefinition(prefix=["bikes:"], index_type=IndexType.JSON)
res = client.ft("idx:bikes_vss").create_index(fields=schema, definition=definition)
# >>> 'OK'
info = client.ft("idx:bikes_vss").info()
num_docs = info["num_docs"]
indexing_failures = info["hash_indexing_failures"]
# print(f"{num_docs} documents indexed with {indexing_failures} failures")
# >>> 11 documents indexed with 0 failures
query = Query("@brand:Peaknetic")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:008',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Soothe Electric bike',
# 'price': '1950', 'description_embeddings': ...
query = Query("@brand:Peaknetic").return_fields("id", "brand", "model", "price")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:008',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Soothe Electric bike',
# 'price': '1950'
# },
# Document {
# 'id': 'bikes:009',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Secto',
# 'price': '430'
# }
# ]
query = Query("@brand:Peaknetic @price:[0 1000]").return_fields(
"id", "brand", "model", "price"
)
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:009',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Secto',
# 'price': '430'
# }
# ]
queries = [
"Bike for small kids",
"Best Mountain bikes for kids",
"Cheap Mountain bike for kids",
"Female specific mountain bike",
"Road bike for beginners",
"Commuter bike for people over 60",
"Comfortable commuter bike",
"Good bike for college students",
"Mountain bike for beginners",
"Vintage bike",
"Comfortable city bike",
]
encoded_queries = embedder.encode(queries)
len(encoded_queries)
# >>> 11
def create_query_table(query, queries, encoded_queries, extra_params=None):
"""
Creates a query table.
"""
results_list = []
for i, encoded_query in enumerate(encoded_queries):
result_docs = (
client.ft("idx:bikes_vss")
.search(
query,
{"query_vector": np.array(encoded_query, dtype=np.float32).tobytes()}
| (extra_params if extra_params else {}),
)
.docs
)
for doc in result_docs:
vector_score = round(1 - float(doc.vector_score), 2)
results_list.append(
{
"query": queries[i],
"score": vector_score,
"id": doc.id,
"brand": doc.brand,
"model": doc.model,
"description": doc.description,
}
)
# Optional: convert the table to Markdown using Pandas
queries_table = pd.DataFrame(results_list)
queries_table.sort_values(
by=["query", "score"], ascending=[True, False], inplace=True
)
queries_table["query"] = queries_table.groupby("query")["query"].transform(
lambda x: [x.iloc[0]] + [""] * (len(x) - 1)
)
queries_table["description"] = queries_table["description"].apply(
lambda x: (x[:497] + "...") if len(x) > 500 else x
)
return queries_table.to_markdown(index=False)
query = (
Query("(*)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
table = create_query_table(query, queries, encoded_queries)
print(table)
# >>> | Best Mountain bikes for kids | 0.54 | bikes:003...
hybrid_query = (
Query("(@brand:Peaknetic)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
table = create_query_table(hybrid_query, queries, encoded_queries)
print(table)
# >>> | Best Mountain bikes for kids | 0.3 | bikes:008...
range_query = (
Query(
"@vector:[VECTOR_RANGE $range $query_vector]=>"
"{$YIELD_DISTANCE_AS: vector_score}"
)
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.paging(0, 4)
.dialect(2)
)
table = create_query_table(
range_query, queries[:1],
encoded_queries[:1],
{"range": 0.55}
)
print(table)
# >>> | Bike for small kids | 0.52 | bikes:001 | Velorim |...
连接
连接到 Redis。默认情况下,Redis 返回二进制响应。要对其进行解码,您将 decode_responses
参数设置为 True
"""
Code samples for vector database quickstart pages:
https://redis.ac.cn/docs/latest/develop/get-started/vector-database/
"""
import json
import time
import numpy as np
import pandas as pd
import requests
import redis
from redis.commands.search.field import (
NumericField,
TagField,
TextField,
VectorField,
)
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.query import Query
from sentence_transformers import SentenceTransformer
URL = ("https://raw.githubusercontent.com/bsbodden/redis_vss_getting_started"
"/main/data/bikes.json"
)
response = requests.get(URL, timeout=10)
bikes = response.json()
json.dumps(bikes[0], indent=2)
client = redis.Redis(host="localhost", port=6379, decode_responses=True)
res = client.ping()
# >>> True
pipeline = client.pipeline()
for i, bike in enumerate(bikes, start=1):
redis_key = f"bikes:{i:03}"
pipeline.json().set(redis_key, "$", bike)
res = pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
res = client.json().get("bikes:010", "$.model")
# >>> ['Summit']
keys = sorted(client.keys("bikes:*"))
# >>> ['bikes:001', 'bikes:002', ..., 'bikes:011']
descriptions = client.json().mget(keys, "$.description")
descriptions = [item for sublist in descriptions for item in sublist]
embedder = SentenceTransformer("msmarco-distilbert-base-v4")
embeddings = embedder.encode(descriptions).astype(np.float32).tolist()
VECTOR_DIMENSION = len(embeddings[0])
# >>> 768
pipeline = client.pipeline()
for key, embedding in zip(keys, embeddings):
pipeline.json().set(key, "$.description_embeddings", embedding)
pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
res = client.json().get("bikes:010")
# >>>
# {
# "model": "Summit",
# "brand": "nHill",
# "price": 1200,
# "type": "Mountain Bike",
# "specs": {
# "material": "alloy",
# "weight": "11.3"
# },
# "description": "This budget mountain bike from nHill performs well..."
# "description_embeddings": [
# -0.538114607334137,
# -0.49465855956077576,
# -0.025176964700222015,
# ...
# ]
# }
schema = (
TextField("$.model", no_stem=True, as_name="model"),
TextField("$.brand", no_stem=True, as_name="brand"),
NumericField("$.price", as_name="price"),
TagField("$.type", as_name="type"),
TextField("$.description", as_name="description"),
VectorField(
"$.description_embeddings",
"FLAT",
{
"TYPE": "FLOAT32",
"DIM": VECTOR_DIMENSION,
"DISTANCE_METRIC": "COSINE",
},
as_name="vector",
),
)
definition = IndexDefinition(prefix=["bikes:"], index_type=IndexType.JSON)
res = client.ft("idx:bikes_vss").create_index(fields=schema, definition=definition)
# >>> 'OK'
info = client.ft("idx:bikes_vss").info()
num_docs = info["num_docs"]
indexing_failures = info["hash_indexing_failures"]
# print(f"{num_docs} documents indexed with {indexing_failures} failures")
# >>> 11 documents indexed with 0 failures
query = Query("@brand:Peaknetic")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:008',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Soothe Electric bike',
# 'price': '1950', 'description_embeddings': ...
query = Query("@brand:Peaknetic").return_fields("id", "brand", "model", "price")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:008',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Soothe Electric bike',
# 'price': '1950'
# },
# Document {
# 'id': 'bikes:009',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Secto',
# 'price': '430'
# }
# ]
query = Query("@brand:Peaknetic @price:[0 1000]").return_fields(
"id", "brand", "model", "price"
)
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:009',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Secto',
# 'price': '430'
# }
# ]
queries = [
"Bike for small kids",
"Best Mountain bikes for kids",
"Cheap Mountain bike for kids",
"Female specific mountain bike",
"Road bike for beginners",
"Commuter bike for people over 60",
"Comfortable commuter bike",
"Good bike for college students",
"Mountain bike for beginners",
"Vintage bike",
"Comfortable city bike",
]
encoded_queries = embedder.encode(queries)
len(encoded_queries)
# >>> 11
def create_query_table(query, queries, encoded_queries, extra_params=None):
"""
Creates a query table.
"""
results_list = []
for i, encoded_query in enumerate(encoded_queries):
result_docs = (
client.ft("idx:bikes_vss")
.search(
query,
{"query_vector": np.array(encoded_query, dtype=np.float32).tobytes()}
| (extra_params if extra_params else {}),
)
.docs
)
for doc in result_docs:
vector_score = round(1 - float(doc.vector_score), 2)
results_list.append(
{
"query": queries[i],
"score": vector_score,
"id": doc.id,
"brand": doc.brand,
"model": doc.model,
"description": doc.description,
}
)
# Optional: convert the table to Markdown using Pandas
queries_table = pd.DataFrame(results_list)
queries_table.sort_values(
by=["query", "score"], ascending=[True, False], inplace=True
)
queries_table["query"] = queries_table.groupby("query")["query"].transform(
lambda x: [x.iloc[0]] + [""] * (len(x) - 1)
)
queries_table["description"] = queries_table["description"].apply(
lambda x: (x[:497] + "...") if len(x) > 500 else x
)
return queries_table.to_markdown(index=False)
query = (
Query("(*)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
table = create_query_table(query, queries, encoded_queries)
print(table)
# >>> | Best Mountain bikes for kids | 0.54 | bikes:003...
hybrid_query = (
Query("(@brand:Peaknetic)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
table = create_query_table(hybrid_query, queries, encoded_queries)
print(table)
# >>> | Best Mountain bikes for kids | 0.3 | bikes:008...
range_query = (
Query(
"@vector:[VECTOR_RANGE $range $query_vector]=>"
"{$YIELD_DISTANCE_AS: vector_score}"
)
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.paging(0, 4)
.dialect(2)
)
table = create_query_table(
range_query, queries[:1],
encoded_queries[:1],
{"range": 0.55}
)
print(table)
# >>> | Bike for small kids | 0.52 | bikes:001 | Velorim |...
us-east-1
中的 Cloud 数据库的连接字符串示例,该数据库侦听端口 16379:redis-16379.c283.us-east-1-4.ec2.cloud.redislabs.com:16379
。连接字符串的格式为 host:port
。您还必须复制并粘贴 Cloud 数据库的用户名和密码。使用默认用户的连接代码行随后更改为 client = redis.Redis(host="redis-16379.c283.us-east-1-4.ec2.cloud.redislabs.com", port=16379, password="your_password_here" decode_responses=True)
。准备演示数据集
本快速入门指南还使用 bikes 数据集。以下是一个来自它的示例文档
{
"model": "Jigger",
"brand": "Velorim",
"price": 270,
"type": "Kids bikes",
"specs": {
"material": "aluminium",
"weight": "10"
},
"description": "Small and powerful, the Jigger is the best ride for the smallest of tikes! ..."
}
description
字段包含自行车自由格式的文本描述,并将用于创建向量嵌入。
1. 获取演示数据
您需要首先将演示数据集作为 JSON 数组获取
"""
Code samples for vector database quickstart pages:
https://redis.ac.cn/docs/latest/develop/get-started/vector-database/
"""
import json
import time
import numpy as np
import pandas as pd
import requests
import redis
from redis.commands.search.field import (
NumericField,
TagField,
TextField,
VectorField,
)
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.query import Query
from sentence_transformers import SentenceTransformer
URL = ("https://raw.githubusercontent.com/bsbodden/redis_vss_getting_started"
"/main/data/bikes.json"
)
response = requests.get(URL, timeout=10)
bikes = response.json()
json.dumps(bikes[0], indent=2)
client = redis.Redis(host="localhost", port=6379, decode_responses=True)
res = client.ping()
# >>> True
pipeline = client.pipeline()
for i, bike in enumerate(bikes, start=1):
redis_key = f"bikes:{i:03}"
pipeline.json().set(redis_key, "$", bike)
res = pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
res = client.json().get("bikes:010", "$.model")
# >>> ['Summit']
keys = sorted(client.keys("bikes:*"))
# >>> ['bikes:001', 'bikes:002', ..., 'bikes:011']
descriptions = client.json().mget(keys, "$.description")
descriptions = [item for sublist in descriptions for item in sublist]
embedder = SentenceTransformer("msmarco-distilbert-base-v4")
embeddings = embedder.encode(descriptions).astype(np.float32).tolist()
VECTOR_DIMENSION = len(embeddings[0])
# >>> 768
pipeline = client.pipeline()
for key, embedding in zip(keys, embeddings):
pipeline.json().set(key, "$.description_embeddings", embedding)
pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
res = client.json().get("bikes:010")
# >>>
# {
# "model": "Summit",
# "brand": "nHill",
# "price": 1200,
# "type": "Mountain Bike",
# "specs": {
# "material": "alloy",
# "weight": "11.3"
# },
# "description": "This budget mountain bike from nHill performs well..."
# "description_embeddings": [
# -0.538114607334137,
# -0.49465855956077576,
# -0.025176964700222015,
# ...
# ]
# }
schema = (
TextField("$.model", no_stem=True, as_name="model"),
TextField("$.brand", no_stem=True, as_name="brand"),
NumericField("$.price", as_name="price"),
TagField("$.type", as_name="type"),
TextField("$.description", as_name="description"),
VectorField(
"$.description_embeddings",
"FLAT",
{
"TYPE": "FLOAT32",
"DIM": VECTOR_DIMENSION,
"DISTANCE_METRIC": "COSINE",
},
as_name="vector",
),
)
definition = IndexDefinition(prefix=["bikes:"], index_type=IndexType.JSON)
res = client.ft("idx:bikes_vss").create_index(fields=schema, definition=definition)
# >>> 'OK'
info = client.ft("idx:bikes_vss").info()
num_docs = info["num_docs"]
indexing_failures = info["hash_indexing_failures"]
# print(f"{num_docs} documents indexed with {indexing_failures} failures")
# >>> 11 documents indexed with 0 failures
query = Query("@brand:Peaknetic")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:008',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Soothe Electric bike',
# 'price': '1950', 'description_embeddings': ...
query = Query("@brand:Peaknetic").return_fields("id", "brand", "model", "price")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:008',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Soothe Electric bike',
# 'price': '1950'
# },
# Document {
# 'id': 'bikes:009',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Secto',
# 'price': '430'
# }
# ]
query = Query("@brand:Peaknetic @price:[0 1000]").return_fields(
"id", "brand", "model", "price"
)
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:009',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Secto',
# 'price': '430'
# }
# ]
queries = [
"Bike for small kids",
"Best Mountain bikes for kids",
"Cheap Mountain bike for kids",
"Female specific mountain bike",
"Road bike for beginners",
"Commuter bike for people over 60",
"Comfortable commuter bike",
"Good bike for college students",
"Mountain bike for beginners",
"Vintage bike",
"Comfortable city bike",
]
encoded_queries = embedder.encode(queries)
len(encoded_queries)
# >>> 11
def create_query_table(query, queries, encoded_queries, extra_params=None):
"""
Creates a query table.
"""
results_list = []
for i, encoded_query in enumerate(encoded_queries):
result_docs = (
client.ft("idx:bikes_vss")
.search(
query,
{"query_vector": np.array(encoded_query, dtype=np.float32).tobytes()}
| (extra_params if extra_params else {}),
)
.docs
)
for doc in result_docs:
vector_score = round(1 - float(doc.vector_score), 2)
results_list.append(
{
"query": queries[i],
"score": vector_score,
"id": doc.id,
"brand": doc.brand,
"model": doc.model,
"description": doc.description,
}
)
# Optional: convert the table to Markdown using Pandas
queries_table = pd.DataFrame(results_list)
queries_table.sort_values(
by=["query", "score"], ascending=[True, False], inplace=True
)
queries_table["query"] = queries_table.groupby("query")["query"].transform(
lambda x: [x.iloc[0]] + [""] * (len(x) - 1)
)
queries_table["description"] = queries_table["description"].apply(
lambda x: (x[:497] + "...") if len(x) > 500 else x
)
return queries_table.to_markdown(index=False)
query = (
Query("(*)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
table = create_query_table(query, queries, encoded_queries)
print(table)
# >>> | Best Mountain bikes for kids | 0.54 | bikes:003...
hybrid_query = (
Query("(@brand:Peaknetic)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
table = create_query_table(hybrid_query, queries, encoded_queries)
print(table)
# >>> | Best Mountain bikes for kids | 0.3 | bikes:008...
range_query = (
Query(
"@vector:[VECTOR_RANGE $range $query_vector]=>"
"{$YIELD_DISTANCE_AS: vector_score}"
)
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.paging(0, 4)
.dialect(2)
)
table = create_query_table(
range_query, queries[:1],
encoded_queries[:1],
{"range": 0.55}
)
print(table)
# >>> | Bike for small kids | 0.52 | bikes:001 | Velorim |...
检查其中一个自行车 JSON 文档的结构
"""
Code samples for vector database quickstart pages:
https://redis.ac.cn/docs/latest/develop/get-started/vector-database/
"""
import json
import time
import numpy as np
import pandas as pd
import requests
import redis
from redis.commands.search.field import (
NumericField,
TagField,
TextField,
VectorField,
)
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.query import Query
from sentence_transformers import SentenceTransformer
URL = ("https://raw.githubusercontent.com/bsbodden/redis_vss_getting_started"
"/main/data/bikes.json"
)
response = requests.get(URL, timeout=10)
bikes = response.json()
json.dumps(bikes[0], indent=2)
client = redis.Redis(host="localhost", port=6379, decode_responses=True)
res = client.ping()
# >>> True
pipeline = client.pipeline()
for i, bike in enumerate(bikes, start=1):
redis_key = f"bikes:{i:03}"
pipeline.json().set(redis_key, "$", bike)
res = pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
res = client.json().get("bikes:010", "$.model")
# >>> ['Summit']
keys = sorted(client.keys("bikes:*"))
# >>> ['bikes:001', 'bikes:002', ..., 'bikes:011']
descriptions = client.json().mget(keys, "$.description")
descriptions = [item for sublist in descriptions for item in sublist]
embedder = SentenceTransformer("msmarco-distilbert-base-v4")
embeddings = embedder.encode(descriptions).astype(np.float32).tolist()
VECTOR_DIMENSION = len(embeddings[0])
# >>> 768
pipeline = client.pipeline()
for key, embedding in zip(keys, embeddings):
pipeline.json().set(key, "$.description_embeddings", embedding)
pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
res = client.json().get("bikes:010")
# >>>
# {
# "model": "Summit",
# "brand": "nHill",
# "price": 1200,
# "type": "Mountain Bike",
# "specs": {
# "material": "alloy",
# "weight": "11.3"
# },
# "description": "This budget mountain bike from nHill performs well..."
# "description_embeddings": [
# -0.538114607334137,
# -0.49465855956077576,
# -0.025176964700222015,
# ...
# ]
# }
schema = (
TextField("$.model", no_stem=True, as_name="model"),
TextField("$.brand", no_stem=True, as_name="brand"),
NumericField("$.price", as_name="price"),
TagField("$.type", as_name="type"),
TextField("$.description", as_name="description"),
VectorField(
"$.description_embeddings",
"FLAT",
{
"TYPE": "FLOAT32",
"DIM": VECTOR_DIMENSION,
"DISTANCE_METRIC": "COSINE",
},
as_name="vector",
),
)
definition = IndexDefinition(prefix=["bikes:"], index_type=IndexType.JSON)
res = client.ft("idx:bikes_vss").create_index(fields=schema, definition=definition)
# >>> 'OK'
info = client.ft("idx:bikes_vss").info()
num_docs = info["num_docs"]
indexing_failures = info["hash_indexing_failures"]
# print(f"{num_docs} documents indexed with {indexing_failures} failures")
# >>> 11 documents indexed with 0 failures
query = Query("@brand:Peaknetic")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:008',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Soothe Electric bike',
# 'price': '1950', 'description_embeddings': ...
query = Query("@brand:Peaknetic").return_fields("id", "brand", "model", "price")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:008',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Soothe Electric bike',
# 'price': '1950'
# },
# Document {
# 'id': 'bikes:009',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Secto',
# 'price': '430'
# }
# ]
query = Query("@brand:Peaknetic @price:[0 1000]").return_fields(
"id", "brand", "model", "price"
)
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:009',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Secto',
# 'price': '430'
# }
# ]
queries = [
"Bike for small kids",
"Best Mountain bikes for kids",
"Cheap Mountain bike for kids",
"Female specific mountain bike",
"Road bike for beginners",
"Commuter bike for people over 60",
"Comfortable commuter bike",
"Good bike for college students",
"Mountain bike for beginners",
"Vintage bike",
"Comfortable city bike",
]
encoded_queries = embedder.encode(queries)
len(encoded_queries)
# >>> 11
def create_query_table(query, queries, encoded_queries, extra_params=None):
"""
Creates a query table.
"""
results_list = []
for i, encoded_query in enumerate(encoded_queries):
result_docs = (
client.ft("idx:bikes_vss")
.search(
query,
{"query_vector": np.array(encoded_query, dtype=np.float32).tobytes()}
| (extra_params if extra_params else {}),
)
.docs
)
for doc in result_docs:
vector_score = round(1 - float(doc.vector_score), 2)
results_list.append(
{
"query": queries[i],
"score": vector_score,
"id": doc.id,
"brand": doc.brand,
"model": doc.model,
"description": doc.description,
}
)
# Optional: convert the table to Markdown using Pandas
queries_table = pd.DataFrame(results_list)
queries_table.sort_values(
by=["query", "score"], ascending=[True, False], inplace=True
)
queries_table["query"] = queries_table.groupby("query")["query"].transform(
lambda x: [x.iloc[0]] + [""] * (len(x) - 1)
)
queries_table["description"] = queries_table["description"].apply(
lambda x: (x[:497] + "...") if len(x) > 500 else x
)
return queries_table.to_markdown(index=False)
query = (
Query("(*)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
table = create_query_table(query, queries, encoded_queries)
print(table)
# >>> | Best Mountain bikes for kids | 0.54 | bikes:003...
hybrid_query = (
Query("(@brand:Peaknetic)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
table = create_query_table(hybrid_query, queries, encoded_queries)
print(table)
# >>> | Best Mountain bikes for kids | 0.3 | bikes:008...
range_query = (
Query(
"@vector:[VECTOR_RANGE $range $query_vector]=>"
"{$YIELD_DISTANCE_AS: vector_score}"
)
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.paging(0, 4)
.dialect(2)
)
table = create_query_table(
range_query, queries[:1],
encoded_queries[:1],
{"range": 0.55}
)
print(table)
# >>> | Bike for small kids | 0.52 | bikes:001 | Velorim |...
2. 将演示数据存储在 Redis 中
现在遍历 bikes
数组,通过使用 JSON.SET 命令将数据作为 JSON 文档存储在 Redis 中。以下代码使用 管道 来最大程度地减少网络往返时间
"""
Code samples for vector database quickstart pages:
https://redis.ac.cn/docs/latest/develop/get-started/vector-database/
"""
import json
import time
import numpy as np
import pandas as pd
import requests
import redis
from redis.commands.search.field import (
NumericField,
TagField,
TextField,
VectorField,
)
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.query import Query
from sentence_transformers import SentenceTransformer
URL = ("https://raw.githubusercontent.com/bsbodden/redis_vss_getting_started"
"/main/data/bikes.json"
)
response = requests.get(URL, timeout=10)
bikes = response.json()
json.dumps(bikes[0], indent=2)
client = redis.Redis(host="localhost", port=6379, decode_responses=True)
res = client.ping()
# >>> True
pipeline = client.pipeline()
for i, bike in enumerate(bikes, start=1):
redis_key = f"bikes:{i:03}"
pipeline.json().set(redis_key, "$", bike)
res = pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
res = client.json().get("bikes:010", "$.model")
# >>> ['Summit']
keys = sorted(client.keys("bikes:*"))
# >>> ['bikes:001', 'bikes:002', ..., 'bikes:011']
descriptions = client.json().mget(keys, "$.description")
descriptions = [item for sublist in descriptions for item in sublist]
embedder = SentenceTransformer("msmarco-distilbert-base-v4")
embeddings = embedder.encode(descriptions).astype(np.float32).tolist()
VECTOR_DIMENSION = len(embeddings[0])
# >>> 768
pipeline = client.pipeline()
for key, embedding in zip(keys, embeddings):
pipeline.json().set(key, "$.description_embeddings", embedding)
pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
res = client.json().get("bikes:010")
# >>>
# {
# "model": "Summit",
# "brand": "nHill",
# "price": 1200,
# "type": "Mountain Bike",
# "specs": {
# "material": "alloy",
# "weight": "11.3"
# },
# "description": "This budget mountain bike from nHill performs well..."
# "description_embeddings": [
# -0.538114607334137,
# -0.49465855956077576,
# -0.025176964700222015,
# ...
# ]
# }
schema = (
TextField("$.model", no_stem=True, as_name="model"),
TextField("$.brand", no_stem=True, as_name="brand"),
NumericField("$.price", as_name="price"),
TagField("$.type", as_name="type"),
TextField("$.description", as_name="description"),
VectorField(
"$.description_embeddings",
"FLAT",
{
"TYPE": "FLOAT32",
"DIM": VECTOR_DIMENSION,
"DISTANCE_METRIC": "COSINE",
},
as_name="vector",
),
)
definition = IndexDefinition(prefix=["bikes:"], index_type=IndexType.JSON)
res = client.ft("idx:bikes_vss").create_index(fields=schema, definition=definition)
# >>> 'OK'
info = client.ft("idx:bikes_vss").info()
num_docs = info["num_docs"]
indexing_failures = info["hash_indexing_failures"]
# print(f"{num_docs} documents indexed with {indexing_failures} failures")
# >>> 11 documents indexed with 0 failures
query = Query("@brand:Peaknetic")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:008',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Soothe Electric bike',
# 'price': '1950', 'description_embeddings': ...
query = Query("@brand:Peaknetic").return_fields("id", "brand", "model", "price")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:008',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Soothe Electric bike',
# 'price': '1950'
# },
# Document {
# 'id': 'bikes:009',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Secto',
# 'price': '430'
# }
# ]
query = Query("@brand:Peaknetic @price:[0 1000]").return_fields(
"id", "brand", "model", "price"
)
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:009',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Secto',
# 'price': '430'
# }
# ]
queries = [
"Bike for small kids",
"Best Mountain bikes for kids",
"Cheap Mountain bike for kids",
"Female specific mountain bike",
"Road bike for beginners",
"Commuter bike for people over 60",
"Comfortable commuter bike",
"Good bike for college students",
"Mountain bike for beginners",
"Vintage bike",
"Comfortable city bike",
]
encoded_queries = embedder.encode(queries)
len(encoded_queries)
# >>> 11
def create_query_table(query, queries, encoded_queries, extra_params=None):
"""
Creates a query table.
"""
results_list = []
for i, encoded_query in enumerate(encoded_queries):
result_docs = (
client.ft("idx:bikes_vss")
.search(
query,
{"query_vector": np.array(encoded_query, dtype=np.float32).tobytes()}
| (extra_params if extra_params else {}),
)
.docs
)
for doc in result_docs:
vector_score = round(1 - float(doc.vector_score), 2)
results_list.append(
{
"query": queries[i],
"score": vector_score,
"id": doc.id,
"brand": doc.brand,
"model": doc.model,
"description": doc.description,
}
)
# Optional: convert the table to Markdown using Pandas
queries_table = pd.DataFrame(results_list)
queries_table.sort_values(
by=["query", "score"], ascending=[True, False], inplace=True
)
queries_table["query"] = queries_table.groupby("query")["query"].transform(
lambda x: [x.iloc[0]] + [""] * (len(x) - 1)
)
queries_table["description"] = queries_table["description"].apply(
lambda x: (x[:497] + "...") if len(x) > 500 else x
)
return queries_table.to_markdown(index=False)
query = (
Query("(*)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
table = create_query_table(query, queries, encoded_queries)
print(table)
# >>> | Best Mountain bikes for kids | 0.54 | bikes:003...
hybrid_query = (
Query("(@brand:Peaknetic)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
table = create_query_table(hybrid_query, queries, encoded_queries)
print(table)
# >>> | Best Mountain bikes for kids | 0.3 | bikes:008...
range_query = (
Query(
"@vector:[VECTOR_RANGE $range $query_vector]=>"
"{$YIELD_DISTANCE_AS: vector_score}"
)
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.paging(0, 4)
.dialect(2)
)
table = create_query_table(
range_query, queries[:1],
encoded_queries[:1],
{"range": 0.55}
)
print(table)
# >>> | Bike for small kids | 0.52 | bikes:001 | Velorim |...
加载后,您可以使用 JSONPath 表达式从 Redis 中的某个 JSON 文档中检索特定属性
"""
Code samples for vector database quickstart pages:
https://redis.ac.cn/docs/latest/develop/get-started/vector-database/
"""
import json
import time
import numpy as np
import pandas as pd
import requests
import redis
from redis.commands.search.field import (
NumericField,
TagField,
TextField,
VectorField,
)
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.query import Query
from sentence_transformers import SentenceTransformer
URL = ("https://raw.githubusercontent.com/bsbodden/redis_vss_getting_started"
"/main/data/bikes.json"
)
response = requests.get(URL, timeout=10)
bikes = response.json()
json.dumps(bikes[0], indent=2)
client = redis.Redis(host="localhost", port=6379, decode_responses=True)
res = client.ping()
# >>> True
pipeline = client.pipeline()
for i, bike in enumerate(bikes, start=1):
redis_key = f"bikes:{i:03}"
pipeline.json().set(redis_key, "$", bike)
res = pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
res = client.json().get("bikes:010", "$.model")
# >>> ['Summit']
keys = sorted(client.keys("bikes:*"))
# >>> ['bikes:001', 'bikes:002', ..., 'bikes:011']
descriptions = client.json().mget(keys, "$.description")
descriptions = [item for sublist in descriptions for item in sublist]
embedder = SentenceTransformer("msmarco-distilbert-base-v4")
embeddings = embedder.encode(descriptions).astype(np.float32).tolist()
VECTOR_DIMENSION = len(embeddings[0])
# >>> 768
pipeline = client.pipeline()
for key, embedding in zip(keys, embeddings):
pipeline.json().set(key, "$.description_embeddings", embedding)
pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
res = client.json().get("bikes:010")
# >>>
# {
# "model": "Summit",
# "brand": "nHill",
# "price": 1200,
# "type": "Mountain Bike",
# "specs": {
# "material": "alloy",
# "weight": "11.3"
# },
# "description": "This budget mountain bike from nHill performs well..."
# "description_embeddings": [
# -0.538114607334137,
# -0.49465855956077576,
# -0.025176964700222015,
# ...
# ]
# }
schema = (
TextField("$.model", no_stem=True, as_name="model"),
TextField("$.brand", no_stem=True, as_name="brand"),
NumericField("$.price", as_name="price"),
TagField("$.type", as_name="type"),
TextField("$.description", as_name="description"),
VectorField(
"$.description_embeddings",
"FLAT",
{
"TYPE": "FLOAT32",
"DIM": VECTOR_DIMENSION,
"DISTANCE_METRIC": "COSINE",
},
as_name="vector",
),
)
definition = IndexDefinition(prefix=["bikes:"], index_type=IndexType.JSON)
res = client.ft("idx:bikes_vss").create_index(fields=schema, definition=definition)
# >>> 'OK'
info = client.ft("idx:bikes_vss").info()
num_docs = info["num_docs"]
indexing_failures = info["hash_indexing_failures"]
# print(f"{num_docs} documents indexed with {indexing_failures} failures")
# >>> 11 documents indexed with 0 failures
query = Query("@brand:Peaknetic")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:008',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Soothe Electric bike',
# 'price': '1950', 'description_embeddings': ...
query = Query("@brand:Peaknetic").return_fields("id", "brand", "model", "price")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:008',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Soothe Electric bike',
# 'price': '1950'
# },
# Document {
# 'id': 'bikes:009',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Secto',
# 'price': '430'
# }
# ]
query = Query("@brand:Peaknetic @price:[0 1000]").return_fields(
"id", "brand", "model", "price"
)
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:009',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Secto',
# 'price': '430'
# }
# ]
queries = [
"Bike for small kids",
"Best Mountain bikes for kids",
"Cheap Mountain bike for kids",
"Female specific mountain bike",
"Road bike for beginners",
"Commuter bike for people over 60",
"Comfortable commuter bike",
"Good bike for college students",
"Mountain bike for beginners",
"Vintage bike",
"Comfortable city bike",
]
encoded_queries = embedder.encode(queries)
len(encoded_queries)
# >>> 11
def create_query_table(query, queries, encoded_queries, extra_params=None):
"""
Creates a query table.
"""
results_list = []
for i, encoded_query in enumerate(encoded_queries):
result_docs = (
client.ft("idx:bikes_vss")
.search(
query,
{"query_vector": np.array(encoded_query, dtype=np.float32).tobytes()}
| (extra_params if extra_params else {}),
)
.docs
)
for doc in result_docs:
vector_score = round(1 - float(doc.vector_score), 2)
results_list.append(
{
"query": queries[i],
"score": vector_score,
"id": doc.id,
"brand": doc.brand,
"model": doc.model,
"description": doc.description,
}
)
# Optional: convert the table to Markdown using Pandas
queries_table = pd.DataFrame(results_list)
queries_table.sort_values(
by=["query", "score"], ascending=[True, False], inplace=True
)
queries_table["query"] = queries_table.groupby("query")["query"].transform(
lambda x: [x.iloc[0]] + [""] * (len(x) - 1)
)
queries_table["description"] = queries_table["description"].apply(
lambda x: (x[:497] + "...") if len(x) > 500 else x
)
return queries_table.to_markdown(index=False)
query = (
Query("(*)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
table = create_query_table(query, queries, encoded_queries)
print(table)
# >>> | Best Mountain bikes for kids | 0.54 | bikes:003...
hybrid_query = (
Query("(@brand:Peaknetic)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
table = create_query_table(hybrid_query, queries, encoded_queries)
print(table)
# >>> | Best Mountain bikes for kids | 0.3 | bikes:008...
range_query = (
Query(
"@vector:[VECTOR_RANGE $range $query_vector]=>"
"{$YIELD_DISTANCE_AS: vector_score}"
)
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.paging(0, 4)
.dialect(2)
)
table = create_query_table(
range_query, queries[:1],
encoded_queries[:1],
{"range": 0.55}
)
print(table)
# >>> | Bike for small kids | 0.52 | bikes:001 | Velorim |...
3. 选择文本嵌入模型
HuggingFace 拥有大量文本嵌入模型目录,这些模型可以通过 SentenceTransformers
框架在本地提供服务。这里我们使用 MS MARCO 模型,该模型广泛用于搜索引擎、聊天机器人和其他 AI 应用程序中。
from sentence_transformers import SentenceTransformer
embedder = SentenceTransformer('msmarco-distilbert-base-v4')
4. 生成文本嵌入
遍历所有带有前缀 bikes:
的 Redis 键
"""
Code samples for vector database quickstart pages:
https://redis.ac.cn/docs/latest/develop/get-started/vector-database/
"""
import json
import time
import numpy as np
import pandas as pd
import requests
import redis
from redis.commands.search.field import (
NumericField,
TagField,
TextField,
VectorField,
)
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.query import Query
from sentence_transformers import SentenceTransformer
URL = ("https://raw.githubusercontent.com/bsbodden/redis_vss_getting_started"
"/main/data/bikes.json"
)
response = requests.get(URL, timeout=10)
bikes = response.json()
json.dumps(bikes[0], indent=2)
client = redis.Redis(host="localhost", port=6379, decode_responses=True)
res = client.ping()
# >>> True
pipeline = client.pipeline()
for i, bike in enumerate(bikes, start=1):
redis_key = f"bikes:{i:03}"
pipeline.json().set(redis_key, "$", bike)
res = pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
res = client.json().get("bikes:010", "$.model")
# >>> ['Summit']
keys = sorted(client.keys("bikes:*"))
# >>> ['bikes:001', 'bikes:002', ..., 'bikes:011']
descriptions = client.json().mget(keys, "$.description")
descriptions = [item for sublist in descriptions for item in sublist]
embedder = SentenceTransformer("msmarco-distilbert-base-v4")
embeddings = embedder.encode(descriptions).astype(np.float32).tolist()
VECTOR_DIMENSION = len(embeddings[0])
# >>> 768
pipeline = client.pipeline()
for key, embedding in zip(keys, embeddings):
pipeline.json().set(key, "$.description_embeddings", embedding)
pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
res = client.json().get("bikes:010")
# >>>
# {
# "model": "Summit",
# "brand": "nHill",
# "price": 1200,
# "type": "Mountain Bike",
# "specs": {
# "material": "alloy",
# "weight": "11.3"
# },
# "description": "This budget mountain bike from nHill performs well..."
# "description_embeddings": [
# -0.538114607334137,
# -0.49465855956077576,
# -0.025176964700222015,
# ...
# ]
# }
schema = (
TextField("$.model", no_stem=True, as_name="model"),
TextField("$.brand", no_stem=True, as_name="brand"),
NumericField("$.price", as_name="price"),
TagField("$.type", as_name="type"),
TextField("$.description", as_name="description"),
VectorField(
"$.description_embeddings",
"FLAT",
{
"TYPE": "FLOAT32",
"DIM": VECTOR_DIMENSION,
"DISTANCE_METRIC": "COSINE",
},
as_name="vector",
),
)
definition = IndexDefinition(prefix=["bikes:"], index_type=IndexType.JSON)
res = client.ft("idx:bikes_vss").create_index(fields=schema, definition=definition)
# >>> 'OK'
info = client.ft("idx:bikes_vss").info()
num_docs = info["num_docs"]
indexing_failures = info["hash_indexing_failures"]
# print(f"{num_docs} documents indexed with {indexing_failures} failures")
# >>> 11 documents indexed with 0 failures
query = Query("@brand:Peaknetic")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:008',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Soothe Electric bike',
# 'price': '1950', 'description_embeddings': ...
query = Query("@brand:Peaknetic").return_fields("id", "brand", "model", "price")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:008',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Soothe Electric bike',
# 'price': '1950'
# },
# Document {
# 'id': 'bikes:009',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Secto',
# 'price': '430'
# }
# ]
query = Query("@brand:Peaknetic @price:[0 1000]").return_fields(
"id", "brand", "model", "price"
)
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:009',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Secto',
# 'price': '430'
# }
# ]
queries = [
"Bike for small kids",
"Best Mountain bikes for kids",
"Cheap Mountain bike for kids",
"Female specific mountain bike",
"Road bike for beginners",
"Commuter bike for people over 60",
"Comfortable commuter bike",
"Good bike for college students",
"Mountain bike for beginners",
"Vintage bike",
"Comfortable city bike",
]
encoded_queries = embedder.encode(queries)
len(encoded_queries)
# >>> 11
def create_query_table(query, queries, encoded_queries, extra_params=None):
"""
Creates a query table.
"""
results_list = []
for i, encoded_query in enumerate(encoded_queries):
result_docs = (
client.ft("idx:bikes_vss")
.search(
query,
{"query_vector": np.array(encoded_query, dtype=np.float32).tobytes()}
| (extra_params if extra_params else {}),
)
.docs
)
for doc in result_docs:
vector_score = round(1 - float(doc.vector_score), 2)
results_list.append(
{
"query": queries[i],
"score": vector_score,
"id": doc.id,
"brand": doc.brand,
"model": doc.model,
"description": doc.description,
}
)
# Optional: convert the table to Markdown using Pandas
queries_table = pd.DataFrame(results_list)
queries_table.sort_values(
by=["query", "score"], ascending=[True, False], inplace=True
)
queries_table["query"] = queries_table.groupby("query")["query"].transform(
lambda x: [x.iloc[0]] + [""] * (len(x) - 1)
)
queries_table["description"] = queries_table["description"].apply(
lambda x: (x[:497] + "...") if len(x) > 500 else x
)
return queries_table.to_markdown(index=False)
query = (
Query("(*)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
table = create_query_table(query, queries, encoded_queries)
print(table)
# >>> | Best Mountain bikes for kids | 0.54 | bikes:003...
hybrid_query = (
Query("(@brand:Peaknetic)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
table = create_query_table(hybrid_query, queries, encoded_queries)
print(table)
# >>> | Best Mountain bikes for kids | 0.3 | bikes:008...
range_query = (
Query(
"@vector:[VECTOR_RANGE $range $query_vector]=>"
"{$YIELD_DISTANCE_AS: vector_score}"
)
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.paging(0, 4)
.dialect(2)
)
table = create_query_table(
range_query, queries[:1],
encoded_queries[:1],
{"range": 0.55}
)
print(table)
# >>> | Bike for small kids | 0.52 | bikes:001 | Velorim |...
将键作为输入传递给 JSON.MGET 命令,以及 $.description
字段,以将描述收集到列表中。然后,将描述列表传递给 .encode()
方法
"""
Code samples for vector database quickstart pages:
https://redis.ac.cn/docs/latest/develop/get-started/vector-database/
"""
import json
import time
import numpy as np
import pandas as pd
import requests
import redis
from redis.commands.search.field import (
NumericField,
TagField,
TextField,
VectorField,
)
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.query import Query
from sentence_transformers import SentenceTransformer
URL = ("https://raw.githubusercontent.com/bsbodden/redis_vss_getting_started"
"/main/data/bikes.json"
)
response = requests.get(URL, timeout=10)
bikes = response.json()
json.dumps(bikes[0], indent=2)
client = redis.Redis(host="localhost", port=6379, decode_responses=True)
res = client.ping()
# >>> True
pipeline = client.pipeline()
for i, bike in enumerate(bikes, start=1):
redis_key = f"bikes:{i:03}"
pipeline.json().set(redis_key, "$", bike)
res = pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
res = client.json().get("bikes:010", "$.model")
# >>> ['Summit']
keys = sorted(client.keys("bikes:*"))
# >>> ['bikes:001', 'bikes:002', ..., 'bikes:011']
descriptions = client.json().mget(keys, "$.description")
descriptions = [item for sublist in descriptions for item in sublist]
embedder = SentenceTransformer("msmarco-distilbert-base-v4")
embeddings = embedder.encode(descriptions).astype(np.float32).tolist()
VECTOR_DIMENSION = len(embeddings[0])
# >>> 768
pipeline = client.pipeline()
for key, embedding in zip(keys, embeddings):
pipeline.json().set(key, "$.description_embeddings", embedding)
pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
res = client.json().get("bikes:010")
# >>>
# {
# "model": "Summit",
# "brand": "nHill",
# "price": 1200,
# "type": "Mountain Bike",
# "specs": {
# "material": "alloy",
# "weight": "11.3"
# },
# "description": "This budget mountain bike from nHill performs well..."
# "description_embeddings": [
# -0.538114607334137,
# -0.49465855956077576,
# -0.025176964700222015,
# ...
# ]
# }
schema = (
TextField("$.model", no_stem=True, as_name="model"),
TextField("$.brand", no_stem=True, as_name="brand"),
NumericField("$.price", as_name="price"),
TagField("$.type", as_name="type"),
TextField("$.description", as_name="description"),
VectorField(
"$.description_embeddings",
"FLAT",
{
"TYPE": "FLOAT32",
"DIM": VECTOR_DIMENSION,
"DISTANCE_METRIC": "COSINE",
},
as_name="vector",
),
)
definition = IndexDefinition(prefix=["bikes:"], index_type=IndexType.JSON)
res = client.ft("idx:bikes_vss").create_index(fields=schema, definition=definition)
# >>> 'OK'
info = client.ft("idx:bikes_vss").info()
num_docs = info["num_docs"]
indexing_failures = info["hash_indexing_failures"]
# print(f"{num_docs} documents indexed with {indexing_failures} failures")
# >>> 11 documents indexed with 0 failures
query = Query("@brand:Peaknetic")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:008',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Soothe Electric bike',
# 'price': '1950', 'description_embeddings': ...
query = Query("@brand:Peaknetic").return_fields("id", "brand", "model", "price")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:008',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Soothe Electric bike',
# 'price': '1950'
# },
# Document {
# 'id': 'bikes:009',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Secto',
# 'price': '430'
# }
# ]
query = Query("@brand:Peaknetic @price:[0 1000]").return_fields(
"id", "brand", "model", "price"
)
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:009',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Secto',
# 'price': '430'
# }
# ]
queries = [
"Bike for small kids",
"Best Mountain bikes for kids",
"Cheap Mountain bike for kids",
"Female specific mountain bike",
"Road bike for beginners",
"Commuter bike for people over 60",
"Comfortable commuter bike",
"Good bike for college students",
"Mountain bike for beginners",
"Vintage bike",
"Comfortable city bike",
]
encoded_queries = embedder.encode(queries)
len(encoded_queries)
# >>> 11
def create_query_table(query, queries, encoded_queries, extra_params=None):
"""
Creates a query table.
"""
results_list = []
for i, encoded_query in enumerate(encoded_queries):
result_docs = (
client.ft("idx:bikes_vss")
.search(
query,
{"query_vector": np.array(encoded_query, dtype=np.float32).tobytes()}
| (extra_params if extra_params else {}),
)
.docs
)
for doc in result_docs:
vector_score = round(1 - float(doc.vector_score), 2)
results_list.append(
{
"query": queries[i],
"score": vector_score,
"id": doc.id,
"brand": doc.brand,
"model": doc.model,
"description": doc.description,
}
)
# Optional: convert the table to Markdown using Pandas
queries_table = pd.DataFrame(results_list)
queries_table.sort_values(
by=["query", "score"], ascending=[True, False], inplace=True
)
queries_table["query"] = queries_table.groupby("query")["query"].transform(
lambda x: [x.iloc[0]] + [""] * (len(x) - 1)
)
queries_table["description"] = queries_table["description"].apply(
lambda x: (x[:497] + "...") if len(x) > 500 else x
)
return queries_table.to_markdown(index=False)
query = (
Query("(*)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
table = create_query_table(query, queries, encoded_queries)
print(table)
# >>> | Best Mountain bikes for kids | 0.54 | bikes:003...
hybrid_query = (
Query("(@brand:Peaknetic)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
table = create_query_table(hybrid_query, queries, encoded_queries)
print(table)
# >>> | Best Mountain bikes for kids | 0.3 | bikes:008...
range_query = (
Query(
"@vector:[VECTOR_RANGE $range $query_vector]=>"
"{$YIELD_DISTANCE_AS: vector_score}"
)
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.paging(0, 4)
.dialect(2)
)
table = create_query_table(
range_query, queries[:1],
encoded_queries[:1],
{"range": 0.55}
)
print(table)
# >>> | Bike for small kids | 0.52 | bikes:001 | Velorim |...
使用 JSON.SET 命令将矢量化的描述插入到 Redis 中的自行车文档中。以下命令在 JSONPath $.description_embeddings
下将一个新字段插入到每个文档中。同样,使用管道执行此操作以避免不必要的网络往返时间
"""
Code samples for vector database quickstart pages:
https://redis.ac.cn/docs/latest/develop/get-started/vector-database/
"""
import json
import time
import numpy as np
import pandas as pd
import requests
import redis
from redis.commands.search.field import (
NumericField,
TagField,
TextField,
VectorField,
)
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.query import Query
from sentence_transformers import SentenceTransformer
URL = ("https://raw.githubusercontent.com/bsbodden/redis_vss_getting_started"
"/main/data/bikes.json"
)
response = requests.get(URL, timeout=10)
bikes = response.json()
json.dumps(bikes[0], indent=2)
client = redis.Redis(host="localhost", port=6379, decode_responses=True)
res = client.ping()
# >>> True
pipeline = client.pipeline()
for i, bike in enumerate(bikes, start=1):
redis_key = f"bikes:{i:03}"
pipeline.json().set(redis_key, "$", bike)
res = pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
res = client.json().get("bikes:010", "$.model")
# >>> ['Summit']
keys = sorted(client.keys("bikes:*"))
# >>> ['bikes:001', 'bikes:002', ..., 'bikes:011']
descriptions = client.json().mget(keys, "$.description")
descriptions = [item for sublist in descriptions for item in sublist]
embedder = SentenceTransformer("msmarco-distilbert-base-v4")
embeddings = embedder.encode(descriptions).astype(np.float32).tolist()
VECTOR_DIMENSION = len(embeddings[0])
# >>> 768
pipeline = client.pipeline()
for key, embedding in zip(keys, embeddings):
pipeline.json().set(key, "$.description_embeddings", embedding)
pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
res = client.json().get("bikes:010")
# >>>
# {
# "model": "Summit",
# "brand": "nHill",
# "price": 1200,
# "type": "Mountain Bike",
# "specs": {
# "material": "alloy",
# "weight": "11.3"
# },
# "description": "This budget mountain bike from nHill performs well..."
# "description_embeddings": [
# -0.538114607334137,
# -0.49465855956077576,
# -0.025176964700222015,
# ...
# ]
# }
schema = (
TextField("$.model", no_stem=True, as_name="model"),
TextField("$.brand", no_stem=True, as_name="brand"),
NumericField("$.price", as_name="price"),
TagField("$.type", as_name="type"),
TextField("$.description", as_name="description"),
VectorField(
"$.description_embeddings",
"FLAT",
{
"TYPE": "FLOAT32",
"DIM": VECTOR_DIMENSION,
"DISTANCE_METRIC": "COSINE",
},
as_name="vector",
),
)
definition = IndexDefinition(prefix=["bikes:"], index_type=IndexType.JSON)
res = client.ft("idx:bikes_vss").create_index(fields=schema, definition=definition)
# >>> 'OK'
info = client.ft("idx:bikes_vss").info()
num_docs = info["num_docs"]
indexing_failures = info["hash_indexing_failures"]
# print(f"{num_docs} documents indexed with {indexing_failures} failures")
# >>> 11 documents indexed with 0 failures
query = Query("@brand:Peaknetic")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:008',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Soothe Electric bike',
# 'price': '1950', 'description_embeddings': ...
query = Query("@brand:Peaknetic").return_fields("id", "brand", "model", "price")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:008',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Soothe Electric bike',
# 'price': '1950'
# },
# Document {
# 'id': 'bikes:009',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Secto',
# 'price': '430'
# }
# ]
query = Query("@brand:Peaknetic @price:[0 1000]").return_fields(
"id", "brand", "model", "price"
)
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:009',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Secto',
# 'price': '430'
# }
# ]
queries = [
"Bike for small kids",
"Best Mountain bikes for kids",
"Cheap Mountain bike for kids",
"Female specific mountain bike",
"Road bike for beginners",
"Commuter bike for people over 60",
"Comfortable commuter bike",
"Good bike for college students",
"Mountain bike for beginners",
"Vintage bike",
"Comfortable city bike",
]
encoded_queries = embedder.encode(queries)
len(encoded_queries)
# >>> 11
def create_query_table(query, queries, encoded_queries, extra_params=None):
"""
Creates a query table.
"""
results_list = []
for i, encoded_query in enumerate(encoded_queries):
result_docs = (
client.ft("idx:bikes_vss")
.search(
query,
{"query_vector": np.array(encoded_query, dtype=np.float32).tobytes()}
| (extra_params if extra_params else {}),
)
.docs
)
for doc in result_docs:
vector_score = round(1 - float(doc.vector_score), 2)
results_list.append(
{
"query": queries[i],
"score": vector_score,
"id": doc.id,
"brand": doc.brand,
"model": doc.model,
"description": doc.description,
}
)
# Optional: convert the table to Markdown using Pandas
queries_table = pd.DataFrame(results_list)
queries_table.sort_values(
by=["query", "score"], ascending=[True, False], inplace=True
)
queries_table["query"] = queries_table.groupby("query")["query"].transform(
lambda x: [x.iloc[0]] + [""] * (len(x) - 1)
)
queries_table["description"] = queries_table["description"].apply(
lambda x: (x[:497] + "...") if len(x) > 500 else x
)
return queries_table.to_markdown(index=False)
query = (
Query("(*)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
table = create_query_table(query, queries, encoded_queries)
print(table)
# >>> | Best Mountain bikes for kids | 0.54 | bikes:003...
hybrid_query = (
Query("(@brand:Peaknetic)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
table = create_query_table(hybrid_query, queries, encoded_queries)
print(table)
# >>> | Best Mountain bikes for kids | 0.3 | bikes:008...
range_query = (
Query(
"@vector:[VECTOR_RANGE $range $query_vector]=>"
"{$YIELD_DISTANCE_AS: vector_score}"
)
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.paging(0, 4)
.dialect(2)
)
table = create_query_table(
range_query, queries[:1],
encoded_queries[:1],
{"range": 0.55}
)
print(table)
# >>> | Bike for small kids | 0.52 | bikes:001 | Velorim |...
使用 JSON.GET 命令检查其中一个更新后的自行车文档
"""
Code samples for vector database quickstart pages:
https://redis.ac.cn/docs/latest/develop/get-started/vector-database/
"""
import json
import time
import numpy as np
import pandas as pd
import requests
import redis
from redis.commands.search.field import (
NumericField,
TagField,
TextField,
VectorField,
)
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.query import Query
from sentence_transformers import SentenceTransformer
URL = ("https://raw.githubusercontent.com/bsbodden/redis_vss_getting_started"
"/main/data/bikes.json"
)
response = requests.get(URL, timeout=10)
bikes = response.json()
json.dumps(bikes[0], indent=2)
client = redis.Redis(host="localhost", port=6379, decode_responses=True)
res = client.ping()
# >>> True
pipeline = client.pipeline()
for i, bike in enumerate(bikes, start=1):
redis_key = f"bikes:{i:03}"
pipeline.json().set(redis_key, "$", bike)
res = pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
res = client.json().get("bikes:010", "$.model")
# >>> ['Summit']
keys = sorted(client.keys("bikes:*"))
# >>> ['bikes:001', 'bikes:002', ..., 'bikes:011']
descriptions = client.json().mget(keys, "$.description")
descriptions = [item for sublist in descriptions for item in sublist]
embedder = SentenceTransformer("msmarco-distilbert-base-v4")
embeddings = embedder.encode(descriptions).astype(np.float32).tolist()
VECTOR_DIMENSION = len(embeddings[0])
# >>> 768
pipeline = client.pipeline()
for key, embedding in zip(keys, embeddings):
pipeline.json().set(key, "$.description_embeddings", embedding)
pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
res = client.json().get("bikes:010")
# >>>
# {
# "model": "Summit",
# "brand": "nHill",
# "price": 1200,
# "type": "Mountain Bike",
# "specs": {
# "material": "alloy",
# "weight": "11.3"
# },
# "description": "This budget mountain bike from nHill performs well..."
# "description_embeddings": [
# -0.538114607334137,
# -0.49465855956077576,
# -0.025176964700222015,
# ...
# ]
# }
schema = (
TextField("$.model", no_stem=True, as_name="model"),
TextField("$.brand", no_stem=True, as_name="brand"),
NumericField("$.price", as_name="price"),
TagField("$.type", as_name="type"),
TextField("$.description", as_name="description"),
VectorField(
"$.description_embeddings",
"FLAT",
{
"TYPE": "FLOAT32",
"DIM": VECTOR_DIMENSION,
"DISTANCE_METRIC": "COSINE",
},
as_name="vector",
),
)
definition = IndexDefinition(prefix=["bikes:"], index_type=IndexType.JSON)
res = client.ft("idx:bikes_vss").create_index(fields=schema, definition=definition)
# >>> 'OK'
info = client.ft("idx:bikes_vss").info()
num_docs = info["num_docs"]
indexing_failures = info["hash_indexing_failures"]
# print(f"{num_docs} documents indexed with {indexing_failures} failures")
# >>> 11 documents indexed with 0 failures
query = Query("@brand:Peaknetic")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:008',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Soothe Electric bike',
# 'price': '1950', 'description_embeddings': ...
query = Query("@brand:Peaknetic").return_fields("id", "brand", "model", "price")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:008',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Soothe Electric bike',
# 'price': '1950'
# },
# Document {
# 'id': 'bikes:009',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Secto',
# 'price': '430'
# }
# ]
query = Query("@brand:Peaknetic @price:[0 1000]").return_fields(
"id", "brand", "model", "price"
)
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:009',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Secto',
# 'price': '430'
# }
# ]
queries = [
"Bike for small kids",
"Best Mountain bikes for kids",
"Cheap Mountain bike for kids",
"Female specific mountain bike",
"Road bike for beginners",
"Commuter bike for people over 60",
"Comfortable commuter bike",
"Good bike for college students",
"Mountain bike for beginners",
"Vintage bike",
"Comfortable city bike",
]
encoded_queries = embedder.encode(queries)
len(encoded_queries)
# >>> 11
def create_query_table(query, queries, encoded_queries, extra_params=None):
"""
Creates a query table.
"""
results_list = []
for i, encoded_query in enumerate(encoded_queries):
result_docs = (
client.ft("idx:bikes_vss")
.search(
query,
{"query_vector": np.array(encoded_query, dtype=np.float32).tobytes()}
| (extra_params if extra_params else {}),
)
.docs
)
for doc in result_docs:
vector_score = round(1 - float(doc.vector_score), 2)
results_list.append(
{
"query": queries[i],
"score": vector_score,
"id": doc.id,
"brand": doc.brand,
"model": doc.model,
"description": doc.description,
}
)
# Optional: convert the table to Markdown using Pandas
queries_table = pd.DataFrame(results_list)
queries_table.sort_values(
by=["query", "score"], ascending=[True, False], inplace=True
)
queries_table["query"] = queries_table.groupby("query")["query"].transform(
lambda x: [x.iloc[0]] + [""] * (len(x) - 1)
)
queries_table["description"] = queries_table["description"].apply(
lambda x: (x[:497] + "...") if len(x) > 500 else x
)
return queries_table.to_markdown(index=False)
query = (
Query("(*)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
table = create_query_table(query, queries, encoded_queries)
print(table)
# >>> | Best Mountain bikes for kids | 0.54 | bikes:003...
hybrid_query = (
Query("(@brand:Peaknetic)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
table = create_query_table(hybrid_query, queries, encoded_queries)
print(table)
# >>> | Best Mountain bikes for kids | 0.3 | bikes:008...
range_query = (
Query(
"@vector:[VECTOR_RANGE $range $query_vector]=>"
"{$YIELD_DISTANCE_AS: vector_score}"
)
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.paging(0, 4)
.dialect(2)
)
table = create_query_table(
range_query, queries[:1],
encoded_queries[:1],
{"range": 0.55}
)
print(table)
# >>> | Bike for small kids | 0.52 | bikes:001 | Velorim |...
创建索引
1. 使用向量字段创建索引
您必须创建一个索引来查询文档元数据或执行向量搜索。使用 FT.CREATE 命令
FT.CREATE idx:bikes_vss ON JSON
PREFIX 1 bikes: SCORE 1.0
SCHEMA
$.model TEXT WEIGHT 1.0 NOSTEM
$.brand TEXT WEIGHT 1.0 NOSTEM
$.price NUMERIC
$.type TAG SEPARATOR ","
$.description AS description TEXT WEIGHT 1.0
$.description_embeddings AS vector VECTOR FLAT 6 TYPE FLOAT32 DIM 768 DISTANCE_METRIC COSINE
"""
Code samples for vector database quickstart pages:
https://redis.ac.cn/docs/latest/develop/get-started/vector-database/
"""
import json
import time
import numpy as np
import pandas as pd
import requests
import redis
from redis.commands.search.field import (
NumericField,
TagField,
TextField,
VectorField,
)
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.query import Query
from sentence_transformers import SentenceTransformer
URL = ("https://raw.githubusercontent.com/bsbodden/redis_vss_getting_started"
"/main/data/bikes.json"
)
response = requests.get(URL, timeout=10)
bikes = response.json()
json.dumps(bikes[0], indent=2)
client = redis.Redis(host="localhost", port=6379, decode_responses=True)
res = client.ping()
# >>> True
pipeline = client.pipeline()
for i, bike in enumerate(bikes, start=1):
redis_key = f"bikes:{i:03}"
pipeline.json().set(redis_key, "$", bike)
res = pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
res = client.json().get("bikes:010", "$.model")
# >>> ['Summit']
keys = sorted(client.keys("bikes:*"))
# >>> ['bikes:001', 'bikes:002', ..., 'bikes:011']
descriptions = client.json().mget(keys, "$.description")
descriptions = [item for sublist in descriptions for item in sublist]
embedder = SentenceTransformer("msmarco-distilbert-base-v4")
embeddings = embedder.encode(descriptions).astype(np.float32).tolist()
VECTOR_DIMENSION = len(embeddings[0])
# >>> 768
pipeline = client.pipeline()
for key, embedding in zip(keys, embeddings):
pipeline.json().set(key, "$.description_embeddings", embedding)
pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
res = client.json().get("bikes:010")
# >>>
# {
# "model": "Summit",
# "brand": "nHill",
# "price": 1200,
# "type": "Mountain Bike",
# "specs": {
# "material": "alloy",
# "weight": "11.3"
# },
# "description": "This budget mountain bike from nHill performs well..."
# "description_embeddings": [
# -0.538114607334137,
# -0.49465855956077576,
# -0.025176964700222015,
# ...
# ]
# }
schema = (
TextField("$.model", no_stem=True, as_name="model"),
TextField("$.brand", no_stem=True, as_name="brand"),
NumericField("$.price", as_name="price"),
TagField("$.type", as_name="type"),
TextField("$.description", as_name="description"),
VectorField(
"$.description_embeddings",
"FLAT",
{
"TYPE": "FLOAT32",
"DIM": VECTOR_DIMENSION,
"DISTANCE_METRIC": "COSINE",
},
as_name="vector",
),
)
definition = IndexDefinition(prefix=["bikes:"], index_type=IndexType.JSON)
res = client.ft("idx:bikes_vss").create_index(fields=schema, definition=definition)
# >>> 'OK'
info = client.ft("idx:bikes_vss").info()
num_docs = info["num_docs"]
indexing_failures = info["hash_indexing_failures"]
# print(f"{num_docs} documents indexed with {indexing_failures} failures")
# >>> 11 documents indexed with 0 failures
query = Query("@brand:Peaknetic")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:008',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Soothe Electric bike',
# 'price': '1950', 'description_embeddings': ...
query = Query("@brand:Peaknetic").return_fields("id", "brand", "model", "price")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:008',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Soothe Electric bike',
# 'price': '1950'
# },
# Document {
# 'id': 'bikes:009',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Secto',
# 'price': '430'
# }
# ]
query = Query("@brand:Peaknetic @price:[0 1000]").return_fields(
"id", "brand", "model", "price"
)
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:009',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Secto',
# 'price': '430'
# }
# ]
queries = [
"Bike for small kids",
"Best Mountain bikes for kids",
"Cheap Mountain bike for kids",
"Female specific mountain bike",
"Road bike for beginners",
"Commuter bike for people over 60",
"Comfortable commuter bike",
"Good bike for college students",
"Mountain bike for beginners",
"Vintage bike",
"Comfortable city bike",
]
encoded_queries = embedder.encode(queries)
len(encoded_queries)
# >>> 11
def create_query_table(query, queries, encoded_queries, extra_params=None):
"""
Creates a query table.
"""
results_list = []
for i, encoded_query in enumerate(encoded_queries):
result_docs = (
client.ft("idx:bikes_vss")
.search(
query,
{"query_vector": np.array(encoded_query, dtype=np.float32).tobytes()}
| (extra_params if extra_params else {}),
)
.docs
)
for doc in result_docs:
vector_score = round(1 - float(doc.vector_score), 2)
results_list.append(
{
"query": queries[i],
"score": vector_score,
"id": doc.id,
"brand": doc.brand,
"model": doc.model,
"description": doc.description,
}
)
# Optional: convert the table to Markdown using Pandas
queries_table = pd.DataFrame(results_list)
queries_table.sort_values(
by=["query", "score"], ascending=[True, False], inplace=True
)
queries_table["query"] = queries_table.groupby("query")["query"].transform(
lambda x: [x.iloc[0]] + [""] * (len(x) - 1)
)
queries_table["description"] = queries_table["description"].apply(
lambda x: (x[:497] + "...") if len(x) > 500 else x
)
return queries_table.to_markdown(index=False)
query = (
Query("(*)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
table = create_query_table(query, queries, encoded_queries)
print(table)
# >>> | Best Mountain bikes for kids | 0.54 | bikes:003...
hybrid_query = (
Query("(@brand:Peaknetic)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
table = create_query_table(hybrid_query, queries, encoded_queries)
print(table)
# >>> | Best Mountain bikes for kids | 0.3 | bikes:008...
range_query = (
Query(
"@vector:[VECTOR_RANGE $range $query_vector]=>"
"{$YIELD_DISTANCE_AS: vector_score}"
)
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.paging(0, 4)
.dialect(2)
)
table = create_query_table(
range_query, queries[:1],
encoded_queries[:1],
{"range": 0.55}
)
print(table)
# >>> | Bike for small kids | 0.52 | bikes:001 | Velorim |...
以下是 VECTOR
字段定义的细分
$.description_embeddings AS vector
:向量字段的 JSON 路径及其字段别名vector
。FLAT
:指定索引方法,可以是平面索引或分层可导航的小世界图 (HNSW)。TYPE FLOAT32
:设置向量组件的浮点精度,在本例中为 32 位浮点数。DIM 768
:嵌入的长度或维度,由所选嵌入模型确定。DISTANCE_METRIC COSINE
:所选距离函数:余弦距离。
您可以在 向量参考文档 中找到有关所有这些选项的更多详细信息。
2. 检查索引的状态
执行 FT.CREATE 命令后,索引过程将在后台运行。很快,所有 JSON 文档都应该被索引并可以被查询。要验证这一点,您可以使用 FT.INFO 命令,该命令提供有关索引的详细信息和统计信息。特别要注意的是成功索引的文档数量和失败的文档数量
FT.INFO idx:bikes_vss
"""
Code samples for vector database quickstart pages:
https://redis.ac.cn/docs/latest/develop/get-started/vector-database/
"""
import json
import time
import numpy as np
import pandas as pd
import requests
import redis
from redis.commands.search.field import (
NumericField,
TagField,
TextField,
VectorField,
)
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.query import Query
from sentence_transformers import SentenceTransformer
URL = ("https://raw.githubusercontent.com/bsbodden/redis_vss_getting_started"
"/main/data/bikes.json"
)
response = requests.get(URL, timeout=10)
bikes = response.json()
json.dumps(bikes[0], indent=2)
client = redis.Redis(host="localhost", port=6379, decode_responses=True)
res = client.ping()
# >>> True
pipeline = client.pipeline()
for i, bike in enumerate(bikes, start=1):
redis_key = f"bikes:{i:03}"
pipeline.json().set(redis_key, "$", bike)
res = pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
res = client.json().get("bikes:010", "$.model")
# >>> ['Summit']
keys = sorted(client.keys("bikes:*"))
# >>> ['bikes:001', 'bikes:002', ..., 'bikes:011']
descriptions = client.json().mget(keys, "$.description")
descriptions = [item for sublist in descriptions for item in sublist]
embedder = SentenceTransformer("msmarco-distilbert-base-v4")
embeddings = embedder.encode(descriptions).astype(np.float32).tolist()
VECTOR_DIMENSION = len(embeddings[0])
# >>> 768
pipeline = client.pipeline()
for key, embedding in zip(keys, embeddings):
pipeline.json().set(key, "$.description_embeddings", embedding)
pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
res = client.json().get("bikes:010")
# >>>
# {
# "model": "Summit",
# "brand": "nHill",
# "price": 1200,
# "type": "Mountain Bike",
# "specs": {
# "material": "alloy",
# "weight": "11.3"
# },
# "description": "This budget mountain bike from nHill performs well..."
# "description_embeddings": [
# -0.538114607334137,
# -0.49465855956077576,
# -0.025176964700222015,
# ...
# ]
# }
schema = (
TextField("$.model", no_stem=True, as_name="model"),
TextField("$.brand", no_stem=True, as_name="brand"),
NumericField("$.price", as_name="price"),
TagField("$.type", as_name="type"),
TextField("$.description", as_name="description"),
VectorField(
"$.description_embeddings",
"FLAT",
{
"TYPE": "FLOAT32",
"DIM": VECTOR_DIMENSION,
"DISTANCE_METRIC": "COSINE",
},
as_name="vector",
),
)
definition = IndexDefinition(prefix=["bikes:"], index_type=IndexType.JSON)
res = client.ft("idx:bikes_vss").create_index(fields=schema, definition=definition)
# >>> 'OK'
info = client.ft("idx:bikes_vss").info()
num_docs = info["num_docs"]
indexing_failures = info["hash_indexing_failures"]
# print(f"{num_docs} documents indexed with {indexing_failures} failures")
# >>> 11 documents indexed with 0 failures
query = Query("@brand:Peaknetic")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:008',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Soothe Electric bike',
# 'price': '1950', 'description_embeddings': ...
query = Query("@brand:Peaknetic").return_fields("id", "brand", "model", "price")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:008',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Soothe Electric bike',
# 'price': '1950'
# },
# Document {
# 'id': 'bikes:009',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Secto',
# 'price': '430'
# }
# ]
query = Query("@brand:Peaknetic @price:[0 1000]").return_fields(
"id", "brand", "model", "price"
)
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:009',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Secto',
# 'price': '430'
# }
# ]
queries = [
"Bike for small kids",
"Best Mountain bikes for kids",
"Cheap Mountain bike for kids",
"Female specific mountain bike",
"Road bike for beginners",
"Commuter bike for people over 60",
"Comfortable commuter bike",
"Good bike for college students",
"Mountain bike for beginners",
"Vintage bike",
"Comfortable city bike",
]
encoded_queries = embedder.encode(queries)
len(encoded_queries)
# >>> 11
def create_query_table(query, queries, encoded_queries, extra_params=None):
"""
Creates a query table.
"""
results_list = []
for i, encoded_query in enumerate(encoded_queries):
result_docs = (
client.ft("idx:bikes_vss")
.search(
query,
{"query_vector": np.array(encoded_query, dtype=np.float32).tobytes()}
| (extra_params if extra_params else {}),
)
.docs
)
for doc in result_docs:
vector_score = round(1 - float(doc.vector_score), 2)
results_list.append(
{
"query": queries[i],
"score": vector_score,
"id": doc.id,
"brand": doc.brand,
"model": doc.model,
"description": doc.description,
}
)
# Optional: convert the table to Markdown using Pandas
queries_table = pd.DataFrame(results_list)
queries_table.sort_values(
by=["query", "score"], ascending=[True, False], inplace=True
)
queries_table["query"] = queries_table.groupby("query")["query"].transform(
lambda x: [x.iloc[0]] + [""] * (len(x) - 1)
)
queries_table["description"] = queries_table["description"].apply(
lambda x: (x[:497] + "...") if len(x) > 500 else x
)
return queries_table.to_markdown(index=False)
query = (
Query("(*)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
table = create_query_table(query, queries, encoded_queries)
print(table)
# >>> | Best Mountain bikes for kids | 0.54 | bikes:003...
hybrid_query = (
Query("(@brand:Peaknetic)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
table = create_query_table(hybrid_query, queries, encoded_queries)
print(table)
# >>> | Best Mountain bikes for kids | 0.3 | bikes:008...
range_query = (
Query(
"@vector:[VECTOR_RANGE $range $query_vector]=>"
"{$YIELD_DISTANCE_AS: vector_score}"
)
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.paging(0, 4)
.dialect(2)
)
table = create_query_table(
range_query, queries[:1],
encoded_queries[:1],
{"range": 0.55}
)
print(table)
# >>> | Bike for small kids | 0.52 | bikes:001 | Velorim |...
执行向量搜索
本快速入门指南重点介绍向量搜索。但是,您可以在 文档数据库快速入门指南 中了解有关如何根据文档元数据进行查询的更多信息。
1. 嵌入您的查询
以下代码段显示了您将用于在 Redis 中执行向量搜索的文本查询列表
"""
Code samples for vector database quickstart pages:
https://redis.ac.cn/docs/latest/develop/get-started/vector-database/
"""
import json
import time
import numpy as np
import pandas as pd
import requests
import redis
from redis.commands.search.field import (
NumericField,
TagField,
TextField,
VectorField,
)
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.query import Query
from sentence_transformers import SentenceTransformer
URL = ("https://raw.githubusercontent.com/bsbodden/redis_vss_getting_started"
"/main/data/bikes.json"
)
response = requests.get(URL, timeout=10)
bikes = response.json()
json.dumps(bikes[0], indent=2)
client = redis.Redis(host="localhost", port=6379, decode_responses=True)
res = client.ping()
# >>> True
pipeline = client.pipeline()
for i, bike in enumerate(bikes, start=1):
redis_key = f"bikes:{i:03}"
pipeline.json().set(redis_key, "$", bike)
res = pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
res = client.json().get("bikes:010", "$.model")
# >>> ['Summit']
keys = sorted(client.keys("bikes:*"))
# >>> ['bikes:001', 'bikes:002', ..., 'bikes:011']
descriptions = client.json().mget(keys, "$.description")
descriptions = [item for sublist in descriptions for item in sublist]
embedder = SentenceTransformer("msmarco-distilbert-base-v4")
embeddings = embedder.encode(descriptions).astype(np.float32).tolist()
VECTOR_DIMENSION = len(embeddings[0])
# >>> 768
pipeline = client.pipeline()
for key, embedding in zip(keys, embeddings):
pipeline.json().set(key, "$.description_embeddings", embedding)
pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
res = client.json().get("bikes:010")
# >>>
# {
# "model": "Summit",
# "brand": "nHill",
# "price": 1200,
# "type": "Mountain Bike",
# "specs": {
# "material": "alloy",
# "weight": "11.3"
# },
# "description": "This budget mountain bike from nHill performs well..."
# "description_embeddings": [
# -0.538114607334137,
# -0.49465855956077576,
# -0.025176964700222015,
# ...
# ]
# }
schema = (
TextField("$.model", no_stem=True, as_name="model"),
TextField("$.brand", no_stem=True, as_name="brand"),
NumericField("$.price", as_name="price"),
TagField("$.type", as_name="type"),
TextField("$.description", as_name="description"),
VectorField(
"$.description_embeddings",
"FLAT",
{
"TYPE": "FLOAT32",
"DIM": VECTOR_DIMENSION,
"DISTANCE_METRIC": "COSINE",
},
as_name="vector",
),
)
definition = IndexDefinition(prefix=["bikes:"], index_type=IndexType.JSON)
res = client.ft("idx:bikes_vss").create_index(fields=schema, definition=definition)
# >>> 'OK'
info = client.ft("idx:bikes_vss").info()
num_docs = info["num_docs"]
indexing_failures = info["hash_indexing_failures"]
# print(f"{num_docs} documents indexed with {indexing_failures} failures")
# >>> 11 documents indexed with 0 failures
query = Query("@brand:Peaknetic")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:008',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Soothe Electric bike',
# 'price': '1950', 'description_embeddings': ...
query = Query("@brand:Peaknetic").return_fields("id", "brand", "model", "price")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:008',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Soothe Electric bike',
# 'price': '1950'
# },
# Document {
# 'id': 'bikes:009',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Secto',
# 'price': '430'
# }
# ]
query = Query("@brand:Peaknetic @price:[0 1000]").return_fields(
"id", "brand", "model", "price"
)
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:009',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Secto',
# 'price': '430'
# }
# ]
queries = [
"Bike for small kids",
"Best Mountain bikes for kids",
"Cheap Mountain bike for kids",
"Female specific mountain bike",
"Road bike for beginners",
"Commuter bike for people over 60",
"Comfortable commuter bike",
"Good bike for college students",
"Mountain bike for beginners",
"Vintage bike",
"Comfortable city bike",
]
encoded_queries = embedder.encode(queries)
len(encoded_queries)
# >>> 11
def create_query_table(query, queries, encoded_queries, extra_params=None):
"""
Creates a query table.
"""
results_list = []
for i, encoded_query in enumerate(encoded_queries):
result_docs = (
client.ft("idx:bikes_vss")
.search(
query,
{"query_vector": np.array(encoded_query, dtype=np.float32).tobytes()}
| (extra_params if extra_params else {}),
)
.docs
)
for doc in result_docs:
vector_score = round(1 - float(doc.vector_score), 2)
results_list.append(
{
"query": queries[i],
"score": vector_score,
"id": doc.id,
"brand": doc.brand,
"model": doc.model,
"description": doc.description,
}
)
# Optional: convert the table to Markdown using Pandas
queries_table = pd.DataFrame(results_list)
queries_table.sort_values(
by=["query", "score"], ascending=[True, False], inplace=True
)
queries_table["query"] = queries_table.groupby("query")["query"].transform(
lambda x: [x.iloc[0]] + [""] * (len(x) - 1)
)
queries_table["description"] = queries_table["description"].apply(
lambda x: (x[:497] + "...") if len(x) > 500 else x
)
return queries_table.to_markdown(index=False)
query = (
Query("(*)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
table = create_query_table(query, queries, encoded_queries)
print(table)
# >>> | Best Mountain bikes for kids | 0.54 | bikes:003...
hybrid_query = (
Query("(@brand:Peaknetic)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
table = create_query_table(hybrid_query, queries, encoded_queries)
print(table)
# >>> | Best Mountain bikes for kids | 0.3 | bikes:008...
range_query = (
Query(
"@vector:[VECTOR_RANGE $range $query_vector]=>"
"{$YIELD_DISTANCE_AS: vector_score}"
)
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.paging(0, 4)
.dialect(2)
)
table = create_query_table(
range_query, queries[:1],
encoded_queries[:1],
{"range": 0.55}
)
print(table)
# >>> | Bike for small kids | 0.52 | bikes:001 | Velorim |...
首先,使用相同的 SentenceTransformers 模型将每个输入查询编码为向量嵌入
"""
Code samples for vector database quickstart pages:
https://redis.ac.cn/docs/latest/develop/get-started/vector-database/
"""
import json
import time
import numpy as np
import pandas as pd
import requests
import redis
from redis.commands.search.field import (
NumericField,
TagField,
TextField,
VectorField,
)
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.query import Query
from sentence_transformers import SentenceTransformer
URL = ("https://raw.githubusercontent.com/bsbodden/redis_vss_getting_started"
"/main/data/bikes.json"
)
response = requests.get(URL, timeout=10)
bikes = response.json()
json.dumps(bikes[0], indent=2)
client = redis.Redis(host="localhost", port=6379, decode_responses=True)
res = client.ping()
# >>> True
pipeline = client.pipeline()
for i, bike in enumerate(bikes, start=1):
redis_key = f"bikes:{i:03}"
pipeline.json().set(redis_key, "$", bike)
res = pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
res = client.json().get("bikes:010", "$.model")
# >>> ['Summit']
keys = sorted(client.keys("bikes:*"))
# >>> ['bikes:001', 'bikes:002', ..., 'bikes:011']
descriptions = client.json().mget(keys, "$.description")
descriptions = [item for sublist in descriptions for item in sublist]
embedder = SentenceTransformer("msmarco-distilbert-base-v4")
embeddings = embedder.encode(descriptions).astype(np.float32).tolist()
VECTOR_DIMENSION = len(embeddings[0])
# >>> 768
pipeline = client.pipeline()
for key, embedding in zip(keys, embeddings):
pipeline.json().set(key, "$.description_embeddings", embedding)
pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
res = client.json().get("bikes:010")
# >>>
# {
# "model": "Summit",
# "brand": "nHill",
# "price": 1200,
# "type": "Mountain Bike",
# "specs": {
# "material": "alloy",
# "weight": "11.3"
# },
# "description": "This budget mountain bike from nHill performs well..."
# "description_embeddings": [
# -0.538114607334137,
# -0.49465855956077576,
# -0.025176964700222015,
# ...
# ]
# }
schema = (
TextField("$.model", no_stem=True, as_name="model"),
TextField("$.brand", no_stem=True, as_name="brand"),
NumericField("$.price", as_name="price"),
TagField("$.type", as_name="type"),
TextField("$.description", as_name="description"),
VectorField(
"$.description_embeddings",
"FLAT",
{
"TYPE": "FLOAT32",
"DIM": VECTOR_DIMENSION,
"DISTANCE_METRIC": "COSINE",
},
as_name="vector",
),
)
definition = IndexDefinition(prefix=["bikes:"], index_type=IndexType.JSON)
res = client.ft("idx:bikes_vss").create_index(fields=schema, definition=definition)
# >>> 'OK'
info = client.ft("idx:bikes_vss").info()
num_docs = info["num_docs"]
indexing_failures = info["hash_indexing_failures"]
# print(f"{num_docs} documents indexed with {indexing_failures} failures")
# >>> 11 documents indexed with 0 failures
query = Query("@brand:Peaknetic")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:008',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Soothe Electric bike',
# 'price': '1950', 'description_embeddings': ...
query = Query("@brand:Peaknetic").return_fields("id", "brand", "model", "price")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:008',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Soothe Electric bike',
# 'price': '1950'
# },
# Document {
# 'id': 'bikes:009',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Secto',
# 'price': '430'
# }
# ]
query = Query("@brand:Peaknetic @price:[0 1000]").return_fields(
"id", "brand", "model", "price"
)
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:009',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Secto',
# 'price': '430'
# }
# ]
queries = [
"Bike for small kids",
"Best Mountain bikes for kids",
"Cheap Mountain bike for kids",
"Female specific mountain bike",
"Road bike for beginners",
"Commuter bike for people over 60",
"Comfortable commuter bike",
"Good bike for college students",
"Mountain bike for beginners",
"Vintage bike",
"Comfortable city bike",
]
encoded_queries = embedder.encode(queries)
len(encoded_queries)
# >>> 11
def create_query_table(query, queries, encoded_queries, extra_params=None):
"""
Creates a query table.
"""
results_list = []
for i, encoded_query in enumerate(encoded_queries):
result_docs = (
client.ft("idx:bikes_vss")
.search(
query,
{"query_vector": np.array(encoded_query, dtype=np.float32).tobytes()}
| (extra_params if extra_params else {}),
)
.docs
)
for doc in result_docs:
vector_score = round(1 - float(doc.vector_score), 2)
results_list.append(
{
"query": queries[i],
"score": vector_score,
"id": doc.id,
"brand": doc.brand,
"model": doc.model,
"description": doc.description,
}
)
# Optional: convert the table to Markdown using Pandas
queries_table = pd.DataFrame(results_list)
queries_table.sort_values(
by=["query", "score"], ascending=[True, False], inplace=True
)
queries_table["query"] = queries_table.groupby("query")["query"].transform(
lambda x: [x.iloc[0]] + [""] * (len(x) - 1)
)
queries_table["description"] = queries_table["description"].apply(
lambda x: (x[:497] + "...") if len(x) > 500 else x
)
return queries_table.to_markdown(index=False)
query = (
Query("(*)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
table = create_query_table(query, queries, encoded_queries)
print(table)
# >>> | Best Mountain bikes for kids | 0.54 | bikes:003...
hybrid_query = (
Query("(@brand:Peaknetic)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
table = create_query_table(hybrid_query, queries, encoded_queries)
print(table)
# >>> | Best Mountain bikes for kids | 0.3 | bikes:008...
range_query = (
Query(
"@vector:[VECTOR_RANGE $range $query_vector]=>"
"{$YIELD_DISTANCE_AS: vector_score}"
)
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.paging(0, 4)
.dialect(2)
)
table = create_query_table(
range_query, queries[:1],
encoded_queries[:1],
{"range": 0.55}
)
print(table)
# >>> | Bike for small kids | 0.52 | bikes:001 | Velorim |...
2. K 最近邻 (KNN) 搜索
KNN 算法根据所选距离函数计算查询向量与 Redis 中每个向量之间的距离。然后,它返回与查询向量距离最小的前 K 个项目。这些是最语义上相似的项目。
现在构建一个查询来执行此操作
query = (
Query('(*)=>[KNN 3 @vector $query_vector AS vector_score]')
.sort_by('vector_score')
.return_fields('vector_score', 'id', 'brand', 'model', 'description')
.dialect(2)
)
让我们分解上面的查询模板
- 过滤表达式
(*)
表示all
。换句话说,没有应用任何过滤。您可以将其替换为按其他元数据过滤的表达式。 - 查询的
KNN
部分搜索前 3 个最近邻。 - 查询向量必须作为参数
query_vector
传递。 - 到查询向量的距离将作为
vector_score
返回。 - 结果将按此
vector_score
排序。 - 最后,它返回每个结果的
vector_score
、id
、brand
、model
和description
字段。
FT.SEARCH
命令一起使用,您必须指定 DIALECT 2 或更高版本。您必须将矢量化查询作为字节数组传递,参数名称为 query_vector
。以下代码从查询向量创建 Python NumPy 数组,并将其转换为紧凑的字节级表示,可以作为参数传递给查询
client.ft('idx:bikes_vss').search(
query,
{
'query_vector': np.array(encoded_query, dtype=np.float32).tobytes()
}
).docs
有了查询模板,您就可以在循环中执行所有查询。请注意,脚本将每个结果的 vector_score
计算为 1 - doc.vector_score
。由于余弦距离用作度量,因此距离最小的项目更接近,因此与查询更相似。
然后,循环遍历匹配的文档,并创建一个结果列表,可以将其转换为 Pandas 表格以可视化结果
"""
Code samples for vector database quickstart pages:
https://redis.ac.cn/docs/latest/develop/get-started/vector-database/
"""
import json
import time
import numpy as np
import pandas as pd
import requests
import redis
from redis.commands.search.field import (
NumericField,
TagField,
TextField,
VectorField,
)
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.query import Query
from sentence_transformers import SentenceTransformer
URL = ("https://raw.githubusercontent.com/bsbodden/redis_vss_getting_started"
"/main/data/bikes.json"
)
response = requests.get(URL, timeout=10)
bikes = response.json()
json.dumps(bikes[0], indent=2)
client = redis.Redis(host="localhost", port=6379, decode_responses=True)
res = client.ping()
# >>> True
pipeline = client.pipeline()
for i, bike in enumerate(bikes, start=1):
redis_key = f"bikes:{i:03}"
pipeline.json().set(redis_key, "$", bike)
res = pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
res = client.json().get("bikes:010", "$.model")
# >>> ['Summit']
keys = sorted(client.keys("bikes:*"))
# >>> ['bikes:001', 'bikes:002', ..., 'bikes:011']
descriptions = client.json().mget(keys, "$.description")
descriptions = [item for sublist in descriptions for item in sublist]
embedder = SentenceTransformer("msmarco-distilbert-base-v4")
embeddings = embedder.encode(descriptions).astype(np.float32).tolist()
VECTOR_DIMENSION = len(embeddings[0])
# >>> 768
pipeline = client.pipeline()
for key, embedding in zip(keys, embeddings):
pipeline.json().set(key, "$.description_embeddings", embedding)
pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
res = client.json().get("bikes:010")
# >>>
# {
# "model": "Summit",
# "brand": "nHill",
# "price": 1200,
# "type": "Mountain Bike",
# "specs": {
# "material": "alloy",
# "weight": "11.3"
# },
# "description": "This budget mountain bike from nHill performs well..."
# "description_embeddings": [
# -0.538114607334137,
# -0.49465855956077576,
# -0.025176964700222015,
# ...
# ]
# }
schema = (
TextField("$.model", no_stem=True, as_name="model"),
TextField("$.brand", no_stem=True, as_name="brand"),
NumericField("$.price", as_name="price"),
TagField("$.type", as_name="type"),
TextField("$.description", as_name="description"),
VectorField(
"$.description_embeddings",
"FLAT",
{
"TYPE": "FLOAT32",
"DIM": VECTOR_DIMENSION,
"DISTANCE_METRIC": "COSINE",
},
as_name="vector",
),
)
definition = IndexDefinition(prefix=["bikes:"], index_type=IndexType.JSON)
res = client.ft("idx:bikes_vss").create_index(fields=schema, definition=definition)
# >>> 'OK'
info = client.ft("idx:bikes_vss").info()
num_docs = info["num_docs"]
indexing_failures = info["hash_indexing_failures"]
# print(f"{num_docs} documents indexed with {indexing_failures} failures")
# >>> 11 documents indexed with 0 failures
query = Query("@brand:Peaknetic")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:008',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Soothe Electric bike',
# 'price': '1950', 'description_embeddings': ...
query = Query("@brand:Peaknetic").return_fields("id", "brand", "model", "price")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:008',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Soothe Electric bike',
# 'price': '1950'
# },
# Document {
# 'id': 'bikes:009',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Secto',
# 'price': '430'
# }
# ]
query = Query("@brand:Peaknetic @price:[0 1000]").return_fields(
"id", "brand", "model", "price"
)
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:009',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Secto',
# 'price': '430'
# }
# ]
queries = [
"Bike for small kids",
"Best Mountain bikes for kids",
"Cheap Mountain bike for kids",
"Female specific mountain bike",
"Road bike for beginners",
"Commuter bike for people over 60",
"Comfortable commuter bike",
"Good bike for college students",
"Mountain bike for beginners",
"Vintage bike",
"Comfortable city bike",
]
encoded_queries = embedder.encode(queries)
len(encoded_queries)
# >>> 11
def create_query_table(query, queries, encoded_queries, extra_params=None):
"""
Creates a query table.
"""
results_list = []
for i, encoded_query in enumerate(encoded_queries):
result_docs = (
client.ft("idx:bikes_vss")
.search(
query,
{"query_vector": np.array(encoded_query, dtype=np.float32).tobytes()}
| (extra_params if extra_params else {}),
)
.docs
)
for doc in result_docs:
vector_score = round(1 - float(doc.vector_score), 2)
results_list.append(
{
"query": queries[i],
"score": vector_score,
"id": doc.id,
"brand": doc.brand,
"model": doc.model,
"description": doc.description,
}
)
# Optional: convert the table to Markdown using Pandas
queries_table = pd.DataFrame(results_list)
queries_table.sort_values(
by=["query", "score"], ascending=[True, False], inplace=True
)
queries_table["query"] = queries_table.groupby("query")["query"].transform(
lambda x: [x.iloc[0]] + [""] * (len(x) - 1)
)
queries_table["description"] = queries_table["description"].apply(
lambda x: (x[:497] + "...") if len(x) > 500 else x
)
return queries_table.to_markdown(index=False)
query = (
Query("(*)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
table = create_query_table(query, queries, encoded_queries)
print(table)
# >>> | Best Mountain bikes for kids | 0.54 | bikes:003...
hybrid_query = (
Query("(@brand:Peaknetic)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
table = create_query_table(hybrid_query, queries, encoded_queries)
print(table)
# >>> | Best Mountain bikes for kids | 0.3 | bikes:008...
range_query = (
Query(
"@vector:[VECTOR_RANGE $range $query_vector]=>"
"{$YIELD_DISTANCE_AS: vector_score}"
)
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.paging(0, 4)
.dialect(2)
)
table = create_query_table(
range_query, queries[:1],
encoded_queries[:1],
{"range": 0.55}
)
print(table)
# >>> | Bike for small kids | 0.52 | bikes:001 | Velorim |...
查询结果显示了每个查询的前三个匹配项(我们的 K 参数),以及每个查询的自行车的 id、品牌和型号。
例如,对于查询“适合儿童的最佳山地自行车”,相似度得分最高(0.54
),因此最匹配的是 'Nord' 品牌的 'Chook air 5' 车型,描述为
Chook Air 5 为 6 岁及以上的儿童提供了一款耐用且超轻的山地自行车,适合他们在轨道上的首次体验,以及在森林和田野中轻松巡航。较低的顶管使其在任何情况下都易于上下车,为您的孩子在小径上提供更大的安全性。Chook Air 5 是山地骑行的完美入门款。
从描述来看,这款自行车非常适合年龄较小的儿童,并且嵌入准确地捕捉到了描述的语义。
"""
Code samples for vector database quickstart pages:
https://redis.ac.cn/docs/latest/develop/get-started/vector-database/
"""
import json
import time
import numpy as np
import pandas as pd
import requests
import redis
from redis.commands.search.field import (
NumericField,
TagField,
TextField,
VectorField,
)
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.query import Query
from sentence_transformers import SentenceTransformer
URL = ("https://raw.githubusercontent.com/bsbodden/redis_vss_getting_started"
"/main/data/bikes.json"
)
response = requests.get(URL, timeout=10)
bikes = response.json()
json.dumps(bikes[0], indent=2)
client = redis.Redis(host="localhost", port=6379, decode_responses=True)
res = client.ping()
# >>> True
pipeline = client.pipeline()
for i, bike in enumerate(bikes, start=1):
redis_key = f"bikes:{i:03}"
pipeline.json().set(redis_key, "$", bike)
res = pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
res = client.json().get("bikes:010", "$.model")
# >>> ['Summit']
keys = sorted(client.keys("bikes:*"))
# >>> ['bikes:001', 'bikes:002', ..., 'bikes:011']
descriptions = client.json().mget(keys, "$.description")
descriptions = [item for sublist in descriptions for item in sublist]
embedder = SentenceTransformer("msmarco-distilbert-base-v4")
embeddings = embedder.encode(descriptions).astype(np.float32).tolist()
VECTOR_DIMENSION = len(embeddings[0])
# >>> 768
pipeline = client.pipeline()
for key, embedding in zip(keys, embeddings):
pipeline.json().set(key, "$.description_embeddings", embedding)
pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
res = client.json().get("bikes:010")
# >>>
# {
# "model": "Summit",
# "brand": "nHill",
# "price": 1200,
# "type": "Mountain Bike",
# "specs": {
# "material": "alloy",
# "weight": "11.3"
# },
# "description": "This budget mountain bike from nHill performs well..."
# "description_embeddings": [
# -0.538114607334137,
# -0.49465855956077576,
# -0.025176964700222015,
# ...
# ]
# }
schema = (
TextField("$.model", no_stem=True, as_name="model"),
TextField("$.brand", no_stem=True, as_name="brand"),
NumericField("$.price", as_name="price"),
TagField("$.type", as_name="type"),
TextField("$.description", as_name="description"),
VectorField(
"$.description_embeddings",
"FLAT",
{
"TYPE": "FLOAT32",
"DIM": VECTOR_DIMENSION,
"DISTANCE_METRIC": "COSINE",
},
as_name="vector",
),
)
definition = IndexDefinition(prefix=["bikes:"], index_type=IndexType.JSON)
res = client.ft("idx:bikes_vss").create_index(fields=schema, definition=definition)
# >>> 'OK'
info = client.ft("idx:bikes_vss").info()
num_docs = info["num_docs"]
indexing_failures = info["hash_indexing_failures"]
# print(f"{num_docs} documents indexed with {indexing_failures} failures")
# >>> 11 documents indexed with 0 failures
query = Query("@brand:Peaknetic")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:008',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Soothe Electric bike',
# 'price': '1950', 'description_embeddings': ...
query = Query("@brand:Peaknetic").return_fields("id", "brand", "model", "price")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:008',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Soothe Electric bike',
# 'price': '1950'
# },
# Document {
# 'id': 'bikes:009',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Secto',
# 'price': '430'
# }
# ]
query = Query("@brand:Peaknetic @price:[0 1000]").return_fields(
"id", "brand", "model", "price"
)
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [
# Document {
# 'id': 'bikes:009',
# 'payload': None,
# 'brand': 'Peaknetic',
# 'model': 'Secto',
# 'price': '430'
# }
# ]
queries = [
"Bike for small kids",
"Best Mountain bikes for kids",
"Cheap Mountain bike for kids",
"Female specific mountain bike",
"Road bike for beginners",
"Commuter bike for people over 60",
"Comfortable commuter bike",
"Good bike for college students",
"Mountain bike for beginners",
"Vintage bike",
"Comfortable city bike",
]
encoded_queries = embedder.encode(queries)
len(encoded_queries)
# >>> 11
def create_query_table(query, queries, encoded_queries, extra_params=None):
"""
Creates a query table.
"""
results_list = []
for i, encoded_query in enumerate(encoded_queries):
result_docs = (
client.ft("idx:bikes_vss")
.search(
query,
{"query_vector": np.array(encoded_query, dtype=np.float32).tobytes()}
| (extra_params if extra_params else {}),
)
.docs
)
for doc in result_docs:
vector_score = round(1 - float(doc.vector_score), 2)
results_list.append(
{
"query": queries[i],
"score": vector_score,
"id": doc.id,
"brand": doc.brand,
"model": doc.model,
"description": doc.description,
}
)
# Optional: convert the table to Markdown using Pandas
queries_table = pd.DataFrame(results_list)
queries_table.sort_values(
by=["query", "score"], ascending=[True, False], inplace=True
)
queries_table["query"] = queries_table.groupby("query")["query"].transform(
lambda x: [x.iloc[0]] + [""] * (len(x) - 1)
)
queries_table["description"] = queries_table["description"].apply(
lambda x: (x[:497] + "...") if len(x) > 500 else x
)
return queries_table.to_markdown(index=False)
query = (
Query("(*)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
table = create_query_table(query, queries, encoded_queries)
print(table)
# >>> | Best Mountain bikes for kids | 0.54 | bikes:003...
hybrid_query = (
Query("(@brand:Peaknetic)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
table = create_query_table(hybrid_query, queries, encoded_queries)
print(table)
# >>> | Best Mountain bikes for kids | 0.3 | bikes:008...
range_query = (
Query(
"@vector:[VECTOR_RANGE $range $query_vector]=>"
"{$YIELD_DISTANCE_AS: vector_score}"
)
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.paging(0, 4)
.dialect(2)
)
table = create_query_table(
range_query, queries[:1],
encoded_queries[:1],
{"range": 0.55}
)
print(table)
# >>> | Bike for small kids | 0.52 | bikes:001 | Velorim |...
查询 | 得分 | id | 品牌 | 型号 | 描述 |
---|---|---|---|---|---|
适合儿童的最佳山地自行车 | 0.54 | bikes:003 | Nord | Chook air 5 | Chook Air 5 为 6 岁及以上的儿童提供了一款耐用且超轻的山地自行车,适合他们在轨道上的首次体验,以及在森林和田野中轻松巡航。较低的顶管使其在任何情况下都易于上下车,为您的孩子在小径上提供更大的安全性。Chook Air 5 是山地骑行的完美入门款。 |
0.51 | bikes:010 | nHill | Summit | 这款来自 nHill 的经济型山地自行车在自行车道和山路上表现出色。带 100mm 行程的叉子可以吸收崎岖的地形。Kenda Booster 胖胎在弯道和潮湿的小径上提供抓地力。禧玛诺 Tourney 传动系统提供足够的档位,可以找到舒适的速度来骑上坡,而 Tektro 液压盘式制动器可以平稳制动。无论您是想买一辆价格实惠的自行车,可以骑着上班,也可以在周末骑行,还是仅仅想要一辆稳定的自行车,... | |
0.46 | bikes:001 | Velorim | Jigger | Jigger 小巧而强大,是那些最小的孩子们的最佳骑行选择!这款是市面上最小的儿童踏板自行车,没有脚踏制动器,Jigger 是为那些罕见的固执的小骑手选择的车辆,他们渴望出发。我们说罕见是因为这款酷炫的小自行车不适合那些第一次骑行的紧张的小骑手,但它是一款真正的快车,适合真正的速度爱好者。Jigger 是一款 12 英寸的轻便儿童自行车,可以满足您孩子的速度需求。它是一款单速自行车,... |
下一步
- 您可以通过阅读 向量参考文档 了解有关查询选项(例如过滤器和向量范围查询)的更多信息。
- 完整的 搜索和查询文档 可能对您很有趣。
- 如果您想更交互地遵循代码示例,那么您可以使用 Jupyter 笔记本,该笔记本是本快速入门指南的灵感来源。
- 如果您想在实际操作中看到更多 Redis 向量数据库的更高级示例,请访问 GitHub 上的 Redis AI 资源 页面。