JSON 与哈希存储

使用 RedisVL 存储 JSON 和哈希

Redis 默认提供各种数据结构，可用于您的特定领域应用程序和用例。在本文档中，您将学习如何使用 RedisVL 与哈希和 JSON 数据一起使用。

注意

本文档是此 Jupyter 笔记本的转换形式。

在开始之前，请确保以下事项

您已安装 RedisVL 并激活了该环境。
您有一个运行的 Redis 实例，其中启用了搜索和查询功能。

# import necessary modules
import pickle

from redisvl.redis.utils import buffer_to_array
from jupyterutils import result_print, table_print
from redisvl.index import SearchIndex

# load in the example data and printing utils
data = pickle.load(open("hybrid_example_data.pkl", "rb"))

table_print(data)

用户	年龄	工作	信用评分	办公地点	用户嵌入
约翰	18	工程师	高	-122.4194,37.7749	b'\xcd\xcc\xcc=\xcd\xcc\xcc=\x00\x00\x00?'
德里克	14	医生	低	-122.4194,37.7749	b'\xcd\xcc\xcc=\xcd\xcc\xcc=\x00\x00\x00?'
南希	94	医生	高	-122.4194,37.7749	b'333?\xcd\xcc\xcc=\x00\x00\x00?'
泰勒	100	工程师	高	-122.0839,37.3861	b'\xcd\xcc\xcc=\xcd\xcc\xcc>\x00\x00\x00?'
蒂姆	12	皮肤科医生	高	-122.0839,37.3861	b'\xcd\xcc\xcc>\xcd\xcc\xcc>\x00\x00\x00?'
塔伊穆尔	15	首席执行官	低	-122.0839,37.3861	b'\x9a\x99\x19?\xcd\xcc\xcc=\x00\x00\x00?'
乔	35	牙医	中等	-122.0839,37.3861	b'fff?fff?\xcd\xcc\xcc='

哈希或 JSON - 如何选择？

两种存储选项都提供各种功能和权衡。下面，您将通过一个虚拟数据集来学习何时以及如何使用这两种数据类型。

使用哈希

Redis 中的哈希是字段-值对的简单集合。可以将其视为包含多个“行”的可变、单级字典。

{
    "model": "Deimos",
    "brand": "Ergonom",
    "type": "Enduro bikes",
    "price": 4972,
}

哈希最适合具有以下特征的用例。

性能（速度）和存储空间（内存消耗）是最重要的关注点。
数据可以轻松地规范化并建模为单级字典。

哈希通常是默认建议。

# define the hash index schema
hash_schema = {
    "index": {
        "name": "user-hash",
        "prefix": "user-hash-docs",
        "storage_type": "hash", # default setting -- HASH
    },
    "fields": [
        {"name": "user", "type": "tag"},
        {"name": "credit_score", "type": "tag"},
        {"name": "job", "type": "text"},
        {"name": "age", "type": "numeric"},
        {"name": "office_location", "type": "geo"},
        {
            "name": "user_embedding",
            "type": "vector",
            "attrs": {
                "dims": 3,
                "distance_metric": "cosine",
                "algorithm": "flat",
                "datatype": "float32"
            }
        }
    ],
}

# construct a search index from the hash schema
hindex = SearchIndex.from_dict(hash_schema)

# connect to local redis instance
hindex.connect("redis://localhost:6379")

# create the index (no data yet)
hindex.create(overwrite=True)

# show the underlying storage type
hindex.storage_type

    <StorageType.HASH: 'hash'>

向量作为字节字符串

在 Redis 中使用哈希时，一个细微差别是所有矢量化数据都必须作为字节字符串传递（为了高效存储、索引和处理）。下面可以看到一个示例。

# show a single entry from the data that will be loaded
data[0]

    {'user': 'john',
     'age': 18,
     'job': 'engineer',
     'credit_score': 'high',
     'office_location': '-122.4194,37.7749',
     'user_embedding': b'\xcd\xcc\xcc=\xcd\xcc\xcc=\x00\x00\x00?'}

# load hash data
keys = hindex.load(data)

$ rvl stats -i user-hash

    Statistics:
    ╭─────────────────────────────┬─────────────╮
    │ Stat Key                    │ Value       │
    ├─────────────────────────────┼─────────────┤
    │ num_docs                    │ 7           │
    │ num_terms                   │ 6           │
    │ max_doc_id                  │ 7           │
    │ num_records                 │ 44          │
    │ percent_indexed             │ 1           │
    │ hash_indexing_failures      │ 0           │
    │ number_of_uses              │ 1           │
    │ bytes_per_record_avg        │ 3.40909     │
    │ doc_table_size_mb           │ 0.000767708 │
    │ inverted_sz_mb              │ 0.000143051 │
    │ key_table_size_mb           │ 0.000248909 │
    │ offset_bits_per_record_avg  │ 8           │
    │ offset_vectors_sz_mb        │ 8.58307e-06 │
    │ offsets_per_term_avg        │ 0.204545    │
    │ records_per_doc_avg         │ 6.28571     │
    │ sortable_values_size_mb     │ 0           │
    │ total_indexing_time         │ 0.587       │
    │ total_inverted_index_blocks │ 18          │
    │ vector_index_sz_mb          │ 0.0202332   │
    ╰─────────────────────────────┴─────────────╯

执行查询

创建索引并将数据加载到正确格式后，您可以对索引运行查询。

from redisvl.query import VectorQuery
from redisvl.query.filter import Tag, Text, Num

t = (Tag("credit_score") == "high") & (Text("job") % "enginee*") & (Num("age") > 17)

v = VectorQuery([0.1, 0.1, 0.5],
                "user_embedding",
                return_fields=["user", "credit_score", "age", "job", "office_location"],
                filter_expression=t)


results = hindex.query(v)
result_print(results)

向量距离	用户	信用评分	年龄	工作	办公地点
0	约翰	高	18	工程师	-122.4194,37.7749
0.109129190445	泰勒	高	100	工程师	-122.0839,37.3861

# clean up
hindex.delete()

使用 JSON

Redis 还支持原生 JSON 对象。这些对象可以是多级（嵌套）对象，并提供完整的 JSONPath 支持来检索和更新子元素。

{
    "name": "bike",
    "metadata": {
        "model": "Deimos",
        "brand": "Ergonom",
        "type": "Enduro bikes",
        "price": 4972,
    }
}

JSON 最适合具有以下特征的用例。

易用性和数据模型灵活性是最重要的关注点。
应用程序数据已经是原生 JSON。
替换其他文档存储/数据库解决方案。

完整的 JSON Path 支持

由于 Redis 支持完整的 JSONPath 支持，因此在创建索引模式时，需要通过其路径对元素进行索引和选择，并使用所需的 name 和 path 指向数据在对象中的位置。

注意

默认情况下，如果 JSON 字段模式中未提供路径，RedisVL 会将路径视为 $.{name}。

# define the json index schema
json_schema = {
    "index": {
        "name": "user-json",
        "prefix": "user-json-docs",
        "storage_type": "json", # JSON storage type
    },
    "fields": [
        {"name": "user", "type": "tag"},
        {"name": "credit_score", "type": "tag"},
        {"name": "job", "type": "text"},
        {"name": "age", "type": "numeric"},
        {"name": "office_location", "type": "geo"},
        {
            "name": "user_embedding",
            "type": "vector",
            "attrs": {
                "dims": 3,
                "distance_metric": "cosine",
                "algorithm": "flat",
                "datatype": "float32"
            }
        }
    ],
}

# construct a search index from the JSON schema
jindex = SearchIndex.from_dict(json_schema)

# connect to a local redis instance
jindex.connect("redis://localhost:6379")

# create the index (no data yet)
jindex.create(overwrite=True)

# note the multiple indices in the same database
$ rvl index listall

    20:23:08 [RedisVL] INFO   Indices:
    20:23:08 [RedisVL] INFO   1. user-json

#### Vectors as float arrays

Vectorized data stored in JSON must be stored as a pure array (e.g., a Python list) of floats. Modify your sample data to account for this below:

```python
import numpy as np

json_data = data.copy()

for d in json_data:
    d['user_embedding'] = buffer_to_array(d['user_embedding'], dtype=np.float32)

# inspect a single JSON record
json_data[0]

{'user': 'john',
 'age': 18,
 'job': 'engineer',
 'credit_score': 'high',
 'office_location': '-122.4194,37.7749',
 'user_embedding': [0.10000000149011612, 0.10000000149011612, 0.5]}

keys = jindex.load(json_data)

# we can now run the exact same query as above
result_print(jindex.query(v))

向量距离	用户	信用评分	年龄	工作	办公地点
0	约翰	高	18	工程师	-122.4194,37.7749
0.109129190445	泰勒	高	100	工程师	-122.0839,37.3861

清理

jindex.delete()

产品

工具

关键功能

了解其工作原理

获取 Redis

用例

行业

客户案例研究

专家服务

关于

学习

连接

向量搜索

产品

工具