Create dictionaries in Aiven for ClickHouse®
Create dictionaries in Aiven for ClickHouse® to accelerate queries for better efficiency and performance.
Dictionaries in Aiven for ClickHouse
A dictionary is a key-attribute mapping useful for low latency lookup queries, when often looking up attributes for a particular key. Dictionary data resides fully in memory, which is why using a dictionary in JOINs is often much faster than using a MergeTree table. Dictionaries can be an efficient replacement for regular tables in your JOIN clauses.
Aiven for ClickHouse supports backup and restore for dictionaries. Also, dictionaries in Aiven for ClickHouse are automatically replicated to all service nodes.
Read more on dictionaries in the upstream ClickHouse documentation.
Prerequisites
- Aiven for ClickHouse service created
- SQL client installed
- Dictionary source available
Limitations
-
Only TLS connections supported
-
If no host is specified in a dictionary with a ClickHouse source, the local host is assumed, and the dictionary is filled with data from a query against the local ClickHouse, for example:
-- users table
CREATE TABLE default.users
(
id UInt64,
username String,
email String,
country String
)
ENGINE = MergeTree()
ORDER BY id;
CREATE DICTIONARY default.users_dictionary
(
id UInt64,
username String,
email String,
country String
)
PRIMARY KEY id
SOURCE(CLICKHOUSE(DB 'default' TABLE 'users'))
LAYOUT(FLAT())
LIFETIME(100);In Aiven for ClickHouse, to fill the dictionary the table users are queried with the permissions of the
avnadmin
user even if another user creates the dictionary. In upstream ClickHouse, the same is true except thedefault
user is used.
Supported layouts
Aiven for ClickHouse supports the same
layouts that the upstream ClickHouse supports
with two exceptions,ssd_cache
and complex_key_ssd_cache
, which are not supported.
Supported sources
- HTTP(s)
- Upstream/remote ClickHouse
- Aiven for ClickHouse
Create a dictionary
To create a dictionary with specified structure (attributes), source, layout, and lifetime, use the following syntax:
CREATE [OR REPLACE] DICTIONARY [IF NOT EXISTS] [db.]dictionary_name
(
key1 type1 [DEFAULT|EXPRESSION expr1] [IS_OBJECT_ID],
key2 type2 [DEFAULT|EXPRESSION expr2],
attr1 type2 [DEFAULT|EXPRESSION expr3] [HIERARCHICAL|INJECTIVE],
attr2 type2 [DEFAULT|EXPRESSION expr4] [HIERARCHICAL|INJECTIVE]
)
PRIMARY KEY key1, key2
SOURCE(SOURCE_NAME([param1 value1 ... paramN valueN]))
LAYOUT(LAYOUT_NAME([param_name param_value]))
LIFETIME({MIN min_val MAX max_val | max_val})
SETTINGS(setting_name = setting_value, setting_name = setting_value, ...)
COMMENT 'Comment'
Examples
Speeding up JOIN
s
-
Create tables in your ClickHouse database:
CREATE TABLE users
(
id UInt64,
username String,
email String,
country String
)
ENGINE = MergeTree()
ORDER BY id;CREATE TABLE transactions
(
id UInt64,
user_id UInt64,
product_id UInt64,
quantity Float64,
price Float64
)
ENGINE = MergeTree()
ORDER BY id; -
Create a dictionary for the
users
table:CREATE DICTIONARY users_dictionary
(
id UInt64,
username String,
email String,
country String
)
PRIMARY KEY id
SOURCE(CLICKHOUSE(DB 'default' TABLE 'users'))
LAYOUT(FLAT())
LIFETIME(100);You can do the same using the
QUERY
parameter:CREATE DICTIONARY users_dictionary
(
id UInt64,
username String,
email String,
country String
)
PRIMARY KEY id
SOURCE(CLICKHOUSE(QUERY 'SELECT id, username, email, country FROM default.users'))
LAYOUT(FLAT())
LIFETIME(100);
JOIN
s are much faster as the data is pre-indexed in memory.
SELECT
t.id,
u.username,
t.product_id,
t.quantity,
t.price
FROM transactions AS t
ANY LEFT JOIN users_dictionary AS u
ON t.user_id = u.id;
Caching data from an external database or URL
Create a dictionary with HTTP
as a source:
CREATE DICTIONARY currency_rates
(
currency_code String,
rate Float64 DEFAULT 1.0
)
PRIMARY KEY currency_code
SOURCE(HTTP(URL 'https://example.com/currency_rates.csv' FORMAT CSV))
LAYOUT(COMPLEX_KEY_HASHED())
LIFETIME(100);