Skip to main content

Create dictionaries in Aiven for ClickHouse®

Create dictionaries in Aiven for ClickHouse® to accelerate queries for better efficiency and performance.

Dictionaries in Aiven for ClickHouse

A dictionary is a key-attribute mapping useful for low latency lookup queries, when often looking up attributes for a particular key. Dictionary data resides fully in memory, which is why using a dictionary in JOINs is often much faster than using a MergeTree table. Dictionaries can be an efficient replacement for regular tables in your JOIN clauses.

Aiven for ClickHouse supports backup and restore for dictionaries. Also, dictionaries in Aiven for ClickHouse are automatically replicated to all service nodes.

Read more on dictionaries in the upstream ClickHouse documentation.

Prerequisites

  • Aiven for ClickHouse service created
  • SQL client installed
  • Dictionary source available

Limitations

  • Only TLS connections supported

  • If no host is specified in a dictionary with a ClickHouse source, the local host is assumed, and the dictionary is filled with data from a query against the local ClickHouse, for example:

    -- users table
    CREATE TABLE default.users
    (
    id UInt64,
    username String,
    email String,
    country String
    )
    ENGINE = MergeTree()
    ORDER BY id;
    CREATE DICTIONARY default.users_dictionary
    (
    id UInt64,
    username String,
    email String,
    country String
    )
    PRIMARY KEY id
    SOURCE(CLICKHOUSE(DB 'default' TABLE 'users'))
    LAYOUT(FLAT())
    LIFETIME(100);

    In Aiven for ClickHouse, to fill the dictionary the table users are queried with the permissions of the avnadmin user even if another user creates the dictionary. In upstream ClickHouse, the same is true except the default user is used.

Supported layouts

Aiven for ClickHouse supports the same layouts that the upstream ClickHouse supports with two exceptions,ssd_cache and complex_key_ssd_cache, which are not supported.

Supported sources

  • HTTP(s)
  • Upstream/remote ClickHouse
  • Aiven for ClickHouse

Create a dictionary

To create a dictionary with specified structure (attributes), source, layout, and lifetime, use the following syntax:

CREATE [OR REPLACE] DICTIONARY [IF NOT EXISTS] [db.]dictionary_name
(
key1 type1 [DEFAULT|EXPRESSION expr1] [IS_OBJECT_ID],
key2 type2 [DEFAULT|EXPRESSION expr2],
attr1 type2 [DEFAULT|EXPRESSION expr3] [HIERARCHICAL|INJECTIVE],
attr2 type2 [DEFAULT|EXPRESSION expr4] [HIERARCHICAL|INJECTIVE]
)
PRIMARY KEY key1, key2
SOURCE(SOURCE_NAME([param1 value1 ... paramN valueN]))
LAYOUT(LAYOUT_NAME([param_name param_value]))
LIFETIME({MIN min_val MAX max_val | max_val})
SETTINGS(setting_name = setting_value, setting_name = setting_value, ...)
COMMENT 'Comment'

Examples

Speeding up JOINs

  1. Create tables in your ClickHouse database:

    CREATE TABLE users
    (
    id UInt64,
    username String,
    email String,
    country String
    )
    ENGINE = MergeTree()
    ORDER BY id;
    CREATE TABLE transactions
    (
    id UInt64,
    user_id UInt64,
    product_id UInt64,
    quantity Float64,
    price Float64
    )
    ENGINE = MergeTree()
    ORDER BY id;
  2. Create a dictionary for the users table:

    CREATE DICTIONARY users_dictionary
    (
    id UInt64,
    username String,
    email String,
    country String
    )
    PRIMARY KEY id
    SOURCE(CLICKHOUSE(DB 'default' TABLE 'users'))
    LAYOUT(FLAT())
    LIFETIME(100);

    You can do the same using the QUERY parameter:

    CREATE DICTIONARY users_dictionary
    (
    id UInt64,
    username String,
    email String,
    country String
    )
    PRIMARY KEY id
    SOURCE(CLICKHOUSE(QUERY 'SELECT id, username, email, country FROM default.users'))
    LAYOUT(FLAT())
    LIFETIME(100);

JOINs are much faster as the data is pre-indexed in memory.

SELECT
t.id,
u.username,
t.product_id,
t.quantity,
t.price
FROM transactions AS t
ANY LEFT JOIN users_dictionary AS u
ON t.user_id = u.id;

Caching data from an external database or URL

Create a dictionary with HTTP as a source:

CREATE DICTIONARY currency_rates
(
currency_code String,
rate Float64 DEFAULT 1.0
)
PRIMARY KEY currency_code
SOURCE(HTTP(URL 'https://example.com/currency_rates.csv' FORMAT CSV))
LAYOUT(COMPLEX_KEY_HASHED())
LIFETIME(100);