Practice of ClickHouse user resource isolation in GrowingIO

Keywords: Database clickhouse RBAC

Business scenario

Multi tenancy of ClickHouse is a very necessary scenario. If there are no restrictions, users may consume too much resources of database services and affect the use of other users. This paper attempts to show you another relatively cheap method besides hardware level isolation.

This article first describes the resource restriction mechanism provided by ClickHouse, and then describes the practice of GrowingIO. Since user resource usage restriction is a kind of user management, this paper will also add the content of user management.

Hello World

Generally, ClickHouse is managed by systemd, and the tool rendering template is used to obtain the configuration file. Since our purpose is only demonstration, the relevant steps and configuration are simplified.

Create two files in the / etc/clickhouse Directory:

── conf
   ├── config.xml
   └── users.xml

Then run in any directory:

clickhouse server --config=/etc/clickhouse/conf/config.xml

Let ClickHouse run according to the configuration file, and ClickHouse listens to the configuration file. If there is any change, it can run according to the new configuration without restarting.

The server tracks changes in config files... and reloads the settings for users and clusters on the fly (https://clickhouse.com/docs/e...)

This is the configuration demo of the simple version of config.xml:

<?xml version="1.0"?>
<yandex>
    <logger>
        <level>trace</level>
        <log>/tmp/log/clickhouse-server.log</log>
        <errorlog>/tmp/log/clickhouse-server.err.log</errorlog>
        <size>1000M</size>
        <count>10</count>
    </logger>
    <query_log>
        <database>system</database>
        <table>query_log</table>
        <partition_by>toYYYYMM(event_date)</partition_by>
        <flush_interval_milliseconds>1000</flush_interval_milliseconds>
    </query_log>
    <tcp_port>9000</tcp_port>
    <listen_host>127.0.0.1</listen_host>
    <access_control_path>/tmp/ledzeppelin/</access_control_path>
    <max_concurrent_queries>500</max_concurrent_queries>
    <mark_cache_size>5368709120</mark_cache_size>
    <path>./clickhouse/</path>
    <users_config>users.xml</users_config>
</yandex>

Note < users_ config>users.xml</users_ Config > binds users.xml of the current directory of config.xml

<?xml version="1.0"?>
<yandex>
    <profiles>
        <default>
            <readonly>1</readonly>
        </default>
        <pA>
            <max_memory_usage>10G</max_memory_usage>
            <max_memory_usage_for_user>10G</max_memory_usage_for_user>
            <max_memory_usage_for_all_queries>10G</max_memory_usage_for_all_queries>
            <max_query_size>1073741824</max_query_size>
            <readonly>1</readonly>
        </pA>
    </profiles>
    <users>
        <zeppelin>
            <!-- <password_sha256_hex></password_sha256_hex> -->
            <password>password</password>
            <networks>
                <ip>::/0</ip>
            </networks>
            <profile>pA</profile>
            <quota>qA</quota>
        </zeppelin>
    </users>
    <quotas>
        <qA>
            <interval>
                <!-- In seconds -->
                <duration>10</duration>
                <!-- 10 You can only query once in seconds -->
                <queries>2</queries>
                <errors>0</errors>
                <result_rows>0</result_rows>
                <read_rows>0</read_rows>
                <execution_time>0</execution_time>
            </interval>
        </qA>
    </quotas>
</yandex>

You can initially see the reference relationship between user, profile and quotas. These two configuration files stipulate that the user zeppelin can only query twice in 10 seconds

Set the Profile pA as read-only: < readonly > 1 < / readonly >, the user zeppelin references pA, and an error will be reported when zeppelin executes a write statement:

Quotes and profiles

Quotes and Settings Profiles are two ways ClickHouse uses to control user resources.

From section 2, we can get a preliminary impression of these two restriction mechanisms. Their definitions in ClickHouse documents are:

Quotas allow you to limit resource usage over a period of time or track the use of resources.
quote
A settings profile is a collection of settings grouped under the same name.

Settings is also a limitation. The definition of Profile does not highlight the meaning of "restriction", but we can use it as a restriction.

The Profile demo in the figure above specifies the "number of rows you can read", max_rows_to_read. The question is whether this is the maximum number of rows you can read at one time or the total number of rows you can read when the user is alive. The answer is the maximum number of rows that the database management tool can query when executing a query.

A maximum number of rows that can be read from a table when running a query. (https://clickhouse.com/docs/e...)

It can be seen that the essential difference between Profile and quotes is that the setting of Profile is independent of time. For each query, take readonly, which is the easiest to understand, as an example "It does not aim at the user behavior of 10 seconds or 1 hour. It aims at each query. For each query, the restriction of Profile is real. Unlike Quota, there are 10 Quotas in this second, and there may be no Quotas in the next second, resulting in the query being discarded.

In order to further understand why we associate Quota with Interval, we need to think of QPS(Queries per second):

As shown in the figure, when the traffic is very rapid, it will exhaust all the quotas in the early stage of each Interval, resulting in no quota available in the rest of the time, resulting in disconnection, such as Interval a and B. when the traffic is relatively mild, its distribution in the whole time Interval will be more uniform, such as Interval C.

The integral of QPS and TIME is equal to Quota.

It can be seen that ClickHouse's Quota and token bucket current limiting methods have many similarities. The token bucket generates a certain Quota in each interval. When the Quota is exhausted, the new query will be discarded.

It can be seen from the source code that every time you query, you check whether (checkExceeded()) exceeds the quota

Each interval has multiple resource_types. For example, < query > 1 < / query > is a type. Check the maximum inventory Max and the quota used. If used > max, an error will be reported.

SQL driven access control

We recommend using SQL-driven workflow. Both of the configuration methods work simultaneously, so if you use the server configuration files for managing accounts and access rights, you can smoothly switch to SQL-driven workflow. (https://clickhouse.com/docs/e...)

ClickHouse recommends using SQL driven access control mode. Section 2 shows the control mode of "Server configuration files users.xml and config.xml", which is sufficient for uncomplicated scenarios. The mode of configuration file involves security issues, and there are inflexible problems when adding users, including managing static users and dynamic (temporary) Now we will verify the feasibility of SQL driven mode through a series of practical operations.

4.0 preliminary solution

  1. Create the super administrator ledzeppelin and user zeppelin through the default user. (if zeppelin is defined in users.xml, it cannot be changed, and an error is reported: Cannot update user zeppelin in users.xml because this storage is readonly).
  2. Disallow default.
  3. As a super administrator, dynamically configure zeppelin resources through SQL

As shown in the figure, ordinary user zeppelin does not have any permissions. It is passive and cannot create "ClickHouse access entities" "As a super administrator, ledzeppelin is active. It assembles access entities such as profile, quota and role. Finally, it assigns the role to ordinary users zeppelin and redefines the attributes of zeppelin to achieve the purpose of user management. In this practice, we also start from resource constraints and finally expand to more general user management.

4.1 create zeppelin user

Create users as recommended by ClickHouse

4.1.1 give the default user access_management permission

<user>
    <default>
        <!-- Other configurations -->
        <access_management>1</access_management>
    </default>
</user>

4.1.2 create super administrator ledzeppelin and ordinary user zeppelin with default user

CREATE USER ledzeppelin;
GRANT ALL ON *.* TO ledzeppelin WITH GRANT OPTION;
CREATE USER zeppelin;

In practice, passwords must be set, such as

CREATE USER MJ IDENTIFIED WITH sha256_password BY 'qwerty';

Sign in:

clickhouse-client -u MJ --password qwerty

For privacy reasons, you can set the password as an environment variable

clickhouse-client -u MJ --password ${CLICKHOUSE_MJ_PASSWORD}

4.1.3 turn off SQL access control permission of default user

For security reasons, we need to turn off the default permission to avoid creating another super administrator and other hidden dangers

First, exit the default login. Note that the default user's properties cannot be changed through SQL, otherwise an error will be reported:

REVOKE ALL ON *.* FROM default

Query id: 7ab4a62a-62d1-4d0b-bbfe-c27d0e595f65


0 rows in set. Elapsed: 0.002 sec. 

Received exception from server (version 21.10.2):
Code: 495. DB::Exception: Received from localhost:9000. DB::Exception: Cannot update user `default` in users.xml because this storage is readonly: Couldn't update user `default`. Successfully updated: none. (ACCESS_STORAGE_READONLY)

We can only change the default configuration by modifying the XML

By ClickHouse:

  1. Make the default user read-only and set a read-only Profile in users.xml
<profiles>
        <!-- Profile that allows only read queries. -->
        <readonly_profile>
            <readonly>2</readonly>
            <!-- Other configurations -->
        </readonly_profile>
    </profiles>
  1. Change the Profile of the default user:
<users>
    <profile> readonly_profile</profile>
    <!-- Other configurations -->
</users>
  1. And delete this line:

<access_management>1</access_management>

ClickHouse will automatically load the configuration, and now there is only one super administrator: ledzeppelin

4.2 dynamically manage resources through SQL

4.2.1 create a Profile

Take readonly(Permissions for Queries | ClickHouse Documentation) as an example because this attribute is easy to test. The database can be created at 0, but not at 1 and 2

CREATE SETTINGS PROFILE IF NOT EXISTS z_profile
SETTINGS
readonly = 2
READONLY

4.2.2 create a Quota

You can only query once in 10 seconds

CREATE QUOTA IF NOT EXISTS z_quota
FOR INTERVAL 10 second
  MAX queries 1

4.2.3 create Role

CREATE ROLE IF NOT EXISTS z_role

4.2.3.1 give Role permission

GRANT SELECT ON db.* TO z_role;

Allow Z_ When the role accesses the db database, it is noted that we do not grant permissions to a user, but to a role, which can be granted to other users. In this way, we establish an abstraction layer between permissions and users.

4.2.4 create a garbage collection Role

CREATE ROLE IF NOT EXISTS gc_role

It is used to bind unused profiles and quotas

4.2.5 Role binding profile and quota

ALTER SETTINGS PROFILE z_profile TO z_role
ALTER QUOTA z_quota TO z_role;

4.2.6 application to user zeppelin

Check which role s are bound to zeppelin

SELECT * FROM system.role_grants WHERE user_name LIKE 'zeppelin'

If so, assume old_role, untie it first

REVOKE old_role FROM zeppelin

Bind new role:

GRANT z_role TO zeppelin

4.2.7 revoke (important)

Our assumption is that profile and quota are bound to roles, and roles are bound to users. If you need to change the user's configuration, there are two methods

  1. Change a role
  2. Change the profile/quota of the current role

There is a problem with the second method, because after changing the profile, the user's configuration will not be refreshed directly (it may be a ClickHouse bug, which we are still verifying). At this time, you need to revoke the user's role and hang it up

For the sake of insurance, the following paradigm should be followed:

  1. revoke z_role from zeppelin
  2. Modify Z_ profile or quota under role
  3. grant z_role from zeppelin

Or:

  1. Modify Z_ profile or quota under role
  2. revoke z_role from zeppelin
  3. grant z_role from zeppelin

4.2.8 bind a new Quota

Change to query every 5 seconds

CREATE QUOTA IF NOT EXISTS z_quota_5
FOR INTERVAL 5 second
  MAX queries 1

Change the original Z_ Bind quota to gc_role

ALTER QUOTA z_quota TO gc_role;

Tie a new quota:

ALTER QUOTA z_quota_5 TO z_role;

Refresh:

revoke z_role from zeppelin;
grant z_role to zeppelin;

Check:

SELECT
    name,
    apply_to_list
FROM system.quotas
WHERE name LIKE 'z_quota_5'

Query id: dc5b2ece-c792-4585-95d8-8b4275631664

┌─name──────┬─apply_to_list─┐
│ z_quota_5 │ ['z_role']    │
└───────────┴───────────────┘

1 rows in set. Elapsed: 0.004 sec. 

4.2.9 binding a new Profile

CREATE SETTINGS PROFILE IF NOT EXISTS z_profile_permission_type_0
SETTINGS
    readonly = 0
READONLY

To change the binding object of the profile, first set Z_ Bind profile to gc_role

ALTER SETTINGS PROFILE z_profile TO gc_role

Add new z_profile_permission_type_0 tied to z_role

ALTER SETTINGS PROFILE z_profile_permission_type_0 TO z_role

Refresh configuration:

revoke z_role from zeppelin;
grant z_role to zeppelin;
SELECT
    name,
    apply_to_list
FROM system.settings_profiles

Query id: 51d1c2de-ced7-4558-a164-020a76a53d97

┌─name────────────────────────┬─apply_to_list─┐
│ readonly                    │ []            │
│ default                     │ []            │
│ z_profile                   │ ['gc_role']   │
│ z_profile_permission_type_0 │ ['z_role']    │
└─────────────────────────────┴───────────────┘

5.SQL driven settings persistence

Local storage / local persistence is the same as the path of the data directory. According to Server Settings | ClickHouse Documentation, the properties created through SQL are stored in access_control_path, the default value is / var/lib/clickhouse/access/

Path to a folder where a ClickHouse server stores user and role configurations created by SQL commands.

Setting location: config.xml

<yandex>
...
    <access_control_path>/var/lib/clickhouse/access/ledzeppelin/</access_control_path>
...
</yandex

View Directory:

use/access/ledzeppelin  
╰─➤  ls
094e5b76-1b4d-97a5-43cd-bbb3ccceb688.sql  9c492851-bf60-7753-7c14-c05f4be46b8e.sql
2949a246-848f-fc07-f1b9-a3f7246b31a0.sql  f340165d-0ec4-f90e-fb55-74c8d0cfd7ae.sql
394e47fd-b701-d017-6857-0d8744d830b1.sql  quotas.list
3d44c8a1-01b3-6f68-eff7-1e20a9080296.sql  roles.list
3f447b54-b515-aa93-bacc-1a649aeac19c.sql  row_policies.list
674912da-5593-e497-c5d8-d058e916c7a5.sql  settings_profiles.list
6c44ad5c-4354-c154-048e-8e293ff427ab.sql  users.list

In addition, note that after restarting ClickHouse, ClickHouse starts from access_control_path recovery configuration:

If these files are deleted, the ClickHouse access entities previously created through SQL will disappear after ClickHouse is restarted.

conclusion

This paper starts with ClickHouse user resource isolation and extends it to ClickHouse user management. We realize the business of resource isolation through quotes and Profile; In user management, we use roles and create a new abstraction layer. In this way, we do not need to directly operate permissions when managing users, but give roles to users as a component. Roles can also encapsulate resources. We can deploy these access entities through configuration files or SQL, which is advocated by ClickHouse. Finally, this article only involves the tip of the iceberg of ClickHouse user management. I hope it will be helpful to you.

reference resources

Posted by regprick on Fri, 05 Nov 2021 16:33:55 -0700