Business scenario
Multi tenancy of ClickHouse is a very necessary scenario. If there are no restrictions, users may consume too much resources of database services and affect the use of other users. This paper attempts to show you another relatively cheap method besides hardware level isolation.
This article first describes the resource restriction mechanism provided by ClickHouse, and then describes the practice of GrowingIO. Since user resource usage restriction is a kind of user management, this paper will also add the content of user management.
Hello World
Generally, ClickHouse is managed by systemd, and the tool rendering template is used to obtain the configuration file. Since our purpose is only demonstration, the relevant steps and configuration are simplified.
Create two files in the / etc/clickhouse Directory:
── conf ├── config.xml └── users.xml
Then run in any directory:
clickhouse server --config=/etc/clickhouse/conf/config.xml
Let ClickHouse run according to the configuration file, and ClickHouse listens to the configuration file. If there is any change, it can run according to the new configuration without restarting.
The server tracks changes in config files... and reloads the settings for users and clusters on the fly (https://clickhouse.com/docs/e...)
This is the configuration demo of the simple version of config.xml:
<?xml version="1.0"?> <yandex> <logger> <level>trace</level> <log>/tmp/log/clickhouse-server.log</log> <errorlog>/tmp/log/clickhouse-server.err.log</errorlog> <size>1000M</size> <count>10</count> </logger> <query_log> <database>system</database> <table>query_log</table> <partition_by>toYYYYMM(event_date)</partition_by> <flush_interval_milliseconds>1000</flush_interval_milliseconds> </query_log> <tcp_port>9000</tcp_port> <listen_host>127.0.0.1</listen_host> <access_control_path>/tmp/ledzeppelin/</access_control_path> <max_concurrent_queries>500</max_concurrent_queries> <mark_cache_size>5368709120</mark_cache_size> <path>./clickhouse/</path> <users_config>users.xml</users_config> </yandex>
Note < users_ config>users.xml</users_ Config > binds users.xml of the current directory of config.xml
<?xml version="1.0"?> <yandex> <profiles> <default> <readonly>1</readonly> </default> <pA> <max_memory_usage>10G</max_memory_usage> <max_memory_usage_for_user>10G</max_memory_usage_for_user> <max_memory_usage_for_all_queries>10G</max_memory_usage_for_all_queries> <max_query_size>1073741824</max_query_size> <readonly>1</readonly> </pA> </profiles> <users> <zeppelin> <!-- <password_sha256_hex></password_sha256_hex> --> <password>password</password> <networks> <ip>::/0</ip> </networks> <profile>pA</profile> <quota>qA</quota> </zeppelin> </users> <quotas> <qA> <interval> <!-- In seconds --> <duration>10</duration> <!-- 10 You can only query once in seconds --> <queries>2</queries> <errors>0</errors> <result_rows>0</result_rows> <read_rows>0</read_rows> <execution_time>0</execution_time> </interval> </qA> </quotas> </yandex>
You can initially see the reference relationship between user, profile and quotas. These two configuration files stipulate that the user zeppelin can only query twice in 10 seconds
Set the Profile pA as read-only: < readonly > 1 < / readonly >, the user zeppelin references pA, and an error will be reported when zeppelin executes a write statement:
Quotes and profiles
Quotes and Settings Profiles are two ways ClickHouse uses to control user resources.
From section 2, we can get a preliminary impression of these two restriction mechanisms. Their definitions in ClickHouse documents are:
Quotas allow you to limit resource usage over a period of time or track the use of resources.
quote
A settings profile is a collection of settings grouped under the same name.
Settings is also a limitation. The definition of Profile does not highlight the meaning of "restriction", but we can use it as a restriction.
The Profile demo in the figure above specifies the "number of rows you can read", max_rows_to_read. The question is whether this is the maximum number of rows you can read at one time or the total number of rows you can read when the user is alive. The answer is the maximum number of rows that the database management tool can query when executing a query.
A maximum number of rows that can be read from a table when running a query. (https://clickhouse.com/docs/e...)
It can be seen that the essential difference between Profile and quotes is that the setting of Profile is independent of time. For each query, take readonly, which is the easiest to understand, as an example "It does not aim at the user behavior of 10 seconds or 1 hour. It aims at each query. For each query, the restriction of Profile is real. Unlike Quota, there are 10 Quotas in this second, and there may be no Quotas in the next second, resulting in the query being discarded.
In order to further understand why we associate Quota with Interval, we need to think of QPS(Queries per second):
As shown in the figure, when the traffic is very rapid, it will exhaust all the quotas in the early stage of each Interval, resulting in no quota available in the rest of the time, resulting in disconnection, such as Interval a and B. when the traffic is relatively mild, its distribution in the whole time Interval will be more uniform, such as Interval C.
The integral of QPS and TIME is equal to Quota.
It can be seen that ClickHouse's Quota and token bucket current limiting methods have many similarities. The token bucket generates a certain Quota in each interval. When the Quota is exhausted, the new query will be discarded.
It can be seen from the source code that every time you query, you check whether (checkExceeded()) exceeds the quota
Each interval has multiple resource_types. For example, < query > 1 < / query > is a type. Check the maximum inventory Max and the quota used. If used > max, an error will be reported.
SQL driven access control
We recommend using SQL-driven workflow. Both of the configuration methods work simultaneously, so if you use the server configuration files for managing accounts and access rights, you can smoothly switch to SQL-driven workflow. (https://clickhouse.com/docs/e...)
ClickHouse recommends using SQL driven access control mode. Section 2 shows the control mode of "Server configuration files users.xml and config.xml", which is sufficient for uncomplicated scenarios. The mode of configuration file involves security issues, and there are inflexible problems when adding users, including managing static users and dynamic (temporary) Now we will verify the feasibility of SQL driven mode through a series of practical operations.
4.0 preliminary solution
- Create the super administrator ledzeppelin and user zeppelin through the default user. (if zeppelin is defined in users.xml, it cannot be changed, and an error is reported: Cannot update user zeppelin in users.xml because this storage is readonly).
- Disallow default.
- As a super administrator, dynamically configure zeppelin resources through SQL
As shown in the figure, ordinary user zeppelin does not have any permissions. It is passive and cannot create "ClickHouse access entities" "As a super administrator, ledzeppelin is active. It assembles access entities such as profile, quota and role. Finally, it assigns the role to ordinary users zeppelin and redefines the attributes of zeppelin to achieve the purpose of user management. In this practice, we also start from resource constraints and finally expand to more general user management.
4.1 create zeppelin user
Create users as recommended by ClickHouse
4.1.1 give the default user access_management permission
<user> <default> <!-- Other configurations --> <access_management>1</access_management> </default> </user>
4.1.2 create super administrator ledzeppelin and ordinary user zeppelin with default user
CREATE USER ledzeppelin; GRANT ALL ON *.* TO ledzeppelin WITH GRANT OPTION; CREATE USER zeppelin;
In practice, passwords must be set, such as
CREATE USER MJ IDENTIFIED WITH sha256_password BY 'qwerty';
Sign in:
clickhouse-client -u MJ --password qwerty
For privacy reasons, you can set the password as an environment variable
clickhouse-client -u MJ --password ${CLICKHOUSE_MJ_PASSWORD}
4.1.3 turn off SQL access control permission of default user
For security reasons, we need to turn off the default permission to avoid creating another super administrator and other hidden dangers
First, exit the default login. Note that the default user's properties cannot be changed through SQL, otherwise an error will be reported:
REVOKE ALL ON *.* FROM default Query id: 7ab4a62a-62d1-4d0b-bbfe-c27d0e595f65 0 rows in set. Elapsed: 0.002 sec. Received exception from server (version 21.10.2): Code: 495. DB::Exception: Received from localhost:9000. DB::Exception: Cannot update user `default` in users.xml because this storage is readonly: Couldn't update user `default`. Successfully updated: none. (ACCESS_STORAGE_READONLY)
We can only change the default configuration by modifying the XML
By ClickHouse:
- Make the default user read-only and set a read-only Profile in users.xml
<profiles> <!-- Profile that allows only read queries. --> <readonly_profile> <readonly>2</readonly> <!-- Other configurations --> </readonly_profile> </profiles>
- Change the Profile of the default user:
<users> <profile> readonly_profile</profile> <!-- Other configurations --> </users>
- And delete this line:
<access_management>1</access_management>
ClickHouse will automatically load the configuration, and now there is only one super administrator: ledzeppelin
4.2 dynamically manage resources through SQL
4.2.1 create a Profile
Take readonly(Permissions for Queries | ClickHouse Documentation) as an example because this attribute is easy to test. The database can be created at 0, but not at 1 and 2
CREATE SETTINGS PROFILE IF NOT EXISTS z_profile SETTINGS readonly = 2 READONLY
4.2.2 create a Quota
You can only query once in 10 seconds
CREATE QUOTA IF NOT EXISTS z_quota FOR INTERVAL 10 second MAX queries 1
4.2.3 create Role
CREATE ROLE IF NOT EXISTS z_role
4.2.3.1 give Role permission
GRANT SELECT ON db.* TO z_role;
Allow Z_ When the role accesses the db database, it is noted that we do not grant permissions to a user, but to a role, which can be granted to other users. In this way, we establish an abstraction layer between permissions and users.
4.2.4 create a garbage collection Role
CREATE ROLE IF NOT EXISTS gc_role
It is used to bind unused profiles and quotas
4.2.5 Role binding profile and quota
ALTER SETTINGS PROFILE z_profile TO z_role ALTER QUOTA z_quota TO z_role;
4.2.6 application to user zeppelin
Check which role s are bound to zeppelin
SELECT * FROM system.role_grants WHERE user_name LIKE 'zeppelin'
If so, assume old_role, untie it first
REVOKE old_role FROM zeppelin
Bind new role:
GRANT z_role TO zeppelin
4.2.7 revoke (important)
Our assumption is that profile and quota are bound to roles, and roles are bound to users. If you need to change the user's configuration, there are two methods
- Change a role
- Change the profile/quota of the current role
There is a problem with the second method, because after changing the profile, the user's configuration will not be refreshed directly (it may be a ClickHouse bug, which we are still verifying). At this time, you need to revoke the user's role and hang it up
For the sake of insurance, the following paradigm should be followed:
- revoke z_role from zeppelin
- Modify Z_ profile or quota under role
- grant z_role from zeppelin
Or:
- Modify Z_ profile or quota under role
- revoke z_role from zeppelin
- grant z_role from zeppelin
4.2.8 bind a new Quota
Change to query every 5 seconds
CREATE QUOTA IF NOT EXISTS z_quota_5 FOR INTERVAL 5 second MAX queries 1
Change the original Z_ Bind quota to gc_role
ALTER QUOTA z_quota TO gc_role;
Tie a new quota:
ALTER QUOTA z_quota_5 TO z_role;
Refresh:
revoke z_role from zeppelin; grant z_role to zeppelin;
Check:
SELECT name, apply_to_list FROM system.quotas WHERE name LIKE 'z_quota_5' Query id: dc5b2ece-c792-4585-95d8-8b4275631664 ┌─name──────┬─apply_to_list─┐ │ z_quota_5 │ ['z_role'] │ └───────────┴───────────────┘ 1 rows in set. Elapsed: 0.004 sec.
4.2.9 binding a new Profile
CREATE SETTINGS PROFILE IF NOT EXISTS z_profile_permission_type_0 SETTINGS readonly = 0 READONLY
To change the binding object of the profile, first set Z_ Bind profile to gc_role
ALTER SETTINGS PROFILE z_profile TO gc_role
Add new z_profile_permission_type_0 tied to z_role
ALTER SETTINGS PROFILE z_profile_permission_type_0 TO z_role
Refresh configuration:
revoke z_role from zeppelin; grant z_role to zeppelin;
SELECT name, apply_to_list FROM system.settings_profiles Query id: 51d1c2de-ced7-4558-a164-020a76a53d97 ┌─name────────────────────────┬─apply_to_list─┐ │ readonly │ [] │ │ default │ [] │ │ z_profile │ ['gc_role'] │ │ z_profile_permission_type_0 │ ['z_role'] │ └─────────────────────────────┴───────────────┘
5.SQL driven settings persistence
Local storage / local persistence is the same as the path of the data directory. According to Server Settings | ClickHouse Documentation, the properties created through SQL are stored in access_control_path, the default value is / var/lib/clickhouse/access/
Path to a folder where a ClickHouse server stores user and role configurations created by SQL commands.
Setting location: config.xml
<yandex> ... <access_control_path>/var/lib/clickhouse/access/ledzeppelin/</access_control_path> ... </yandex
View Directory:
use/access/ledzeppelin ╰─➤ ls 094e5b76-1b4d-97a5-43cd-bbb3ccceb688.sql 9c492851-bf60-7753-7c14-c05f4be46b8e.sql 2949a246-848f-fc07-f1b9-a3f7246b31a0.sql f340165d-0ec4-f90e-fb55-74c8d0cfd7ae.sql 394e47fd-b701-d017-6857-0d8744d830b1.sql quotas.list 3d44c8a1-01b3-6f68-eff7-1e20a9080296.sql roles.list 3f447b54-b515-aa93-bacc-1a649aeac19c.sql row_policies.list 674912da-5593-e497-c5d8-d058e916c7a5.sql settings_profiles.list 6c44ad5c-4354-c154-048e-8e293ff427ab.sql users.list
In addition, note that after restarting ClickHouse, ClickHouse starts from access_control_path recovery configuration:
If these files are deleted, the ClickHouse access entities previously created through SQL will disappear after ClickHouse is restarted.
conclusion
This paper starts with ClickHouse user resource isolation and extends it to ClickHouse user management. We realize the business of resource isolation through quotes and Profile; In user management, we use roles and create a new abstraction layer. In this way, we do not need to directly operate permissions when managing users, but give roles to users as a component. Roles can also encapsulate resources. We can deploy these access entities through configuration files or SQL, which is advocated by ClickHouse. Finally, this article only involves the tip of the iceberg of ClickHouse user management. I hope it will be helpful to you.
reference resources
- https://altinity.com/blog/goo...
- https://clickhouse.com/docs/e...
- Quotas: https://clickhouse.com/docs/e...
- Settings Profiles: https://clickhouse.com/docs/e...
- ClickHouse recommendations: https://clickhouse.com/docs/e...