Data type of clickhouse

Keywords: Big Data clickhouse

data type

In order to improve performance, ClickHouse provides composite data types compared with traditional databases. ClickHouse's Upadate and Delete are implemented by the Alter variant.

1 integer

Integer includes signed integer and unsigned integer

1.1 signed integer

type Range byte
int8 [-128,127] 1
int16 [-32768 : 32767] 2
int32 [-2147483648 : 2147483647] 3
int64 [-9223372036854775808 : 9223372036854775807] 4

1.2 unsigned integer

type Range byte
UInt8 [0 : 255] 1
UInt6 [0 : 65535] 2
UInt32 [0 : 4294967295] 3
UInt64 [0 : 18446744073709551615] 4

2 floating point

type Effective precision (digits) byte
Float32 7 4
Float64 16 8

Note: floating point number calculation has errors, so use it with caution

gcw1 :) select 1-0.9 ;
SELECT 1 - 0.9
┌───────minus(1, 0.9)─┐
│ 0.09999999999999998 │
└─────────────────────┘
1 rows in set. Elapsed: 0.002 sec. 
gcw1 :) 

3 Decimal

If higher precision is required, you can choose the Decimal type

Format: Decimal (P,S)

P: Represents the precision, determines the total number of digits (positive part + decimal part), and the value range is 0-38

S: Represents the scale, determines the decimal places, and the value range is 0-P

ClickHouse provides three abbreviations for Decimal: Decimal32, Decimal64 and Decimal128

3.1 the accuracy of addition and subtraction is greater

If the two numbers are added and subtracted, the precision of the result is the higher

gcw1 :) SELECT toDecimal32(1, 3) + toDecimal32(2, 2) ;

SELECT toDecimal32(1, 3) + toDecimal32(2, 2)
┌─plus(toDecimal32(1, 3), toDecimal32(2, 2))─┐
│                                      3.000 │
└────────────────────────────────────────────┘

3.2 sum of multiplication accuracy

gcw1 :) SELECT toDecimal32(1, 3) * toDecimal32(2, 2) ;

SELECT toDecimal32(1, 3) * toDecimal32(2, 2)
┌─multiply(toDecimal32(1, 3), toDecimal32(2, 2))─┐
│                                        2.00000 │
└────────────────────────────────────────────────┘

3.3 division precision takes the divisor

gcw1 :) SELECT toDecimal32(1, 3) / toDecimal32(2, 2) ;

SELECT toDecimal32(1, 3) / toDecimal32(2, 2)
┌─divide(toDecimal32(1, 3), toDecimal32(2, 2))─┐
│                                        0.500 │
└──────────────────────────────────────────────┘

4 string

String string string can be any length

It can contain any byte set, including empty bytes.

FixedString(N) a string of fixed length N, n must be a strictly positive natural number.

When the server reads a string with a length less than N, the length of N bytes is reached by adding empty bytes at the end of the string.

When the server reads a string with a length greater than N, an error message will be returned.

gcw1 :) SELECT toFixedString('zzzz', 5),LENGTH(toFixedString('zzzz', 5)) AS LENGTH;

SELECT 
    toFixedString('zzzz', 5),
    LENGTH(toFixedString('zzzz', 5)) AS LENGTH
    
┌─toFixedString('zzzz', 5)─┬─LENGTH─┐
│ zzzz                     │      5 │
└──────────────────────────┴────────┘

5 UUID

ClickHouse directly makes UUID, which acts as the primary key in the traditional database, a data type

CREATE TABLE UUID_TEST
(
`s1` UUID,
`s2` String
)
ENGINE = TinyLog;
insert into UUID_TEST select generateUUIDv4(),'s1';
insert into UUID_TEST(s2)values('s2');

Query display, auto supplement 0

gcw1 :) select * from UUID_TEST;

SELECT *
FROM UUID_TEST

┌───────────────────────────────────s1─┬─s2─┐
│ 7da9a8b3-e655-4218-aac3-e66ed758e094 │ s1 │
│ 00000000-0000-0000-0000-000000000000 │ s2 │
└──────────────────────────────────────┴────┘

6 enumeration type

Include Enum8 and Enum16 types. Enum saves the correspondence of 'string'= integer.

Enum8 is described with 'String'= Int8 pairs. Enum16 is described with 'String'= Int16 pairs.

CREATE TABLE t_enum
(
x Enum8('pc' = 1, 'mobile' = 2, 'pad' = 3)
)
ENGINE = TinyLog

When inserting an enumeration table, you can only insert the values listed in the definition, such as PC, mobile and pad. Inserting unknown values will throw exceptions

gcw1 :) INSERT INTO t_enum VALUES ('pc'), ('pc'), ('pad');

INSERT INTO t_enum VALUES

Ok.

Insert unknown value

gcw1 :) INSERT INTO t_enum VALUES ('pc1');

INSERT INTO t_enum VALUES

Exception on client:
Code: 36. DB::Exception: Unknown element 'pc1' for type Enum8('pc' = 1, 'mobile' = 2, 'pad' = 3)

Connecting to database default at localhost:19000 as user default.
Connected to ClickHouse server version 20.5.4 revision 54435.

7 array

Array(T): an array of elements of type T.

T can be any type, including array type.

However, multi-dimensional arrays are not recommended. ClickHouse has limited support for multi-dimensional arrays.

For example, you cannot store multidimensional arrays in the MergeTree table.

You can use the array function to create an array:

  • array(T)
  • Square brackets [] can also be used

ClickHouse can automatically infer data types

gcw1 :) select [1,2,3] as arr, toTypeName(arr);

SELECT 
    [1, 2, 3] AS arr,
    toTypeName(arr)

┌─arr─────┬─toTypeName([1, 2, 3])─┐
│ [1,2,3] │ Array(UInt8)          │
└─────────┴───────────────────────┘

8 tuples

Tuple(T1, T2,...): a tuple in which each element has a separate type

gcw1 :) SELECT tuple(1,'a') AS x, (1,2,3) as y, toTypeName(x), toTypeName(y) ;

SELECT 
    (1, 'a') AS x,
    (1, 2, 3) AS y,
    toTypeName(x),
    toTypeName(y)

┌─x───────┬─y───────┬─toTypeName(tuple(1, 'a'))─┬─toTypeName(tuple(1, 2, 3))─┐
│ (1,'a') │ (1,2,3) │ Tuple(UInt8, String)      │ Tuple(UInt8, UInt8, UInt8) │
└─────────┴─────────┴───────────────────────────┴────────────────────────────┘

9 Date,DateTime

Time types are divided into DateTime, DateTime64 and Date. It should be noted that ClickHouse currently has no timestamp type, that is, the highest precision of time type is seconds. Therefore, if you need to process time with millisecond and microsecond precision, you can only use UInt type

Date type

It is stored in two bytes, indicating the date value from 1970-01-01 (unsigned) to the current date. There is no time zone information stored in the date

CREATE TABLE t_date (x date) ENGINE=TinyLog;
INSERT INTO t_date VALUES('2021-11-06');
SELECT x,toTypeName(x) FROM t_date;

DateTime

The Unix timestamp is stored in four bytes (unsigned). It is allowed to store values in the same range as the date type. The minimum value is 0000-00-00 00:00:00. The timestamp type value is accurate to seconds (no leap seconds). The time zone uses the system time zone when the client or server is started

CREATE TABLE t_datetime(`timestamp` DateTime) ENGINE = TinyLog;
INSERT INTO t_datetime Values('2020-11-06 00:00:00');
SELECT * FROM t_datetime;
┌───────────timestamp─┐
│ 2020-11-06 00:00:00 │
└─────────────────────┘
## DateTime type is time zone sensitive
SELECT 
    toDateTime(timestamp, 'Asia/Shanghai') AS column,
    toTypeName(column) AS x
FROM t_datetime

┌──────────────column─┬─x─────────────────────────┐
│ 2020-11-06 00:00:00 │ DateTime('Asia/Shanghai') │
└─────────────────────┴───────────────────────────┘

10 boolean type

There is no separate type to store Booleans. You can use the UInt8 type with values limited to 0 or 1

Posted by dmayo2 on Sat, 06 Nov 2021 12:52:23 -0700