Submit the sql script file with Flink sql client

Title: submit sql script file with flynk sql client
Date: twenty 21-10-22 22:11:34
Tag: [Flink, SQL client]
Category: flink

We know that sql-client.sh can provide us with an SQL interactive interface, so that we can see the execution results or interactive query table results without executing an SQL.

In fact, we can also submit sql scripts through sql client. Let's take a look.

. / bin/sql-client.sh -h corresponding help parameters:

(base) [chenzuoli@chenzuolis-MacBook /Volumes/chenzuoli/Data/docker_img/flink-1.12.1]$./bin/sql-client.sh -h
./sql-client [MODE] [OPTIONS]

The following options are available:

Mode "embedded" submits Flink jobs from the local machine.

Syntax: embedded [OPTIONS]
"embedded" mode options:
-d,--defaults <environment file> The environment properties with which
every new session is initialized.
Properties might be overwritten by
session properties.
-e,--environment <environment file> The environment properties to be
imported into the session. It might
overwrite default environment
properties.
-h,--help Show the help message with
descriptions of all options.
-hist,--history <History file path> The file which you want to save the
command history into. If not
specified, we will auto-generate one
under your user's home directory.
-j,--jar <JAR file> A JAR file to be imported into the
session. The file might contain
user-defined classes needed for the
execution of statements such as
functions, table sources, or sinks.
Can be used multiple times.
-l,--library <JAR directory> A JAR file directory with which every
new session is initialized. The files
might contain user-defined classes
needed for the execution of
statements such as functions, table
sources, or sinks. Can be used
multiple times.
-pyarch,--pyArchives <arg> Add python archive files for job. The
archive files will be extracted to
the working directory of python UDF
worker. Currently only zip-format is
supported. For each archive file, a
target directory be specified. If the
target directory name is specified,
the archive file will be extracted to
a name can directory with the
specified name. Otherwise, the
archive file will be extracted to a
directory with the same name of the
archive file. The files uploaded via
this option are accessible via
relative path. '#' could be used as
the separator of the archive file
path and the target directory name.
Comma (',') could be used as the
separator to specify multiple archive
files. This option can be used to
upload the virtual environment, the
data files used in Python UDF (e.g.:
--pyArchives
file:///tmp/py37.zip,file:///tmp/data
.zip#data --pyExecutable
py37.zip/py37/bin/python). The data
files could be accessed in Python
UDF, e.g.: f = open('data/data.txt',
'r').
-pyexec,--pyExecutable <arg> Specify the path of the python
interpreter used to execute the
python UDF worker (e.g.:
--pyExecutable
/usr/local/bin/python3). The python
UDF worker depends on Python 3.5+,
Apache Beam (version == 2.23.0), Pip
(version >= 7.1.0) and SetupTools
(version >= 37.0.0). Please ensure
that the specified environment meets
the above requirements.
-pyfs,--pyFiles <pythonFiles> Attach custom python files for job.
These files will be added to the
PYTHONPATH of both the local client
and the remote python UDF worker. The
standard python resource file
suffixes such as .py/.egg/.zip or
directory are all supported. Comma
(',') could be used as the separator
to specify multiple files (e.g.:
--pyFiles
file:///tmp/myresource.zip,hdfs:///$n
amenode_address/myresource2.zip).
-pyreq,--pyRequirements <arg> Specify a requirements.txt file which
defines the third-party dependencies.
These dependencies will be installed
and added to the PYTHONPATH of the
python UDF worker. A directory which
contains the installation packages of
these dependencies could be specified
optionally. Use '#' as the separator
if the optional parameter exists
(e.g.: --pyRequirements
file:///tmp/requirements.txt#file:///
tmp/cached_dir).
-s,--session <session identifier> The identifier for a session.
'default' is the default identifier.
-u,--update <SQL update statement> Experimental (for testing only!):
Instructs the SQL Client to
immediately execute the given update
statement after starting up. The
process is shut down after the
statement has been submitted to the
cluster and returns an appropriate
return code. Currently, this feature
is only supported for INSERT INTO
statements that declare the target
sink table.

The first parameter - d can specify some parameter configurations on the environment.

Next, let's look at the conf/sql-client-defaults.yaml file, which is actually the corresponding configuration file.
To create a data file for testing:

mkdir sql_test
vim sql_test/book-store.csv

Guns, germs and steel,eighteen,Sociology
APP UI The way of design,20,Design
 Pass economy,22,economics
 Real business opportunities of blockchain,21,economics

Let's create our own configuration file, read the csv file, select it, and create a new file conf/book-store.yaml

vim conf/book-store.yaml

tables:
  - name: BookStore
    type: source-table
    update-mode: append
    connector:
      type: filesystem
      path: "/Users/zhaoqin/temp/202004/26/book-store.csv"
    format:
      type: csv
      fields:
        - name: BookName
          type: VARCHAR
        - name: BookAmount
          type: INT
        - name: BookCatalog
          type: VARCHAR
      line-delimiter: "\n"
      comment-prefix: ","
    schema:
      - name: BookName
        type: VARCHAR
      - name: BookAmount
        type: INT
      - name: BookCatalog
        type: VARCHAR
  - name: MyBookView
    type: view
    query: "SELECT BookCatalog, SUM(BookAmount) AS Amount FROM BookStore GROUP BY BookCatalog"


execution:
  planner: blink                    # optional: either 'blink' (default) or 'old'
  type: streaming                   # required: execution mode either 'batch' or 'streaming'
  result-mode: table                # required: either 'table' or 'changelog'
  max-table-result-rows: 1000000    # optional: maximum number of maintained rows in
                                    #   'table' mode (1000000 by default, smaller 1 means unlimited)
  time-characteristic: event-time   # optional: 'processing-time' or 'event-time' (default)
  parallelism: 1                    # optional: Flink's parallelism (1 by default)
  periodic-watermarks-interval: 200 # optional: interval for periodic watermarks (200 ms by default)
  max-parallelism: 16               # optional: Flink's maximum parallelism (128 by default)
  min-idle-state-retention: 0       # optional: table program's minimum idle state time
  max-idle-state-retention: 0       # optional: table program's maximum idle state time

                                    #   (default database of the current catalog by default)
  restart-strategy:                 # optional: restart strategy
    type: fallback                  #   "fallback" to global restart strategy by default

# Configuration options for adjusting and tuning table programs.

# A full list of options and their default values can be found
# on the dedicated "Configuration" page.
configuration:
  table.optimizer.join-reorder-enabled: true
  table.exec.spill-compression.enabled: true
  table.exec.spill-compression.block-size: 128kb

# Properties that describe the cluster to which table programs are submitted to.

deployment:
  response-timeout: 5000

Start a session by specifying the configuration file and execute the corresponding source table and sink table.
There are several points to note about the book-store.yaml configuration file:
a. tables.type is equal to source table, indicating that this is the configuration information of the data source;
b. tables.connector describes the detailed data source information. Path is the full path of the book-store.csv file. The type of connector is specified as file system, which is consistent with the connector parameter specified when we write sql;
c. tables.format describes the contents of the file, and the type is csv format;
d. tables.schema describes the table structure of the data source table;
ed. type is view, which means MyBookView is a view (refer to the view concept of the database);

Let's take a look at the test results:

./bin/start-cluster.sh
./bin/sql-client.sh embedded -d conf/book-store.yaml

After entering the SQL client SQL interactive interface, you can see that the environment has been configured,

Flink SQL> show tables;
BookStore
MyBookView

Flink SQL> desc BookStore;
+-------------+--------+------+-----+--------+-----------+
|        name |   type | null | key | extras | watermark |
+-------------+--------+------+-----+--------+-----------+
|    BookName | STRING | true |     |        |           |
|  BookAmount |    INT | true |     |        |           |
| BookCatalog | STRING | true |     |        |           |
+-------------+--------+------+-----+--------+-----------+
3 rows in set

Flink SQL> desc MyBookView
> ;
+-------------+--------+------+-----+--------+-----------+
|        name |   type | null | key | extras | watermark |
+-------------+--------+------+-----+--------+-----------+
| BookCatalog | STRING | true |     |        |           |
|      Amount |    INT | true |     |        |           |
+-------------+--------+------+-----+--------+-----------+
2 rows in set

You can see that the two tables have been created. Let's take a look at the data:

select * from MyBookView;

BookCatalog                    Amount
        Sociology                        18
        Design                        20
        economics                        forty-three

Right? ok. If you write sink table in yaml file, you will directly submit a flick job to the flick cluster. Does it achieve the effect of submitting the flick SQL script file.

Well, that's it today. I was surprised to find that they submitted to k8s through SQL client on a streaming data platform developed by the data platform group of inverted company these days. Because I have translated an article about the official flink submit job, which mentioned four submission methods:

local cluster
application mode
per job mode
session mode
I think there are only four. In fact, if you look carefully, the way sql client submits sql is similar to the way of session. During the whole session startup process, you can execute sql statements without listening. If the session is closed, the task is closed.

ok, see you next time.

flink, yyds.

There is a road to the mountain of books. Diligence is the path. There is no end to learning. It is hard to make a boat.

Love official account of my WeChat official account. I love sharing knowledge and pets. So I did these 2 public numbers:

Friends who like Pets can pay attention to: [electric Barker Pets]

Study together and make progress together.

Posted by Drannon on Fri, 22 Oct 2021 07:39:30 -0700

Programmer Group

Submit the sql script file with Flink sql client

Title: submit sql script file with flynk sql client
Date: twenty 21-10-22 22:11:34
Tag: [Flink, SQL client]
Category: flink

Hot Keywords

Programmer Group

Submit the sql script file with Flink sql client

Title: submit sql script file with flynk sql client Date: twenty 21-10-22 22:11:34 Tag: [Flink, SQL client] Category: flink

Hot Keywords

Title: submit sql script file with flynk sql client
Date: twenty 21-10-22 22:11:34
Tag: [Flink, SQL client]
Category: flink