Title: submit sql script file with flynk sql client
Date: twenty 21-10-22 22:11:34
Tag: [Flink, SQL client]
Category: flink
We know that sql-client.sh can provide us with an SQL interactive interface, so that we can see the execution results or interactive query table results without executing an SQL.
In fact, we can also submit sql scripts through sql client. Let's take a look.
. / bin/sql-client.sh -h corresponding help parameters:
(base) [chenzuoli@chenzuolis-MacBook /Volumes/chenzuoli/Data/docker_img/flink-1.12.1]$./bin/sql-client.sh -h ./sql-client [MODE] [OPTIONS] The following options are available: Mode "embedded" submits Flink jobs from the local machine. Syntax: embedded [OPTIONS] "embedded" mode options: -d,--defaults <environment file> The environment properties with which every new session is initialized. Properties might be overwritten by session properties. -e,--environment <environment file> The environment properties to be imported into the session. It might overwrite default environment properties. -h,--help Show the help message with descriptions of all options. -hist,--history <History file path> The file which you want to save the command history into. If not specified, we will auto-generate one under your user's home directory. -j,--jar <JAR file> A JAR file to be imported into the session. The file might contain user-defined classes needed for the execution of statements such as functions, table sources, or sinks. Can be used multiple times. -l,--library <JAR directory> A JAR file directory with which every new session is initialized. The files might contain user-defined classes needed for the execution of statements such as functions, table sources, or sinks. Can be used multiple times. -pyarch,--pyArchives <arg> Add python archive files for job. The archive files will be extracted to the working directory of python UDF worker. Currently only zip-format is supported. For each archive file, a target directory be specified. If the target directory name is specified, the archive file will be extracted to a name can directory with the specified name. Otherwise, the archive file will be extracted to a directory with the same name of the archive file. The files uploaded via this option are accessible via relative path. '#' could be used as the separator of the archive file path and the target directory name. Comma (',') could be used as the separator to specify multiple archive files. This option can be used to upload the virtual environment, the data files used in Python UDF (e.g.: --pyArchives file:///tmp/py37.zip,file:///tmp/data .zip#data --pyExecutable py37.zip/py37/bin/python). The data files could be accessed in Python UDF, e.g.: f = open('data/data.txt', 'r'). -pyexec,--pyExecutable <arg> Specify the path of the python interpreter used to execute the python UDF worker (e.g.: --pyExecutable /usr/local/bin/python3). The python UDF worker depends on Python 3.5+, Apache Beam (version == 2.23.0), Pip (version >= 7.1.0) and SetupTools (version >= 37.0.0). Please ensure that the specified environment meets the above requirements. -pyfs,--pyFiles <pythonFiles> Attach custom python files for job. These files will be added to the PYTHONPATH of both the local client and the remote python UDF worker. The standard python resource file suffixes such as .py/.egg/.zip or directory are all supported. Comma (',') could be used as the separator to specify multiple files (e.g.: --pyFiles file:///tmp/myresource.zip,hdfs:///$n amenode_address/myresource2.zip). -pyreq,--pyRequirements <arg> Specify a requirements.txt file which defines the third-party dependencies. These dependencies will be installed and added to the PYTHONPATH of the python UDF worker. A directory which contains the installation packages of these dependencies could be specified optionally. Use '#' as the separator if the optional parameter exists (e.g.: --pyRequirements file:///tmp/requirements.txt#file:/// tmp/cached_dir). -s,--session <session identifier> The identifier for a session. 'default' is the default identifier. -u,--update <SQL update statement> Experimental (for testing only!): Instructs the SQL Client to immediately execute the given update statement after starting up. The process is shut down after the statement has been submitted to the cluster and returns an appropriate return code. Currently, this feature is only supported for INSERT INTO statements that declare the target sink table.
The first parameter - d can specify some parameter configurations on the environment.
Next, let's look at the conf/sql-client-defaults.yaml file, which is actually the corresponding configuration file.
To create a data file for testing:
mkdir sql_test vim sql_test/book-store.csv Guns, germs and steel,eighteen,Sociology APP UI The way of design,20,Design Pass economy,22,economics Real business opportunities of blockchain,21,economics
Let's create our own configuration file, read the csv file, select it, and create a new file conf/book-store.yaml
vim conf/book-store.yaml tables: - name: BookStore type: source-table update-mode: append connector: type: filesystem path: "/Users/zhaoqin/temp/202004/26/book-store.csv" format: type: csv fields: - name: BookName type: VARCHAR - name: BookAmount type: INT - name: BookCatalog type: VARCHAR line-delimiter: "\n" comment-prefix: "," schema: - name: BookName type: VARCHAR - name: BookAmount type: INT - name: BookCatalog type: VARCHAR - name: MyBookView type: view query: "SELECT BookCatalog, SUM(BookAmount) AS Amount FROM BookStore GROUP BY BookCatalog" execution: planner: blink # optional: either 'blink' (default) or 'old' type: streaming # required: execution mode either 'batch' or 'streaming' result-mode: table # required: either 'table' or 'changelog' max-table-result-rows: 1000000 # optional: maximum number of maintained rows in # 'table' mode (1000000 by default, smaller 1 means unlimited) time-characteristic: event-time # optional: 'processing-time' or 'event-time' (default) parallelism: 1 # optional: Flink's parallelism (1 by default) periodic-watermarks-interval: 200 # optional: interval for periodic watermarks (200 ms by default) max-parallelism: 16 # optional: Flink's maximum parallelism (128 by default) min-idle-state-retention: 0 # optional: table program's minimum idle state time max-idle-state-retention: 0 # optional: table program's maximum idle state time # (default database of the current catalog by default) restart-strategy: # optional: restart strategy type: fallback # "fallback" to global restart strategy by default # Configuration options for adjusting and tuning table programs. # A full list of options and their default values can be found # on the dedicated "Configuration" page. configuration: table.optimizer.join-reorder-enabled: true table.exec.spill-compression.enabled: true table.exec.spill-compression.block-size: 128kb # Properties that describe the cluster to which table programs are submitted to. deployment: response-timeout: 5000
Start a session by specifying the configuration file and execute the corresponding source table and sink table.
There are several points to note about the book-store.yaml configuration file:
a. tables.type is equal to source table, indicating that this is the configuration information of the data source;
b. tables.connector describes the detailed data source information. Path is the full path of the book-store.csv file. The type of connector is specified as file system, which is consistent with the connector parameter specified when we write sql;
c. tables.format describes the contents of the file, and the type is csv format;
d. tables.schema describes the table structure of the data source table;
ed. type is view, which means MyBookView is a view (refer to the view concept of the database);
Let's take a look at the test results:
./bin/start-cluster.sh ./bin/sql-client.sh embedded -d conf/book-store.yaml
After entering the SQL client SQL interactive interface, you can see that the environment has been configured,
Flink SQL> show tables; BookStore MyBookView Flink SQL> desc BookStore; +-------------+--------+------+-----+--------+-----------+ | name | type | null | key | extras | watermark | +-------------+--------+------+-----+--------+-----------+ | BookName | STRING | true | | | | | BookAmount | INT | true | | | | | BookCatalog | STRING | true | | | | +-------------+--------+------+-----+--------+-----------+ 3 rows in set Flink SQL> desc MyBookView > ; +-------------+--------+------+-----+--------+-----------+ | name | type | null | key | extras | watermark | +-------------+--------+------+-----+--------+-----------+ | BookCatalog | STRING | true | | | | | Amount | INT | true | | | | +-------------+--------+------+-----+--------+-----------+ 2 rows in set
You can see that the two tables have been created. Let's take a look at the data:
select * from MyBookView; BookCatalog Amount Sociology 18 Design 20 economics forty-three
Right? ok. If you write sink table in yaml file, you will directly submit a flick job to the flick cluster. Does it achieve the effect of submitting the flick SQL script file.
Well, that's it today. I was surprised to find that they submitted to k8s through SQL client on a streaming data platform developed by the data platform group of inverted company these days. Because I have translated an article about the official flink submit job, which mentioned four submission methods:
- local cluster
- application mode
- per job mode
- session mode
I think there are only four. In fact, if you look carefully, the way sql client submits sql is similar to the way of session. During the whole session startup process, you can execute sql statements without listening. If the session is closed, the task is closed.
ok, see you next time.
flink, yyds.
There is a road to the mountain of books. Diligence is the path. There is no end to learning. It is hard to make a boat.
Love official account of my WeChat official account. I love sharing knowledge and pets. So I did these 2 public numbers:
Friends who like Pets can pay attention to: [electric Barker Pets]
Study together and make progress together.