Key words: elastic search, crate.io, sql
Elastic Search has been introduced before. It has simple deployment, powerful search aggregation function and is integrated with other big data frameworks. It is inconvenient that queries need to be submitted through JSON as request Body. Request response is also JSON. As I am used to using SQL, I can't wait to try Crate (crate.io), which is in Elastic Search. It encapsulates the SQL interface. Users can complete the search and statistics through the SQL statements. There are many SQL grammars supported, and they want MySQL very much.
This article records Crate's installation configuration (two-node Crate cluster) and its simple use.
Download and install Crate
Can from https://cdn.crate.io/downloads/releases/nightly/ Download the latest version of crate.
After downloading, extract it to the specified directory.
Configure Crate
The configuration of Crate is very similar to that of Elastic Search, taking the Crate cluster of two nodes as an example.
cd $CRATE_HOME/conf
Edit crate.yml and modify the following parameters:
- cluster.name: lxw1234_crate
- node.name: crate_node_17
- index.number_of_replicas: 2
- path.conf: /home/liuxiaowen/crate-0.54.0/config
- path.data: /home/liuxiaowen/crate-0.54.0/data
- path.work: /home/liuxiaowen/crate-0.54.0/tmp
- path.logs: /home/liuxiaowen/crate-0.54.0/logs
- path.plugins: /home/liuxiaowen/crate-0.54.0/plugins
- network.bind_host: 172.16.212.17
- network.publish_host: 172.16.212.17
- network.host: 172.16.212.17
- gateway.recover_after_nodes: 2
- discovery.zen.minimum_master_nodes: 2
- gateway.expected_nodes: 2
- discovery.zen.ping.timeout: 10s
- discovery.zen.fd.ping_interval: 10s
Edit $CRATE_HOME/bin/crate.in.sh to configure the memory used by the node, depending on the memory of the machine itself, and the maximum memory should not exceed 50% of the physical memory.
CRATE_MIN_MEM=8g
CRATE_MAX_MEM=16g
Configuring JAVA_HOME, I use jdk1.8.0_65 here
Start Crate
On two nodes,
cd CRATEHOME/bin executes. / crate d to start Crate in the background, and then you can see the log name D {cluster. name}. log in the configured path.logs directory.
Use the Crate command line
Like other databases, Crate provides a command line for users to execute SQL queries.
cd $CRATE_HOME/bin
Execute. / crash to enter the command line;
Connect to Crate at c 172.16.212.17:4200 on the Crate command line;
Create table
Create tables from the Crate command line using the following SQL statements:
- CREATE TABLE sitelog (
- cookieid STRING,
- siteid STRING,
- visit_id STRING,
- pv LONG,
- is_return_cookie INTEGER,
- is_bounce_visit INTEGER,
- visit_stay_times INTEGER,
- visit_view_page_cnt INTEGER,
- region STRING,
- city STRING
- );
- cr> show tables;
- +------------+
- | table_name |
- +------------+
- | sitelog |
- +------------+
- SHOW 1 row in set (0.019 sec)
- cr>
Loading data in batches from outside
crate provides a COPY command to load data from external text files into tables, but only supports text in JSON format, such as:
- [liuxiaowen@dev sitelog]$ head sitelog_000005_0_9.json
- {"cookieid" : "DE9C68B401DBE5566A9676","siteid" : "633","visit_id" : "805cdab5-8361-4134-9bbe-7c54771d4dc8","pv" : 1,
- "is_return_cookie" : 0,"is_bounce_visit" : 1,"visit_stay_times" : 0,"visit_view_page_cnt" : 1,"region" : "Jiangsu","city" : "Xuzhou"}
- {"cookieid" : "DE9C68B40422A9566A68F2","siteid" : "633","visit_id" : "7f844323-e0c0-48b4-bc1b-69055ac3c308","pv" : 1,
- "is_return_cookie" : 0,"is_bounce_visit" : 1,"visit_stay_times" : 0,"visit_view_page_cnt" : 1,"region" : "Jiangsu","city" : "Xuzhou"}
- {"cookieid" : "DE9C68B4066B7F566A6F36","siteid" : "633","visit_id" : "045c3a13-41bf-45c4-93ce-7725a00ada5f","pv" : 1,
- "is_return_cookie" : 0,"is_bounce_visit" : 1,"visit_stay_times" : 0,"visit_view_page_cnt" : 1,"region" : "Jiangsu","city" : "Xuzhou"}
The k in the JSON object needs the same field name as the table.
On the Crate command line, load data using the COPY command:
Loading speed is still very fast.
SQL query
The supported SQL syntax can be viewed from Crate's official website: https://crate.io/docs/reference/sql/dql.html
It is worth noting that when Crate does COUNT DISTINCT query, it finds out the true number of the duplicated data without error, but the response time of the query is slower, which needs to be studied.
Crate's monitoring interface
Crate provides a more dazzling monitoring interface, which is very useful. After the Crate cluster starts, enter http://172.16.212.102:4200/admin/into the monitoring interface in the browser:
OverView page: Cluster overall health and load status.
Tables page: All Tables and Schema s in Crate.
Cluster page: The list of nodes in the Crate cluster and the health status of each node.
Crate's shortcomings
At present, it is only a simple installation and trial, and several shortcomings have been found:
- Subqueries are not supported;
- Logical judgment grammars such as CASE WHEN and IF ELSE are not supported, especially in aggregation functions.
- The built-in Elastic Search version is too low;
- There is no integration with other big data components.
But its query performance is still very good, the key is the convenience of SQL.
You can pay attention to it. lxw's Big Data Field Or Join the mailing list Receive notification emails of blog updates at any time.
If you think this blog is helpful to you, please Sponsor author .
Reprinted please indicate: lxw's Big Data Field » SQL for Elastic Search-Crate.io Installation Configuration and Experience
</article>