[software engineering practice] Hive research - Blog9

[software engineering practice] Hive research - Blog9 2021SC@SDUSC Research content introduction I am responsible for converting the query block QB into a logical query plan (OP Tree) The following code comes from apaceh-hive-3.1.2-src/ql/src/java/org/apache/hadoop/hive/ql/plan, which is my analysis object code. In the previous Hive research ...

Posted by lilRachie on Sun, 28 Nov 2021 04:10:18 -0800

Spark SQL knowledge points and actual combat

Pay attention to the official account: big data technology, reply to "information" and receive 1000G information. This article started on my personal blog: Spark SQL knowledge points and actual combat Spark SQL overview 1. What is Spark SQL Spark SQL is a spark module used by spark for structured data processing. Unlike the basi ...

Posted by lovasco on Fri, 26 Nov 2021 16:41:48 -0800

Experiment 6 is familiar with Hive's basic operations

1. Experimental Purpose (1) Understand Hive's role as a data warehouse in the Hadoop architecture. (2) Skilled in using commonly used HiveQL. 2. Experimental Platform Operating system: Ubuntu 18.04 (or Ubuntu 16.04);Hadoop version: 3.1.3;Hive version: 3.1.2;JDK version: 1.8. 3. Data Sets Dead work: Provided by Hive Programming Guide ...

Posted by abcd1234 on Thu, 25 Nov 2021 09:12:34 -0800

Actual combat of e-commerce offline warehouse project

Actual combat of e-commerce offline warehouse project (Part 2) E-commerce analysis - core transactions 1, Business requirements Select indicators: order quantity, commodity quantity and payment amount, and analyze these indicators by sales region and commodity type. 2, Business database table structure 1. Relationship between databas ...

Posted by pengu on Sat, 20 Nov 2021 03:23:31 -0800

[Hive] Chapter 5 DML data operation

Data import 1. Load data into the table (load) 1) Grammar hive> load data [local] inpath 'Data path' [overwrite] into table \ student [partition (partcol1=val1,...)]; (1) load data: indicates loading data (2) Local: indicates loading data from local to hive table; Otherwise, load data from HDFS to hive table (3) inpath: indicates the pa ...

Posted by PseudoEvolution on Sun, 31 Oct 2021 14:38:44 -0700

Construction of data warehouse environment

Hive environment construction Hive engine introduction Hive engine includes: default MR, tez, spark Hive on Spark: hive not only stores metadata, but also is responsible for SQL parsing and optimization. The syntax is HQL syntax. The execution engine has become Spark, and Spark is responsible for RDD execution. Spark on hive: hive is only ...

Posted by wha??? on Thu, 21 Oct 2021 19:46:17 -0700

Sqoop+Jekins realize the mutual transmission between Mysql and Hive database

1, Foreword Recently, we are using Sqoop+Jekins to realize the data transfer between mysql and hive database. It mainly uses the Import command of sqoop to import mysql data into hive, and uses the export command to export hive data to mysql. Jekins plays a regular role, executing sh scripts regularly and synchronizing once a day. Relevant ...

Posted by reethu on Mon, 18 Oct 2021 15:17:56 -0700

Solution of Spark data skew

Data skew caused by Shuffle When data skew occurs during Shuffle, we generally follow the troubleshooting steps ① Check the WEB-UI page to check the execution of tasks in the Stage of each Job, and whether there is an obvious situation that the execution time is too long ② If the task reports an error, check the corresponding log excepti ...

Posted by dstantdog3 on Sat, 16 Oct 2021 10:15:13 -0700

Apache Hive installation and deployment

​ Hive installation deployment 1, Single node installation Apache hive 1. Environmental dependence CentOS7, JDK8, start Apache Hadoop, start Mysql5.7 (MySql configures remote connection permissions) 2. Software download Download Apache hive version 2.3.6 Download addresses of various versions of Apache Hive http://archive.apache.org/dist ...

Posted by volleytotal.ch on Fri, 15 Oct 2021 19:30:14 -0700

Hive's architecture design and SQL statement review summary

1, Hive's architecture design and SQL statement review summary 1.1. Data warehouse The definition put forward by Bill Inmon, the father of data warehouse, in his book "Building the Data Warehouse" published in 1991 is widely accepted. (DW for short) (DM) Data Warehouse is a Subject Oriented, Integrated, non volatile and Time V ...

Posted by Bastern on Sun, 10 Oct 2021 19:39:12 -0700