[software engineering practice] Hive research - Blog9
Research content introduction
I am responsible for converting the query block QB into a logical query plan (OP Tree) The following code comes from apaceh-hive-3.1.2-src/ql/src/java/org/apache/hadoop/hive/ql/plan, which is my analysis object code. In the previous Hive research ...
Posted by lilRachie on Sun, 28 Nov 2021 04:10:18 -0800
Pay attention to the official account: big data technology, reply to "information" and receive 1000G information. This article started on my personal blog: Spark SQL knowledge points and actual combat Spark SQL overview
1. What is Spark SQL
Spark SQL is a spark module used by spark for structured data processing. Unlike the basi ...
Posted by lovasco on Fri, 26 Nov 2021 16:41:48 -0800
1. Experimental Purpose
(1) Understand Hive's role as a data warehouse in the Hadoop architecture. (2) Skilled in using commonly used HiveQL.
2. Experimental Platform
Operating system: Ubuntu 18.04 (or Ubuntu 16.04);Hadoop version: 3.1.3;Hive version: 3.1.2;JDK version: 1.8.
3. Data Sets
Provided by Hive Programming Guide ...
Posted by abcd1234 on Thu, 25 Nov 2021 09:12:34 -0800
Actual combat of e-commerce offline warehouse project (Part 2)
E-commerce analysis - core transactions
1, Business requirements
Select indicators: order quantity, commodity quantity and payment amount, and analyze these indicators by sales region and commodity type.
2, Business database table structure
1. Relationship between databas ...
Posted by pengu on Sat, 20 Nov 2021 03:23:31 -0800
1. Load data into the table (load)
hive> load data [local] inpath 'Data path' [overwrite] into table \
student [partition (partcol1=val1,...)];
(1) load data: indicates loading data (2) Local: indicates loading data from local to hive table; Otherwise, load data from HDFS to hive table (3) inpath: indicates the pa ...
Posted by PseudoEvolution on Sun, 31 Oct 2021 14:38:44 -0700
Hive environment construction
Hive engine introduction
Hive engine includes: default MR, tez, spark Hive on Spark: hive not only stores metadata, but also is responsible for SQL parsing and optimization. The syntax is HQL syntax. The execution engine has become Spark, and Spark is responsible for RDD execution. Spark on hive: hive is only ...
Posted by wha??? on Thu, 21 Oct 2021 19:46:17 -0700
Recently, we are using Sqoop+Jekins to realize the data transfer between mysql and hive database.
It mainly uses the Import command of sqoop to import mysql data into hive, and uses the export command to export hive data to mysql.
Jekins plays a regular role, executing sh scripts regularly and synchronizing once a day.
Posted by reethu on Mon, 18 Oct 2021 15:17:56 -0700
Data skew caused by Shuffle
When data skew occurs during Shuffle, we generally follow the troubleshooting steps
① Check the WEB-UI page to check the execution of tasks in the Stage of each Job, and whether there is an obvious situation that the execution time is too long
② If the task reports an error, check the corresponding log excepti ...
Posted by dstantdog3 on Sat, 16 Oct 2021 10:15:13 -0700
1, Hive's architecture design and SQL statement review summary
1.1. Data warehouse
The definition put forward by Bill Inmon, the father of data warehouse, in his book "Building the Data Warehouse" published in 1991 is widely accepted. (DW for short) (DM)
Data Warehouse is a Subject Oriented, Integrated, non volatile and Time V ...
Posted by Bastern on Sun, 10 Oct 2021 19:39:12 -0700