Hive is finally waiting, Flink
When did Apache Spark start to support the integrated Hive feature? I believe that as long as readers have used Spark, they will say that this is a long time ago.
When does Apache Flink support integration with Hive? Readers may be confused. Haven't they supported it yet, haven't they used it? Or the latest version only supports it, but the fu ...
Posted by elfeste on Fri, 27 Mar 2020 03:09:45 -0700
hive-1.2.1 Installation and Simple Use
Hive can only be installed on one node
1. Upload tar package
2. Decompression
tar -zxvf hive-1.2.1.tar.gz -C /apps/
3. Install mysql database (switch to root user) (there are no limitations on where to install, only nodes that can connect to the hadoop cluster)
4. Configure hive
(a) Configure the HIVE_HOME environment variable vi conf/hive-e ...
Posted by gyash on Thu, 26 Mar 2020 21:12:15 -0700
Flume installation deployment and cases
1. Installation address
1) Flume official website address
http://flume.apache.org/
2) Document view address
http://flume.apache.org/FlumeUserGuide.html
3) Download address
http://archive.apache.org/dist/flume/
2. Installation and deployment
1) Upload apache-flume-1.7.0-bin.tar.gz to the / opt/softwa ...
Posted by The14thGOD on Fri, 13 Mar 2020 19:40:57 -0700
hive installation (incomplete)
1. Three installation methods of hive
Three installation methods of Hive are introduced on Hive official website, corresponding to different application scenarios. In the final analysis, the storage location of metadata is different.
Embedded mode (metadata is saved in the embedded derby database, al ...
Posted by akreation on Thu, 27 Feb 2020 02:23:22 -0800
RDD common operations in pyspark
preparation:
import pyspark
from pyspark import SparkContext
from pyspark import SparkConf
conf=SparkConf().setAppName("lg").setMaster('local[4]') #local[4] means to run 4 kernels locally
sc=SparkContext.getOrCreate(conf)
1. Parallel and collect
The parallelize function converts the list obj ...
Posted by moomsdad on Fri, 21 Feb 2020 02:13:19 -0800
centos7 hadoop+hive installation
Prepare four virtual machines
Virtual Machine Installation
1.Create a new virtual machine
2.Click on Typical Installation(Recommend)
3.Select Chinese and click on your own partition
# Partition Configuration (JD Usage)
/boot 200M
swap 512M # Not enough native memory, swap
/ # root directory
4.Configu ...
Posted by Derek on Fri, 31 Jan 2020 20:33:17 -0800
The trap of Broadcast Join in SparkSql 2.x (hint does not work)
Problem description
Use hint to specify the broadcast table, but cannot perform the specified broadcast;
preparation in advance
hive> select * from test.tmp_demo_small;
OK
tmp_demo_small.pas_phone tmp_demo_small.age
156 20
157 22
158 15
hive> analyze table test.tmp_demo_small compute statis ...
Posted by cbullock on Fri, 17 Jan 2020 06:02:22 -0800
Dynamic partition and static partition of hive
Dynamic partition and static partition of hive
1.1 static partition
If the value of the partition is determined, it is called a static partition.
When adding a partition or loading partition data, the partition name has been specified.
create table if not exists day_part1(
uid int,
uname string
)
pa ...
Posted by noisyscanner on Tue, 14 Jan 2020 21:53:44 -0800
Hadoop Part 2: mapreedce
Mapreedce (3)
Project address: https://github.com/KingBobTitan/hadoop.git
MR's Shuffle explanation and Join implementation
First, review
1. MapReduce's history monitoring service: JobHistoryServer
Function: used to monitor the information of all MapReduce programs running on YARN
Configure log ...
Posted by nick1 on Tue, 14 Jan 2020 02:21:13 -0800
UDF Writing for Struct Complex Data Type and GenericUDF Writing
1. Background introduction: With the upgrade of MaxCompute version 2.0, the data types supported by Java UDF have expanded from BIGINT, STRING, DOUBLE, BOOLEAN to more basic data types, as well as complex types such as ARRAY, MAP, STRUCT, and Writable parameters.Java UDF uses a method of complex data types, STRUCT corresponds to com.aliyun.odps ...
Posted by mnewbegin on Mon, 23 Dec 2019 19:19:12 -0800