Hive is finally waiting, Flink

When did Apache Spark start to support the integrated Hive feature? I believe that as long as readers have used Spark, they will say that this is a long time ago. When does Apache Flink support integration with Hive? Readers may be confused. Haven't they supported it yet, haven't they used it? Or the latest version only supports it, but the fu ...

Posted by elfeste on Fri, 27 Mar 2020 03:09:45 -0700

hive-1.2.1 Installation and Simple Use

Hive can only be installed on one node 1. Upload tar package 2. Decompression tar -zxvf hive-1.2.1.tar.gz -C /apps/ 3. Install mysql database (switch to root user) (there are no limitations on where to install, only nodes that can connect to the hadoop cluster) 4. Configure hive (a) Configure the HIVE_HOME environment variable vi conf/hive-e ...

Posted by gyash on Thu, 26 Mar 2020 21:12:15 -0700

Flume installation deployment and cases

1. Installation address 1) Flume official website address http://flume.apache.org/ 2) Document view address http://flume.apache.org/FlumeUserGuide.html 3) Download address http://archive.apache.org/dist/flume/ 2. Installation and deployment 1) Upload apache-flume-1.7.0-bin.tar.gz to the / opt/softwa ...

Posted by The14thGOD on Fri, 13 Mar 2020 19:40:57 -0700

hive installation (incomplete)

1. Three installation methods of hive Three installation methods of Hive are introduced on Hive official website, corresponding to different application scenarios. In the final analysis, the storage location of metadata is different. Embedded mode (metadata is saved in the embedded derby database, al ...

Posted by akreation on Thu, 27 Feb 2020 02:23:22 -0800

RDD common operations in pyspark

preparation: import pyspark from pyspark import SparkContext from pyspark import SparkConf conf=SparkConf().setAppName("lg").setMaster('local[4]') #local[4] means to run 4 kernels locally sc=SparkContext.getOrCreate(conf) 1. Parallel and collect The parallelize function converts the list obj ...

Posted by moomsdad on Fri, 21 Feb 2020 02:13:19 -0800

centos7 hadoop+hive installation

Prepare four virtual machines Virtual Machine Installation 1.Create a new virtual machine 2.Click on Typical Installation(Recommend) 3.Select Chinese and click on your own partition # Partition Configuration (JD Usage) /boot 200M swap 512M # Not enough native memory, swap / # root directory 4.Configu ...

Posted by Derek on Fri, 31 Jan 2020 20:33:17 -0800

The trap of Broadcast Join in SparkSql 2.x (hint does not work)

Problem description Use hint to specify the broadcast table, but cannot perform the specified broadcast; preparation in advance hive> select * from test.tmp_demo_small; OK tmp_demo_small.pas_phone tmp_demo_small.age 156 20 157 22 158 15 hive> analyze table test.tmp_demo_small compute statis ...

Posted by cbullock on Fri, 17 Jan 2020 06:02:22 -0800

Dynamic partition and static partition of hive

Dynamic partition and static partition of hive 1.1 static partition If the value of the partition is determined, it is called a static partition. When adding a partition or loading partition data, the partition name has been specified. create table if not exists day_part1( uid int, uname string ) pa ...

Posted by noisyscanner on Tue, 14 Jan 2020 21:53:44 -0800

Hadoop Part 2: mapreedce

Mapreedce (3) Project address: https://github.com/KingBobTitan/hadoop.git MR's Shuffle explanation and Join implementation First, review 1. MapReduce's history monitoring service: JobHistoryServer Function: used to monitor the information of all MapReduce programs running on YARN Configure log ...

Posted by nick1 on Tue, 14 Jan 2020 02:21:13 -0800

UDF Writing for Struct Complex Data Type and GenericUDF Writing

1. Background introduction: With the upgrade of MaxCompute version 2.0, the data types supported by Java UDF have expanded from BIGINT, STRING, DOUBLE, BOOLEAN to more basic data types, as well as complex types such as ARRAY, MAP, STRUCT, and Writable parameters.Java UDF uses a method of complex data types, STRUCT corresponds to com.aliyun.odps ...

Posted by mnewbegin on Mon, 23 Dec 2019 19:19:12 -0800