hive installation (incomplete)

1. Three installation methods of hive Three installation methods of Hive are introduced on Hive official website, corresponding to different application scenarios. In the final analysis, the storage location of metadata is different. Embedded mode (metadata is saved in the embedded derby database, al ...

Posted by akreation on Thu, 27 Feb 2020 02:23:22 -0800

RDD common operations in pyspark

preparation: import pyspark from pyspark import SparkContext from pyspark import SparkConf conf=SparkConf().setAppName("lg").setMaster('local[4]') #local[4] means to run 4 kernels locally sc=SparkContext.getOrCreate(conf) 1. Parallel and collect The parallelize function converts the list obj ...

Posted by moomsdad on Fri, 21 Feb 2020 02:13:19 -0800

centos7 hadoop+hive installation

Prepare four virtual machines Virtual Machine Installation 1.Create a new virtual machine 2.Click on Typical Installation(Recommend) 3.Select Chinese and click on your own partition # Partition Configuration (JD Usage) /boot 200M swap 512M # Not enough native memory, swap / # root directory 4.Configu ...

Posted by Derek on Fri, 31 Jan 2020 20:33:17 -0800

The trap of Broadcast Join in SparkSql 2.x (hint does not work)

Problem description Use hint to specify the broadcast table, but cannot perform the specified broadcast; preparation in advance hive> select * from test.tmp_demo_small; OK tmp_demo_small.pas_phone tmp_demo_small.age 156 20 157 22 158 15 hive> analyze table test.tmp_demo_small compute statis ...

Posted by cbullock on Fri, 17 Jan 2020 06:02:22 -0800

Dynamic partition and static partition of hive

Dynamic partition and static partition of hive 1.1 static partition If the value of the partition is determined, it is called a static partition. When adding a partition or loading partition data, the partition name has been specified. create table if not exists day_part1( uid int, uname string ) pa ...

Posted by noisyscanner on Tue, 14 Jan 2020 21:53:44 -0800

Hadoop Part 2: mapreedce

Mapreedce (3) Project address: https://github.com/KingBobTitan/hadoop.git MR's Shuffle explanation and Join implementation First, review 1. MapReduce's history monitoring service: JobHistoryServer Function: used to monitor the information of all MapReduce programs running on YARN Configure log ...

Posted by nick1 on Tue, 14 Jan 2020 02:21:13 -0800

UDF Writing for Struct Complex Data Type and GenericUDF Writing

1. Background introduction: With the upgrade of MaxCompute version 2.0, the data types supported by Java UDF have expanded from BIGINT, STRING, DOUBLE, BOOLEAN to more basic data types, as well as complex types such as ARRAY, MAP, STRUCT, and Writable parameters.Java UDF uses a method of complex data types, STRUCT corresponds to com.aliyun.odps ...

Posted by mnewbegin on Mon, 23 Dec 2019 19:19:12 -0800

spark reads hive data java

Requirement: read out the data in hive and write it into es. Environment: spark 2.0.2 1. enableHiveSupport() is set in sparksession SparkConf conf = new SparkConf().setAppName("appName").setMaster("local[*]"); SparkSession spark = SparkSession .builder() .appName("Java Spark SQL basic exam ...

Posted by NSW42 on Tue, 10 Dec 2019 14:30:32 -0800

AWS Athena analysis log

In AWS, Athena can be used to analyze the logs saved in S3. He can convert the logs into the format of database tables, so that they can be queried through sql statements. This function is similar to using logparser to analyze Exchange or IIS logs on a windows Server. Let's do a demonstration, record the management log through Cloudtrail, and ...

Posted by stubarny on Wed, 04 Dec 2019 09:10:56 -0800

Hive Installation, Configuration, and Use

Overview of Hive Hive is a Hadoop-based data warehouse tool that maps structured data files to a table and provides SQL-like query capabilities. Hive is essentially a MapReduce program that converts HQL. Data processed by Hive is stored in HDFS, and the underlying implementation of analytic data can be MapReduce, tes, or Spark, with its executo ...

Posted by Wayniac on Tue, 03 Dec 2019 17:47:05 -0800