1. Three installation methods of hive
Three installation methods of Hive are introduced on Hive official website, corresponding to different application scenarios. In the final analysis, the storage location of metadata is different.
Embedded mode (metadata is saved in the embedded derby database, al ...
Posted by akreation on Thu, 27 Feb 2020 02:23:22 -0800
from pyspark import SparkContext
from pyspark import SparkConf
conf=SparkConf().setAppName("lg").setMaster('local') #local means to run 4 kernels locally
1. Parallel and collect
The parallelize function converts the list obj ...
Posted by moomsdad on Fri, 21 Feb 2020 02:13:19 -0800
Prepare four virtual machines
Virtual Machine Installation
1.Create a new virtual machine
2.Click on Typical Installation(Recommend)
3.Select Chinese and click on your own partition
# Partition Configuration (JD Usage)
swap 512M # Not enough native memory, swap
/ # root directory
Posted by Derek on Fri, 31 Jan 2020 20:33:17 -0800
Use hint to specify the broadcast table, but cannot perform the specified broadcast;
preparation in advance
hive> select * from test.tmp_demo_small;
hive> analyze table test.tmp_demo_small compute statis ...
Posted by cbullock on Fri, 17 Jan 2020 06:02:22 -0800
Dynamic partition and static partition of hive
1.1 static partition
If the value of the partition is determined, it is called a static partition.
When adding a partition or loading partition data, the partition name has been specified.
create table if not exists day_part1(
Posted by noisyscanner on Tue, 14 Jan 2020 21:53:44 -0800
Project address: https://github.com/KingBobTitan/hadoop.git
MR's Shuffle explanation and Join implementation
1. MapReduce's history monitoring service: JobHistoryServer
Function: used to monitor the information of all MapReduce programs running on YARN
Configure log ...
Posted by nick1 on Tue, 14 Jan 2020 02:21:13 -0800
1. Background introduction: With the upgrade of MaxCompute version 2.0, the data types supported by Java UDF have expanded from BIGINT, STRING, DOUBLE, BOOLEAN to more basic data types, as well as complex types such as ARRAY, MAP, STRUCT, and Writable parameters.Java UDF uses a method of complex data types, STRUCT corresponds to com.aliyun.odps ...
Posted by mnewbegin on Mon, 23 Dec 2019 19:19:12 -0800
Requirement: read out the data in hive and write it into es.
Environment: spark 2.0.2
1. enableHiveSupport() is set in sparksession
SparkConf conf = new SparkConf().setAppName("appName").setMaster("local[*]");
SparkSession spark = SparkSession
.appName("Java Spark SQL basic exam ...
Posted by NSW42 on Tue, 10 Dec 2019 14:30:32 -0800
In AWS, Athena can be used to analyze the logs saved in S3. He can convert the logs into the format of database tables, so that they can be queried through sql statements. This function is similar to using logparser to analyze Exchange or IIS logs on a windows Server.
Let's do a demonstration, record the management log through Cloudtrail, and ...
Posted by stubarny on Wed, 04 Dec 2019 09:10:56 -0800
Overview of Hive
Hive is a Hadoop-based data warehouse tool that maps structured data files to a table and provides SQL-like query capabilities.
Hive is essentially a MapReduce program that converts HQL.
Data processed by Hive is stored in HDFS, and the underlying implementation of analytic data can be MapReduce, tes, or Spark, with its executo ...
Posted by Wayniac on Tue, 03 Dec 2019 17:47:05 -0800