Basic knowledge and use of impala
Chapter 1 basic concepts of Impala
1.1 what is Impala
Cloudera provides interactive SQL query function with high performance and low latency for HDFS and HBase data.
Based on Hive, it uses memory computing, takes into account data warehouse, and has the advantages of real-time, batch processing, multi concurrency and so on.
It is the prefer ...
Posted by yaba on Sun, 19 Sep 2021 06:13:40 -0700
Production Optimization Practice of hive3. X on spark 3.0
1 data tilt
Most tasks are completed quickly, and only one or a few tasks are executed slowly or even fail. This phenomenon is data skew.The data skew is divided into single table query with GroupBy field and two table (or multi table) Join query.
1.1 single table data skew optimization
1.1.1 the map side performs aggregation - GroupBy opera ...
Posted by ben2k8 on Thu, 16 Sep 2021 14:35:16 -0700
Eclipse configures hadoop development environment
Hadoop stepping on the pit (3)
Eclipse configures hadoop development environment
Environmental Science
windows 10
java 1.8
Namenode (Hadoop 1-ali) alicloud (CentOS 7.3) 120.26.173.104
hadoop version 2.8.5
Eclipse installation
Enterprise version needs to be installed. For network reasons, offline installation package is recommended
https://www.e ...
Posted by Duodecillion on Sun, 28 Jun 2020 22:17:50 -0700
Hbase client programming (Eclipse)
Hadoop stepping on the pit (4)
Hbase client programming (Eclipse)
Environmental Science
For the installation and configuration of Hbase and the configuration of Eclipse, please refer to the previous two articles
The version of hbase used in this series is 1.4.13
The selected hadoop version of this series is 2.8.5
Please pay attention to the per ...
Posted by Quest on Sun, 28 Jun 2020 21:59:57 -0700
Deploying a hadoop cluster on Centos7
Hadoop's Trample Notes (1)
Deploying a hadoop cluster on Centos7
Environmental Science
Machine 1(hadoop1-ali) Ali Cloud (CentOS 7.3) 120.26.173.104
Machine 2(hadoop2-hw) Huawei Cloud (CentOS 7.4) 114.116.233.156
Where the first server serves as a namenode and the second serves as a datanode
Modify hostname and hostfile
Execute on two machines ...
Posted by maxxx on Sun, 28 Jun 2020 17:21:11 -0700
Construction of Zookeeper single machine and cluster environment
Article catalog
1, Single machine environment construction
1.1 download
1.2 decompression
1.3 configure environment variables
1.4 modify configuration
1.5 start up
1.6 verification
2, Cluster environment construction
2.1 modify configuration
2.2 identification node
2.3 start cluster
2.4 cluster ve ...
Posted by jwagner on Thu, 25 Jun 2020 04:42:05 -0700
Big data Hadoop cluster construction
Big data Hadoop cluster construction
1, Environment
Server configuration:
CPU model: Intel ® Xeon ® CPU E5-2620 v4 @ 2.10GHz
CPU cores: 16
Memory: 64GB
operating system
Version: CentOS Linux release 7.5.1804 (Core)
Host list:
IP
host name
192.168.1.101
node1
192.168.1.102
node2
1 ...
Posted by SL-Cowsrule on Sun, 21 Jun 2020 17:57:54 -0700
Configuration of hadoop pseudo distribution mode and some common commands
The history of big data
3V: volume, velocity, variety (structured and unstructured data), value (low value density)
Technical challenges brought by big data
Increasing storage capacity
Difficulty in obtaining valuable information: search, advertisement, recommendation
Data processing scenario ...
Posted by jmansa on Sat, 20 Jun 2020 00:46:03 -0700
ZooKeeper installation and deployment configuration under Centos (cluster mode)
catalog
Step 1: prepare documents
(1) Upload file
(2) Unzip files
Step 2: modify the configuration file
(1) Rename file
(2) Create tmp folder
(3) Create a myid file
(4) Modify profile
Step 3: configure environment variables
Step 4: distribute documents
(1) Distribution of documents
(2) Configura ...
Posted by darkfreaks on Mon, 15 Jun 2020 22:32:36 -0700
Installation, configuration and use of Sqoop
Article catalog title
Introduction and characteristics
summary
Working mechanism
Installation and configuration
Common command operations of Sqoop
Import data from the database into HDFS
Export data on HDFS to database
Introduction and characteristics
Sqoop is a Hadoop And data transfer tools ...
Posted by Bac on Sat, 06 Jun 2020 04:23:27 -0700