Basic knowledge and use of impala

Chapter 1 basic concepts of Impala 1.1 what is Impala Cloudera provides interactive SQL query function with high performance and low latency for HDFS and HBase data. Based on Hive, it uses memory computing, takes into account data warehouse, and has the advantages of real-time, batch processing, multi concurrency and so on. It is the prefer ...

Posted by yaba on Sun, 19 Sep 2021 06:13:40 -0700

Production Optimization Practice of hive3. X on spark 3.0

1 data tilt Most tasks are completed quickly, and only one or a few tasks are executed slowly or even fail. This phenomenon is data skew.The data skew is divided into single table query with GroupBy field and two table (or multi table) Join query. 1.1 single table data skew optimization 1.1.1 the map side performs aggregation - GroupBy opera ...

Posted by ben2k8 on Thu, 16 Sep 2021 14:35:16 -0700

Eclipse configures hadoop development environment

Hadoop stepping on the pit (3) Eclipse configures hadoop development environment Environmental Science windows 10 java 1.8 Namenode (Hadoop 1-ali) alicloud (CentOS 7.3) 120.26.173.104 hadoop version 2.8.5 Eclipse installation Enterprise version needs to be installed. For network reasons, offline installation package is recommended https://www.e ...

Posted by Duodecillion on Sun, 28 Jun 2020 22:17:50 -0700

Hbase client programming (Eclipse)

Hadoop stepping on the pit (4) Hbase client programming (Eclipse) Environmental Science For the installation and configuration of Hbase and the configuration of Eclipse, please refer to the previous two articles The version of hbase used in this series is 1.4.13 The selected hadoop version of this series is 2.8.5 Please pay attention to the per ...

Posted by Quest on Sun, 28 Jun 2020 21:59:57 -0700

Deploying a hadoop cluster on Centos7

Hadoop's Trample Notes (1) Deploying a hadoop cluster on Centos7 Environmental Science Machine 1(hadoop1-ali) Ali Cloud (CentOS 7.3) 120.26.173.104 Machine 2(hadoop2-hw) Huawei Cloud (CentOS 7.4) 114.116.233.156 Where the first server serves as a namenode and the second serves as a datanode Modify hostname and hostfile Execute on two machines ...

Posted by maxxx on Sun, 28 Jun 2020 17:21:11 -0700

Construction of Zookeeper single machine and cluster environment

Article catalog 1, Single machine environment construction 1.1 download 1.2 decompression 1.3 configure environment variables 1.4 modify configuration 1.5 start up 1.6 verification 2, Cluster environment construction 2.1 modify configuration 2.2 identification node 2.3 start cluster 2.4 cluster ve ...

Posted by jwagner on Thu, 25 Jun 2020 04:42:05 -0700

Big data Hadoop cluster construction

Big data Hadoop cluster construction 1, Environment Server configuration: CPU model: Intel ® Xeon ® CPU E5-2620 v4 @ 2.10GHz CPU cores: 16 Memory: 64GB operating system Version: CentOS Linux release 7.5.1804 (Core) Host list: IP host name 192.168.1.101 node1 192.168.1.102 node2 1 ...

Posted by SL-Cowsrule on Sun, 21 Jun 2020 17:57:54 -0700

Configuration of hadoop pseudo distribution mode and some common commands

The history of big data 3V: volume, velocity, variety (structured and unstructured data), value (low value density) Technical challenges brought by big data Increasing storage capacity Difficulty in obtaining valuable information: search, advertisement, recommendation Data processing scenario ...

Posted by jmansa on Sat, 20 Jun 2020 00:46:03 -0700

ZooKeeper installation and deployment configuration under Centos (cluster mode)

catalog Step 1: prepare documents (1) Upload file (2) Unzip files Step 2: modify the configuration file (1) Rename file (2) Create tmp folder (3) Create a myid file (4) Modify profile Step 3: configure environment variables Step 4: distribute documents (1) Distribution of documents (2) Configura ...

Posted by darkfreaks on Mon, 15 Jun 2020 22:32:36 -0700

Installation, configuration and use of Sqoop

Article catalog title Introduction and characteristics summary Working mechanism Installation and configuration Common command operations of Sqoop Import data from the database into HDFS Export data on HDFS to database Introduction and characteristics Sqoop is a Hadoop And data transfer tools ...

Posted by Bac on Sat, 06 Jun 2020 04:23:27 -0700