Implementing MapReduce in eclipse

1. Prepare the environment The mapred-site.xml and yarn-site.xml configuration files of Hadoop under Windows are updated as in the virtual machine. Copy mapred-site.xml and yarn-site.xml configuration files to the project. Add dependency packages. 2. Operation mode Run locally (start multiple threads in local eclipse to simul ...

Posted by Yanayaya on Tue, 29 Jan 2019 15:15:15 -0800

Flink Read and Write Series - Read and Write HBase

There are two ways to read HBase. One is to inherit RichSourceFunction, rewrite the parent method, and the other is to implement the OutputFormat interface. The code is as follows: Way 1: Inherit RichSourceFunction package com.my.flink.utils.streaming.hbase; import com.my.flink.utils.config.ConfigKeys; import org.apache.flin ...

Posted by jallard on Sun, 27 Jan 2019 21:03:14 -0800

Big Data Hadoop Tool python Tutorial 1-HDFS Hadoop Distributed File System

Hadoop Distributed File System (HDFS) is a Java-based distributed file system.Distributed, scalable and portable file systems are designed to span large clusters of commercial servers. The design of HDFS is based on Google File System (GFS). https://ai.google/research/pubs/pub51) . Like many other distributed file systems, HDFS has a large amo ...

Posted by Nick Zaccardi on Sun, 27 Jan 2019 19:21:14 -0800

Linux Learning from scratch IV

Authority management: ls -l rwxrwxrwx: The left three: defining user(owner) permissions The middle three: define the rights of group; The right three: the right to define other Process security context: Application model of process access rights to files: Whether the ownership of ...

Posted by ntbd on Sun, 27 Jan 2019 18:27:15 -0800

Spark Learning Notes (12) - SparkSQL

1 SparkSQL Introduction Spark SQL is a module Spark uses to process structured data. It provides a programming abstraction called DataFrame and serves as a distributed SQL query engine. Hive has been learned. It converts Hive SQL into MapReduce and submits it to cluster for execution. It greatly simplifies the complexity of pr ...

Posted by Dorin85 on Sat, 26 Jan 2019 00:24:15 -0800

hadoop garbage collection station

In production, the hdfs recycling bin must be open, usually set to 7 days. fs.trash.interval reserves time for the garbage collection station, and disables the function of the collection station if it is 0. The checkpoint time of fs.trash.checkpoint.interval recycling bin is generally set to be less than or equal to fs.trash.interval. If 0, t ...

Posted by NJordan72 on Fri, 25 Jan 2019 14:51:13 -0800

Hadoop 2.9.1 Install Hive 2.3.3 on Ubuntu 16.04

Hadoop 2.9.1 Install Hive 2.3.3 on Ubuntu 16.04 Preface http://hive.apache.org/downloads.html There are instructions. hadoop3.x Version needs hive3.0.0,and hadoop2.x Need hive2.3.3. Because of mine. hadoop It's 2..9,So choose to download hive2.3.3. Hive yes hadoop Tools, so you only need to install them NameNode Up, no need to install DataN ...

Posted by longtone on Fri, 25 Jan 2019 04:45:13 -0800

idea writes WordCount program under windows and uploads it to hadoop cluster by jar package (fool version)

Typically, programs are programmed in IDE, then packaged as jar packages, and submitted to the cluster. The most commonly used method is to create a Maven project to manage the dependencies of jar packages using Maven. 1. Generating jar packages for WordCount 1. Open IDEA, File New Project Maven Next Fill in Groupld and Artif ...

Posted by kcgame on Thu, 24 Jan 2019 19:45:14 -0800

Java code implements basic operations on HBase

overview 1. Importing jar packages 2. Testing 3. Abnormal handling First of all, the HBase Build, then start Zookeeper,Hadoop HBase cluster 1. Importing jar packages Get ready: 1.CentOS7 2.Zookeeper Cluster 3. Hadoop 2.7.3 Cluster 4. HBase 2.0.0 cluster 5.eclipse Build a java project in eclipse where you create a new lib fol ...

Posted by torrentmasta on Wed, 23 Jan 2019 19:27:12 -0800

Build a High Availability (HA) cluster using the Quorum Journal Manager (QJM)

Manual HA 1. If you are creating a brand new HA cluster, please complete this article first. Blog The first 18 steps. 2,modify core-site.xml <property> <name>fs.defaultFS</name> <value>hdfs://mycluster</value> </property> <property> <name>hadoop.tmp.dir</name> &l ...

Posted by dragongamer on Tue, 22 Jan 2019 19:24:12 -0800