Hadoop running environment building tutorial

1. Environmental preparation: A linux machine can be a virtual machine installed by local VMware or a real linux machine. If it is a locally installed virtual machine, the following points need to be pre configured: Configure the static IP of the machine (to prevent IP changes during restart) Modify host name (easy to configure) Turn off the ...

Posted by crazylegseddie on Tue, 25 Feb 2020 19:02:32 -0800

Install Sqoop on Linux (and connect mysql test)

Article directory Environment description Download and unzip Change configuration sqoop-env.sh After decompression Modify sqoop-env.sh Configure environment variables Copy mysql driver mysql start View sqoop version Test with mysql Environment description Software Edition operating syste ...

Posted by possiblyB9 on Tue, 25 Feb 2020 07:36:16 -0800

Shell operation of HDFS

1. Basic grammar bin/hadoop fs specific command or bin / HDFS DFS specific command 2. Command Daquan [andy@xiaoai01 hadoop-2.7.2]$ bin/hadoop fs [-appendToFile <localsrc> ... <dst>] [-cat [-ignoreCrc] <src> ...] [-checksum <src> ...] [-chgrp [-R] GROUP PATH...] [-chmod [-R ...

Posted by Abarak on Fri, 21 Feb 2020 03:07:18 -0800

RDD common operations in pyspark

preparation: import pyspark from pyspark import SparkContext from pyspark import SparkConf conf=SparkConf().setAppName("lg").setMaster('local[4]') #local[4] means to run 4 kernels locally sc=SparkContext.getOrCreate(conf) 1. Parallel and collect The parallelize function converts the list obj ...

Posted by moomsdad on Fri, 21 Feb 2020 02:13:19 -0800

Seckill TOPK problem (with code)

subject Given a large file (1T? 10T), in which each line stores a user's ID (IP? IQ?) , your computer has only 2G memory, please find the ten IDS with the highest frequency introduce In recent years, the TopK problem has been the most, the most and the most in the field test In fact, the answer is r ...

Posted by hughmill on Mon, 17 Feb 2020 20:07:30 -0800

Flink1.9“Error: A JNI error has occurred”

background Recently, companies are building big data systems, and architects recommend using Flink to build them.So this day, I'm investigating Flink in my own virtual machine environment (Ubuntu 16.4). from ververica I learned the fundamentals of flink in school, because I worked on python data pr ...

Posted by aperales10 on Sun, 16 Feb 2020 18:33:05 -0800

Bug0: resolve java.io.FileNotFoundException error encountered when Hadoop plug-in runs

Problem description The following error occurred during running WordCount program in Eclipse with plug-ins (instead of manually packing and uploading servers): DEBUG - LocalFetcher 1 going to fetch: attempt_local938878567_0001_m_000000_0 WARN - job_local938878567_0001 java.lang.Exception: org.apache. ...

Posted by jeffshead on Wed, 12 Feb 2020 06:32:57 -0800

How to quickly build a Spark distributed architecture for big data

Build our Spark platform from scratch 1. Preparing the centeros environment In order to build a real cluster environment and achieve a highly available architecture, we should prepare at least three virtual machines as cluster nodes. So I bought three Alibaba cloud servers as our cluster nodes. ...

Posted by knelson on Tue, 04 Feb 2020 23:47:57 -0800

Troubleshooting of namenode in HDFS

After NameNode fails, you can use the following two methods to recover data: 1. Copy data in SecondaryNameNode to the directory where NameNode stores data (1) kill -9 NameNode process [test@hadoop151 ~]$ jps 3764 DataNode 4069 NodeManager 3654 NameNode 7738 Jps [test@hadoop151 ~]$ kill -9 3654 [test@h ...

Posted by webtechatlantic on Mon, 03 Feb 2020 08:51:44 -0800

centos7 hadoop+hive installation

Prepare four virtual machines Virtual Machine Installation 1.Create a new virtual machine 2.Click on Typical Installation(Recommend) 3.Select Chinese and click on your own partition # Partition Configuration (JD Usage) /boot 200M swap 512M # Not enough native memory, swap / # root directory 4.Configu ...

Posted by Derek on Fri, 31 Jan 2020 20:33:17 -0800