Single-machine version Hadoop hdfs installation setup record

Keywords: Java ssh Hadoop Linux

**System Configuration**

** Specification: ** 1vCPUs | 2GB | s6.medium.2

** Mirror: ** Ubuntu 18.04 server 64bit

**User: ** Create a halo user on Ubuntu

Preparatory software: 1 Hadoop installation package (recommended cdh, cloudera site) 2 Java 1.8 +3 ssh

1. Install Java

Download the Linux version of JDK jdk-8u161-linux-x64.tar.gz first

Unzip installation package

tar -zxvf jdk-8u161-linux-x64.tar.gz -C unzipPath

Configure environment variables/etc/profile or ~/.bash_profile

#set java environment
JAVA_HOME=/usr/local/java/jdk1.8.0_161
JRE_HOME=$JAVA_HOME/jre
CLASS_PATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib/rt.jar
PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
export JAVA_HOME JRE_HOME CLASS_PATH PATH

Make configuration content valid source/etc/profile
Verify java-version

2. Install ssh and configure secret-free login

Server purchased, ssh installed, need to check if ssh service is installed in case of single machine deployment
```
ps -e | grep ssh # View ssh for
systemctl status ssh  # Detecting ssh status
```

Install ssh

(1)Determine whether to install ssh Services can be performed by the following commands:
ssh localhost
ssh: connect to host localhost port 22: Connection refused

(2)As shown above, it means that there is no installation yet, and you can apt Install with the following commands:
apt-get install openssh-server

(3)Start the service:
sudo /etc/init.d/ssh start

ssh Secret Login

cd ~
ssh-keygen -t rsa
cd .ssh
# Write generated rsa public key information to authorized_keys file
cat id_rsa.pub >> authorized_keys
# Modify read and write permissions for authorized_keys file
chmod 600 authorized_keys

.ssh folder structure

|--- id_rsa # SSH RSA generated private key file

|--- id_rsa.pub # SSH RSA generated public key file

|--- authorized_keys #Secret-free login file

|--- know_hosts # SSH remote login record

When SSH is a machine that has not logged on, you often need to enter yes to confirm that adding the know_hosts file is not convenient for some scripts. You can modify the / etc/ssh/ssh_config file to add it automatically. Note that ssh_config is not sshd_config

Find # StrictHostKeyChecking ask modified to StrictHostKeyChecking no

This enables automatic addition of know_hosts to~

3. Hadoop Installation

Unzip the Hadoop3 package

Add environment variable.profile

export HADOOP_HOME=/home/hadoop0/app/hadoop-3.1.3
export PATH=$HADOOP_HOME/bin:$PATH

hadoop directory description
|--- bin # Hadoop client command

|--- etc/hadoop #Dependent Profile Storage Directory

|--- sbin #Start hadoop related process script (Server side)

|--- share #Common use cases* (share/hadoop/mapreduce)

Modify Profile

(1) etc/hadoop/hadoop_env.sh
# Add to
export JAVA_HOME=/software/java/jdk1.8.0_161
    
(2) etc/hadoop/core-site.xml
# Add hadoop0 as the configured local hosts
<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://hadoop0:9000</value>
        </property>
</configuration>

# Remarks
hadoop0 -> 127.0.0.1 be hadoop-client Client cannot connect because listening locally

Change to hdfs://0.0.0.0:9000, hadoop-client connections can normally create folders, but cannot read and write
Exception: There are 1 datanode(s) running and 1 node(s) are excluded in this operation.

stack overflow Suggest changing to local ip，But perhaps on Huawei's cloud server because ip Forwarding issues, which are not resolved, have time to study.
(Determined above Exeption There is no problem deploying on a single machine, it should be that the cloud server has ip Forwarding relationship)

(3) etc/hadoop/hdfs-site.xml
# Add to
<configuration>
    <property>
            <name>dfs.replication</name>    # Number of copies
            <value>1</value>
    </property>

    <property>
             # File blocks location, default under linux system tmp folder, restart may be lost
             # So you need to modify the storage location
            <name>hadoop.tmp.dir</name>   
            <value>/home/hadoop0/data/tmp</value>
    </property>
</configuration>

(4) modify workers file
# Add configuration ip or map name
hadoop0

4. Startup and Verification

Perform system formatting before first starting hadoop
```
# Format
hdfs namenode -format
```

start-up

# Start dfs
sbin/start-dfs.sh
//Start hadoop dfs log location: logs/hadoop-hadoop0-namenode-xxx.log
# Stop Clustering
sbin/stop-dfs.sh
# Single Component Process Start
sbin/hadoop-daemons.sh stop|start|status xxx
xxx Can be:
    NameNode
    SecondaryNameNode 
    DataNode
    
netstat -ntlp

Verification

(1)linux command line jps Appear
NameNode,DataNode,SecondaryNameNode

(2)Verify Web Site-Namenode information
http://ip:9870 Attention to firewall issues sudo ufw allow 9870 / systemctl stop firewalld

(3)Two types of permanent shutdown/How to turn on the firewall
systemctl disable firewalld
systemctl enable firewalld
chkconfig iptables off
chkconfig iptables on

Note: The difference between chkconfig and systemctl

5. Common Hadoop command line operations

Common File System Operations: View, Store, Move, Delete

hadoop fs -ls /         # View the hadoop root folder
    -cp src dest  # copy
    -getmerge file1 file2 localdst #merge
    -get          # Obtain
    -put          # Submit (both local and hdfs can)
    ....

The difference between -cat and -text (text decodes and transcodes text, cat does not, so cat command output is scrambled)

Notes and Linux commands used during installation

Linux hosts file modification

# Open vi/etc/hosts file to modify, create user name hadoop0
127.0.0.1       localhost
127.0.0.1       hadoop0 

uname -a # Get system information

Some Linux commands

ls command

ls -a # View all files including hiding 
ls -la # Tree display
ll -h # Display data size, convert to (K, M...)
env  # View the system's current environment variables

tar command

tar -zxvf jdk*.tar.gz -C ~/app # Unzip file to specified directory
tar -czvf *.tar abc/   # Package Compressed Files 

//Directives:
-c  # Create Compressed File
-x  # Unzip Compressed File
-t  # See which files are in the compressed package
-z  # Decompression or compression with Gzip
-j  # Decompression or compression with bzip2
-v  # Show Detailed Procedures
-f  # Target File Name
-P  # Keep original permissions and attributes
-p  # Use absolute path compression
-C  # Specify the directory to unzip to

ssh Modify Port

(1) Modify the ssh port
 The default port for SSH is 22, configured at/etc/ssh/sshd_config
Port 22
Port 800
 Edit firewall configuration to enable ports 22 and 800.
sudo /etc/init.d/ssh restart 
This way the ssh port will work on 22 and 800 at the same time.

(2) Validation of results
 a. Use SSH root@localhost-p 800
    
b. or use systemctl status ssh
 Appear
Server listening on 0.0.0.0 port 800.
Server listening on :: port 800.
Server listening on 0.0.0.0 port 22.
Server listening on :: port 22.

If the connection is successful, edit the settings of sshd_config again and delete Port22 inside.

Posted by cbn_noodles on Wed, 18 Mar 2020 19:38:36 -0700

Programmer Group