Single-machine version Hadoop hdfs installation setup record

Keywords: Java ssh Hadoop Linux

**System Configuration**

** Specification: ** 1vCPUs | 2GB | s6.medium.2

** Mirror: ** Ubuntu 18.04 server 64bit

**User: ** Create a halo user on Ubuntu

Preparatory software: 1 Hadoop installation package (recommended cdh, cloudera site) 2 Java 1.8 +3 ssh

1. Install Java

  • Download the Linux version of JDK jdk-8u161-linux-x64.tar.gz first
  • Unzip installation package

    tar -zxvf jdk-8u161-linux-x64.tar.gz -C unzipPath
  • Configure environment variables/etc/profile or ~/.bash_profile

    #set java environment
    JAVA_HOME=/usr/local/java/jdk1.8.0_161
    JRE_HOME=$JAVA_HOME/jre
    CLASS_PATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib/rt.jar
    PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
    export JAVA_HOME JRE_HOME CLASS_PATH PATH
  • Make configuration content valid source/etc/profile
  • Verify java-version

2. Install ssh and configure secret-free login

  • Server purchased, ssh installed, need to check if ssh service is installed in case of single machine deployment

    ps -e | grep ssh # View ssh for
    systemctl status ssh  # Detecting ssh status
  • Install ssh

    (1)Determine whether to install ssh Services can be performed by the following commands:
    ssh localhost
    ssh: connect to host localhost port 22: Connection refused
    
    (2)As shown above, it means that there is no installation yet, and you can apt Install with the following commands:
    apt-get install openssh-server
    
    (3)Start the service:
    sudo /etc/init.d/ssh start  
  • ssh Secret Login

    cd ~
    ssh-keygen -t rsa
    cd .ssh
    # Write generated rsa public key information to authorized_keys file
    cat id_rsa.pub >> authorized_keys
    # Modify read and write permissions for authorized_keys file
    chmod 600 authorized_keys

.ssh folder structure

|--- id_rsa # SSH RSA generated private key file

|--- id_rsa.pub # SSH RSA generated public key file

|--- authorized_keys #Secret-free login file

|--- know_hosts # SSH remote login record

When SSH is a machine that has not logged on, you often need to enter yes to confirm that adding the know_hosts file is not convenient for some scripts. You can modify the / etc/ssh/ssh_config file to add it automatically. Note that ssh_config is not sshd_config

Find # StrictHostKeyChecking ask modified to StrictHostKeyChecking no

This enables automatic addition of know_hosts to~

3. Hadoop Installation

  • Unzip the Hadoop3 package
  • Add environment variable.profile

    export HADOOP_HOME=/home/hadoop0/app/hadoop-3.1.3
    export PATH=$HADOOP_HOME/bin:$PATH
  • hadoop directory description

    |--- bin # Hadoop client command

    |--- etc/hadoop #Dependent Profile Storage Directory

    |--- sbin #Start hadoop related process script (Server side)

    |--- share #Common use cases* (share/hadoop/mapreduce)

  • Modify Profile

    (1) etc/hadoop/hadoop_env.sh
    # Add to
    export JAVA_HOME=/software/java/jdk1.8.0_161
        
    (2) etc/hadoop/core-site.xml
    # Add hadoop0 as the configured local hosts
    <configuration>
            <property>
                    <name>fs.defaultFS</name>
                    <value>hdfs://hadoop0:9000</value>
            </property>
    </configuration>
    
    # Remarks
    hadoop0 -> 127.0.0.1 be hadoop-client Client cannot connect because listening locally
    
    Change to hdfs://0.0.0.0:9000, hadoop-client connections can normally create folders, but cannot read and write
    Exception: There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
    
    stack overflow Suggest changing to local ip,But perhaps on Huawei's cloud server because ip Forwarding issues, which are not resolved, have time to study.
    (Determined above Exeption There is no problem deploying on a single machine, it should be that the cloud server has ip Forwarding relationship)
    
    (3) etc/hadoop/hdfs-site.xml
    # Add to
    <configuration>
        <property>
                <name>dfs.replication</name>    # Number of copies
                <value>1</value>
        </property>
    
        <property>
                 # File blocks location, default under linux system tmp folder, restart may be lost
                 # So you need to modify the storage location
                <name>hadoop.tmp.dir</name>   
                <value>/home/hadoop0/data/tmp</value>
        </property>
    </configuration>
    
    (4) modify workers file
    # Add configuration ip or map name
    hadoop0

4. Startup and Verification

  • Perform system formatting before first starting hadoop

    # Format
    hdfs namenode -format
  • start-up

    # Start dfs
    sbin/start-dfs.sh
    //Start hadoop dfs log location: logs/hadoop-hadoop0-namenode-xxx.log
    # Stop Clustering
    sbin/stop-dfs.sh
    # Single Component Process Start
    sbin/hadoop-daemons.sh stop|start|status xxx
    xxx Can be:
        NameNode
        SecondaryNameNode 
        DataNode
        
    netstat -ntlp
  • Verification

    (1)linux command line jps Appear
    NameNode,DataNode,SecondaryNameNode
    
    (2)Verify Web Site-Namenode information
    http://ip:9870 Attention to firewall issues sudo ufw allow 9870 / systemctl stop firewalld
    
    (3)Two types of permanent shutdown/How to turn on the firewall
    systemctl disable firewalld
    systemctl enable firewalld
    chkconfig iptables off
    chkconfig iptables on

Note: The difference between chkconfig and systemctl

5. Common Hadoop command line operations

  • Common File System Operations: View, Store, Move, Delete

    hadoop fs -ls /         # View the hadoop root folder
        -cp src dest  # copy
        -getmerge file1 file2 localdst #merge
        -get          # Obtain
        -put          # Submit (both local and hdfs can)
        ....
  • The difference between -cat and -text (text decodes and transcodes text, cat does not, so cat command output is scrambled)




Notes and Linux commands used during installation

  1. Linux hosts file modification

    # Open vi/etc/hosts file to modify, create user name hadoop0
    127.0.0.1       localhost
    127.0.0.1       hadoop0 
    
    uname -a # Get system information
  2. Some Linux commands

    • ls command

      ls -a # View all files including hiding 
      ls -la # Tree display
      ll -h # Display data size, convert to (K, M...)
      env  # View the system's current environment variables
    • tar command

      tar -zxvf jdk*.tar.gz -C ~/app # Unzip file to specified directory
      tar -czvf *.tar abc/   # Package Compressed Files 
      
      //Directives:
      -c  # Create Compressed File
      -x  # Unzip Compressed File
      -t  # See which files are in the compressed package
      -z  # Decompression or compression with Gzip
      -j  # Decompression or compression with bzip2
      -v  # Show Detailed Procedures
      -f  # Target File Name
      -P  # Keep original permissions and attributes
      -p  # Use absolute path compression
      -C  # Specify the directory to unzip to
  3. ssh Modify Port

    (1) Modify the ssh port
     The default port for SSH is 22, configured at/etc/ssh/sshd_config
    Port 22
    Port 800
     Edit firewall configuration to enable ports 22 and 800.
    sudo /etc/init.d/ssh restart 
    This way the ssh port will work on 22 and 800 at the same time.
    
    (2) Validation of results
     a. Use SSH root@localhost-p 800
        
    b. or use systemctl status ssh
     Appear
    Server listening on 0.0.0.0 port 800.
    Server listening on :: port 800.
    Server listening on 0.0.0.0 port 22.
    Server listening on :: port 22.
    
    If the connection is successful, edit the settings of sshd_config again and delete Port22 inside.

Posted by cbn_noodles on Wed, 18 Mar 2020 19:38:36 -0700