**System Configuration** ** Specification: ** 1vCPUs | 2GB | s6.medium.2 ** Mirror: ** Ubuntu 18.04 server 64bit **User: ** Create a halo user on Ubuntu
Preparatory software: 1 Hadoop installation package (recommended cdh, cloudera site) 2 Java 1.8 +3 ssh
1. Install Java
- Download the Linux version of JDK jdk-8u161-linux-x64.tar.gz first
-
Unzip installation package
tar -zxvf jdk-8u161-linux-x64.tar.gz -C unzipPath
-
Configure environment variables/etc/profile or ~/.bash_profile
#set java environment JAVA_HOME=/usr/local/java/jdk1.8.0_161 JRE_HOME=$JAVA_HOME/jre CLASS_PATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib/rt.jar PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin export JAVA_HOME JRE_HOME CLASS_PATH PATH
- Make configuration content valid source/etc/profile
- Verify java-version
2. Install ssh and configure secret-free login
-
Server purchased, ssh installed, need to check if ssh service is installed in case of single machine deployment
ps -e | grep ssh # View ssh for systemctl status ssh # Detecting ssh status
-
Install ssh
(1)Determine whether to install ssh Services can be performed by the following commands: ssh localhost ssh: connect to host localhost port 22: Connection refused (2)As shown above, it means that there is no installation yet, and you can apt Install with the following commands: apt-get install openssh-server (3)Start the service: sudo /etc/init.d/ssh start
-
ssh Secret Login
cd ~ ssh-keygen -t rsa cd .ssh # Write generated rsa public key information to authorized_keys file cat id_rsa.pub >> authorized_keys # Modify read and write permissions for authorized_keys file chmod 600 authorized_keys
.ssh folder structure
|--- id_rsa # SSH RSA generated private key file
|--- id_rsa.pub # SSH RSA generated public key file
|--- authorized_keys #Secret-free login file
|--- know_hosts # SSH remote login record
When SSH is a machine that has not logged on, you often need to enter yes to confirm that adding the know_hosts file is not convenient for some scripts. You can modify the / etc/ssh/ssh_config file to add it automatically. Note that ssh_config is not sshd_config Find # StrictHostKeyChecking ask modified to StrictHostKeyChecking no This enables automatic addition of know_hosts to~
3. Hadoop Installation
- Unzip the Hadoop3 package
-
Add environment variable.profile
export HADOOP_HOME=/home/hadoop0/app/hadoop-3.1.3 export PATH=$HADOOP_HOME/bin:$PATH
- hadoop directory description
|--- bin # Hadoop client command
|--- etc/hadoop #Dependent Profile Storage Directory
|--- sbin #Start hadoop related process script (Server side)
|--- share #Common use cases* (share/hadoop/mapreduce)
-
Modify Profile
(1) etc/hadoop/hadoop_env.sh # Add to export JAVA_HOME=/software/java/jdk1.8.0_161 (2) etc/hadoop/core-site.xml # Add hadoop0 as the configured local hosts <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop0:9000</value> </property> </configuration> # Remarks hadoop0 -> 127.0.0.1 be hadoop-client Client cannot connect because listening locally Change to hdfs://0.0.0.0:9000, hadoop-client connections can normally create folders, but cannot read and write Exception: There are 1 datanode(s) running and 1 node(s) are excluded in this operation. stack overflow Suggest changing to local ip,But perhaps on Huawei's cloud server because ip Forwarding issues, which are not resolved, have time to study. (Determined above Exeption There is no problem deploying on a single machine, it should be that the cloud server has ip Forwarding relationship) (3) etc/hadoop/hdfs-site.xml # Add to <configuration> <property> <name>dfs.replication</name> # Number of copies <value>1</value> </property> <property> # File blocks location, default under linux system tmp folder, restart may be lost # So you need to modify the storage location <name>hadoop.tmp.dir</name> <value>/home/hadoop0/data/tmp</value> </property> </configuration> (4) modify workers file # Add configuration ip or map name hadoop0
4. Startup and Verification
-
Perform system formatting before first starting hadoop
# Format hdfs namenode -format
-
start-up
# Start dfs sbin/start-dfs.sh //Start hadoop dfs log location: logs/hadoop-hadoop0-namenode-xxx.log # Stop Clustering sbin/stop-dfs.sh # Single Component Process Start sbin/hadoop-daemons.sh stop|start|status xxx xxx Can be: NameNode SecondaryNameNode DataNode netstat -ntlp
-
Verification
(1)linux command line jps Appear NameNode,DataNode,SecondaryNameNode (2)Verify Web Site-Namenode information http://ip:9870 Attention to firewall issues sudo ufw allow 9870 / systemctl stop firewalld (3)Two types of permanent shutdown/How to turn on the firewall systemctl disable firewalld systemctl enable firewalld chkconfig iptables off chkconfig iptables on
Note: The difference between chkconfig and systemctl
5. Common Hadoop command line operations
-
Common File System Operations: View, Store, Move, Delete
hadoop fs -ls / # View the hadoop root folder -cp src dest # copy -getmerge file1 file2 localdst #merge -get # Obtain -put # Submit (both local and hdfs can) ....
- The difference between -cat and -text (text decodes and transcodes text, cat does not, so cat command output is scrambled)
Notes and Linux commands used during installation
-
Linux hosts file modification
# Open vi/etc/hosts file to modify, create user name hadoop0 127.0.0.1 localhost 127.0.0.1 hadoop0 uname -a # Get system information
-
Some Linux commands
-
ls command
ls -a # View all files including hiding ls -la # Tree display ll -h # Display data size, convert to (K, M...) env # View the system's current environment variables
-
tar command
tar -zxvf jdk*.tar.gz -C ~/app # Unzip file to specified directory tar -czvf *.tar abc/ # Package Compressed Files //Directives: -c # Create Compressed File -x # Unzip Compressed File -t # See which files are in the compressed package -z # Decompression or compression with Gzip -j # Decompression or compression with bzip2 -v # Show Detailed Procedures -f # Target File Name -P # Keep original permissions and attributes -p # Use absolute path compression -C # Specify the directory to unzip to
-
-
ssh Modify Port
(1) Modify the ssh port The default port for SSH is 22, configured at/etc/ssh/sshd_config Port 22 Port 800 Edit firewall configuration to enable ports 22 and 800. sudo /etc/init.d/ssh restart This way the ssh port will work on 22 and 800 at the same time. (2) Validation of results a. Use SSH root@localhost-p 800 b. or use systemctl status ssh Appear Server listening on 0.0.0.0 port 800. Server listening on :: port 800. Server listening on 0.0.0.0 port 22. Server listening on :: port 22. If the connection is successful, edit the settings of sshd_config again and delete Port22 inside.