Hadoop: Flink on Yan service configuration and settings

Keywords: Hadoop flink

Previously, I planned to directly install and configure the flink service on ambari to facilitate management, but I found that the flink integrated by ambari would have many problems, which would be inconvenient to manage (maybe I didn't find the correct method), so I planned to configure the service separately

Download two files

Flink-1.10.1 file: https://archive.apache.org/dist/flink/flink-1.10.1/flink-1.10.1-bin-scala_2.11.tgz

Hadoop dependent packages: https://repo.maven.apache.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/2.7.5-10.0/flink-shaded-hadoop-2-uber-2.7.5-10.0.jar

First unzip the flink package to the / opt directory (other directories can also be used)

cd /tmp
tar -zxvf flink-1.10.1-bin-scala_2.11.tgz -C /opt/

Put the Hadoop dependency package in the / opt/flink-1.10.1/lib directory

Here, xftp is used to move directly, and the step is omitted

Maximum number of attempts to modify YARN task

Method 1: modify it directly on ambari

Find the advanced yen site under yen - > configs - > advanced, and change the value of yen.resourcemanager.am.max-attempts to 4

Method 2: modify the yarn-site.xml file and add or modify the following contents

<property>
<name>yarn.resourcemanager.am.max-attempts</name>
<value>4</value>
</property>

Modify the conf/flink-conf.yaml file

# Configure the host of the master node
jobmanager.rpc.address: node-00

Modify the conf/masters file

# Change the host to the host of the master node
node-00:8081

Modify the conf / slave file

#Add host for all compute nodes
node-00
node-01
node-02
node-03

Distribute the entire flink folder to all child nodes

scp -r /opt/flink-1.10.1 node-01:/opt
scp -r /opt/flink-1.10.1 node-02:/opt
scp -r /opt/flink-1.10.1 node-03:/opt

Start cluster

cd /opt/flink-1.10.1/bin
bash start-cluster.sh

Enter the corresponding web address on the browser to view the interface

node-00:8081

Test Flink standalone mode

Modify the HDFS permission so that all users can operate the files: open the HDFS management page in ambari and click

Configs->Advanced->Anvanced hdfs-site->dfs.permissions.enabled
 The modified value is false

After the modification is completed, restart the components on all nodes to take effect. On the HDSF management page, pop up: Restart Required: 10 Components on 4 Hosts, and select restart - > Restart all affected

Then wait for restart

Create test file directory

hdfs dfs -mkdir /test
hdfs dfs -mkdir /test/input
hdfs dfs -mkdir /test/output

Upload the statistical word frequency file of the test to the input directory (the file can be found on the Internet)

hdfs dfs -put /tmp/wordcount.txt /test/input

Run the case task that comes with flink

cd /opt/flink-1.10.1/bin
bash flink run ../examples/batch/WordCount.jar --input hdfs://node-00:8020/test/input/wordcount.txt --output hdfs://node-00:8020/test/output/result.txt

Stepping pit

If the following or similar errors are reported

Protocol message end-group tag did not match expected tag.; Host Details : local host is: "node-00.hdp/192.168.10.100"; destination host is: "node-00":8020

It is likely that there is an error in the link address of hdfs

hdfs getconf -confkey fs.default.name

The output link address is the real address of hdfs. Replace the address in the above command

View run results

hdfs dfs -cat /test/output/result.txt

You can access the web interface in the main node ip address: 8081

Jobs->Completed Jobs 

View completed tasks

Test the operation of Flink on YARN mode

Clear the result file of the previous run first

hdfs dfs -rm /test/output/result.txt

Others remain unchanged, and the running code is changed to

bash flink run -m yarn-cluster ../examples/batch/WordCount.jar --input hdfs://node-00:8020/test/input/wordcount.txt --output hdfs://node-00:8020/test/output/result.txt

The operation status cannot be seen on the Flink management interface. You need to go to the YARN web interface to see it, because it has been put into YARN management, and enter the YARN web interface

http://ip address of YARN service installation node: 8088

Stepping pit

This problem has been found online for a long time. Finally, thank you for this post for giving the correct solution: https://blog.csdn.net/qq_41398614/article/details/107635391

If you encounter the following or similar error reports

java.lang.NoClassDefFoundError: com/sun/jersey/core/util/FeaturesAndProperties

That is probably because the jersey class cannot be found. To install the dependent package, you need to download three packages at the following address

https://repo1.maven.org/maven2/com/sun/jersey/jersey-core/1.9/jersey-core-1.9.jar
https://repo1.maven.org/maven2/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar
https://repo1.maven.org/maven2/com/sun/jersey/jersey-client/1.9/jersey-client-1.9.jar

Put it under the / opt/flink-1.10.1/lib folder and distribute it to the child nodes at the same time. Restart Flink. If the Flink restart still fails, restart the whole cluster

Posted by sangamon on Tue, 26 Oct 2021 18:36:19 -0700