Previously, I planned to directly install and configure the flink service on ambari to facilitate management, but I found that the flink integrated by ambari would have many problems, which would be inconvenient to manage (maybe I didn't find the correct method), so I planned to configure the service separately
Download two files
Flink-1.10.1 file: https://archive.apache.org/dist/flink/flink-1.10.1/flink-1.10.1-bin-scala_2.11.tgz
Hadoop dependent packages: https://repo.maven.apache.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/2.7.5-10.0/flink-shaded-hadoop-2-uber-2.7.5-10.0.jar
First unzip the flink package to the / opt directory (other directories can also be used)
cd /tmp tar -zxvf flink-1.10.1-bin-scala_2.11.tgz -C /opt/
Put the Hadoop dependency package in the / opt/flink-1.10.1/lib directory
Here, xftp is used to move directly, and the step is omitted
Maximum number of attempts to modify YARN task
Method 1: modify it directly on ambari
Find the advanced yen site under yen - > configs - > advanced, and change the value of yen.resourcemanager.am.max-attempts to 4
Method 2: modify the yarn-site.xml file and add or modify the following contents
<property> <name>yarn.resourcemanager.am.max-attempts</name> <value>4</value> </property>
Modify the conf/flink-conf.yaml file
# Configure the host of the master node jobmanager.rpc.address: node-00
Modify the conf/masters file
# Change the host to the host of the master node node-00:8081
Modify the conf / slave file
#Add host for all compute nodes node-00 node-01 node-02 node-03
Distribute the entire flink folder to all child nodes
scp -r /opt/flink-1.10.1 node-01:/opt scp -r /opt/flink-1.10.1 node-02:/opt scp -r /opt/flink-1.10.1 node-03:/opt
Start cluster
cd /opt/flink-1.10.1/bin bash start-cluster.sh
Enter the corresponding web address on the browser to view the interface
node-00:8081
Test Flink standalone mode
Modify the HDFS permission so that all users can operate the files: open the HDFS management page in ambari and click
Configs->Advanced->Anvanced hdfs-site->dfs.permissions.enabled The modified value is false
After the modification is completed, restart the components on all nodes to take effect. On the HDSF management page, pop up: Restart Required: 10 Components on 4 Hosts, and select restart - > Restart all affected
Then wait for restart
Create test file directory
hdfs dfs -mkdir /test hdfs dfs -mkdir /test/input hdfs dfs -mkdir /test/output
Upload the statistical word frequency file of the test to the input directory (the file can be found on the Internet)
hdfs dfs -put /tmp/wordcount.txt /test/input
Run the case task that comes with flink
cd /opt/flink-1.10.1/bin bash flink run ../examples/batch/WordCount.jar --input hdfs://node-00:8020/test/input/wordcount.txt --output hdfs://node-00:8020/test/output/result.txt
Stepping pit
If the following or similar errors are reported
Protocol message end-group tag did not match expected tag.; Host Details : local host is: "node-00.hdp/192.168.10.100"; destination host is: "node-00":8020
It is likely that there is an error in the link address of hdfs
hdfs getconf -confkey fs.default.name
The output link address is the real address of hdfs. Replace the address in the above command
View run results
hdfs dfs -cat /test/output/result.txt
You can access the web interface in the main node ip address: 8081
Jobs->Completed Jobs
View completed tasks
Test the operation of Flink on YARN mode
Clear the result file of the previous run first
hdfs dfs -rm /test/output/result.txt
Others remain unchanged, and the running code is changed to
bash flink run -m yarn-cluster ../examples/batch/WordCount.jar --input hdfs://node-00:8020/test/input/wordcount.txt --output hdfs://node-00:8020/test/output/result.txt
The operation status cannot be seen on the Flink management interface. You need to go to the YARN web interface to see it, because it has been put into YARN management, and enter the YARN web interface
http://ip address of YARN service installation node: 8088
Stepping pit
This problem has been found online for a long time. Finally, thank you for this post for giving the correct solution: https://blog.csdn.net/qq_41398614/article/details/107635391
If you encounter the following or similar error reports
java.lang.NoClassDefFoundError: com/sun/jersey/core/util/FeaturesAndProperties
That is probably because the jersey class cannot be found. To install the dependent package, you need to download three packages at the following address
https://repo1.maven.org/maven2/com/sun/jersey/jersey-core/1.9/jersey-core-1.9.jar https://repo1.maven.org/maven2/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar https://repo1.maven.org/maven2/com/sun/jersey/jersey-client/1.9/jersey-client-1.9.jar
Put it under the / opt/flink-1.10.1/lib folder and distribute it to the child nodes at the same time. Restart Flink. If the Flink restart still fails, restart the whole cluster