Experimental environment
JDK 1.8
IDE Intellij idea
Flink 1.8.1
Experimental content
Create a Flink Simple Demo that counts the number of words from the stream data.
Experimental steps
First, create a maven project, where the pom.xml file is as follows:
<properties> <flink.version>1.8.1</flink.version> </properties> <dependencies> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>${flink.version}</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java_2.11</artifactId> <version>${flink.version}</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-scala_2.11</artifactId> <version>${flink.version}</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-wikiedits_2.11</artifactId> <version>${flink.version}</version> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <configuration> <source>8</source> <target>8</target> </configuration> </plugin> <plugin> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-maven-plugin</artifactId> <version>2.1.4.RELEASE</version> <configuration> <mainClass>wikiedits.StreamingJob</mainClass> </configuration> <executions> <execution> <goals> <goal>repackage</goal> </goals> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <configuration> <skip>true</skip> </configuration> </plugin> </plugins> </build>
Create a package com.vincent and create a class StreamingJob.java
public class WikipediaAnalysis { public static void main(String[] args) throws Exception { } }
The first step in the Flink program is to create a Stream Execution Environment. Stream Execution Environment can set parameters and import data sources from external systems.
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
Next, an external data source is created. The external data source uses NC-L 9000 to indicate that the server opens the listening port 9000 and can send data.
DataStream<String> text = env.socketTextStream("192.168.152.45", 9000);
In this way, a stream text data source is added. With DataStream, the data can be retrieved and analyzed.
DataStream<Tuple2<String, Integer>> dataStream = text.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() { @Override public void flatMap(String s, Collector<Tuple2<String, Integer>> collector) throws Exception { String[] tokens = s.toLowerCase().split("\\W+"); for (String token : tokens) { if (token.length() > 0) { collector.collect(new Tuple2<String, Integer>(token, 1)); } } } }).keyBy(0).timeWindow(Time.seconds(5)).sum(1);
flatMap means that nested sets are transformed and flattened into non-nested sets, the string is s, and the return value is Collector < Tuple2 < String, Integer >. And add one operation according to keyBy(0), that is, field 0. The. timeWindow() specifies a window size of 5 seconds.
So the whole code is as follows:
public class StreamingJob { public static void main(String[] args) throws Exception { final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); DataStream<String> text = env.socketTextStream("192.168.152.45", 9000); DataStream<Tuple2<String, Integer>> dataStream = text.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() { @Override public void flatMap(String s, Collector<Tuple2<String, Integer>> collector) throws Exception { String[] tokens = s.toLowerCase().split("\\W+"); for (String token : tokens) { if (token.length() > 0) { collector.collect(new Tuple2<String, Integer>(token, 1)); } } } }).keyBy(0).timeWindow(Time.seconds(5)).sum(1); dataStream.print(); // execute program env.execute("Java WordCount from SocketTextStream Example"); } }
Function
Run the main method, then execute NC-L 9000 on the server side and enter text:
iie4bu@swarm-manager:~$ nc -l 9000 a b d d e f
Then output from the intellij console:
1> (b,1) 3> (a,1) 1> (f,1) 3> (d,2) 1> (e,1)
The number of times each word can be counted.