Apache Flink Zero Foundation Getting Started Writing the Simplest Hello World

Keywords: Programming Maven Apache Java JDK

Experimental environment

JDK 1.8

IDE Intellij idea

Flink 1.8.1

Experimental content

Create a Flink Simple Demo that counts the number of words from the stream data.

Experimental steps

First, create a maven project, where the pom.xml file is as follows:

    <properties>
        <flink.version>1.8.1</flink.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-java</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java_2.11</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-scala_2.11</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-wikiedits_2.11</artifactId>
            <version>${flink.version}</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>8</source>
                    <target>8</target>
                </configuration>
            </plugin>

            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
                <version>2.1.4.RELEASE</version>
                <configuration>
                    <mainClass>wikiedits.StreamingJob</mainClass>
                </configuration>
                <executions>
                    <execution>
                        <goals>
                            <goal>repackage</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <configuration>
                    <skip>true</skip>
                </configuration>
            </plugin>
        </plugins>
    </build>

Create a package com.vincent and create a class StreamingJob.java

public class WikipediaAnalysis {
	public static void main(String[] args) throws Exception {

	}
}

The first step in the Flink program is to create a Stream Execution Environment. Stream Execution Environment can set parameters and import data sources from external systems.

final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

Next, an external data source is created. The external data source uses NC-L 9000 to indicate that the server opens the listening port 9000 and can send data.

DataStream<String> text = env.socketTextStream("192.168.152.45", 9000);

In this way, a stream text data source is added. With DataStream, the data can be retrieved and analyzed.

        DataStream<Tuple2<String, Integer>> dataStream = text.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
            @Override
            public void flatMap(String s, Collector<Tuple2<String, Integer>> collector) throws Exception {
                String[] tokens = s.toLowerCase().split("\\W+");

                for (String token : tokens) {
                    if (token.length() > 0) {
                        collector.collect(new Tuple2<String, Integer>(token, 1));
                    }
                }
            }
        }).keyBy(0).timeWindow(Time.seconds(5)).sum(1);

flatMap means that nested sets are transformed and flattened into non-nested sets, the string is s, and the return value is Collector < Tuple2 < String, Integer >. And add one operation according to keyBy(0), that is, field 0. The. timeWindow() specifies a window size of 5 seconds.

So the whole code is as follows:

public class StreamingJob {
    public static void main(String[] args) throws Exception {
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        DataStream<String> text = env.socketTextStream("192.168.152.45", 9000);
        DataStream<Tuple2<String, Integer>> dataStream = text.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
            @Override
            public void flatMap(String s, Collector<Tuple2<String, Integer>> collector) throws Exception {
                String[] tokens = s.toLowerCase().split("\\W+");

                for (String token : tokens) {
                    if (token.length() > 0) {
                        collector.collect(new Tuple2<String, Integer>(token, 1));
                    }
                }
            }
        }).keyBy(0).timeWindow(Time.seconds(5)).sum(1);

        dataStream.print();
        // execute program
        env.execute("Java WordCount from SocketTextStream Example");
    }
}

Function

Run the main method, then execute NC-L 9000 on the server side and enter text:

iie4bu@swarm-manager:~$ nc -l 9000
a b d d e f

Then output from the intellij console:

1> (b,1)
3> (a,1)
1> (f,1)
3> (d,2)
1> (e,1)

The number of times each word can be counted.

Posted by kavitam on Wed, 09 Oct 2019 00:08:13 -0700