Flink Asynchronous IO Learning Records

International practice, first put Official Documents Introducing a wave

Every time you join a flink IO, you access the database, so the database read is based on disk IO, which must be slow, so this can become a performance bottleneck for stream processing.

Asynchronous IO then asynchronizes the original synchronous requests, and the total time consumed is allocated to IO multiple times.

Asynchronous interaction with the database 
means that a single parallel function instance 
can handle many requests concurrently and receive the 
responses concurrently.

Asynchronous means that a function with a concurrency of 1 can concurrently initiate multiple requests and concurrently receive multiple responses

Then you might ask, why not just increase the parallelism of the function?

 parallelism is in some cases possible as well, but usually comes at
 a very high resource cost: Having many more parallel
 MapFunction instances means more tasks, 
 threads, Flink-internal network connections, 
 network connections to the database, buffers, 
 and general internal bookkeeping overhead.

The higher the Parallel settings, the more Task s there will be, more threads will be opened, and more connections will be made to the internal network, which in fact will cause increased overhead.

Talk is cheap, show me the code

If you want to make asynchronous IO requests to the database, you need to implement three parts:

Implement AsyncFunction to distribute requests
Callback function, passed to ResultFuture after the result of the request is obtained
In DataStream, use asynchronous IO operations as a transformation

//source stream
DataStream<Integer> inputStream = env.addSource(new SimpleSource(maxCount));

//Create Asynchronous Function
AsyncFunction<Integer, String> function =
        new SampleAsyncFunction(sleepFactor, failRatio, shutdownWaitTS);

//Asynchronous IO as Operator
DataStream<String> result;
if (ORDERED.equals(mode)) {
    result = AsyncDataStream.orderedWait(
            inputStream,
            function,
            timeout,
            TimeUnit.MILLISECONDS,
            20
    ).setParallelism(taskNum);
} else {
    result = AsyncDataStream.unorderedWait(
            inputStream,
            function,
            timeout,
            TimeUnit.MILLISECONDS,
            20
    ).setParallelism(taskNum);
}

result.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {

    @Override
    public void flatMap(String s, Collector<Tuple2<String, Integer>> collector) throws Exception {
        collector.collect(new Tuple2<>(s, 1));
    }
})
        .keyBy(0)
        .sum(1)
        .print();

env.execute();
}

Then let's see how this callback function is set up

The key is to rewrite the asyncInoke function to return the results of the IO request to ResultFuture

 private static class SampleAsyncFunction extends RichAsyncFunction<Integer, String> {

        private transient ExecutorService executorService;

        // Thread pool pause working time to simulate a time-consuming asynchronous operation
        private long sleepFactor;

        // Simulate an IO request in error
        private float failRatio;

        private long shutdownWaitTS;

        public SampleAsyncFunction(long sleepFactor, float failRatio, long shutdownWaitTS) {
            this.sleepFactor = sleepFactor;
            this.failRatio = failRatio;
            this.shutdownWaitTS = shutdownWaitTS;
        }

       //Simulate thread pool, issue request
        @Override
        public void open(Configuration parameters) throws Exception {
            super.open(parameters);
            executorService = Executors.newFixedThreadPool(30);

        }

        @Override
        public void close() throws Exception {
            super.close();
            ExecutorUtils.gracefulShutdown(shutdownWaitTS, TimeUnit.MILLISECONDS, executorService);
        }

		//callback
        @Override
        public void asyncInvoke(Integer integer, ResultFuture<String> resultFuture) throws Exception {
            executorService.submit(() -> {
                long sleep = (long) (ThreadLocalRandom.current().nextFloat() * sleepFactor);
                try {
                    Thread.sleep(sleep);
                    if (ThreadLocalRandom.current().nextFloat() < failRatio) {
                        resultFuture.completeExceptionally(new Exception("failed"));
                    } else {
                        resultFuture.complete(Collections.singletonList("key-" + (integer %10)));
                    }
                } catch (InterruptedException e) {
                    resultFuture.complete(new ArrayList<>(0));
                }
            });
        }
    }

Xiao Kai Alex

58 original articles were published. 14 were praised. 30,000 visits+

Private letter follow

Posted by songwind on Fri, 10 Jan 2020 19:56:00 -0800

Programmer Group

Flink Asynchronous IO Learning Records

Hot Keywords