Flink notes: Flink data saving redis (custom Redis Sink)

Keywords: Redis kafka Jedis Apache

This paper mainly introduces the process that Flink reads Kafka data and sinks (Sink) data to Redis in real time. Through the following link: Flink official documents , we know that the fault tolerance mechanism for saving data to Redis is at least once. So we use idempotent operation and the principle of overwriting old data with new data under the same data condition to realize exactly once.

1.config.properties configuration file

bootstrap.server=192.168.204.210:9092,192.168.204.211:9092,192.168.204.212:9092
group.id=testGroup
auto.offset.reset=earliest
enable.auto.commit=false
topics=words

redis.host=192.168.204.210
redis.port=6379
redis.password=123456
redis.timeout=5000
redis.db=0

2.RedisUtils tool class

/**
 * TODO FlinkUtils Tool class (continuous update writing)
 *
 * @author liuzebiao
 * @Date 2020-2-18 9:11
 */
public class FlinkUtils {

    private static StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

    /**
     * Back to Flink streaming environment
     * @return
     */
    public static StreamExecutionEnvironment getEnv(){
        return env;
    }

    /**
     * Flink Reading data from Kafka (satisfying exactly once)
     * @param parameters
     * @param clazz
     * @param <T>
     * @return
     * @throws IllegalAccessException
     * @throws InstantiationException
     */
    public static <T> DataStream<T> createKafkaStream(ParameterTool parameters, Class<? extends DeserializationSchema> clazz) throws IllegalAccessException, InstantiationException {

        //Set global parameters
        env.getConfig().setGlobalJobParameters(parameters);
        //1. Only when checkpoint is turned on can there be a restart strategy
        //Set Checkpoint mode (integrate with Kafka, be sure to set Checkpoint mode to exactly "once")
        env.enableCheckpointing(parameters.getLong("checkpoint.interval",5000L),CheckpointingMode.EXACTLY_ONCE);
        //2. The default restart strategy is: fixed delay infinite restart
        //The restart policy is set here as follows: restart 3 times in case of exception, every 5 seconds (you can also write in the flink-conf.yaml configuration file). Configuration here will overwrite)
        env.getConfig().setRestartStrategy(RestartStrategies.fixedDelayRestart(10, Time.seconds(20)));
        //If the system exits abnormally or cancels manually, the checkpoint data will not be deleted
        env.getCheckpointConfig().enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);

        /**2.Source:Read messages in Kafka**/
        //Kafka props
        Properties properties = new Properties();
        //Specify the Broker address of Kafka
        properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, parameters.getRequired("bootstrap.server"));
        //Specified group ID
        properties.put(ConsumerConfig.GROUP_ID_CONFIG, parameters.getRequired("group.id"));
        //If the offset is not recorded, the first consumption is from the beginning
        properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, parameters.get("auto.offset.reset","earliest"));
        //Kafka's consumers, do not automatically submit offsets
        properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, parameters.get("enable.auto.commit","false"));

        String topics = parameters.getRequired("topics");

        List<String> topicList = Arrays.asList(topics.split(","));

        FlinkKafkaConsumer<T> kafkaConsumer = new FlinkKafkaConsumer(topicList, clazz.newInstance(), properties);

        return env.addSource(kafkaConsumer);
    }
}

3.Flink integrates Kafka

Flink integrates Kafka to realize Exactly Once, which is not covered in this article. You can refer to: Flink integrates Kafka (to realize exactly once)

4. Customize Redis Sink

Flink has released other flow connectors (including ActiveMQ, Flume, Redis, Akka, Netty) for Flink through Apache Bahir. The official link is as follows: Flink official Apache Bahir sink link

The official Redis Sink is not very convenient for us to use, so here we come from the definition of Redis Sink.

/**
 * TODO Customize Redis Sink
 *
 * @author liuzebiao
 * @Date 2020-2-18 10:26
 */
public class MyRedisSink extends RichSinkFunction<Tuple3<String, String, Integer>> {

    private transient Jedis jedis;

    @Override
    public void open(Configuration config) {

        ParameterTool parameters = (ParameterTool)getRuntimeContext().getExecutionConfig().getGlobalJobParameters();
        String host = parameters.getRequired("redis.host");
        String password = parameters.get("redis.password", "");
        Integer port = parameters.getInt("redis.port", 6379);
        Integer timeout = parameters.getInt("redis.timeout", 5000);
        Integer db = parameters.getInt("redis.db", 0);
        jedis = new Jedis(host, port, timeout);
        jedis.auth(password);
        jedis.select(db);
    }


    @Override
    public void invoke(Tuple3<String, String, Integer> value, Context context) throws Exception {
        if (!jedis.isConnected()) {
            jedis.connect();
        }
        //Preservation
        jedis.hset(value.f0, value.f1, String.valueOf(value.f2));
    }

    @Override
    public void close() throws Exception {
        jedis.close();
    }
}

5. code

/**
 1. TODO Read Kafka words in real time, calculate and save data to Redis
 2.  3. @author liuzebiao
 3. @Date 2020-2-18 9:50
 */
public class Test {

    public static void main(String[] args) throws Exception {

        StreamExecutionEnvironment env = FlinkUtils.getEnv();

        ParameterTool parameters = ParameterTool.fromPropertiesFile("config.properties Path of configuration file");

        DataStream<String> kafkaStream = FlinkUtils.createKafkaStream(parameters, SimpleStringSchema.class);

        SingleOutputStreamOperator<Tuple3<String, String, Integer>> tuple3Operator = kafkaStream.flatMap((String lines, Collector<Tuple3<String, String, Integer>> out) -> {
            Arrays.stream(lines.split(" ")).forEach(word -> out.collect(Tuple3.of("WordCount",word, 1)));
        }).returns(Types.TUPLE(Types.STRING, Types.STRING,Types.INT));

        SingleOutputStreamOperator<Tuple3<String, String, Integer>> sum = tuple3Operator.keyBy(1).sum(2);

        sum.addSink(new MyRedisSink());

        env.execute("Test");
    }
}

6. Test results

  1. Read data from Kafka and save the calculation results to Redis in real time.
  2. At this time, if the Redis service is abnormal, Kafka continues to write messages. After the Redis service is restarted, the data during the abnormal period can be recalculated to ensure that the data is not lost.
  3. Save the data to Redis. This is the way that the latter overwrites the former to ensure the most effective once in Chengdu.

The data saved to Redis is as follows:

Flink integrates Kafka to read data and save Redis. So far

The articles are all written by the blogger's heart. If this article helps you, please give me a compliment^

End

264 original articles published, 49 praised, 50000 visitors+
Private letter follow

Posted by quak on Wed, 19 Feb 2020 20:38:23 -0800