[vernacular analysis] Flink's Watermark mechanism

Keywords: flink

catalogue

0x00 summary

0x01 problem

Q1. What are the common requirements / solutions in Flink stream processing applications

Question 2. Watermark should be translated into water mark

Question 3. What is the essence of watermark

Question 4. How does watermark solve the problem

0x02 background concept

Stream processing

Out of order

0x03 window concept in Flink

window

Window lifecycle

Keyed vs Non-Keyed Windows

Window classification

0x04 time concept in Flink

processing time

Extraction time

Event time

Set time characteristics

0x05 Watermark

1. Window trigger conditions

2. WaterMark setting method

3. Late events

4. Examples

0x06 Flink source code

Data structure definition

How does Flink generate & process watermarks

How does Flink handle late data

0x06 reference

0x00 summary

Watermark is a difficult concept for Flink. From the perspective of overall thinking, this paper will use perceptual and intuitive thinking to help you sort out the concept of watermark.

0x01 problem

Several problems can easily arise with Watermark

  • What are the common processing requirements / solutions in Flink stream processing applications?
  • Should Watermark be translated into Watermark or Watermark?
  • What is the essence of Watermark?
  • How does Watermark solve the problem?

Let's briefly answer these questions to give you a general concept, which will be described in detail later.

Q1. What are the common requirements / solutions in Flink stream processing applications

Processing of aggregate classes   Flink can process every message, but sometimes we need to do some aggregation processing, such as how many users have clicked on our web page in the past minute. So Flink introduced the concept of window.

window   The function of the window is to obtain data periodically. This is to cut the incoming raw data stream into multiple buckets, and all calculations are performed in a single bucket. A window is a bridge from Streaming to Batch.

Problems: aggregation class processing brings new problems, such as out of order / delay. The solution is the combination of Watermark / allowLateNess / sideOutPut.

Watermark   The function of is to prevent data disorder / failure to obtain all data within the specified time.

allowLateNess   Is to delay the window closing time for another period of time.

**sideOutPut * * is the last operation. When the specified window has been completely closed, all expired delay data will be put into the side output stream to let the user decide how to deal with it.

To sum up, that is

Windows -----> Watermark -----> allowLateNess -----> sideOutPut 
    
use Windows Block the stream data and use Watermark Determine when to stop waiting for earlier data/Trigger window for calculation with allowLateNess Delay the window closing time for another period of time. use sideOutPut Finally, export the data to other places.

Question 2. Watermark should be translated into water mark

In an article I first read, Watermark was translated into "Watermark". I was dizzy. Because the name must reflect the essence of things. But I can't make up the essence of this "Watermark".

Continue to read the content of the article, more and more think that this should be translated into "water level". So I checked it and found that there is the following translation in English: high water mark (the highest water level reached by sea water or flood).

Later, I gradually saw that other articles were also translated into watermarks. I was relieved that there would be no magical translation like the second "socket".

Question 3. What is the essence of watermark

Watermarks estimates whether there are still messages that have not arrived based on the collected messages. In essence, it is a timestamp. The timestamp reflects the time of the event, not the time of event processing.

We can see from Flink's source code that the only meaningful member variable is timestamp.

public final class Watermark extends StreamElement {
  /*The watermark that signifies end-of-event-time. */
  public static final Watermark MAX_WATERMARK = new Watermark(Long.MAX_VALUE);
  /* The timestamp of the watermark in milliseconds. */
  private final long timestamp;
  /* Creates a new watermark with the given timestamp in milliseconds.*/
  public Watermarklong timestamp) {
    this.timestamp = timestamp;
  }
  /*Returns the timestamp associated with this {@link Watermark} in milliseconds.**/
  public long getTimestamp() {
    return timestamp;
  }
}

Question 4. How does watermark solve the problem

Watermark is a way to tell Flink how late a message is. It defines when to stop waiting for earlier data.

Watermarks can be understood as a water mark, which is constantly changing. Watermarks actually flow with the data flow as a part of the data flow.

When the operator in Flink receives Watermarks, it understands that messages earlier than this time have completely arrived at the computing engine, that is, it is assumed that no events with a time less than the watermark will arrive.

This assumption is the basis of triggering window calculation. The window will be closed and calculated only when the water level crosses the corresponding end time of the window.

0x02 background concept

Stream processing

The essence of stream processing is to accept and process one data when processing data.

Batch processing is the process of accumulating data to a certain extent, which is the essential difference between them.

In terms of design, Flink believes that data is streaming, and batch processing is only a special case of stream processing. At the same time, data is divided into bounded data and unbounded data.

  • Bounded data corresponds to batch processing and API corresponds to Dateset.
  • Unbounded data corresponds to stream processing, and API corresponds to DataStream.

Out of order

What is out of order? It can be understood that the order of data arrival is inconsistent with the actual generation time. There are many reasons for this, such as delay, message backlog, Retry, etc.

As we know, there is a process and time for flow processing from event generation to flow through the source and then to the operator. Although in most cases, the data flowing to the operator is in the chronological order of event generation, it does not rule out out out of order or late element due to network, back pressure and other reasons.

For example:

Some data in a data source for some reason(For example, network reasons and external storage reasons)There will be a five second delay,
That is, the data generated in the first second of the actual time may come after the data generated in the fifth second(For example, to Window Processing node). 

Have 1~10 Events.
The sequence of out of order arrival is: 2,3,4,5,1,6,3,8,9,10,7

0x03 window concept in Flink

window

For Flink, it is OK to calculate one message, but such calculation is very frequent and consumes resources. It is impossible to make some statistics. Therefore, window calculation is generated for Spark and Flink.

For example, because we want to see the data accessed in the past minute and half an hour, we need a window at this time.

Window: window is the key to handling unbounded flow. Windows divides the flow into buckets of limited size, which can be calculated in each bucket.

start_time,end_time: when a window is a time window, each window will have a start time and end time (open before closing), which is the system time.

Window lifecycle

In short, as soon as the first element belonging to this window arrives, a window will be created. When the time (event or processing time) exceeds its end timestamp plus the allowable delay specified by the user, the window will be completely deleted.

For example:

Using the event time based window strategy, create a non overlapping (or tumbling) window every 5 minutes and allow a delay of 1 minute.
    
Suppose it's 12:00. 

When the first element with a timestamp falling into the interval arrives, Flink Will be 12:00 To 12:05 The interval between creates a new window when the waterline( watermark)To 12:06 The timestamp will be deleted when.

The window has the following components:

Window assignor: used to determine which window / window an element is assigned to.

Trigger: trigger. Determines when a window can be calculated or cleared. The trigger strategy may be similar to "when the number of elements in the window is greater than 4" or "when the water mark ends through the window".

Evictor: it can delete elements from the window after the trigger is triggered & before and / or after the function is applied.

The window also has functions, such as ProcessWindowFunction, ReduceFunction, AggregateFunction or FoldFunction. This function will contain calculations to be applied to the contents of the window, and the trigger specifies the condition under which the window is considered ready to apply the function.

Keyed vs Non-Keyed Windows

Before defining the window, the first thing to specify is whether the stream needs Keyed. Use keyBy (...) to divide the unbounded stream into logical Keyed streams. If keyBy (...) is not called, it means that the stream is not keyed stream.

  • For Keyed streams, any attribute of the incoming event can be used as a key. Owning Keyed streams will allow window calculations to be performed by multiple tasks in parallel, because each logical Keyed stream can be processed independently of the rest of the tasks. All elements of the same key will be sent to the same task.

  • In the case of non keyed flow, the original flow will not be divided into multiple logical flows, and all window logic will be executed by a single task, that is, the parallelism is 1.

Window classification

Window classification can be divided into Tumbling Window (no overlap), rolling window (overlapping), and Session Window (active gap)

scroll window
The scrolling window allocator assigns each element to a fixed window size window. Scrolling windows are fixed in size and do not overlap. For example, if you specify a scrolling window with a size of 5 minutes, the current window will be executed and a new window will be started every 5 minutes.

sliding window

The difference between sliding window and rolling window is that sliding window has repeated calculation part.

The sliding window allocator assigns each element to a window with a fixed window size. Similar to the rolling window allocator, the window size is configured by the window size parameter. Another window sliding parameter controls how often a sliding window is started . therefore, if the sliding size is smaller than the window size, the sliding windows can overlap. In this case, the elements are assigned to multiple windows.

For example, you can use a window with a window size of 10 minutes and a sliding size of 5 minutes. In this way, a window will be generated every 5 minutes, including the events that arrive in the last 10 minutes.

Session window
The session window allocator groups elements through active sessions. Compared with rolling windows and sliding windows, session windows do not overlap and have no fixed start and end times. On the contrary, when the session window does not receive elements for a period of time, it closes.

For example, when the session window allocator configures the session gap and defines how long is the required period of inactivity. When this time period expires, the current session is closed and subsequent elements are assigned to a new session window.

0x04 time concept in Flink

Flink supports different time concepts in the stream handler. They are Event Time/Processing Time/Ingestion Time, that is, event time, processing time and extraction time.

From the perspective of time series, the sequence of occurrence is:

Event time( Event Time)----> Extraction time( Ingestion Time)----> Processing time( Processing Time)
  • Event Time is the time when an event occurs in the real world. It is usually described by the timestamp in the event.
  • Ingestion Time is the time when the data enters the Apache Flink stream processing system, that is, the time when Flink reads the data source.
  • Processing Time is the corresponding system time when the data flows into a specific operator (the message is calculated and processed). That is, the current system time when the Flink program processes the event.

However, when we explain, we will explain from the back to the front, putting the most important Event Time at the end.

processing time

It is the corresponding system time when data flows into a specific operator.

This system time refers to the system time of the machine performing the corresponding operation. When a stream program runs through processing time, all time-based operations (such as time window) will use the system time of the physical machine where their respective operations are located.

ProcessingTime has the best performance and the lowest latency. However, in a distributed computing environment or asynchronous environment, ProcessingTime is uncertain, and multiple runs of the same data flow may produce different calculation results. Because it is vulnerable to the impact from the speed at which records arrive at the system (for example, from the message queue) to the speed at which records flow between operator s in the system (power failure, dispatching or other).

Extraction time

IngestionTime is the time when the data enters the Apache Flink framework and is set in the Source Operator. Each record takes the current time of the source as the timestamp, and subsequent time-based operations (such as time window) refer to the timestamp.

The extraction time is conceptually between the event time and the processing time. It is earlier than the processing time. IngestionTime can provide more predictable results than ProcessingTime because the timestamp of IngestionTime is stable (recorded only once at the source) Therefore, the same data will use the same timestamp when flowing through different window operations, while for processing time, the same data will have different processing timestamps when flowing through different window operators.

Compared with event time, the extract time program cannot process any unordered events or late data, but the program does not have to specify how to generate watermarks.

Internally, the extraction time is very similar to the event time, but it has the functions of automatic timestamp allocation and automatic watermark generation.

Event time

Event time is the time when an event occurs in the real world, that is, the time when each event occurs on the device that generates it (local time). For example, the time of a click event is the time when the user clicks on the mobile phone or computer where the operation is located.

Before entering the Apache Flink framework, EventTime is usually embedded in the record, and EventTime can also be extracted from the record. In actual online shopping orders and other business scenarios, EventTime is mostly used for data calculation.

The power of event time-based processing is that it can obtain correct results even in out of order events, delayed events, historical data and duplicate data from backup or persistent logs. For event time, the progress of time depends on the data, not any clock.

The event time program must specify how to generate Watermarks of event time, which is a mechanism to represent the progress of event time.

Now suppose we are creating a sorted data stream. This means that the application processes events that arrive out of order in the stream and generates a new data stream with the same events but sorted by timestamp (event time).

For example:

Have 1~10 Events.
The sequence of out of order arrival is: 1,2,4,5,6,3,8,9,10,7
 The sequence after processing by event time is: 1,2,3,4,5,6,7,8,9,10

In order to process the event time, Flink needs to know the event timestamp, which means that each data in the stream needs to be assigned its event timestamp. This usually completes the acquisition of timestamp by extracting the fixed fields in each data.

Set time characteristics

The first part of the Flink DataStream program is usually to set the basic time characteristics. This setting defines the behavior of the data flow sources (for example, whether they will be assigned time stamps) and which time concept above should be used for window operations such as * * KeyedStream.timeWindow(Time.seconds(30)) *.

For example:

final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);

0x05 Watermark

We talked about the event time earlier. This real time is of great concern to our business in the real-time processing program. In an ideal situation, event time processing will produce completely consistent and definite results, no matter when events arrive or their sequencing. But in reality, messages are no longer sent in order, resulting in disorder. What should I do at this time?

Watermark is a mechanism proposed by Apache Flink to handle EventTime window calculation. It is also a timestamp in essence. Watermark is used to handle out of order events or delayed data, which is usually implemented by combining watermark mechanism with window (Watermarks is used to trigger window calculation).

For example, for late element s, we can't wait indefinitely. We must have a mechanism to ensure that after a specific time, we must trigger the window for calculation. This special mechanism is watermark. Watermark can be seen as a way to tell Flink how much a message is delayed. Defines when to stop waiting for earlier data.

1. Window trigger conditions

As mentioned above, the processing mechanism for data disorder is watermark+window. When should the window be triggered?

For event processing based on Event Time, Flink's default event triggering conditions are:

For out of order and normal data

  • Timestamp of watermark > = window Endtime
  • Data exists in [window_start_time,window_end_time].

For late element too much data

  • Event time > timestamp of watermark

Watermarks are equivalent to an EndLine. Once watermarks are greater than the end of a window_ Time means windows_ end_ The window with the same time as WaterMark time starts calculation and execution.

In other words, we calculate Watermarks according to certain rules, set some delays and give some opportunities to late data. That is to say, normally, for late data, I only wait for you for a period of time, and there will be no chance if I don't come again.

WaterMark time can use the real time of Flink system or the Event time carried by processing data.

Using Flink system, there are few problems that need to be paid attention to in parallel and multithreading, because they all take real time as the standard.

If you use the Event time carried by processing data as the WaterMark time, you need to pay attention to two points:

  • Because the data arrival is not sequential, please save a current maximum timestamp as the WaterMark time
  • Parallel synchronization problem

2. WaterMark setting method

Punctuated watermark

Punctuated watermarks trigger the generation of new watermarks through some special marking events in the data flow. In this way, the trigger of the window is independent of the time, but depends on when the tag event is received.

In actual production, Punctuated mode will produce a large number of watermarks in the scene with high TPS, which will put pressure on the downstream operators to a certain extent. Therefore, Punctuated mode will be selected for Watermark generation only in the scene with high real-time requirements.

Periodic watermark

A Watermark is generated periodically (a certain time interval or a certain number of records are allowed). The time interval of water level rise is set by the user. Within the time interval of two water level rises, some messages will flow in. The user can calculate a new water level according to this part of data.

In actual production, the Periodic method must continue to generate Watermark periodically in combination with the two dimensions of time and accumulated number, otherwise there will be a great delay in extreme cases.

For example, the simplest watermark algorithm is to take the largest event time so far. However, this method is more violent, has a low tolerance for disorderly events, and is prone to a large number of late events.

3. Late events

Although the water mark indicates that events earlier than it should not occur again, as mentioned above, it is inevitable to receive messages before the water mark, which is the so-called late event. In fact, late events are a special case of disordered events. Different from general disordered events, their disordered degree exceeds the expectation of the water level, resulting in the window being closed before they arrive.

When the late event occurs, the window has been closed and the calculation results have been output. Therefore, there are three methods to deal with it:

  • Reactivate the closed window and recalculate to correct the result.
  • Collect late events and handle them separately.
  • Treat the late event as an error message and discard it.

Flink's default processing method is direct discard. The other two methods use Side Output and allowed latency respectively.

The Side Output mechanism can put the late events into a data flow branch separately, which will be used as a by-product of the window calculation results for users to obtain and process them specially.

The Allowed Lateness mechanism allows users to set a maximum Allowed Lateness. Flink will save the status of the window after the window is closed until it exceeds the allowable late time. The late events during this period will not be discarded, but will trigger window recalculation by default. Because saving the window state requires additional memory, and if the window calculation uses   ProcessWindowFunction   API may also cause each late event to trigger a full calculation of the window, which is expensive. Therefore, the allowable late time should not be set too long, and there should not be too many late events. Otherwise, we should consider reducing the speed of raising the water mark or adjusting the algorithm.

The summary mechanism here is:

  • The function of window is to obtain data periodically.

  • watermark is an insurance method to prevent data disorder (frequent) and failure to obtain all specified data within the event time.

  • allowLateNess is to delay the window closing time for another period of time.

  • sideOutPut is the last operation. All expired delay data will be put into the side output stream when the specified window has been completely closed.

4. Examples

Using system time as Watermark

We set the watermark to - 5 seconds between the current system time.

override def getCurrentWatermark(): Watermark = {       
	new Watermark(System.currentTimeMillis - 5000) 
}

It is generally best to maintain the maximum timestamp received and create a watermark with the maximum expected delay, rather than subtracting from the current system time.

Use Event Time as watermark

For example, data based on Event Time itself contains a field of timestamp type. Suppose it is called rowtime, for example, 1543903383 (2018-12-04 14:03:03). Define a watermark based on the rowtime column with an offset of 3s. The watermark timestamp of this data is:

1543903383-3000 = 1543900383(2018-12-04 14:03:00)

The meaning of the water mark time of this data: the data whose timestamp is less than 1543900383 (2018-12-04 14:03:00) has arrived.

class BoundedOutOfOrdernessGenerator extends AssignerWithPeriodicWatermarks[MyEvent] {
    val maxOutOfOrderness = 3000L; // 3 seconds
    var currentMaxTimestamp: Long;
    override def extractTimestamp(element: MyEvent, previousElementTimestamp: Long): Long = {
        val timestamp = element.getCreationTime()
        currentMaxTimestamp = max(timestamp, currentMaxTimestamp)
        timestamp;
    }
    override def getCurrentWatermark(): Watermark = {
        // return the watermark as current highest timestamp minus the out-of-orderness bound
        new Watermark(currentMaxTimestamp - maxOutOfOrderness);
    }
}

See how to trigger the window

We understand the trigger mechanism of the window. Here we add a water mark. What's the situation? Let's look at the following

If we set a 10s time window, 010s and 1020s are all windows. Take 0~10s as an example, 0 is start time and 10 is end time. If the event times of four data are 8 (a), 12.5 (b), 9 (c) and 13.5 (d), we set Watermarks as the maximum value of all current arrival data event times minus the delay value of 3.5 seconds

When A arrives, Watermarks is Max {8}-3.5 = 8-3.5 = 4.5 < 10, which will not trigger calculation
When B arrives, Watermarks is max (12.5,8) - 3.5 = 12.5-3.5 = 9 < 10, which will not trigger the calculation
When C arrives, Watermarks is max (12.5,8,9) - 3.5 = 12.5-3.5 = 9 < 10, which will not trigger the calculation
When D arrives, Watermarks is max(13.5,12.5,8,9)-3.5=13.5-3.5 = 10 = 10, triggering calculation
When triggering the calculation, A and C (because they are less than 10) will be calculated, in which C is late.

max is the key, which is the maximum event of all events in the current window.

The delay of 3.5s here is that we assume that when a data arrives, the data 3.5s earlier than him must also arrive. This needs to be calculated according to experience. Suppose an E arrives after adding D, event time = 6, but E is lost because the time window of 0 ~ 10 has begun to calculate.

The loss of E above shows that the water level line is not omnipotent, but the data can not be lost according to our own production experience + side channel output and other schemes.

0x06 Flink source code

Data structure definition

Different elements flow in Flink DataStream, collectively referred to as StreamElement. StreamElement can be any type of StreamRecord, Watermark, StreamStatus and latency marker.

StreamElement

StreamElement is an abstract class (the base class for Flink to carry messages), and the other four types inherit StreamElement.

public abstract class StreamElement {
  //Determine whether it is Watermark
  public final boolean isWatermark() {
    return getClass() == Watermark.class;
  }
  //Judge whether it is StreamStatus
  public final boolean isStreamStatus() {
    return getClass() == StreamStatus.class;
  }
  //Determine whether it is StreamRecord
  public final boolean isRecord() {
    return getClass() == StreamRecord.class;
  }
  //Judge whether it is LatencyMarker
  public final boolean isLatencyMarker() {
    return getClass() == LatencyMarker.class;
  }
  //Convert to StreamRecord
  public final <E> StreamRecord<E> asRecord() {
    return (StreamRecord<E>) this;
  }
  //Convert to Watermark
  public final Watermark asWatermark() {
    return (Watermark) this;
  }
  //Convert to StreamStatus
  public final StreamStatus asStreamStatus() {
    return (StreamStatus) this;
  }
  //Convert to LatencyMarker
  public final LatencyMarker asLatencyMarker() {
    return (LatencyMarker) this;
  }
}

Watermark

Watermark inherits StreamElement. Watermark is a level of abstraction with events. It contains a member variable timestamp, which identifies the time progress of the current data. Watermark actually flows with the data stream as part of the data stream.

@PublicEvolving
public final class Watermark extends StreamElement {
  /*The watermark that signifies end-of-event-time. */
  public static final Watermark MAX_WATERMARK = new Watermark(Long.MAX_VALUE);
  /* The timestamp of the watermark in milliseconds. */
  private final long timestamp;
  /* Creates a new watermark with the given timestamp in milliseconds.*/
  public Watermarklong timestamp) {
	this.timestamp = timestamp;
  }
  /*Returns the timestamp associated with this {@link Watermark} in milliseconds.**/
  public long getTimestamp() {
    return timestamp;
  }
}

How does Flink generate & process watermarks

In practical use, the periodic generation method is selected in most cases, that is, the assignerwith periodic watermarks method

//Time semantics specified as evenTime
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
//Cycle of generating watermark
env.getConfig.setAutoWatermarkInterval(watermarkInterval)
//Specify method
dataStream.assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor[Element](Time.seconds(allowDealy)) {
   override def extractTimestamp(element: Element): Long = element.dT
  })

Boundedoutordernesstimestampextractor is a watermark generation method that allows the maximum delay of out of order provided by Flink. You only need to override its extractTimestamp method.

assignTimestampsAndWatermarks can be understood as an operator conversion operation, which is the same as map/window. It can set parallelism, name and transformation/operator,

public SingleOutputStreamOperator<T> assignTimestampsAndWatermarks(
		AssignerWithPeriodicWatermarks<T> timestampAndWatermarkAssigner) {

	// match parallelism to input, otherwise dop=1 sources could lead to some strange
	// behaviour: the watermark will creep along very slowly because the elements
	// from the source go to each extraction operator round robin.
	final int inputParallelism = getTransformation().getParallelism();
	final AssignerWithPeriodicWatermarks<T> cleanedAssigner = clean(timestampAndWatermarkAssigner);

	TimestampsAndPeriodicWatermarksOperator<T> operator =
			new TimestampsAndPeriodicWatermarksOperator<>(cleanedAssigner);

	return transform("Timestamps/Watermarks", getTransformation().getOutputType(), operator)
			.setParallelism(inputParallelism);
}

The StreamOperator used is TimestampsAndPeriodicWatermarksOperator, which inherits AbstractUdfStreamOperator and implements OneInputStreamOperator interface and ProcessingTimeCallback interface,

TimestampsAndPeriodicWatermarksOperator.

/**
 * A stream operator that extracts timestamps from stream elements and
 * generates periodic watermarks.
 *
 * @param <T> The type of the input elements
 */
public class TimestampsAndPeriodicWatermarksOperator<T>
		extends AbstractUdfStreamOperator<T, AssignerWithPeriodicWatermarks<T>>
		implements OneInputStreamOperator<T, T>, ProcessingTimeCallback {

	private static final long serialVersionUID = 1L;
	private transient long watermarkInterval;
	private transient long currentWatermark;

	public TimestampsAndPeriodicWatermarksOperator(AssignerWithPeriodicWatermarks<T> assigner) {
		super(assigner);
		this.chainingStrategy = ChainingStrategy.ALWAYS;
	}

	@Override
	public void open() throws Exception {
		super.open();
        //Initializes the default current watermark
		currentWatermark = Long.MIN_VALUE;
        //Generate watermark cycle time configuration
		watermarkInterval = getExecutionConfig().getAutoWatermarkInterval();
        //Register its configuration
		if (watermarkInterval > 0) {
			long now = getProcessingTimeService().getCurrentProcessingTime();
            //Register a timer triggered after watermarkInterval. The callback parameter passed in is this, that is, the onProcessingTime method of the current object will be called
			getProcessingTimeService().registerTimer(now + watermarkInterval, this);
		}
	}

	@Override
	public void processElement(StreamRecord<T> element) throws Exception {
        //Extract current event time
		final long newTimestamp = userFunction.extractTimestamp(element.getValue(),
				element.hasTimestamp() ? element.getTimestamp() : Long.MIN_VALUE);
        //Save the current maximum event time.
		output.collect(element.replace(element.getValue(), newTimestamp));
	}

	@Override
	public void onProcessingTime(long timestamp) throws Exception {
        //This method represents a timed callback method, which sends the qualified watermark and registers the next timer.
		// register next timer
		Watermark newWatermark = userFunction.getCurrentWatermark();
        //When the new watermark is greater than the current watermark
		if (newWatermark != null && newWatermark.getTimestamp() > currentWatermark) {
			currentWatermark = newWatermark.getTimestamp();
            //Send qualified watermark
			// emit watermark
			output.emitWatermark(newWatermark);
		}
        //Register next trigger time
		long now = getProcessingTimeService().getCurrentProcessingTime();
		getProcessingTimeService().registerTimer(now + watermarkInterval, this);
	}

	/**
	 * Override the base implementation to completely ignore watermarks propagated from
	 * upstream (we rely only on the {@link AssignerWithPeriodicWatermarks} to emit
	 * watermarks from here).
	 */
	@Override
	public void processWatermark(Watermark mark) throws Exception {
        //It is used to process watermarks sent from the upstream. It can be considered that no processing is required. The downstream watermarks are only related to the nearest generation method of the upstream.
		// if we receive a Long.MAX_VALUE watermark we forward it since it is used
		// to signal the end of input and to not block watermark progress downstream
		if (mark.getTimestamp() == Long.MAX_VALUE && currentWatermark != Long.MAX_VALUE) {
			currentWatermark = Long.MAX_VALUE;
			output.emitWatermark(mark);
		}
	}

	@Override
	public void close() throws Exception {
		super.close();

		// emit a final watermark
		Watermark newWatermark = userFunction.getCurrentWatermark();
		if (newWatermark != null && newWatermark.getTimestamp() > currentWatermark) {
			currentWatermark = newWatermark.getTimestamp();
			// emit watermark
			output.emitWatermark(newWatermark);
		}
	}
}

How does Flink handle late data

Here we use the Side Output mechanism to illustrate. The Side Output mechanism can put the late events into a data flow branch separately, which will be used as a by-product of the window calculation results for users to obtain and process them specially.

Generate a new Watermark

Flink will replace the Timestamp in the StreamRecord object. If the Watermark generated from the Timestamp of the current event is greater than the previous Watermark, a new Watermark will be issued.

The specific code is in TimestampsAndPunctuatedWatermarksOperator.processElement.

@Override
public void processElement(StreamRecord<T> element) throws Exception {
	final T value = element.getValue();
    // Call the extractTimestamp implemented by the user to obtain a new Timestamp
	final long newTimestamp = userFunction.extractTimestamp(value,
			element.hasTimestamp() ? element.getTimestamp() : Long.MIN_VALUE);
    // Replace the old Timestamp in StreamRecord with the new Timestamp
	output.collect(element.replace(element.getValue(), newTimestamp));
    // Call the checkAndGetNextWatermark method implemented by the user to obtain the next Watermark
	final Watermark nextWatermark = userFunction.checkAndGetNextWatermark(value, newTimestamp);
    // If the next Watermark is greater than the current Watermark, a new Watermark is issued
	if (nextWatermark != null && nextWatermark.getTimestamp() > currentWatermark) {
		currentWatermark = nextWatermark.getTimestamp();
		output.emitWatermark(nextWatermark);
	}
}

Processing late data

First, judge whether it is late data.

@Override
public void processElement(StreamRecord<IN> element) throws Exception {
			for (W window: elementWindows) {
				// drop if the window is already late
                // If the window is already late, the next data is processed
				if (isWindowLate(window)) {
					continue;
				}   
            }
    ......
}

/**
 Returns {@code true} if the watermark is after the end timestamp plus the allowed lateness of the given window.
 */
protected boolean isWindowLate(W window) {
    // The current mechanism is the event time & & the maximum timestamp of the window element + the allowable lateness time < = the current water mark, which is true (that is, the current window element is late)
	return (windowAssigner.isEventTime() && (cleanupTime(window) <= internalTimerService.currentWatermark()));
}

/**
 * Returns the cleanup time for a window, which is
 * {@code window.maxTimestamp + allowedLateness}. In
 * case this leads to a value greater than {@link Long#MAX_VALUE}
 * then a cleanup time of {@link Long#MAX_VALUE} is
 * returned.
 *
 * @param window the window whose cleanup time we are computing.
 */
private long cleanupTime(W window) {
	if (windowAssigner.isEventTime()) {
		long cleanupTime = window.maxTimestamp() + allowedLateness;
    //Return the cleanup time of the window: the maximum timestamp of the window element + the time allowed to delay
		return cleanupTime >= window.maxTimestamp() ? cleanupTime : Long.MAX_VALUE;
	} else {
		return window.maxTimestamp();
	}
}

Secondly, the specific code for processing late data is in the last paragraph of the WindowOperator.processElement method. This is the bypass output.

@Override
public void processElement(StreamRecord<IN> element) throws Exception {
    
    ......
    // Other operations
    ......
    
    // side output input event if element not handled by any window  late arriving tag has been set
    // If no window has processed this data, isSkippedElement = true. If it is judged as late data, isSkippedElement = false
    // windowAssigner is event time and current timestamp + allowed lateness no less than element timestamp
    if (isSkippedElement && isElementLate(element)) {
      if (lateDataOutputTag != null){
          //Bypass output
          //This is what we mentioned earlier. Flink's Side Output mechanism can put late events into a separate data flow branch, which will be used as a by-product of window calculation results for users to obtain and process them specially.
        sideOutput(element);
      } else {
        this.numLateRecordsDropped.inc();
      }
    }
}

/**
 * Decide if a record is currently late, based on current watermark and allowed lateness.
 * The current mechanism is event time & & (element timestamp + time allowed to delay) < = current watermark
 * @param element The element to check
 * @return The element for which should be considered when sideoutputs
 */
protected boolean isElementLate(StreamRecord<IN> element){
	return (windowAssigner.isEventTime()) &&
		(element.getTimestamp() + allowedLateness <= internalTimerService.currentWatermark());
}

/**
 * Write skipped late arriving element to SideOutput.
 * // The data is output to the bypass for the user to decide how to deal with it.
 * @param element skipped late arriving element to side output
 */
protected void sideOutput(StreamRecord<IN> element){
    output.collect(lateDataOutputTag, element);
}

0x06 reference

Introduction to Flink's real-time performance, fault tolerance mechanism, window, etc

Thoroughly understand Flink system learning 11: [Flink 1.7] what is the difference between event time, processing time and extraction time

Thoroughly understand Flink system learning 10: [Flink 1.7] window life cycle, Keyed and non Keyed and allocator interpretation

Flink understands Watermark easily

Apache Flink 1.4 Documentation: Event Time

Flink event time and processing time | SmartSi

Analysis on the combination advantages of Flink Event Time and WaterMark

Flink WaterMark distributed execution understanding

Learn Flink for the first time and understand Watermarks

On WaterMark

Flink WaterMark instance

Apache Flink Talk Series (03) - Watermark

Flink's Event Time

Flink flow calculation programming -- Introduction to watermark

Analysis of Flink Watermark mechanism (thorough)

Understanding of Flink Time and Watermark

[source code analysis] how does Flink handle late data

Delayed data processing of flick watermark allowedlatency() sideoutputlatedata()

Analysis of Watermark timing generation source code in Flink

Posted by schilly on Tue, 28 Sep 2021 20:06:11 -0700