Flink development issues summary

Keywords: Java Apache Scala Hadoop

When installing through brew on mac system, the local default installation address is / usr / local / cell / Apache Flink / 1.5.1

1. Can I call graph algorithm in flink?

It can be called in dataSet, and it needs to write its own method in dataStream. It can be implemented in scala by itself, just by referring to the source code.

Official website link

II. Cannot instance user function

org.apache.flink.streaming.runtime.tasks.StreamTaskException: Cannot instantiate user function.
	at org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperator(StreamConfig.java:235)
	at org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainedOperator(OperatorChain.java:355)
	at org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:282)
	at org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainedOperator(OperatorChain.java:346)
	at org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:282)
	at org.apache.flink.streaming.runtime.tasks.OperatorChain.<init>(OperatorChain.java:126)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:231)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassCastException: cannot assign instance of com.meituan.flink.demo.local.testTopic$$anonfun$2 to field org.apache.flink.streaming.api.scala.DataStream$$anon$4.cleanFun$3 of type scala.Function1 in instance of org.apache.flink.streaming.api.scala.DataStream$$anon$4
	at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
	at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2024)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
	at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:290)
	at org.apache.flink.util.InstantiationUtil.readObjectFromConfig(InstantiationUtil.java:248)
	at org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperator(StreamConfig.java:220)

It's a Commons collections package conflict. Use the normal pom file to exclude the related packages.

III. local idea develops the flink program and runs it locally to read the HDFS file

Add dependency:

Reference blog

       <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.7.5</version>
        </dependency>
 
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>2.7.5</version>
        </dependency>

Read and write hdfs data code:

Reference blog

DataSet<String> hdfslines=env.readTextFile("your hdfs path")
hdfslines.writeAsText("your hdfs path")

Flink's HDFS Connector
Reference blog
This Connector provides a sink to write partition files to any file system supported by Hadoop file system. In order to use this Connector, please add the following dependencies to your project:

<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-connector-filesystem_2.10</artifactId>
  <version>1.3.0</version>
</dependency>

IV. the assigned slot container

This kind of exception, by viewing the log, is generally a Flink App with a large memory consumption, which causes the Task manager (Container on the Yarn) to be killed. If there is no problem in code writing, there is really not enough resources. In fact, 1G Slot running multiple tasks (Slot Group Share) is quite easy to appear. So there are two options. You can choose one according to the specific situation.

  1. Schedule the Flink App on a cluster with larger Per Slot memory.
  2. Reduce the number of shared tasks in the Slot through the Slot sharing group ("xxx")

Reference blog 1
Reference blog 2

V. java.util.concurrent.timeoutexception: Heartbeat of taskmanager with ID container

TaskManager's heartbeat has timed out, and JobManager's memory has been increased a little.

6. Could not find implicit value for evidence parameter of type org.apache.flick.api.common.typeinfo.typeinformation [int]

Solution: import org. Apache. Flick. API. Scala._

Reference blog 1
Reference blog 2

VII. Use event event and process to process status data in flink

For details, please refer to: http://wuchong.me/blog/2018/11/07/use-flick-calculate-hot-items/

It should be noted that the real scene data is out of order, so it is necessary to use

//1. Pay attention to scala api when using windows function
import org.apache.flink.streaming.api.scala.function.WindowFunction


//2. Set the processing time as the time when the event occurs.
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
//3. After preparing the corresponding data with time stamp, set the following disordered watermark, where the time stamp must be a millisecond time stamp.
.assignTimestampsAndWatermarks(
      new BoundedOutOfOrdernessTimestampExtractor[originalData](Time.seconds(10)) {
        override def extractTimestamp(t: originalData): Long = {
          t.timestamp * 1000
        }
      })

//4. When rewriting processElement and onTimer functions in keyedProcess, scala parameters are written as follows, which is strange.
    override def processElement(input: IpDomainUrlCount,
                                context: KeyedProcessFunction[Tuple, IpDomainUrlCount, String]#Context, 
                                collector: Collector[String]): Unit = {
    }
    override def onTimer(timestamp: Long,
                         ctx: KeyedProcessFunction[Tuple, IpDomainUrlCount, String]#OnTimerContext,
                         out: Collector[String]): Unit = {
      //If the previous step takes windowEnd as the key by, the corresponding key is directly obtained here. Compare the timestamp in the incoming parameter: 156378942001
      //So if you use this timestamp, you should remember to subtract 1.
            val curKeyWindowEnd = ctx.getCurrentKey.getField(0).toString.toLong  //156378942000
    }

// It is used to store the state of ipDomain data. After the data of the same window is collected, the calculation of information entropy is triggered. Pay attention to how to use the configured parameters internally through parameters.getLong.
    private var itemState: ListState[IpDomainUrlCount] = null
    override def open(parameters: Configuration): Unit = {
      super.open(parameters)
      curWindowSizeMinute = parameters.getLong("window.size.minute", 0L)
      //Registration of status
      val itemsStateDesc = new ListStateDescriptor("itemState-state", classOf[IpDomainUrlCount])
      itemState = getRuntimeContext.getListState(itemsStateDesc)
    }

Code format reference:
Code format 1
Code format 2

VIII. flink reads multiple permissions topic (> = 1)

Just write a process of union, including generating and consuming instances of each topic, and then merging traffic.

Posted by Rommeo on Sun, 27 Oct 2019 03:13:17 -0700