KakfaSpout custom scheme

Keywords: Java kafka Apache JSON

1. Mapper and Schema

scheme: Transform the data format that kafka passes to spout. record->tuple

mapper: Transform the data format from storm to kafka.tuple->record

2. Why customize the message format

In many cases, the data passed from Kafka is not a simple string, it can be any object. The default new Fields("bytes") is not appropriate when we need to group objects according to an attribute of the object. But the form of messaging is string. We can use fastJson's transformation method to convert entity objects to jsonString before passing in kafka.

The scheme is being converted into an entity class object.

3. How to change scheme

There are many parameters to configure when building kafkaSpout, so take a look at the kafkaConfig code.

public final BrokerHosts hosts; //To get Kafka broker and partition Information
public final String topic;//From which topic Read message
public final String clientId; // SimpleConsumer Used client id
public int fetchSizeBytes = 1024 * 1024; //Issue Kafka Each FetchRequest With this to specify what you want response Size of total messages in
public int socketTimeoutMs = 10000;//and Kafka broker Connected socket timeout
public int fetchMaxWait = 10000;   //Consumers wait for these times when there are no new messages from the server
public int bufferSizeBytes = 1024 * 1024;//SimpleConsumer Used SocketChannel Read buffer size of
public MultiScheme scheme = new RawMultiScheme();//from Kafka Removed from byte[],How to Deserialize
public boolean forceFromStart = false;//Whether to force from Kafka in offset Minimum Start Reading
public long startOffsetTime = kafka.api.OffsetRequest.EarliestTime();//From when offset Time to start reading, default to oldest offset
public long maxOffsetBehind = Long.MAX_VALUE;//KafkaSpout How much, how much, how much, and how much, does the reading differ from the target progress? Spout Median messages will be discarded
public boolean useStartOffsetTimeIfOffsetOutOfRange = true;//If requested offset The corresponding message is Kafka Does not exist in the startOffsetTime 

 

As you can see, all configuration items are public, so when we instantiate a spoutConfig, we can change the property values by direct reference.

Let's look at the code for building kafkaspout:

ZkHosts zkHosts = new ZkHosts(zkHost);
// zk Unique identification of address
String zkRoot = "/" + topic;
String id = UUID.randomUUID().toString();
// structure spoutConfig
SpoutConfig spoutConf = new SpoutConfig(zkHosts, topic, zkRoot, id);
spoutConf.scheme = new SchemeAsMultiScheme(new SensorDataScheme());
spoutConf.startOffsetTime = OffsetRequest.LatestTime();
KafkaSpout kafkaSpout = new KafkaSpout(spoutConf);

4. How to customize scheme

We have a need for an entity class as follows:

public class SensorData implements Serializable {
    // equipment Id;
    private String deviceId;
    // Model id
    private String dmPropertiesId; 
    // Channel Name;
    private String channelName;
    // Collected temperature values
    private double deviceTemp;
    // Time of collection;
    private Date date;
}

Data is grouped according to deviceId when it comes in to storage consumption. Of course, when we write, we jsonize the data and use fastjson to make entity objects into strings instead of directly passing entity class objects into kafka. The final data is processed in the declare method of scheme.

Scheme interface:

public interface Scheme extends Serializable {
    List<Object> deserialize(ByteBuffer ser);
    public Fields getOutputFields();
}

You can see that there are two methods to implement, one is to transform the byte data passed in, the other is to group the fields when the next level of bolt is passed in. Tracking the source of kafka, we can see that his declare method will eventually call the scheme method to confirm the field name.

Take a look at the overall code for the scheme:

package dm.scheme;

import java.nio.ByteBuffer;
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import java.util.List;

import org.apache.storm.kafka.StringScheme;
import org.apache.storm.spout.Scheme;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Values;
import org.apache.storm.utils.Utils;

import com.alibaba.fastjson.JSON;

import dm.entity.SensorData;
/**
 * 
 * KafkaRecord Map tuple transformation classes;
 * 
 * @author chenwen
 *
 */
public class SensorDataScheme implements Scheme {
    /**
     * 
     */
    private static final long serialVersionUID = 1L;
    private static final Charset UTF8_CHARSET = StandardCharsets.UTF_8;

    /**
     * 
     * Deserialize
     */
    @Override
    public List<Object> deserialize(ByteBuffer byteBuffer) {
        // take kafka Message Conversion to jsonString
        String sensorDataJson = StringScheme.deserializeString(byteBuffer);
        SensorData sensorData = JSON.parseObject(sensorDataJson, SensorData.class);
        String id = sensorData.getDeviceId();
        return new Values(id, sensorData);
    }
    public static String deserializeString(ByteBuffer byteBuffer) {
        if (byteBuffer.hasArray()) {
            int base = byteBuffer.arrayOffset();
            return new String(byteBuffer.array(), base + byteBuffer.position(), byteBuffer.remaining());
        } else {
            return new String(Utils.toByteArray(byteBuffer), UTF8_CHARSET);
        }
    }
    @Override
    public Fields getOutputFields() {
        return new Fields("deviceId", "sensorData"); // Return field and its name;
    }
}

Posted by schwa97 on Fri, 10 May 2019 19:54:41 -0700