1. Mapper and Schema
scheme: Transform the data format that kafka passes to spout. record->tuple
mapper: Transform the data format from storm to kafka.tuple->record
2. Why customize the message format
In many cases, the data passed from Kafka is not a simple string, it can be any object. The default new Fields("bytes") is not appropriate when we need to group objects according to an attribute of the object. But the form of messaging is string. We can use fastJson's transformation method to convert entity objects to jsonString before passing in kafka.
The scheme is being converted into an entity class object.
3. How to change scheme
There are many parameters to configure when building kafkaSpout, so take a look at the kafkaConfig code.
public final BrokerHosts hosts; //To get Kafka broker and partition Information public final String topic;//From which topic Read message public final String clientId; // SimpleConsumer Used client id public int fetchSizeBytes = 1024 * 1024; //Issue Kafka Each FetchRequest With this to specify what you want response Size of total messages in public int socketTimeoutMs = 10000;//and Kafka broker Connected socket timeout public int fetchMaxWait = 10000; //Consumers wait for these times when there are no new messages from the server public int bufferSizeBytes = 1024 * 1024;//SimpleConsumer Used SocketChannel Read buffer size of public MultiScheme scheme = new RawMultiScheme();//from Kafka Removed from byte[],How to Deserialize public boolean forceFromStart = false;//Whether to force from Kafka in offset Minimum Start Reading public long startOffsetTime = kafka.api.OffsetRequest.EarliestTime();//From when offset Time to start reading, default to oldest offset public long maxOffsetBehind = Long.MAX_VALUE;//KafkaSpout How much, how much, how much, and how much, does the reading differ from the target progress? Spout Median messages will be discarded public boolean useStartOffsetTimeIfOffsetOutOfRange = true;//If requested offset The corresponding message is Kafka Does not exist in the startOffsetTime
As you can see, all configuration items are public, so when we instantiate a spoutConfig, we can change the property values by direct reference.
Let's look at the code for building kafkaspout:
ZkHosts zkHosts = new ZkHosts(zkHost); // zk Unique identification of address String zkRoot = "/" + topic; String id = UUID.randomUUID().toString(); // structure spoutConfig SpoutConfig spoutConf = new SpoutConfig(zkHosts, topic, zkRoot, id); spoutConf.scheme = new SchemeAsMultiScheme(new SensorDataScheme()); spoutConf.startOffsetTime = OffsetRequest.LatestTime(); KafkaSpout kafkaSpout = new KafkaSpout(spoutConf);
4. How to customize scheme
We have a need for an entity class as follows:
public class SensorData implements Serializable { // equipment Id; private String deviceId; // Model id private String dmPropertiesId; // Channel Name; private String channelName; // Collected temperature values private double deviceTemp; // Time of collection; private Date date; }
Data is grouped according to deviceId when it comes in to storage consumption. Of course, when we write, we jsonize the data and use fastjson to make entity objects into strings instead of directly passing entity class objects into kafka. The final data is processed in the declare method of scheme.
Scheme interface:
public interface Scheme extends Serializable { List<Object> deserialize(ByteBuffer ser); public Fields getOutputFields(); }
You can see that there are two methods to implement, one is to transform the byte data passed in, the other is to group the fields when the next level of bolt is passed in. Tracking the source of kafka, we can see that his declare method will eventually call the scheme method to confirm the field name.
Take a look at the overall code for the scheme:
package dm.scheme; import java.nio.ByteBuffer; import java.nio.charset.Charset; import java.nio.charset.StandardCharsets; import java.util.List; import org.apache.storm.kafka.StringScheme; import org.apache.storm.spout.Scheme; import org.apache.storm.tuple.Fields; import org.apache.storm.tuple.Values; import org.apache.storm.utils.Utils; import com.alibaba.fastjson.JSON; import dm.entity.SensorData; /** * * KafkaRecord Map tuple transformation classes; * * @author chenwen * */ public class SensorDataScheme implements Scheme { /** * */ private static final long serialVersionUID = 1L; private static final Charset UTF8_CHARSET = StandardCharsets.UTF_8; /** * * Deserialize */ @Override public List<Object> deserialize(ByteBuffer byteBuffer) { // take kafka Message Conversion to jsonString String sensorDataJson = StringScheme.deserializeString(byteBuffer); SensorData sensorData = JSON.parseObject(sensorDataJson, SensorData.class); String id = sensorData.getDeviceId(); return new Values(id, sensorData); } public static String deserializeString(ByteBuffer byteBuffer) { if (byteBuffer.hasArray()) { int base = byteBuffer.arrayOffset(); return new String(byteBuffer.array(), base + byteBuffer.position(), byteBuffer.remaining()); } else { return new String(Utils.toByteArray(byteBuffer), UTF8_CHARSET); } } @Override public Fields getOutputFields() { return new Fields("deviceId", "sensorData"); // Return field and its name; } }