Real-time calculation of total order amount (Flume+kafka+storm+mysql) under multi-level distribution

Keywords: Big Data Apache kafka JDBC Database

1.flume configuration

Create a file that flume listens to instead of a kafka producer to transfer content to consumers. The configuration is as follows

###############to sinnsource,channel give a name###############a
kafka_agent.sources = kafka_source
kafka_agent.sinks = kafka_sink
kafka_agent.channels = kafka_channel
 
##############To configure source#####################
#Docking file
kafka_agent.sources.kafka_source.type = exec
kafka_agent.sources.kafka_source.command = tail -F /usr/local/flume1.8/test/MultilevelComputing.log
kafka_agent.sources.tailsource-1.shell = /bin/bash -c
 
 
###############To configure sink######################
#Docking kafka
kafka_agent.sinks.kafka_sink.type = org.apache.flume.sink.kafka.KafkaSink
#Configure which topic to transfer to
kafka_agent.sinks.kafka_sink.kafka.topic = Multilevel
#address
kafka_agent.sinks.kafka_sink.kafka.bootstrap.servers = htkj101:9092
#Batch size, one message processing at a time
kafka_agent.sinks.kafka_sink.kafka.flumeBatchSize = 1
#Successful Input Data Transmission Strategy
kafka_agent.sinks.kafka_sink.kafka.producer.acks = -1
kafka_agent.sinks.kafka_sink.kafka.producer.linger.ms = 1
kafka_agent.sinks.kafka_sink.kafka.producer.compression.type = snappy
############################To configure channel###################
#Temporary caching of data using files for channel configuration descriptions is more secure
kafka_agent.channels.kafka_channel.type = file
kafka_agent.channels.kafka_channel.checkpointDir = /home/uplooking/data/flume/checkpoint
kafka_agent.channels.kafka_channel.dataDirs = /home/uplooking/data/flume/data
###########################Integrate three components#########################
kafka_agent.sources.kafka_source.channels = kafka_channel
kafka_agent.sinks.kafka_sink.channel = kafka_channel
 

2.storm accepts kafka data and calculates it in real time

2.1pom.xml

  <dependencies>
        <!--Introduce storm -->
        <dependency>
            <groupId>org.apache.storm</groupId>
            <artifactId>storm-core</artifactId>
            <version>1.1.0</version>
            <!--<scope>provided</scope>-->
            <exclusions>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-log4j12</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>log4j-over-slf4j</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <!--Introduce kafka-clients-->
        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka-clients</artifactId>
            <version>0.10.0.1</version>
        </dependency>
        <!--Introduce kafka-->
        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka_2.11</artifactId>
            <version>0.10.0.1</version>
            <exclusions>
                <exclusion>
                    <groupId>org.apache.zookeeper</groupId>
                    <artifactId>zookeeper</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-log4j12</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-api</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>log4j-over-slf4j</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.logging.log4j</groupId>
                    <artifactId>log4j-slf4j-impl</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>org.apache.storm</groupId>
            <artifactId>storm-kafka-client</artifactId>
            <version>1.1.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.storm</groupId>
            <artifactId>storm-jdbc</artifactId>
            <version>1.1.1</version>
        </dependency>

        <!--mysql Driver package-->
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.31</version>
        </dependency>
    </dependencies>

2.2 KafkaSpout

Get data from kafka, which is also a topology, and configure it in this class

package com.htkj.multilevelcomputing.Storm;

import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.common.serialization.StringDeserializer;
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.kafka.spout.KafkaSpout;
import org.apache.storm.kafka.spout.KafkaSpoutConfig;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.tuple.Fields;
import org.apache.storm.utils.Utils;

public class KafkaTopo {
    public static void main(String[] args) {

        //Create TopologyBuilder
        TopologyBuilder topologyBuilder = new TopologyBuilder();

        KafkaSpoutConfig.Builder<String, String> kafkaSpoutConfigBuilder;
        //kafka connection information
        String bootstrapServers="htkj101:9092,htkj102:9093,htkj103:9094";
        //theme
        String topic = "Multilevel";
        /**
         * Constructing the kafkaSpoutConfigBuilder constructor
         *
         * bootstrapServers:    Kafka Link address ip:port
         * StringDeserializer:  key Deserializer    Deserialization of topic key s
         * StringDeserializer:  value Deserializer  Deserialization of topic value
         * topic: Topic Name
         */
        kafkaSpoutConfigBuilder = new KafkaSpoutConfig.Builder<>(
                bootstrapServers,
                StringDeserializer.class,
                StringDeserializer.class,
                topic);
        //Use the kafkaSpoutConfigBuilder constructor to construct kafkaSpoutConfig and configure the corresponding properties
        KafkaSpoutConfig<String, String> kafkaSpoutConfig = kafkaSpoutConfigBuilder
                /**
                 * Setting up groupId
                 */
                .setProp(ConsumerConfig.GROUP_ID_CONFIG, topic.toLowerCase() + "_storm_group")

                /**
                 * Set session timeout, which should be between
                 * [group.min.session.timeout.ms, group.max.session.timeout.ms] [6000,300000]
                 * Default value: 10000
                 */
                .setProp(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, "100000")

                /**
                 * Set up maximum pull capacity
                 */
                .setProp(ConsumerConfig.MAX_PARTITION_FETCH_BYTES_CONFIG, "1048576")

                /**
                 * Set the maximum amount of time that controls the client to wait for a request response
                 * Default value: 30000
                 */
                .setProp(ConsumerConfig.REQUEST_TIMEOUT_MS_CONFIG, "300000")

                /**
                 * Set the expected time between heartbeat and consumer coordinator.
                 * The heartbeat is used to ensure that the consumer's conversation remains active and to promote rebalancing when new consumers join or leave the group.
                 * Default value: 3000 (generally less than one third of session.timeout.ms)
                 */
                .setProp(ConsumerConfig.HEARTBEAT_INTERVAL_MS_CONFIG, "30000")

                /**
                 * Set offset submission time 15s by default 30s
                 */
                .setOffsetCommitPeriodMs(15000)

                /**
                 * Set the maximum number of pull-outs that are best handled during session timeout
                 */
                .setMaxPollRecords(20)

                /**
                 * Setting pull-out policy
                 */
                .setFirstPollOffsetStrategy(KafkaSpoutConfig.FirstPollOffsetStrategy.LATEST)

                /**
                 * Constructing kafkaSpoutConfig
                 */
                .build();

        //setSpout receives data from kafka
        topologyBuilder.setSpout("kafkaSpout",new KafkaSpout(kafkaSpoutConfig));
        //setbolt processes data received from kafka
        topologyBuilder.setBolt("KafkaSpoutBolt", new KafkaBolt()).localOrShuffleGrouping("kafkaSpout");
        //setbolt processes the data it receives from KafkaSpoutBolt
        topologyBuilder.setBolt("MultiBolt",new MultiBolt()).fieldsGrouping("KafkaSpoutBolt",new Fields("orderSn","cateId","goodsAmount","parentId","CEOId"));
        //setbolt processes the data it receives from MultiBolt
        topologyBuilder.setBolt("ComputingBolt",new ComputingBolt()).fieldsGrouping("MultiBolt",new Fields("CEOId","parentId","goodsAmount"));
        Config config = new Config();
        /**
         * Set the communication timeout between supervisor and worker.
         * Over this time supervisor will restart the worker (in seconds)
         */
        config.put("supervisor.worker.timeout.secs",600000);
        /**
         * Set the timeout between store and zookeeper.
         */
        config.put("storm.zookeeper.session.timeout",1200000000);
        /**
         * Setting debug mode log output is more complete
         * Only enabled in local LocalCluster mode
         */
        config.setDebug(true);
        LocalCluster localCluster = new LocalCluster();
        localCluster.submitTopology("KafKaTopo", config, topologyBuilder.createTopology());
        Utils.sleep(Long.MAX_VALUE);
        localCluster.shutdown();

    }


}

2.3KafkaBolt

Here we get the data from kafka and cut it.

package com.htkj.multilevelcomputing.Storm;

import org.apache.storm.topology.BasicOutputCollector;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseBasicBolt;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;


public class KafkaBolt extends BaseBasicBolt  {



    @Override
    public void execute(Tuple tuple, BasicOutputCollector basicOutputCollector) {
        String s = tuple.getString(4);
        //Cut into a space
        System.out.println("kafkabolt-----"+s);
        String[] split = s.split(" ");
        Integer orderSn=Integer.valueOf(split[0]);
        Integer cateId=Integer.valueOf(split[1]);
        Integer goodsAmount=Integer.valueOf(split[2]);
        Integer parentId=Integer.valueOf(split[3]);
        Integer CEOId=Integer.valueOf(split[4]);
        //Submission
        basicOutputCollector.emit(new Values(orderSn,cateId,goodsAmount,parentId,CEOId));
    }
    @Override
    public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {
        outputFieldsDeclarer.declare(new Fields("orderSn","cateId","goodsAmount","parentId","CEOId"));
    }


}

2.4MultiBolt

In this business logic judgment, some commodities can not enter the performance calculation.

package com.htkj.multilevelcomputing.Storm;


import org.apache.storm.topology.BasicOutputCollector;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseBasicBolt;

import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;



public class MultiBolt  extends BaseBasicBolt {

    @Override
    public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {
        outputFieldsDeclarer.declare(new Fields("CEOId", "parentId", "goodsAmount"));
    }

    @Override
    public void execute(Tuple tuple, BasicOutputCollector basicOutputCollector) {
        //get data
        Integer orderSn = tuple.getIntegerByField("orderSn");
        Integer cateId = tuple.getIntegerByField("cateId");
        Integer goodsAmount = tuple.getIntegerByField("goodsAmount");
        Integer parentId = tuple.getIntegerByField("parentId");
        Integer CEOId = tuple.getIntegerByField("CEOId");
        System.out.println("orderSn:" + orderSn + "cateId:" + cateId + "goodsAmount:" + goodsAmount + "parentId:" + parentId + "CEOId:" + CEOId);
        //Articles can not be accessories and facial mask.
        if (cateId != 9 && cateId != 65) {
            System.out.println("Success");
            basicOutputCollector.emit(new Values(CEOId, parentId, goodsAmount));
        }
    }
}

2.5ComputingBolt

Here, the database is connected and the calculated order data is added to the database.

package com.htkj.multilevelcomputing.Storm;

import com.google.common.collect.Maps;
import org.apache.storm.jdbc.common.Column;
import org.apache.storm.jdbc.common.ConnectionProvider;
import org.apache.storm.jdbc.common.HikariCPConnectionProvider;
import org.apache.storm.jdbc.common.JdbcClient;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichBolt;
import org.apache.storm.tuple.Tuple;

import java.sql.Types;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;

public class ComputingBolt  extends BaseRichBolt {
    private OutputCollector collector;
    //Creating database connection objects
    private ConnectionProvider connectionProvider;
    //Operating object of database
    private JdbcClient jdbcClient;
    //The amount obtained by the primary agent from the database
    private Integer parentAmount;
    //Total Amount of First-Class Agency
    private Integer parentAllAmount;
    //The amount the CEO obtains from the database
    private  Integer CEOAmount;
    //Total CEO Amount
    private  Integer CEOAllAmount;
    @Override
    public void prepare(Map stormConf, TopologyContext topologyContext, OutputCollector collector) {
        this.collector = collector;
        //Create a map collection to store connection properties
        Map map = Maps.newHashMap();
        //driver
        map.put("dataSourceClassName","com.mysql.jdbc.jdbc2.optional.MysqlDataSource");
        //url
        map.put("dataSource.url", "jdbc:mysql:///yazan");
        //User name
        map.put("dataSource.user","root");
        //Password
        map.put("dataSource.password","123456");
        //Creating Connection Objects
        connectionProvider = new HikariCPConnectionProvider(map);
        //Initialization of database connections
        connectionProvider.prepare();
        //Create database operation object, parameter 2: Query timeout
        jdbcClient = new JdbcClient(connectionProvider,30);
    }

    @Override
    public void execute(Tuple tuple) {
        Integer goodsAmount=tuple.getIntegerByField("goodsAmount");
        Integer parentId=tuple.getIntegerByField("parentId");
        Integer CEOId=tuple.getIntegerByField("CEOId");

        System.out.println("goodsAmount-->"+goodsAmount+"parentId-->"+parentId+"CEOId-->"+CEOId);
        //Conditions for creating collections and storing columns
        List<Column> parent = new ArrayList<Column>();
        //Adding Conditions
        parent.add(new Column("user_id", parentId, Types.INTEGER));
        //Query user_id if parentId
        List<List<Column>> selectParentId = jdbcClient.select("SELECT user_id ,amount from test WHERE user_id = ?", parent);
        //If there is no ParentId, execution increases
        if (selectParentId.size()==0){
            System.out.println("no data");
            jdbcClient.executeSql("INSERT INTO test (user_id,amount,lid,lid_name) VALUES("+parentId+","+goodsAmount+",10,'General Agent')");
        }
        //If there is a ParentId, find the amount and modify it
        else {
            for (List<Column> columns : selectParentId) {
                for (Column column : columns) {
                    String columnName= column.getColumnName();
                    if("amount".equalsIgnoreCase(columnName)){
                        //Get the current amount
                         parentAmount= (Integer) column.getVal();
                        System.out.println("Current amount"+parentAmount);
                    }
                }
        }
            parentAllAmount=parentAmount+goodsAmount;
            System.out.println("Total amount"+parentAllAmount);
            jdbcClient.executeSql("UPDATE test SET amount = "+parentAllAmount+" WHERE user_id = '"+parentId+"'");
        }
        List<Column> CEO = new ArrayList<Column>();
        CEO.add(new Column("user_id", CEOId, Types.INTEGER));
        List<List<Column>> selectCEOId = jdbcClient.select("SELECT user_id ,amount from test WHERE user_id = ?", CEO);
        //If there is no CEOId, execution increases
        if (selectCEOId.size()==0){
            System.out.println("no data");
            jdbcClient.executeSql("INSERT INTO test (user_id,amount,lid,lid_name) VALUES("+CEOId+","+goodsAmount+",9,'CEO')");
        }
        //If there is a CEOId, find the amount and modify it.
        else {
            for (List<Column> columns : selectCEOId) {
                for (Column column : columns) {
                    String columnName= column.getColumnName();
                    if("amount".equalsIgnoreCase(columnName)){
                        //Get the current amount
                        CEOAmount= (Integer) column.getVal();
                        System.out.println("Current amount"+CEOAmount);
                    }
                }
            }
            CEOAllAmount=CEOAmount+goodsAmount;
            System.out.println("Total amount"+CEOAllAmount);
            jdbcClient.executeSql("UPDATE test SET amount = "+CEOAllAmount+" WHERE user_id = '"+CEOId+"'");
        }
        collector.ack(tuple);
    }

    @Override
    public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {

    }
}

3.shell script

Write a shell script for testing

#!/bin/bash

file='/usr/local/flume1.8/test/MultilevelComputing.log '

for((i=0;i<1000000;i++))
do
orderSn=$RANDOM
cateId=$(($RANDOM%100+1))
goodsAmount=$RANDOM
parentId=$(($RANDOM%50+51))
CEOId=$(($RANDOM%50+1))
    echo $orderSn $cateId $goodsAmount $parentId $CEOId >> $file;
sleep 0.001;
done

4. Test results

5. Some Notices

  • The Problems of Repeated Consumption Computation of kafka+storm

At first I thought it was kafka's problem, and then I checked it out.

ComputingBolt inherits the BaseRichBolt class without ack anchoring

At the end of the code

collector.ack(tuple);

Successful solution.

The point here is that if you inherit BaseBasic Bolt, you don't have to anchor an ack because he has already written it for us.

  • Some errors in storm-jdbc

A mistake like this was reported at the time of writing.

This is because the sql statement at that time directly wrote select * from xxx

The storm-jdbc operation queries based on the properties of the columns in the collection we created.

At that time, I only add ed userId to this column, and naturally I couldn't find anything else.

Posted by swissmant on Fri, 30 Aug 2019 02:52:43 -0700