Research on storm source code analysis

Keywords: storm

2021SC@SDUSC

Sprout node analysis of Trident

2021SC@SDUSC

Trident is another set of interfaces provided by Storm to users. It provides basic flow processing functions and reliable message processing functions. The operation of convection is the core of Trident.

Trident mainly supports two types of sput nodes: itridentsput and DRPC sput. For other basic types of spots in Storm, such as IRichSpout and ibatchspot, Trident adapts their interfaces to itrientspot interfaces and executes them in Topology.

This article will slightly analyze these sput nodes and the actuators that adapt to the sput nodes.

TridentSpoutCoordinator.java

The TridentSpoutCoordinator class is the executor of the BatchCoordinator. It inherits from the IBasicBolt interface and is actually a Bolt node. It is mainly used to execute the initializeTransaction method in the coordination Spout interface.

public class TridentSpoutCoordinator implements IBasicBolt {
    public static final Logger LOG = LoggerFactory.getLogger(TridentSpoutCoordinator.class);
    private static final String META_DIR = "meta";

    ITridentSpout<Object> spout;
    ITridentSpout.BatchCoordinator<Object> coord;
    RotatingTransactionalState state;
    TransactionalState underlyingState;
    String id;


    public TridentSpoutCoordinator(String id, ITridentSpout<Object> spout) {
        this.spout = spout;
        this.id = id;
    }

    @Override
    public void prepare(Map<String, Object> conf, TopologyContext context) {
        coord = spout.getCoordinator(id, conf, context);
        underlyingState = TransactionalState.newCoordinatorState(conf, id);
        state = new RotatingTransactionalState(underlyingState, META_DIR);
    }

    @Override
    public void execute(Tuple tuple, BasicOutputCollector collector) {
        TransactionAttempt attempt = (TransactionAttempt) tuple.getValue(0);

        if (tuple.getSourceStreamId().equals(MasterBatchCoordinator.SUCCESS_STREAM_ID)) {
            state.cleanupBefore(attempt.getTransactionId());
            coord.success(attempt.getTransactionId());
        } else {
            long txid = attempt.getTransactionId();
            Object prevMeta = state.getPreviousState(txid);
            Object meta = coord.initializeTransaction(txid, prevMeta, state.getState(txid));
            state.overrideState(txid, meta);
            collector.emit(MasterBatchCoordinator.BATCH_STREAM_ID, new Values(attempt, meta));
        }

    }

    @Override
    public void cleanup() {
        coord.close();
        underlyingState.close();
    }

    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declareStream(MasterBatchCoordinator.BATCH_STREAM_ID, new Fields("tx", "metadata"));
    }

    @Override
    public Map<String, Object> getComponentConfiguration() {
        Config ret = new Config();
        ret.setMaxTaskParallelism(1);
        return ret;
    }
}

In the implementation of the execute method, the TridentSpoutCoordinator will receive the success and batch streams, which contain only one column, the transaction sequence number txid.

When the message of the success stream is received, it indicates that the transaction has ended, so the_ coord's success method and clean up the metadata in ZooKeeper at the same time.

When a batch message is received, the execute method initializes a transaction and sends the message to the $batch stream. The format of the message is < TX, metadata >. Because the received message may be a retransmission message, such as a retransmission caused by the timeout of sput, the metadata corresponding to the transaction may already exist. This is also why the initializeTransaction method has the prevMeta parameter.

Trident initializes transactions in the Bolt node, which is also different from the transaction Topology, which is generated in the sprout node.

MasterBatchCoordinator.java

The MasterBatchCoordinator class is the real spuut node in Trident. Trident can contain multiple spotting nodes of MasterBatchCoordinator type, and each spotting node further corresponds to a node group containing storage nodes
Each MasterBatchCoordinator node can correspond to multiple itridentpout nodes, which belong to the same node group.
The MasterBatchCoordinator class is used to generate a new transaction and judge whether a transaction has been successfully processed.

Spuut type of partition in Trident

Subarea spuut interface
IPartitionedTridentSpout.java

public interface IPartitionedTridentSpout<PartitionsT, PartitionT extends ISpoutPartition, T> extends ITridentDataSource {
    Coordinator<PartitionsT> getCoordinator(Map<String, Object> conf, TopologyContext context);

    Emitter<PartitionsT, PartitionT, T> getEmitter(Map<String, Object> conf, TopologyContext context);

    Map<String, Object> getComponentConfiguration();

    Fields getOutputFields();

    interface Coordinator<PartitionsT> {

        PartitionsT getPartitionsForBatch();

        boolean isReady(long txid);

        void close();
    }

    interface Emitter<PartitionsT, PartitionT extends ISpoutPartition, X> {

        List<PartitionT> getOrderedPartitions(PartitionsT allPartitionInfo);
        
        X emitPartitionBatchNew(TransactionAttempt tx, TridentCollector collector, PartitionT partition, X lastPartitionMeta);

        void refreshPartitions(List<PartitionT> partitionResponsibilities);

        void emitPartitionBatch(TransactionAttempt tx, TridentCollector collector, PartitionT partition, X partitionMeta);

        default List<PartitionT> getPartitionsForTask(int taskId, int numTasks, List<PartitionT> allPartitionInfoSorted) {
            List<PartitionT> taskPartitions = new ArrayList<>(allPartitionInfoSorted == null ? 0 : allPartitionInfoSorted.size());
            if (allPartitionInfoSorted != null) {
                for (int i = taskId; i < allPartitionInfoSorted.size(); i += numTasks) {
                    taskPartitions.add(allPartitionInfoSorted.get(i));
                }
            }
            return taskPartitions;
        }

        void close();
    }
}

}
There are three common types in ipartitionedtridentspuut: Partitions, Partition, and X. The type of data stored in the Coordinator is Partitions, which is a general type, which means the metadata of Partitions.

In the IPartitionedTrident interface, get the list of Partitions according to the entered Partitions type by calling the getOrderedPartitions method. Similarly, Partition is also a general type, but it should inherit from ISpoutPartition interface, that is, it contains getld method to obtain the number ID of Partition.

Type X represents the metadata type corresponding to a transaction of a Partition.

The isread method of the Coordinator is used to determine whether the input transaction can start.

Actuator for zone spuut
PartitionedTridentSpoutExecutor.java
This class implements the ITridentSpout interface, which adapts IPartitionedTridentSpout so that it can be executed by Trident.

Posted by dancingbear on Thu, 02 Dec 2021 14:35:54 -0800