It can be missing or archived.

Keywords: Java github Maven

Sorry appears when you click on CAT to view LogViews. The message is not there. It can be missing or archived.

At this time, this kind of question will be confused, go to github to check it seems that there is no clear answer.

Here we locate the problem according to our own guess and the view of source angle.

First of all, consult the CAT maintenance personnel to get a very important information that the number of hours in the message number is different.

Project name - ac13bd78-430207-91
Project name - ac13bd78-430208-91

The second paragraph represents IP, the third represents hours, and the fourth represents current self-increments.
Give an example:
From the list of message numbers in LogViews, it is found that the current hour can be opened if it belongs to 430207. If it is not 430207, the message will be lost.

Then a column of missing message data is found in the state report of CAT:

Two machine clocks are not allowed to cause message storage loss. This scenario is used in Pigeon, where the server id is generated by the client and the clock difference between the client and the server is 2 hours, resulting in memory loss.

I wonder if this column counts the data I just lost.

With these clues, we begin to hypothesize and verify!

Start with the generation of message number of client number.

My client version is 2.0

Source View

  1. When a message is sent, it is judged whether it is an EVENT message, and if so, it is placed in the m_atomicTrees object.

TcpSocketSender.java

public void send(MessageTree tree) {
   if (isAtomicMessage(tree)) {
      boolean result = m_atomicTrees.offer(tree, m_manager.getSample());

      if (!result) {
         logQueueFullInfo(tree);
      }
   } else {
      boolean result = m_queue.offer(tree, m_manager.getSample());

      if (!result) {
         logQueueFullInfo(tree);
      }
   }
}
  1. Once the m_atomicTrees object is placed, it is detected by a single monitored thread.
public class MergeAtomicTask implements Task {

   @Override
   public String getName() {
      return "merge-atomic-task";
   }

   @Override
   public void run() {
      while (true) {
        // Monitor the queue at all times, and once there is a message that is currently in the hour, it will be satisfied.
         if (shouldMerge(m_atomicTrees)) {
            MessageTree tree = mergeTree(m_atomicTrees);
            boolean result = m_queue.offer(tree);

            if (!result) {
               logQueueFullInfo(tree);
            }
         } else {
            try {
               Thread.sleep(5);
            } catch (InterruptedException e) {
               break;
            }
         }
      }
   }

   @Override
   public void shutdown() {
   }
}

mergeTree is an important way to merge current messages. Why do you need to merge them?

My guess is that in the same message tree, each message corresponds to a message number, but for the number of the message tree, as long as the message number of the first message can be located, and the number of the second backward message is useless at all, but do not want to waste it, put it in the message number queue to generate the number of the next message tree.

private MessageTree mergeTree(MessageQueue trees) {
   int max = MAX_CHILD_NUMBER;
   DefaultTransaction t = new DefaultTransaction("_CatMergeTree", "_CatMergeTree", null);
   // Get the first message in the message tree first
   MessageTree first = trees.poll();

   t.setStatus(Transaction.SUCCESS);
   t.setCompleted(true);
   t.addChild(first.getMessage());
   t.setTimestamp(first.getMessage().getTimestamp());
   long lastTimestamp = 0;
   long lastDuration = 0;

   while (max >= 0) {
      // Note that this begins with Article 2.
      MessageTree tree = trees.poll();

      if (tree == null) {
         t.setDurationInMillis(lastTimestamp - t.getTimestamp() + lastDuration);
         break;
      }
      lastTimestamp = tree.getMessage().getTimestamp();
      if (tree.getMessage() instanceof DefaultTransaction) {
         lastDuration = ((DefaultTransaction) tree.getMessage()).getDurationInMillis();
      } else {
         lastDuration = 0;
      }
      t.addChild(tree.getMessage());
     // It's critical here that the generated id number is put back into the generated queue.
      m_factory.reuse(tree.getMessageId());
      max--;
   }

   ((DefaultMessageTree) first).setMessage(t);
   return first;
}
// D:\lib\maven\com\dianping\cat\cat-client\2.0.0\cat-client-2.0.0-sources.jar!\com\dianping\cat\message\internal\MessageIdFactory.java
// m_factory.reuse(tree.getMessageId()); corresponding implementation
public void reuse(String id) {
  m_reusedIds.offer(id);
}

Next we just need to see how it takes id.

MessageIdFactory.java

public String getNextId() {
  // Get it first from the queue, which is related to the above generated, if there is a direct return.
   String id = m_reusedIds.poll();

   if (id != null) {
      return id;
   } else {
      long timestamp = getTimestamp();

      if (timestamp != m_timestamp) {
         m_index = new AtomicInteger(0);
         m_timestamp = timestamp;
      }

      int index = m_index.getAndIncrement();

      StringBuilder sb = new StringBuilder(m_domain.length() + 32);

      sb.append(m_domain);
      sb.append('-');
      sb.append(m_ipAddress);
      sb.append('-');
      sb.append(timestamp);
      sb.append('-');
      sb.append(index);

      return sb.toString();
   }
}

A small problem arises here. If the id of the current hour is not exhausted and the id is found in the next hour, the generated id number will be retrieved from the queue, and the generated message tree will be sent to the server, but the problem is that the number is left over from the previous hour.
At this time, the server stores the number of hours as the key. When the number is stored, it will be found that the number is the last hour, and it will be discarded directly. From state

Two machine clocks are not allowed to cause loss of message storage | This scenario is used in Pigeon, where the server id is generated by the client and the clock difference between the client and the server is 2 hours, resulting in loss of storage.

See in!
This part of the source code is reflected in: TcpSocketReceiver.MessageDecoder.decode(), the final implementation class is the RealtimeConsumer.consume method.

public void consume(MessageTree tree) {      
    long timestamp = tree.getMessage().getTimestamp();
    Period period = m_periodManager.findPeriod(timestamp);

    if (period != null) {
        // The message tree is handed over to the bucket, then queued here and handed over to another PeriodTask thread for processing.
        period.distribute(tree);
    } else {
        m_serverStateManager.addNetworkTimeError(1);
    }
}

PeriodTask.run-> AbstractMessage Analyzer.analysis-> DumpAnalyzer.process-> processWithStorage processing

DumpAnalyzer.java - Key Code

public void process(MessageTree tree) {
        try {
            // Resolution based on message number
            MessageId messageId = MessageId.parse(tree.getMessageId());

            if (!shouldDiscard(messageId)) {
                 // Here you get the third paragraph of the message number as the parameter messageId.getHour()
                processWithStorage(tree, messageId, messageId.getHour());
            }
        } catch (Exception ignored) {
        }
    }
private void processWithStorage(MessageTree tree, MessageId messageId, int hour) {
        // This bucket is in the current hour, the third paragraph of the message number.
        MessageDumper dumper = m_dumperManager.find(hour);
        tree.setFormatMessageId(messageId);
        // Here we find that the message number does not match according to the message number.
        if (dumper != null) {
            dumper.process(tree);
        } else {
            // And then there's one more piece of data in state.
            m_serverStateManager.addPigeonTimeError(1);
        }
    }

ServerStatistic.Statistic.m_pigeonTimeError

But this time is misleading users, in fact, the message queue left last hour's message generation number caused.

Solution:

Client upgrade to 3.0. It has removed the queue and each time gets the current timestamp to generate the number.

Posted by HIV on Wed, 30 Jan 2019 09:36:15 -0800