Google Protocol Buffer (protoc, protobuf, pb) Learning Notes

Keywords: C++ Google JSON Linux xml

I used to play C, Json, XML and so on, but now I start to play C++, only to find that the world I know is too small - the original C++ and Google Protocol Buffers are such good things. Sure enough, it's good to develop on PC. Without considering the size of executable program, you can use C++ freely.

Reference

Protocol Buffer Basics: C++
Google Protocol Buffers
Use and Principle of Google Protocol Buffer - IBM
This is a sincere Protocol Buffer grammar explanation

brief introduction

Protocol Buffers are also abbreviated as Protobuf and PB. It is a data exchange format launched by Google. Note that this is still binary exchange data.

Protobuf has its own compiler, called protoc in Linux, which can interpret. proto files and claim source files for the corresponding language. At present, Google provides three languages: Java, C++, Python. Later we'll use C++ to illustrate lace in other languages.

To sum up, what we call Protobuf actually includes the following parts:

  • A data exchange format that converts the contents of storage classes defined in C++ to binary sequence strings. It is mainly used for data transmission or storage.

  • A source file with an extension of. proto is defined. With this source file, the contents of storage classes can be defined.

  • Google provides a compiler protoc that compiles. proto into. cc files, making it a class that can be used directly in C++ projects. Class function is very perfect, later Wenhui specifically explained.

Simple grammar of. proto

Let's define an example that covers most of the usage, defining information about a high school class and its students.

// --------------------------------------
// File: School.HighSchool.proto
//
package School.HighSchool;     // Pay attention to semicolons

message Person
{
    optional int32   id  = 0;
    optional int32   age = 0;
    optional string  first_name;
    optional string  last_name;
    optional bool    is_female;
};

message Class
{
    optional int32   grade_num;
    optional int32   class_num;
    optional Person  head_teacher;
    repeated Person  students;
};

The recommended name of the file is "package name. message name. proto". For C++, it is "namespace, data class, proto".

The semantics of the above paragraph can be explained directly by the concept of C++:

  1. The classification (namespace) that defines packages is the HighSchool subcategory under the School category.

  2. Define a personal data class that contains the student's ID, last name, first name, gender, and so on.

  3. Define a class data class, including grade number and class number, head teacher information, all students'information.

It is necessary to explain optional and repeated here. The former indicates that the data type is optional, that is to say, there may not be such a data information. The latter means that the data type is multiple, which can be understood as a heap, or a set, a set, or a set. In a word, it is multiple similar data, similar to the vector in C++. It corresponds to array in JSON. Data of Repeated type may be empty (members are 0).

Corresponding to optional is the required type, which means that the data type is necessary. However, most data suggest that this type should not be used.

Generating C++ Classes

The source file above can be compiled using the following commands:

protoc -I=$SRC_DIR --cpp_out=$DST_DIR School.HighSchool.proto

After compiling, two files are generated: School.HighSchool.pb.cc and Scholl. HighSchool. pb. H.

Within a class, the general structure is as follows:

namespace School {
namespace HighSchool {

class Person public ::google::protobuf::Message {
    ...
}    // end of class Student

class Class public ::google::protobuf::Message {
    ...
}    // end of class public

}}    // end of namespaces

Object Function Method

Object method

Generally speaking, the generated classes have corresponding operation methods for the whole class. The commonly used methods are as follows:

operator= (...)
CopyFrom (...)
MergeFrom (...)
ByteSize () const
Swap (...)

The two most important methods are:

bool SerializeToString (string *output) const;
bool ParseFromString (const string &data);

The functions are serialization and deserialization, that is, serialization and deserialization.

  1. Serialize (binarize) the content of PB into the specified string object.

  2. Resolve PB objects from string types

get / set method

For specific data members, a specific get / set method is given. For example, the id member of the Person class, the C++ class provides the following methods:

inline bool has_id() const;
inline void clear_id();
static const int kInt32IdNumber = 0;

inline ::google::protobuf::int32 id() const;
inline void set_id(::google::protobuf::int32 value);

All the methods can be read literally and very well understood.

For members of repeated attributes, such as students, it is more complex, using STL:

inline int students_size() const;
inline void clear_students_size();

inline const ::School::HighSchool::Person &students(int index) const;
inline ::School::HighSchool::Person *mutable_students(int index);
inline ::School::HighSchool::Person *add_students();
inline const ::google::protobuf::RepeatedPtrField <::School::HighSchool::Person> &students() const;
inline ::google::protobuf::RepeatedPtrField <::School::HighSchool::Person> *mutable_students();

It seems rather complicated, but in fact it's quite understandable. Readers should understand it in conjunction with the iterator in C++.

Basic data types supported

The basic data types commonly used in Protobuf and the necessary explanations are as follows:

  • double

  • float

  • int32, int64: The encoding efficiency of negative numbers is lower than that of negative numbers. sintXX Series. Use when there is a negative number, but the frequency is not high.

  • uint32, uint64

  • sint32, sint64: When the frequency of negative numbers is relatively high, the ratio int32 The efficiency is high.

  • fixed32, fixed64: Note that this does not mean "fixed point", but represents a fixed length of 4 bytes of shaping data. If the number is longer than 228, the ratio int32 More efficient

  • sfixed32, sfixed64

  • bool

  • String: ASCII / UTF-8 string

  • bytes: binary sequence

Posted by ud2008 on Tue, 01 Jan 2019 10:03:08 -0800