Skip to content

Google protocol buffers

protobuf is developed to serialize and retrieve structured data with high efficiency.

Basics

// addressbook.proto
syntax = "proto3";

package tutorial;

message Person {
  required string name = 1;
  required int32 id = 2;
  optional string email = 3;

  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }

  message PhoneNumber {
    required string number = 1;
    optional PhoneType type = 2 [default = HOME];
  }

  repeated PhoneNumber phones = 4;
}

message AddressBook {
  repeated Person people = 1;
}
  • message

    like a class, contains fields and subclasses.

  • field modifiers:

    • required: must be provided.
    • optional: if not set, use the default, if default is not set, use the system default.
    • repeated: this field may be repeated for 0 to any times. (like dynamic arrays)
  • compile

    protoc -I=$src_dir --python_out=$src_dir $src_dir/addressook.proto

    -I is short for --proto_path

    protoc --python_out=./ *.proto

    this generate *_pb2.py files in the python_out directory.

  • API

    import addressbook_pb2
    person = addressbook_pb2.Person()
    person.name = "John Doe" # req
    person.id = 1234 # req
    person.email = "jdoe@example.com" # opt
    
    phone = person.phones.add() # add a repeated field
    phone.number = "555-4321"
    phone.type = addressbook_pb2.Person.HOME # or simply `phone.type = 1`
    

Proto3

Message

can be viewed as class in python.

Field
  • field type

    • bool, string, double, float, int32, int64, uint32, uint64
    • sint32, sint64 (signed int, more efficient in coding negative integers than regular int32 and int64)
    • enumeration
    • other self-defined message type
  • field number

    each field has a unique number in the message.

    best in [1, 15] for only one byte to encode. (Not use 0)

    the exact range is \([1, 2^{29}-1]\) \ [19000, 19999]

  • field modifier

    • singular (by default)

      proto3 removed required, so there is only optional singulars.

    • repeated
  • enum types

    the first enum should start from 0.

    exact range is uint32

    enum Corpus {
        UNIVERSAL = 0; // which is the default value too.
        WEB = 1;
    }
    
    enum EnumAllowingAlias {
      option allow_alias = true; // allow same enum for a number.
      UNKNOWN = 0;
      STARTED = 1;
      RUNNING = 1;
    }
    enum EnumNotAllowingAlias {
      UNKNOWN = 0;
      STARTED = 1;
      // RUNNING = 1;  // Uncommenting this line will cause a compile error inside Google and a warning message outside.
    }
    
comments

same as C/C++.

reserved field
message Foo {
  reserved 2, 15, 9 to 11;
  reserved "foo", "bar";
}

you should reserve the field number or name for your deleted fields, in case of later modifications reuse these fields and cause bugs.

import
// old.proto
import public "new.proto"
import "other.proto";

when old.proto is imported by another proto file saying client.proto,

client.proto also import new.proto, but doesn't import other.proto.

package

python will ignore package declarations, since python modules are organized according to locations in file system.

oneof

oneof is a type that means the message will only record one of its fields or subclasses.

message SampleMessage {
  oneof test_oneof {
    string name = 4;
    SubMessage sub_message = 9;
  }
}
SampleMessage message;
message.set_name("name");
CHECK(message.has_name());
message.mutable_sub_message();   // Will clear name field.
CHECK(!message.has_name());