Serialization examples#
Protobuf#
In this section, we will explore an example using Protobuf. However, this approach is also applicable to other serialization methods.
Protobuf
Protobuf is an alternative message serialization method commonly used in GRPC. Its main advantage is that it results in much smaller message sizes1 compared to JSON, but it requires a message schema (.proto
files) on both the client and server sides.
To begin, install the necessary dependencies:
Next, let's define the schema for our message:
Now, generate a Python class to work with messages in Protobuf format:
This generates two files: message_pb2.py
and message_pb2.pyi
. We can use the generated class to serialize our messages:
Note that we used the NoCast
annotation to exclude the message from the pydantic
representation of our handler.
Msgpack#
Msgpack is another alternative binary data format. Its main advantage is that it results in smaller message sizes2 compared to JSON, although slightly larger than Protobuf. The key advantage is that it doesn't require a message schema, making it easy to use in most cases.
To get started, install the necessary dependencies:
Since there is no need for a schema, you can easily write a Msgpack decoder:
Using Msgpack is much simpler than using Protobuf schemas. Therefore, if you don't have strict message size limitations, you can use Msgpack serialization in most cases.
Avro#
In this section, let's explore how to use Avro encoding and decoding to encode/decode our messages as part of FastStream.
Avro
Apache Avro uses JSON to define data types and protocols and serializes data in a compact binary format. Avro utilizes a schema to structure the data that is being encoded. Schemas are composed of primitive types (null, boolean, int, long, float, double, bytes, and string) and complex types (record, enum, array, map, union, and fixed).
To get started, install the necessary dependencies:
Next, let's define the schema for our message. You can either define it in the Python file itself as:
person_schema = {
"type": "record",
"namespace": "Person",
"name": "Person",
"fields": [
{"doc": "Name", "type": "string", "name": "name"},
{"doc": "Age", "type": "int", "name": "age"},
],
}
Or you can load the schema from an avsc file as:
The contents of the person.avsc
file are:
{
"type": "record",
"namespace": "Person",
"name": "Person",
"fields": [
{"doc": "Name", "type": "string", "name": "name"},
{"doc": "Age", "type": "int", "name": "age"}
]
}
Finally, let's use Avro's schemaless_reader
and schemaless_writer
to decode and encode messages in the FastStream
app.
Tips#
Data Compression#
If you are dealing with very large messages, consider compressing them as well. You can explore libraries such as lz4 or zstd for compression algorithms.
Compression can significantly reduce message size, especially if there are repeated blocks. However, in the case of small message bodies, data compression may increase the message size. Therefore, you should assess the compression impact based on your specific application requirements.
Broker-Level Serialization#
You can still set a custom decoder
at the Broker or Router level. However, if you want to automatically encode publishing messages as well, you should explore Middleware for serialization implementation.
-
For example, a message like
{ "name": "John", "age": 25 }
in JSON takes 27 bytes, while in Protobuf, it takes only 11 bytes. With lists and more complex structures, the savings can be even more significant (up to 20x times). ↩ -
A message with Msgpack serialization, such as
{ "name": "John", "age": 25 }
, takes 16 bytes. ↩