Fastest Way of Serializing Java Field

Learn how to apply easily copyable C++ schema in Java and get amazing serialization speed by using Unsafe and memcpy to copy fields directly in a single scan to memory or to a specific memory file.

In a previous article on the open source Chronicle Queue, there was some benchmarking and characterization of the method indicating that the speed of the sequence had a significant impact on execution performance. After all, this is only to be expected because Chronicle Queue (and other persistent queue libraries) must convert Java objects in the heap into binary data that is later stored in files. Even for the most efficient libraries internally, doing this imperative serialization will greatly limit performance.

data transfer object

In this article, we will use a data transfer object (hereinafter referred to as DTO) called MarketData Which contains financial information with a relatively large number of fields. The same principles apply to other DTOs in any other field of business.

abstract class MarketData extends SelfDescribingMarshallable {

    long securityId;
    long time;

    // bid and ask quantities
    double bidQty0, bidQty1, bidQty2, bidQty3;
    double askQty0, askQty1, askQty2, askQty3;

    // bid and ask prices
    double bidPrice0, bidPrice1, bidPrice2, bidPrice3;
    double askPrice0, askPrice1, askPrice2, askPrice3;

    // Getters and setters not shown for clarity

}

default sequence

The Java Serializable Tag Interface provides a default way to serialize Java objects to/from binary format, usually via ObjectOutputStream And ObjectInputStream layers. The default way (where the magic writeObject() And readObject() not explicitly declared) entails considering the non-temporary fields of the object and reading/writing them one by one, which can be a relatively expensive process.

Chronicle Queue can work with files Serializable Objects but also provides a similar but faster and more space efficient way to serialize data via the abstract class SelfDescribingMarshallable. Similar to serializable objects, this is based on reflection but comes with much less overhead in terms of overhead, CPU cycles, and litter.

Often the default sequence includes steps:

  • Defining non-transient fields using reflection
  • Read/Write non-transient field values ​​defined using reflection
  • Writing/Reading field values ​​to a target format (eg binary format)

Selecting non-temporary fields can be cached, eliminating this step to improve performance.

Here is an example of a class that uses the default sequence:

public final class DefaultMarketData extends MarketData {}

As it turns out, the class doesn’t add anything to its base class, so it will use the default sequence as provided transitionally by SelfDescribingMarshallable.

explicit sequence

Implementation of classes Serializable It can choose to implement two special magic methods (sic!) where those methods will be called rather than resorting to the default sequence.

This provides complete control over the serialization process and allows fields to be read using custom code instead of reflection which will improve performance. The drawback of this method is that if a field is added to the class, the corresponding logic must be added in the above two magic methods otherwise the new field will not participate in the sequence. Another problem is that the outer classes call the private methods. This is a fundamental violation of encapsulation.

SelfDescribingMarshallable Classes work in a similar way but fortunately don’t rely on magic methods and call private methods externally. a SelfDescribingMarshallable The class provides two fundamentally different concepts for serialization: one via the open source Chronicle Wire broker (which can be binary, script, YAML, JSON, etc) that provides flexibility and the other is an implicit binary that provides high performance. We’ll take a closer look at the latter in the sections below.

Here is an example of a class that uses explicit hierarchy where public methods are explicitly declared in the implementation of interfaces:

public final class ExplicitMarketData extends MarketData {

    @Override
    public void readMarshallable(BytesIn bytes) {
        securityId = bytes.readLong();
        time = bytes.readLong();
        bidQty0 = bytes.readDouble();
        bidQty1 = bytes.readDouble();
        bidQty2 = bytes.readDouble();
        bidQty3 = bytes.readDouble();
        askQty0 = bytes.readDouble();
        askQty1 = bytes.readDouble();
        askQty2 = bytes.readDouble();
        askQty3 = bytes.readDouble();
        bidPrice0 = bytes.readDouble();
        bidPrice1 = bytes.readDouble();
        bidPrice2 = bytes.readDouble();
        bidPrice3 = bytes.readDouble();
        askPrice0 = bytes.readDouble();
        askPrice1 = bytes.readDouble();
        askPrice2 = bytes.readDouble();
        askPrice3 = bytes.readDouble();

    }

    @Override
    public void writeMarshallable(BytesOut bytes) {
        bytes.writeLong(securityId);
        bytes.writeLong(time);
        bytes.writeDouble(bidQty0);
        bytes.writeDouble(bidQty1);
        bytes.writeDouble(bidQty2);
        bytes.writeDouble(bidQty3);
        bytes.writeDouble(askQty0);
        bytes.writeDouble(askQty1);
        bytes.writeDouble(askQty2);
        bytes.writeDouble(askQty3);
        bytes.writeDouble(bidPrice0);
        bytes.writeDouble(bidPrice1);
        bytes.writeDouble(bidPrice2);
        bytes.writeDouble(bidPrice3);
        bytes.writeDouble(askPrice0);
        bytes.writeDouble(askPrice1);
        bytes.writeDouble(askPrice2);
        bytes.writeDouble(askPrice3);
    }

}

It can be concluded that this scheme relies on reading or writing each field explicitly and directly, eliminating the need to resort to slow thinking. Care must be taken to ensure that the fields are referenced in a consistent order and class fields must also be added to the above methods.

Trivial copy

The concept of easily copyable Java objects is derived from and inspired by C++.

As it turns out, the MarketData The above class contains only primitive fields. In other words, there are no reference fields like StringAnd List or similar. This means that when the JVM puts fields in memory, the field values ​​can be placed next to each other. The way fields are mapped is not defined in the Java standard that allows for individual JVM implementation optimizations.

Many JVMs will sort the fields of the initial class in descending order of field size and place them in order. This has the advantage that read and write operations can be performed even on primitive type boundaries. Apply this scheme to ExplicitMarketData For example it will result in long The time field is placed first, and assuming we have a 64-bit aligned initial domain space, it allows access to the field on a 64-bit boundary. After that, file int securityId It may be mapped, allowing access to it and all other 32-bit fields on a 32-bit boundary.

Alternatively imagine if it is primitive byte The field is laid out initially, then subsequent larger fields will have to be accessed on uneven domain boundaries. This will increase the performance of some operations, and would actually prevent a small set of operations from being executed at all (for example, unaligned CAS operations on ARM architecture).

How does this relate to high performance sequencing? Well, as it turns out, it is possible to access an object’s domain memory area directly via Unsafe and use memcpy Copies fields directly in a single scan to memory or to a specific memory file. This effectively bypasses single field accesses and replaces, in the example above, many single field accesses with one bulk operation.

The way in which this can be done in a correct, convenient, reasonably portable and safe manner is beyond the scope of this article. Fortunately, this feature is easily available in Chronicle Queue, the open source Chronicle Bytes and other similar products out of the box.

Here is an example of a class that uses easily copyable serialization:

import static net.openhft.chronicle.bytes.BytesUtil.*;

public final class TriviallyCopyableMarketData extends MarketData {

    static final int START = 
            triviallyCopyableStart(TriviallyCopyableMarketData.class);


    static final int LENGTH = 
            triviallyCopyableLength(TriviallyCopyableMarketData.class);

    @Override
    public void readMarshallable(BytesIn bytes) {
        bytes.unsafeReadObject(this, START, LENGTH);
    }

    @Override

    public void writeMarshallable(BytesOut bytes) {
        bytes.unsafeWriteObject(this, START, LENGTH);
    }

}

This pattern lends itself well to scenarios where DTO is reused. Basically, it depends on the summon Unsafe Under the covers to improve performance.

Standards

Using JMH, the sequencing performance of the various sequencing variants above was evaluated using this class:

@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(NANOSECONDS)
@Fork(value = 1, warmups = 1)
@Warmup(iterations = 5, time = 200, timeUnit = MILLISECONDS)
@Measurement(iterations = 5, time = 500, timeUnit = MILLISECONDS)
public class BenchmarkRunner {

    private final MarketData defaultMarketData = new DefaultMarketData();
    private final MarketData explicitMarketData = new ExplicitMarketData();
    private final MarketData triviallyCopyableMarketData = new TriviallyCopyableMarketData();
    private final Bytes<Void> toBytes = Bytes.allocateElasticDirect();
    private final Bytes<Void> fromBytesDefault = Bytes.allocateElasticDirect();
    private final Bytes<Void> fromBytesExplicit = Bytes.allocateElasticDirect();
    private final Bytes<Void> fromBytesTriviallyCopyable = Bytes.allocateElasticDirect();

    public BenchmarkRunner() {
        defaultMarketData.writeMarshallable(fromBytesDefault);
        explicitMarketData.writeMarshallable(fromBytesExplicit);
        triviallyCopyableMarketData.writeMarshallable(fromBytesTriviallyCopyable);
    }

    public static void main(String[] args) throws Exception {
        org.openjdk.jmh.Main.main(args);
    }

    @Benchmark
    public void defaultWrite() {
        toBytes.writePosition(0);
        defaultMarketData.writeMarshallable(toBytes);
    }

    @Benchmark
    public void defaultRead() {
        fromBytesDefault.readPosition(0);
        defaultMarketData.readMarshallable(fromBytesDefault);
    }

    @Benchmark
    public void explicitWrite() {
        toBytes.writePosition(0);
        explicitMarketData.writeMarshallable(toBytes);
    }

    @Benchmark
    public void explicitRead() {
        fromBytesExplicit.readPosition(0);
        explicitMarketData.readMarshallable(fromBytesExplicit);
    }

    @Benchmark
    public void trivialWrite() {
        toBytes.writePosition(0);
        triviallyCopyableMarketData.writeMarshallable(toBytes);
    }

    @Benchmark
    public void trivialRead() {
        fromBytesTriviallyCopyable.readPosition(0);
        triviallyCopyableMarketData.readMarshallable(fromBytesTriviallyCopyable);
    }

}

It produced the following output on a MacBook Pro (16-inch, 2019) with an Intel Core i9 8-Core CPU at 2.3GHz under JDK 1.8.0_312, OpenJDK 64-Bit Server VM, 25.312-b07:

Benchmark                      Mode  Cnt   Score   Error  Units
BenchmarkRunner.defaultRead    avgt    5  88.772 ± 1.766  ns/op
BenchmarkRunner.defaultWrite   avgt    5  90.679 ± 2.923  ns/op
BenchmarkRunner.explicitRead   avgt    5  32.419 ± 2.673  ns/op
BenchmarkRunner.explicitWrite  avgt    5  38.048 ± 0.778  ns/op
BenchmarkRunner.trivialRead    avgt    5   7.437 ± 0.339  ns/op
BenchmarkRunner.trivialWrite   avgt    5   7.911 ± 0.431  ns/op

using different MarketData Variables, explicit concatenation is twice as fast as default concatenation. A copyable sequence is four times faster than an explicit sequence and more than ten times faster than a default sequence as shown in the graph below (lower is better):

Easy copy sequence, explicit sequence, default

More fields generally prefer trivially transliterable sequences over explicit ones. Experience shows that break-even point is reached in about six areas in many cases.

Interestingly, the concept of simple reproducibility can be extended to data that is naturally stored in reference fields such as String or matrix field. This will provide a relative performance increase for such classes. Contact the Chronicle team if you want to know more as this is, again, outside the scope of the article.

why does it matter?

Serialization is an essential feature of outputting DTOs to static queues, sending them over the wire or mapping them out of the heap, and processing DTOs outside the Java heap. These data-intensive applications will always gain performance and experience low latency when base-sequencing performance is improved.

resources

Chronicle Queue (open source)

Github Chronicle Byte (open source)

.

Leave a Comment