SeqProto: fast binary serialization in JavaScript

SeqProto: fast binary serialization in JavaScript

SeqProto: fast binary serialization in JavaScript

Tommaso Allevi

Tommaso Allevi

Algorithms

4.5

min read

Dec 15, 2023

SeqProto
SeqProto
SeqProto

In our digital world, efficient data management is essential due to the crucial role of data. Serialization and deserialization, specifically in JavaScript, can be challenging. While JSON is user-friendly, it is too generic for machines, leading to performance issues.

Seqproto is a library designed to address these problems. It offers a fast and efficient method for serializing and deserializing JavaScript objects.

The reason behind SeqProto

At Orama, we process large amounts of data in a highly constrained environment. Specifically, Cloudflare workers face certain limitations. For example, they can only use a total of 128MB of memory. This might seem substantial, but when considering strings and objects, it is actually quite limited.

Distributed systems are generally slower than monoliths due to the time-consuming network interactions caused by serialization/deserialization and transport time (latency). As the network stack is unavoidable, focus was placed on reducing the time spent on serialization/deserialization and the payload length. By minimizing these, the number of requests per second will increase.

In our context, we sought to identify serialization formats that are both quicker and provide less payload. Initially, we considered Avro, Protobuf, CBOR, and MsgPack. When compared with JSON, the results were quite significant: Avro and Protobuf notably outperformed JSON. This can be attributed to Avro and Protobuf's utilization of knowledge of input structure to enhance performance.

With our performance-oriented goals in mind, we wondered if there was a way to outperform. The answer was YES, and this led to the creation of SeqProto.

Orama implemented SeqProto to function effectively in constrained environments with limited memory and critical time constraints. It's particularly useful for internal APIs crucial in distributed systems, especially when handling large data volumes. Orama uses SeqProto for APIs that could return over 10,000 elements.

Talk is cheap, show me the code

To make some valid examples, we can consider a Todo structure like the following one:

interface Todo {
  id: number
  userId: number
  text: string
  completed: boolean
}

We would like to serialize an array of Todo, so we already know the output. Consider the below todos:

const todo: Todo = {
  id: 44,
  userId: 33,
  text: "foo bar",
  completed: false
}

const todos: Todo[] = [todo]

We can serialize it in the following way:

SeqProto serialization

The first row describes what is contained every 4 bytes. The second represents the array buffer reading 4 bytes at a time. The third row represents the data as bytes.

So, considering the second row:

  • 1: the array length. We are serializing just one element in the array

  • 44: the id

  • 33: the userId

  • 0: the boolean completed is converted to integer (1 means true, 0 means false)

  • 7: the length of the text

  • 544173926: is 102 256 ^ 0 + 111 256 ^ 1 + 111 256 ^ 2 + 32 256 ^ 3, where 102 is the utf-8 representation of “f”, 111, is “o” and 32 is the space “ “.

  • 7496034: is 98 256 ^ 0 + 97 256 ^ 1 + 114 256 ^ 2 + 0 256 ^ 3, where 98 is “b”, 97 is “a” and 114 is “r”.

In code:

import type { Ser, Des } from 'seqproto'
import { createSer, createDes } from 'seqproto'

interface Todo {
  id: number
  userId: number
  text: string
  completed: boolean
}

// Serialization
const ser: Ser = createSer()
export function serializeTodos(todos: Todo[]) {
  ser.reset()
  ser.serializeArray(todos, (ser, todo) => {
    ser.serializeUInt32(todo.id)
    ser.serializeUInt32(todo.userId)
    ser.serializeString(todo.text)
    ser.serializeBoolean(todo.completed)
  })
  const arrBuffer = ser.getBuffer()
  return new Uint8Array(arrBuffer)
}

// Deserialization
const des: Des = createDes(new ArrayBuffer(0))
export function deserialize(arrBuffer: ArrayBuffer) {
  // Deserialize
  des.setBuffer(arrBuffer)
  return des.deserializeArray((des) => {
    const id = des.deserializeUInt32()
    const userId = des.deserializeUInt32()
    const completed = des.deserializeBoolean()
    const title = des.deserializeString()
    return { id, userId, title, completed }
  })
}

// Run
const todo: Todo = {
  id: 44,
  userId: 33,
  text: "foo bar",
  completed: false
}
const todos = [todo]

const buff = serializeTodos(todos)
const deserialized = deserialize(buff)
console.log(deserialized)

I put the above code into a Fastify server to test the performance. Below is the result of the test:

Here, you can find more analysis on that.

Advantages of Using SeqProto

Seqproto excels in speed and efficiency, outperforming other libraries. It takes advantage of a pure JavaScript structure, such as ArrayBuffer, and its views, making serialization quicker than other formats.

It also uses procedural serialization and deserialization, allowing for special case handlings, such as nullable properties, enumerations, and custom structures.

Additionally, its straightforward APIs ensure easy implementation.

Give it a star!

If you enjoyed this article on SeqProto, make sure to check out the source code and give it a star on GitHub at https://github.com/oramasearch/seqproto!

Conclusion

JSON is a good format for serialization and deserialization. It is human-readable, and this counts a lot at the beginning, so it is a good starting point. Anyway, JSON limits performance, so when performance counts, Seqproto is a powerful tool to improve performance.

Run unlimited full-text, vector, and hybrid search queries at the edge, for free!

© OramaSearch Inc. All rights reserved.

© OramaSearch Inc. All rights reserved.

© OramaSearch Inc. All rights reserved.

© OramaSearch Inc. All rights reserved.