Skip to content

Serialize

Serialization is the process of converting a data object into bytes, deserialization is the opposite. The serialize.py module provides classes and functions that are useful for these types of operations.

from everysk.core.serialize import dumps, loads

data = {'data that I want to ': 'serialize'}

serialized_data = dumps(data)
serialized_data
'{"data that I want to ": "serialize"}'

loads(serialized_data)
{'data that I want to ': 'serialize'}

Imagine that we have some data represented in graph form, once we serialize this data we will be left with a linear sequence of bytes. A format that is understood by different programming languages and operating systems.

The serialize.py module implements a couple of useful methods for serializing and deserializing data.

Serializing Data

The dumps() method is used in order to serialize a given obj to either a JSON or a Pickle formatted string. The method can accept multiple inputs to customize the serialization process. Let's take a moment and look at them individually.

The allow_nan flag defaults to True and it is used in order to determine the serialization of out-of-range float values. If False float values like Nan, -Infinity, Infinity will raise a ValueError.

check_circular boolean defaults to True and it is used for checking circular references in data types like lists and dictionaries. Setting this flag to False means skipping those checkings, which might lead to a RecursionError.

The cls argument is used for cases when we want to use a serialization class with some pre-defined rules on how specific objects should be serialized.

Serializing date objects might cause errors, for dealing with this we have the date_format and the datetime_format parameters. The default format used is ISO.

ensure_ascii flag is designed to handle the representation of ASCII characters. If False, the return value can contain non-ASCII characters, otherwise such characters are escaped in JSON strings.

indent option refers to the indentation level of the JSON representation. If indent is 0 only extra newlines are inserted. The default value is None, which is the most compact representation.

The protocol options determines which type of protocol to use in the serialization, the default value is json. At the moment of writing, the only protocols allowed are json or pickle.

separators defines how JSON objects are separated. The argument is provided in the form of a tuple (item_separator, key_separator). The default is (', ', ': ') in the case of indent being None. If indent posses a different value the follow (',', ': ') is used.

skipkeys defaults to False. If True any dictionary key that is not of a basic type, such as str, int, float, bool, or None, will be skipped instead of raising a TypeError.

sort_keys, as the name suggests, sort dictionary by their keys. Defaults to False.

add_class_path defaults to True and is used in order to include or not the serialized class path in the output.

We also have the use_undefined boolean parameter, which is used in order to serialize Undefined objects. When set to True, Undefined objects are serialized using the null keyword in the following format: {"__undefined__": null}. Otherwise, if the argument is set to False, Undefined objects will be serialized as None.

Let's see below an implementation example of the dumps() method:

from everysk.core.serialize import dumps
from everysk.core.datetime import DateTime

data = {
    "id": 'aj83h70fb209',
    "name": "Complex Object",
    "timestamp": DateTime.now(),
}
dumps(data, indent=4, date_format='%Y-%m-%d', datetime_format='%Y-%m-%d %H:%M:%S')
'{\n    "id": "aj83h70fb209",\n    "name": "Complex Object",\n    "timestamp": "2024-10-29 12:00:00"\n}'


Deserializing Data

The loads() method on the other hand is used in order to deserialize the data object to a respective Python object. Below we have an explanation about the different arguments used for deserialization.

The first one is the data to deserialize, which can be a string, bytes, or bytearray containing JSON or Pickle-encoded content.

Following we have the cls which is a subclass used in the deserialization, allowing the overriding of default decoding behavior. The default class used is JSONDecoder.

The date_format and datetime_format arguments specify the format for deserializing date and datetime objects. If not provided, the default behavior is to parse dates in ISO format.

The object_hook argument is an optional function that will be called with the result of any object literal decode (dict). The return value of the calling will be used instead of the dict.

The object_pairs_hook argument is also an optional function that will be called with the result of any object literal decoded with an ordered list of pairs. The object_pairs_hook takes priority over object_hook in the scenario where both arguments are provided.

parse_constant is a function that will be called with any string that represents a constant, such as -Infinity, Infinity, or NaN. This can be used to raise exceptions or handle these constants differently.

If specified, parse_float will be called with every JSON float string to be decoded. By default, this is equivalent to float(num_str). This allows using a different data type (e.g., decimal.Decimal) for floats.

parse_int is a callable object used with every JSON integer to be decoded. This is the same as doing int(number_string). This allows the use of float features as well.

The protocol specifies the deserialization protocol to use. It defaults to json and at the moment of writing the only protocols supported are json and pickle. Note: The pickle module is not safe to use with untrusted data, never unpickle data received from an untrusted or unauthenticated source.

When the use_undefined boolean is set to True, deserialized objects that are Undefined will be kept as Undefined. Otherwise, when the argument is set to False, Undefined objects will be set to None.

The instantiate_object boolean, when set to True keeps the class_path keyword in the dictionary for future instantiation of the objects. Otherwise removes the key, once set to False.

Let's see an implementation example in practice:

from everysk.core.serialize import loads

serialized_data = '{\n    "id": "aj83h70fb209",\n    "name": "complex object",\n    "timestamp": "2024-10-29"\n}'
loads(serialized_data,date_format='%Y-%m-%d', datetime_format='%Y-%m-%d %H:%M:%S', parse_float=float, parse_int=int)
{'id': 'aj83h70fb209', 'name': 'complex object', 'timestamp': '2024-10-29'}