How to Make JSON and Python Talk to Each Other | by Yong Cui | Mar, 2022

Photo by Vipul Jha on Unsplash

JavaScript Object Notation (JSON) is a popular data format that is commonly used in data interchanges between different systems. For instance, many APIs return results in the format of JSON data. Given JSON’s remarkable readability and its object-like structure, it’s useful to know how Python handles JSON data. In this article, we’ll see what JSON is and how to process it with the built-in json module in Python.

JSON data are structured as JSON objects, which hold data in the form of key-value pairs, just like Python dictionaries. The following code snippet shows you what a typical JSON object looks like.

{  "firstName": "John",  "lastName": "Smith",  "age": 35,  "city": "San Francisco"}

In essence, a JSON object is scoped by a pair of curly braces, in which key-value pairs are stored. JSON objects require their keys to be only strings and this requirement allows the standard communication between different systems. The values ​​shown include strings and integers, but JSON does support other data types, including booleans, arrays, and objects.

  • String: string literals enclosed with double quotes
  • Number: number literals, including integers and decimals
  • Boolean: Boolean values, true or false
  • Array: a list of supported data types
  • Object: key-value pairs enclosed by curly braces
  • Null: an empty value (null) for any valid data type

Among these types, one special attention to pay is that, Python strings which can use either single or double quotes, JSON strings are enclosed only by double quotes. Improper use of single quotes invalidates JSON data, which can’t be processed by a common JSON parser.

Besides these supported data types, it’s important to know that JSON supports nested data structures. For instance, you can embed a JSON object inside another object. For another instance, an array can consist of any supported data types, including objects. Some examples are shown below:

an object resides in another object:
{
"one": 1,
"two": {"one": 1}
}
an array consists of multiple objects:
[
{"one": 1},
{"two": 2},
{"three": 3}
]

The flexibility of mixing different data types allows us to construct very complicated data with clear structural information as all the data are saved in the form of key-value pairs.

As a common data interchange format, JSON data types have corresponding native Python data structures. Please note that this is two-way traffic — how the JSON data are converted to Python data and the same (with a few exceptions) conversion rule applies when you convert Python data to JSON data.

+-----------+----------------+
| JSON | Python |
+-----------+----------------+
| String | str |
| Number | int or float |
| Boolean | bool |
| Array | list |
| Object | dict |
| Null | NoneType |
+-----------+----------------+

These conversions should be very straightforward except that Python doesn’t have a native data type that matches numbers in JSON objects. Instead, we’ll have to use int and float to represent JSON numbers when they’re integers or real numbers. You may also notice that the table for the Python data column is missing tuple and set. Notably, a tuple is converted to an array, while a set isn’t natively convertible to an array.

When we read and decode JSON data into data structures of other programming languages, such as Python, for further processing, we say that we deserialize JSON data. In other words, the reading and decoding process is termed deserialization. In Python’s standard library, we have the json module that is specialized in deserializing JSON data.

We know that t’s common for web services to use JSON objects as API responses. Suppose that you receive the following response. To facilitate discussion, let’s express it as a Python string object.

employee_json_data = """{
"employee0": {
"firstName": "John",
"lastName": "Smith",
"age": 35,
"city": "San Francisco"
},
"employee1": {
"firstName": "Zoe",
"lastName": "Thompson",
"age": 32,
"city": "Los Angeles"
}
}"""

To read this JSON string, we simply use the loads method. As shown below, we’re able to obtain a dict object after reading the string containing the above JSON object.

The loads method is flexible. When you have a string representing a list of JSON objects, this method is smart enough to know how to parse the data accordingly. Consider the following example.

employee_json_array = '[{"employee2": "data"}, {"employee3": "data"}]'employee_list = json.loads(employee_json_array)
print(employee_list)
# [{'employee2': 'data'}, {'employee3': 'data'}]

In addition to these structured JSON objects, the loads method can also parse any JSON data types other than objects. Some examples are below.

>>> json.loads("2.2")
2.2
>>> json.loads('"A string"')
'A string'
>>> json.loads('false')
False
>>> json.loads('null') is None
True

The previous section discussed various aspects regarding the deserialization of JSON strings. However, you don’t always directly deal with strings. Sometimes, you’ll have opportunities to work with JSON files. Suppose that you run the following code to create a file that holds JSON strings.

# the JSON data to save
json_to_write='{"name": "John", "age": 35}'
# write the JSON data to a file
with open("json_test.txt", "w") as file:
file.write(json_to_write)

Certainly, you can read the file directly to create the string, which can be sent to the loads method.

with open(“json_test.txt”) as file:
json_string = file.read()
parsed_json0 = json.loads(json_string)
print(parsed_json0)
# output: {'name': 'John', 'age': 35}

Notably, the json module provides the load method that allows us to work with a file directly to parse JSON data:

with open(“json_test.txt”) as file:
parsed_json1 = json.load(file)
print(parsed_json1)
# output: {‘name’: ‘John’, ‘age’: 35}

It’s certainly clearer than the previous implementation by saving the need of creating an intermediate string object.

Here, we learned about the most basic scenarios for the load and loads methods. It should be noted that parsing JSON data is via the JSONDecoder class. Although this base class is powerful enough to handle most situations, it’s possible to define more customized behaviors by creating a subclass of the JSONDecoder class. However, if you don’t want to subclass, the load and loads methods provide other parameters through which you can define customized parsing behaviors. Curious readers can refer to the official documentation for further instructions.

Like reading JSON data, writing Python data into the JSON format involves two counterpart methods, namely dump and dumps. As the opposite of deserializing JSON data, creating JSON data is termed serialization. Thus, when we convert Python data to JSON data, we say that we serialize Python objects to JSON data.

Just like the load and loads methods, the dump and dumps methods have almost identical calling signatures. The most important difference is that the dump method writes the data to a JSON file, while the dumps method writes to a JSON-formatted string. For simplicity, we’ll be just focused on the dumps method. Consider the following example.

import jsondifferent_data = ['text', False, {"0": None, 1: [1.0, 2.0]}]json.dumps(different_data)
# output: '["text", false, {"0": null, "1": [1.0, 2.0]}]'

In this example, we notice that the dumps method creates a JSON array that holds different kinds of JSON data. The most significant observation is that although the original list object uses native Python data structures, the generated JSON string has the converted JSON data structures. Consistent with the conversion table that was shown previously, note the following conversions.

  • The string enclosed with single quotes ‘text’ is now using double quotes “text”.
  • The Python bool object False becomes false.
  • The object None becomes null.
  • Because only strings can be JSON keys, the number 1 is automatically converted to its string counterpart “1”.

Besides these automatic conversions, there are two notable features that we often use. The first one is to create JSON objects in a more readable format by using proper indentations. To do that, we need to set the indent parameter in the dumps method.

As shown above, every level is nicely indented to indicate the relative structure of JSON objects and their key-value pairs.

The other useful feature is the specification of the sort_keys parameter. By setting it to True, the created JSON strings have their keys sorted alphabetically, which make it easier for us to look up information, particularly when there are multiple items. Observe this feature below.

We’ve learned that the load and loads methods are for deserialization and the dump and dumps methods are for serialization. These method names may sound confusing to some people. Here are some tips that may help you distinguish them.

  • JSON data are external to Python, when you need to access their data, we need to “load” into Python. Therefore, loading refers to reading JSON data.
  • By contrast, to export Python data to JSON data, we “dump” the data. Therefore, dumping refers to writing JSON data.
  • If the input or output JSON data are strings, think of “s” as strings such that we append the letter “s” to the load method. Similarly, if we want to have JSON strings, we append the letter “s” to the dump method.

We’ve been focused on built-in Python data structures, in many applications, you’ll define your own custom classes when it becomes necessary that you need to serialize these custom instance objects to JSON data. Let’s consider the following class, from which we create an instance:

class Employee:
def __init__(self, name, employee_id):
self.name = name
self.employee_id = employee_id
employee = Employee("John Smith", 40)

What do you expect to happen if we try to call dumps on employee? Can it succeed? Let’s see:

json.dumps(employee)
# TypeError: Object of type Employee is not JSON serializable

No, it doesn’t work. The reason for this failure is that the dumps method is trying to create a valid JSON string. However, for a custom class’s instance, it doesn’t know what data should be encoded. Although you can create your own JSONEncoder class, a quick solution is to provide encoding instructions to the dumps method by setting the default argument.

>>> json.dumps(employee, default=lambda x: x.__dict__)
'{"name": "John Smith", "employee_id": 40}'

Here, we specify a lambda function, which retrieves the instance’s dict representation through accessing the __dict__ special attribute. We know that the built-in dict object is JSON serializable such that dumps knows to “dump” the dict object.

In this post, we reviewed the key techniques for processing JSON data in Python. Here are the key takeaways:

  1. JSON data are a standard interchange data format. When you create APIs for others to use, consider JSON as a possible format for your response data.
  2. Python has separated methods for dealing with JSON strings and files. These methods have similar calling signatures.
  3. Use proper indentations to improve the readability of JSON data. It’s especially relevant if you’re creating a JSON string. Simply specify the indent parameter when you serialize Python objects.
  4. When you have multiple key-value pairs for JSON objects, it’s often a good idea to sort the keys such that it’ll be easier to look up information.
  5. Remember that JSON keys must be strings and they require double quotes.
  6. To serialize a custom instance, you need to provide specific instructions on the serialization.

Leave a Comment