Basic Python Memory Management and Diagnostics: A Real-World Example

By design, Python tries to make memory management simple for its users. While Python is usually implemented with C, it does away with the need for manual memory allocation and deallocation that C is famous (or infamous) for.

However, just because Python makes memory management simple, it doesn’t mean that best practices can completely be ignored. In this article we will examine a simple case where one of our Python servers were crashing due to careless use of memory.

An Introduction to Python Memory Management

First we will go over the basics of memory management in Python. The important concepts that we should be aware of are stack memory, heap memory, and garbage collection. Note that this information pertains specifically to CPython, which is the reference implementation of Python and the most widely used version. Other implementations are likely to be similar, but it is not guaranteed.

Stack Memory

Stack memory is how variables in a given context are handled. Whenever a new context is created, such as a function call or a loop iteration, a corresponding stack frame is created. This stack frame contains information about the context, such as its local variables and function return addresses. It is then placed “on top” of the previous stack frame, which gives it its name. When the context completes, such as when a function returns, the corresponding frame is removed from the stack, removing its variables, and leaving the previous stack frame on top. As an example, if a function example_function() is called, a frame is allocated on the stack. If example_function() calls another function example_function2(), then a new frame is created for the second function. When example_function2() completes, then its frame is removed from the stack, leaving the frame for example_function() to be on top.

Side note: the infamous stack overflow exception is caused by a program exceeding memory limits on its stack, usually due to an infinite recursion on a function. Python attempts to prevent this by default by having a function depth limit of 1000. This can be configured with sys.setrecursionlimit()

For a given context, its corresponding stack frame will have a constant size. Even if it is a stack of a function with parameters, its size on the stack will always remain the same each time that specific function is called, regardless of the values of the parameters passed to it. This predictability in size allows for faster memory allocation and deallocation, which is suitable for the stack’s purpose as a quick memory store.

If function calls use stack frames which have a constant size, how then is it possible for the same function to be called with parameters of varying sizes? After all it is very common to find functions that accept lists or dictionaries which have variable lengths. Python accomplishes this by the use of heap memory.

In order to store dynamically-sized variables, the stack only contains references to this data. Their actual values are stored in heap memory.

Heap Memory

While stack memory is smaller and is quicker to allocate and deallocate, heap memory on the other hand is larger, with more expensive, or slower, memory allocation and deallocation. Just like how their names imply, stack memory can be seen as neat stacks, while heap memory is more like a large, disorganized heap. In order to access the data in this heap, pointers or references are needed. These references are then stored in the stack memory so they can be accessed by the program. This means that when a Python function is called, only the references are passed as parameters to the function, while their actual values are found on the heap. This is called call by reference or pass by reference.

The heap can be imagined as a sort of warehouse that stores data, with the reference being a note that contains the exact location of the item in the warehouse. Without this reference it would be unable to access the data in the heap.

In CPython, which is the most widely used implementation of Python, the memory address of a variable can be found with the id() function.

Python
>>> test_var = "hello" # assign "hello" to test_var
>>> test_var # the value of test_var is "hello"
'hello'
>>> id(test_var) # get the address of test_var
140637474196080
>>> test_var2 = test_var # assign the value of test_var to test_var2
>>> test_var2 # the value of test_var2 is the same as test_var
'hello'
>>> id(test_var2) # the address of test_var2 is the same as test_var
140637474196080
test_var and test_var2 are two variables referring to the same object on the heap

If there is no longer any variable that holds a reference to an object in the heap, then that means that that this object is now unused so it can be deleted and the memory is freed up and can be reused. This is the job of the garbage collector.

The Garbage Collector

Those familiar with C may have experience with manually allocating and deallocating memory. This is very error prone and often leads to memory leaks, where memory is allocated, but is not deallocated when it is no longer in use. Luckily for Python users, while it may be implemented in C, the Python runtime conveniently handles memory management, with a garbage collector handling the deletion of unused objects.

The main mechanism for determining whether or not an object is still in use is by reference counting. Each object keeps track of the number of references to it. Each time the object is assigned to a new variable will increase the count, and each time a reference to the object is deleted will decrease the count. Deletion of a reference usually happens when the variable goes out of scope, but it can also be manually triggered by using the del keyword.

The current number of references to an object can be seen by using the sys.getrefcount() function.

Python
>>> import sys
>>> foo = "an object" # create a string object and assign to foo
>>> sys.getrefcount(foo) # sys.getrefcount() itself has an extra reference to the object
2
>>> bar = foo # assign the same object to another variable
>>> sys.getrefcount(foo) # the reference count increases
3
>>> del bar # delete the second variable
>>> sys.getrefcount(foo) # the reference count decreases
2
>>> bar # bar no longer exists
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'bar' is not defined
An example of reference counting

Simple reference counting doesn’t cover all cases for object deletion, however. There are cases known as reference cycles where one or more objects hold references to each other, but there are no references from the outside, forming a closed “loop.”

Python
>>> import sys
>>> foo = [] # create an empty list
>>> sys.getrefcount(foo)
2
>>> foo.append(foo) # add list to itself
>>> sys.getrefcount(foo) # the list now contains a reference to itself
3
>>> del foo # the "external" reference is deleted but the object still refers to itself
>>> foo
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'foo' is not defined
Simple example of a reference cycle

In the scenario above, an object holds a reference to itself. When the external reference is deleted, there would still be at least one reference to the object, which would make it seem like it is still in use. The garbage collector has another algorithm for detecting these “unreachable” objects, the specifics of which are too advanced for this article but can be found in the Python documentation.

A Note on CPython Variables

Those who have experience in other languages may be aware that in those languages, value data types (as they are known in C/C++) or primitive data types (as they are known in Java) such as integers or floating point numbers are stored as-is directly on the stack, while reference types such as arrays only have references on the stack as their values are stored in the heap. CPython on the other hand, makes no distinction between these two. All variables are objects stored in the heap, with only references stored in the stack. This also means that all functions are called by reference.

Python
>>> isinstance(1, object) # a number is an object
True
>>> id(1) # the number has an address in the heap
10848872
>>> foo = 1 # assign the number to a variable
>>> id(foo) # the variable has the same address
10848872
Numbers are objects

Example Memory Management Issue

While Python’s automatic memory management systems make it simple for the developer and help prevent bugs caused by manual allocation and deallocation, but it is still possible to run into issues due to careless use of memory. This article will tackle one such instance that we have run into during development. (Note: The code examples given will be heavily reduced and simplified as the project uses proprietary code)

Project Description

The project is a Python backend server that performs some computations on large datasets. It was implemented on Flask served with Gunicorn running on Kubernetes with an NGINX ingress. In order to more quickly serve client requests, the server precomputes data and stores the model so that they will no longer have to be computed every request. This precomputation step is meant to be run periodically as new data is collected.

The Problem

Upon first deploying the server and attempting to run the precomputation, an issue became evident: Attempting to run it would only result in a **502 Bad Gateway** response by NGINX. Getting a response like this usually means that the server behind the NGINX gateway has failed in such a manner that it does not return a proper response. Checking the logs of the Gunicorn server then showed messages similar to [WARNING] Worker with pid 23 was terminated due to signal 9. Looking up this error message reveals that processes are being killed due to running out of memory.

Diagnostics: tracemalloc

Luckily, Python has a built-in library for helping diagnose memory usage: tracemalloc. It has many tools to measure and trace memory usage, but for this basic example we will be focusing on three of its functions: start(), get_traced_memory(), and stop().

As evident in its name, start() begins tracing memory allocations. More specifically, it adds hooks on to memory allocation and deallocation in order to keep track of memory usage. The get_traced_memory() function returns a two-item tuple containing current memory usage and peak memory usage in bytes. Finally stop(), does the opposite of start(), removing the hooks from memory allocation and deallocation, and clearing memory usage data.

Python
>>> import tracemalloc
>>> tracemalloc.get_traced_memory() # tracemalloc has not started yet, no data
(0, 0)
>>> tracemalloc.start() # begin tracing
>>> tracemalloc.get_traced_memory()
(8916, 19841)
>>> foo = [n for n in range(1000000)] # initialize large list
>>> tracemalloc.get_traced_memory() # current and peak have increased
(40449480, 40460405)
>>> del foo # free memory of large list
>>> tracemalloc.get_traced_memory() # current has dropped. peak remains
(8980, 40460405)
>>> tracemalloc.stop() # end tracing
>>> tracemalloc.get_traced_memory() # data cleared after stop()
(0, 0)
tracemalloc demonstration

test_var and test_var2 are two variables referring to the same object on the heap

Memory tracing can also be started automatically as soon as the program starts by setting the PYTHONTRACEMALLOC environment variable.

Practical use

Our precomputation function involves reading large data files, and performing operations on these. The functions that processed these data files were a good place to start looking for optimizations. A simplified version of this is be something similar to the following:

Python
def precompute_data():
    file_a = read_file_a()
    file_b = read_file_b()
    file_c = read_file_c()
    file_d = read_file_d()

    processed_data = process_data(file_a, file_b, file_c, file_d)

    # more operations omitted
    # ...
    
    return computed_data
Sample function

In order to help analyze memory issues, a simple function was created to print usage data:

Python
def print_mem_usage():
    current, peak = tracemalloc.get_traced_memory()
    print(f"current={current}, peak={peak}")
Convenience function for printing memory usage

This was then added into the precompute function at certain points:

Python
def precompute_data():
    tracemalloc.start()
    print("start:")
    print_mem_usage()

    file_a = read_file_a()
    print("with file a:")
    print_mem_usage()

    file_b = read_file_b()
    print("with file b:")
    print_mem_usage()

    file_c = read_file_c()
    print("with file c:")
    print_mem_usage()

    file_d = read_file_d()
    print("with file d:")
    print_mem_usage()

    processed_data = process_data(file_a, file_b, file_c, file_d)
    print("with processed data:")
    print_mem_usage()

    # more operations omitted
    # ...

    print("before return:")
    print_mem_usage()
    
    tracemalloc.stop()
    return computed_databasic
Logging memory usage

In this case start() and stop() were called inside the function since we were only concerned with memory usage per-call. Other use cases may have tracing done across multiple functions throughout the program’s entire lifecycle.

Running the function then printed the following:

Plaintext
start:
current=0, peak=384
with file a:
current=65537594, peak=262606174
with file b:
current=65693972, peak=262606174
with file c:
current=74243717, peak=262606174
with file d:
current=77835403, peak=262606174
with processed data:
current=78425966, peak=262606174
additional logs:
current=83174450, peak=262606174
current=83705728, peak=340133801
current=83706863, peak=340133801
before return:
current=90340254, peak=340133801
Memory usage logs

From these logs, it can be seen that the function peaks above 340MB. As expected, reading the data from the files took a lot of space, with those functions above 260MB, and keeping the data in-memory use up more than 75MB.

Now that we have confirmed that the variables file_a through file_d take up a lot of space, we can use this as a starting point for optimizing the program’s memory usage. Looking through the rest of the function, it turns out that after they are used as parameters for process_data(), they are no longer used for the rest of the function, taking up space that could be reused instead. Deleting these variables with del allows the memory to be freed up and reused:

Python
processed_data = process_data(file_a, file_b, file_c, file_d)
print("with processed data:")
print_mem_usage()

del file_a
del file_b
del file_c
del file_d
print("after deleting:")
print_mem_usage()
A snippet of precompute_data()

After deleting these variables, running the program again shows a change in memory usage:

Plaintext
start:
current=0, peak=384
with file a:
current=65529105, peak=262596992
with file b:
current=65682918, peak=262596992
with file c:
current=74232761, peak=262596992
with file d:
current=77812019, peak=262596992
with processed data:
current=78270099, peak=262596992
after deleting:
current=282717, peak=262596992
additional logs:
current=3698072, peak=262596992
current=4227324, peak=262596992
current=4227324, peak=262596992
before return:
current=10858100, peak=262596992
Logs after deleting variables

As seen from the logs, deleting the files after they are no longer needed instantly freed up 75MB. Freeing up this memory earlier on, prevented the function from peaking above 300MB as opposed to the earlier example. As shown here, the memory usage has been greatly reduced just by deleting unused variables, with the peak usage dropping by more than 20%. This demonstrates how simply keeping track of variables can make big improvements in performance.

Manually deleting the variables in this manner, while effective, is a clunky solution. Since the initial data processing step can be self-contained, a better solution would be to refactor it out into its own function, like so:

Python
def get_processed_data():
    file_a = read_file_a()
    file_b = read_file_b()
    file_c = read_file_c()
    file_d = read_file_d()

    return process_data(file_a, file_b, file_c, file_d)

def precompute_data():
    processed_data = get_processed_data()

    # more operations omitted
    # ...

    return computed_data
Refactored precompute_data()

By factoring out these lines into their own function, the Python runtime automatically knows that these variables will be unused when the function returns, automatically garbage collecting them. This is much easier than keeping track of and manually deleting variables. In general, splitting large functions into smaller ones like this is already best practice for increasing code readability, and in addition to that, this also demonstrates how it can improve performance.

There is of course additional optimization to be done, hinted by how read_file_a() causes a large increase in peak memory usage, but improvements are specific to the data and algorithm, so we will not delve into that in this article.

Conclusion

This example, though very simple, is a real example that shows how neglecting the basics of memory management can lead to inefficient programs. The solution here was only the first step in reducing the memory usage. Additional reductions were made stream processing the files (i.e. filtering them as they are read rather than reading the whole file before filtering) and by refinements to the algorithm used.

We offer Python development and other software consultancy services. Have a project in mind? Send us a message!