When working with Python classes, you have two kinds of variables you can work with: class variables and instance variables. We won’t get deep into the similarities and differences here, nor various use cases for each kind. For that, I’ll refer you to this article. For our purposes here, just know that class variables and instance variables both have their own unique uses.

In this post, we’ll be looking at one particular use case: using Python class and instance variables in multithreaded applications. Why does this matter? As with any programming language, choice of variable types can be critical to the success or failure of your application.

With that in mind, let’s explore some of the options you have when working with Python variables in a multithreaded context. Knowing which option to choose can make all the difference.

Class variables are declared at the top of your class and can be accessed by every object that belongs to this particular class (i.e., each instance shares a copy of each class variable); while on the other hand, instance variables use the indicator called “self” and are tied to a specific instance. Each instance of an object will have its own set of instance variables, but share the class variables.

Class variables look like this:

class obj:
  data = [0,0,0]
Code language: Python (python)

Instance variables look like this:

class obj:
  def __init__(self):
    self.data = [0,0,0]
Code language: Python (python)

I recently ran into an issue with class vs. instance variables in a multithreaded application. A colleague of mine was debugging a UI application that was communicating on an interface and shipping the received data off to a UI for display. He found it would crash randomly, so we switched up the architecture a bit to receive data on one thread and then pass it through a queue to the UI thread to consume. This seemed to resolve the crash, but the data in the UI was wrong.

Digging into the problem, we found that the data was changing as it passed through the queue. After some more digging, my colleague realized that the class that was implemented to push the data through the queue was utilizing class variables instead of instance variables.

This simple program illustrates the issue:

import queue
import threading
import time

q1 = queue.Queue()

class obj:
  id = 0
  data = [0,0,0]

def thread_fn(type):
    d = q1.get()
    preprocessing = d.data
    time.sleep(3)

    # Check data members post "processing", after modified by other thread
    if preprocessing != d.data:
      print(f"{type}: Before data: {preprocessing} != After data: {d.data}")
    else:
      print(f"{type}: Before data: {preprocessing} == After data: {d.data}")

if __name__ == "__main__":
    x = threading.Thread(target=thread_fn, args=("ClassVars",))
    obj.id = 1
    obj.data = [1,2,3]
    q1.put(obj)

    x.start()

    # Update the data
    obj.id = 2
    obj.data = [4,5,6]
    q1.put(obj)

    x.join()
Code language: Python (python)

Essentially what was happening is that the data would be received on the interface (in this case the main function) and put into the queue. As the UI was getting around to processing said data, new data would be received on the interface and put it into the queue. Since class variables were used originally (and the class object was used directly), the old data got overwritten with the new data in the class and the UI would have the wrong data and generate errors during processing.

Once the underlying message class was changed to use instance variables, the “bad data” issue went away and the original problem of the crashing application was also resolved with the architecture change. Take a look at the difference in this program:

import queue
import threading
import time

q1 = queue.Queue()

class obj:
  def __init__(self):
    self.id = 0
    self.data = ['x','y','z']

def thread_fn(type):
    d = q1.get()
    preprocessing = d.data
    time.sleep(3)

    # Check data members post "processing", after modified by other thread
    if preprocessing != d.data:
      print(f"{type}: Before data: {preprocessing} != After data: {d.data}")
    else:
      print(f"{type}: Before data: {preprocessing} == After data: {d.data}")

if __name__ == "__main__":
    x = threading.Thread(target=thread_fn, args=("InstanceVars",))
    obj1 = obj()
    obj1.id = 1
    obj1.data = [1,2,3]
    q1.put(obj1)

    x.start()

    # Update the data
    obj2 = obj()
    obj2.id = 2
    obj2.data = [4,5,6]
    q1.put(obj2)

    x.join()
Code language: Python (python)

As you can see, using instance variables requires that we create an instance of each object to begin with. This ensures that each object created has its own data members that are independent of the other instances, which is exactly what we required in this scenario. This single change along would have likely cleaned up the issues we were seeing, but would not have fixed the root of the problem.

When passing through the queue, the thread would get each instance and use the correct data for processing. Nothing in the thread function had to change; only how the data feeding it was set up.

Python is a great language, but it definitely has its quirks. The next time you hit a snag while trying to parallelize your application, take a step back and understand the features of your programming language. Understanding the peculiarities of your chosen language is the mark of a mindful programmer! With a bit of space to gather your wits and some careful, conscious coding, you can avoid these pesky pitfalls and create fast, reliable threaded applications. What other Python nuances have bitten you in the past? Let us know in the comments below!

Last modified: January 30, 2023

Author

Comments

Write a Reply or Comment

Your email address will not be published.