Understanding python reflection: Build a Custom dataclass from Scratch

A simple search on YouTube will show you many videos titled "Data classes are amazing." However, this video is not about how amazing data classes are. Instead, it focuses on the power and elegance of reflection in Python and how simple type annotations in Python can help you create your own dataclasses.

Reflection

Reflection is a powerful feature in programming that allows a program to inspect and interact with its own structure and behavior at runtime. In simpler terms, reflection allows your code to examine itself. It can look at the types, attributes, and methods of objects and classes dynamically, and even modify them.

Some common examples of reflection used in Python are setattr, getattr, callable, issubclass, and inspect.getsource.

Type annotations

Type annotations have been part of Python for some time now. However, since Python is dynamically typed, these annotations aren't enforced by the interpreter but are used by static type checkers like mypy and other LSPs. I believe that combining these type annotations with reflection can be a powerful tool to create your data validators. In fact, Pydantic is developed based on these concepts.

def sum(no1: int, no2: int)-> int:
    return no1 + no2 

sum.__annotations__
# Output - {'no1': int, 'no2': int, 'return': int}

class Vehicle:
    mileage: float
    top_speed: float
    odometer_reading: int

Vehicle.__annotations__
# Output - {'mileage': float, 'top_speed': float, 'odometer_reading': int}

The above example explains how type annotations are stored as dunder attributes in a python class.

Dataclass

from dataclasses import dataclass

@dataclass
class Vehicle:
    mileage: float
    top_speed: float
    odometer_reading: int

vehicle = Vehicle(42, 120, 45000)
print(vehicle.__str__())
# Output: Vehicle(mileage=42, top_speed=120, odometer_reading=45000)
vehicle.__dict__
# Output: {'mileage': 42, 'top_speed': 120, 'odometer_reading': 45000}

The above code shows how a dataclass works. We can see that various special methods are generated, and a constructor is created that takes the class annotations as parameters and initializes them as instance variables.

This is not a built-in Python feature; all this work is done by the dataclass decorator. Now, we will create our own minimal dataclass decorator that performs the same tasks.

Implementation

Now let's see what happens when we don't use any dataclass decorator.

from dataclasses import dataclass

class Vehicle:
    mileage: float
    top_speed: float
    odometer_reading: int

vehicle = Vehicle(42, 120, 45000)
# Output(error): TypeError: Vehicle() takes no arguments

Without the dataclass decorator, special methods like __init__ are not implemented. Using reflection and annotations, we will create these special methods ourselves.

For this we will first create a decorator named datakilas.

def datakilas(cls):
    annotations = cls.__annotations__
    print(f"{annotations=}")
    # logic to dynamically assign __init__ method into cls
    return cls

@datakilas
class Vehicle:
    mileage: float
    top_speed: float

# Output: annotations={'mileage': <class 'float'>, 'top_speed': <class 'float'>}

Now that we know our next task, let's dynamically create the constructor and representational (__str__) functions for the decorated class.


def datakilas(cls):
    annotations = cls.__annotations__
    print(f"{annotations=}")

    # Define the __init__ method for the class
    def init_func(self, **kwargs):
        for key in annotations:
            setattr(self, key, kwargs.get(key, None))

    # Define the __str__ method
    def str_func(self):
        # Create a string representation of the class with its attributes
        attr_str = ', '.join(f'{key}={getattr(self, key)}' for key in annotations)
        return f'{cls.__name__}({attr_str})'

    # Add the __init__ and __str__ method to the class attributes
    cls.__init__ = init_func
    cls.__str__ = str_func

    return cls

In the above program, we dynamically create the __init__ and __str__ dunders using the class annotations. Now we're using the new datakilas decorator, we can easily create a simple dataclass.

@datakilas
class Vehicle:
    mileage: float
    top_speed: float
vehicle = Vehicle(mileage=15, top_speed=200)

print(vehicle)
# Output: Vehicle(mileage=15, top_speed=200)

print(vehicle.__dict__)
# Output: {'mileage': 15, 'top_speed': 200}

The current implementation expects keyword arguments in class constructors. To understand it better, you can update the implementation to accept non-keyword arguments as well.

Conclusion

This way, we can easily create a basic version of a complex feature like dataclass using Python's simple reflection abilities. Although we didn't explicitly use reflection-specific tools, we used Python's powerful language features to update a class's behavior at runtime and dynamically add constructors and other methods to it. I think this serves as a perfect example of the power of reflection in Python, showing how easy and sophisticated it is to write complex programming logic in Python.