2026-01-26

I've been holding Pydantic wrong

I'm a long time Pydantic user - it's the idiomatic approach to parse don't validate for Python - I advocate running it over all data that doesn't come from your codebase (<form> input, JSON from the db, queue data, CSV data, etc).

Pydantic has been around for a while and there's a many ways of using it, following are my recommendations as of now. Jump straight to definitive example.

The first one is the biggest and most contentious:

Stop using `pydantic.BaseModel`

Instead, use stdlib dataclasss in conjunction with pydantic.TypeAdapter.

There are a number of reasons why you should do this:

Internally, you don't want validation

Validation should be an explicit step at the boundaries of your application, eg:

x = pydantic.TypeAdapter(MyDataclass).validate_python(request.post_data)

The conventional approach is also fine at the system boundaries:

x = MyBaseModel(**request.post_data)

But you end up performing validation where you don't need to, eg. when calling MyBaseModel(a=a, b=b) deep within your application.

Validation incurs some runtime cost (admittedly fairly small as of Pydantic v2) - the bigger issue is the increased API surface area of the data being passed round your application. As a developer, I'd like to look at MyBaseModel(a=a, b=b) and think "an object is being initialized", not "an object is being initialized, it calls any number of validators, it may raise a pydantic.ValidationError, it may trigger a costly model rebuild". This is a similar problem to pervasive use of ORM instances.

Once you've validated data that enters the application, lean on mypy to make sure everything lines up, not runtime validation.

Consistency of Pydantic API usage

Say you're parsing configuration from a csv, you should use Pydantic to gracefully handle date parsing etc:

for row in csv:
    typed_row = pydantic.TypeAdapter(tuple[dt.date, int]).validate_python(row)

Using TypeAdapter, validation is explicit and it looks the same everywhere.

Consistency of your types

In a modern typed Python codebase, dataclasss should be your bread-and-butter struct type (the value-added of attrs aren't worth deviating from the stdlib for). Having your datatypes be only composed of dataclasss, lists, dicts etc. and no other custom types makes it far easier to write generic code in the form:

def transform(v: T) -> T:
    if isinstance(v, list):
        return [transform(x) for x in v]
    if isinstance(v, dict):
        ...
    if is_dataclass(v):
        return v.__class__(**{field.name: transform(getattr(v, field.name) for field in fields(v)})
    raise TypeError(f"Unknown type {v.__class__}")

Using BaseModel means another type of object that you (and other library maintainers) have to consider when writing generic functions.

Model rebuilding

When you have loads of nested models, it can be costly to initialize the classes themselves and you can end up with strange import-order problems. This can be somewhat surmounted on a BaseModel with defer_build=True - then the build is triggered at the first call to MyBaseModel(a=a, b=b).

By explicity using pydantic.TypeAdapter, you know where the costly initialization is going to take place - it's not going to happen in some random test where you didn't even require validation.

Deep error handling

Consider this bug:

def my_route_handler(request):
    try:
        _process_data(request)
    except pydantic.ValidationError as e:
        return Response422(e)
    ...

def _process_data(request):
    # This might raise a `ValidationError`, but we didn't want to catch it
    x = MyOtherBaseModel(a=a, b=b)
    # We do want to catch `ValidationError`s raised here
    y = MyBaseModel(**request.post_data)
    ...

By making validation explicit, you massively reduce the chance of these kind of bugs.

Type checking

Without any configuration, mypy understands my_tuple = pydantic.TypeAdapter(tuple[int, str]).validate_python(v)

For pydantic.BaseModels to typecheck correctly on __init__, we need to add the following to our pyproject.toml:

[tool.mypy]
plugins = ["pydantic.mypy"]

[tool.pydantic-mypy]
init_forbid_extra = true
init_typed = true
warn_required_dynamic_aliases = true

There are two problems here:

As a library writer, you can't guarantee your consumers will even be able to add that configuration.
It's not possible to use other typecheckers.

Compatability with FastAPI

FastAPI is compatible with dataclasss.

Use `Annotated`

Use the Annotated pattern for the reasons outlined in the docs.

Validation

I've had various issues when composing BeforeValidator|AfterValidator|PlainSerializer with deeply nested Annotated types. Always use WrapValidator|WrapSerializer:

def _date_short_validator(v: Any, handler: Callable[[Any], Any]) -> Any:
    if isinstance(v, str):
        v = dt.datetime.strptime(v, "%y%m%d")
    return handler(v)


def _date_short_serializer(v: Any, handler: Any, info: Any) -> Any:
    if isinstance(v, dt.date):
        return v.strftime("%y%m%d")
    return handler(v)

DateShort = Annotated[
    dt.date,
    pydantic.WrapValidator(_date_short_validator),
    pydantic.WrapSerializer(_date_short_serializer),
]

@dataclass(kw_only=True)
class Foo:
    date: DateShort

As of 3.14, rather than use Pydantic's own context gubbins, use the stdlib:

date_format: contextvars.ContextVar[str] = contextvars.ContextVar(
    "date_format", default="%y%m%d"
)

def _date_short_validator(v: Any, handler: Callable[[Any], Any]) -> Any:
    if isinstance(v, str):
        v = dt.datetime.strptime(v, date_format.get())
    return handler(v)

with date_format.set("%Y-%m-%d"):
    pydantic.TypeAdapter(Foo).validate_python({"date": "2025-12-31"})

When doing class-level validation or setting derived default values, do:

@dataclass(kw_only=True)
class Foo:
    a: int | None = None
    b: int | None = None

    @pydantic.model_validator(mode="after")
    def check_for_a_or_b(self) -> Self:
        if self.a is None and self.b is None:
            raise ValueError("Expected a or b")
        return self

Discriminated unions

Where possible, bother to explicitly discriminate unions, it makes for far nicer error messages.

The definitive way to use Pydantic right now - example

import functools
import pydantic


import datetime as dt
from dataclasses import dataclass, field
from typing import Any, Annotated, Callable, Self, TypeVar
import pydantic
import contextvars

# mypy often struggles with `functools.cache`
cache: Callable[[T], T] = functools.cache  # type: ignore
T = TypeVar("T")

# Do different validations in different contexts
date_format: contextvars.ContextVar[str] = contextvars.ContextVar(
    "date_format", default="%y%m%d"
)


# Always use WrapValidator|WrapSerializer
def _date_short_validator(v: Any, handler: Callable[[Any], Any]) -> Any:
    if isinstance(v, str):
        v = dt.datetime.strptime(v, date_format.get())
    return handler(v)


def _date_short_serializer(v: Any, handler: Any, info: Any) -> Any:
    if isinstance(v, dt.date):
        return v.strftime(date_format.get())
    return handler(v)


DateShort = Annotated[
    dt.date,
    pydantic.WrapValidator(_date_short_validator),
    pydantic.WrapSerializer(_date_short_serializer),
]


# Don't use `BaseModel`
@dataclass(kw_only=True)
class MyDataclass:
    date: DateShort
    a: int | None = None
    b: int | None = None
    # Always use `Annotated`
    x: Annotated[
        list[str],
        # Adding to the JSONSchema
        pydantic.Field(json_schema_extra={"x-foo": 1}),
    ] = field(default_factory=list)

    # Model level checks
    @pydantic.model_validator(mode="after")
    def check_for_a_or_b(self) -> Self:
        if self.a is None and self.b is None:
            raise ValueError("Expected a or b")
        return self

    # Add configuration
    __pydantic_config__ = pydantic.ConfigDict(
        str_to_upper=True,
    )


@cache  # constructing `TypeAdapter`s is slow
def type_adapter(cls: type[T]) -> pydantic.TypeAdapter[T]:
    return pydantic.TypeAdapter(cls)


# Validate any type
my_tuple = type_adapter(tuple[int, str]).validate_python([1, "two"])
my_dataclass = type_adapter(MyDataclass).validate_python(
    {
        "x": ["a", "b", "c"],
        "date": "251231",
        "a": 1,
        # We ignore extra fields on `.validate_python()` but not on `__init__`
        "extra": 0,
    },
)
# Serialize data
jsonable = type_adapter(MyDataclass).dump_python(my_dataclass, mode="json")
# Construct JSONSchema
json_schema = type_adapter(MyDataclass).json_schema()

# Use the context
with date_format.set("%Y-%m-%d"):
    my_dataclass = type_adapter(MyDataclass).validate_python(...)