2024-07-23

Weeding out flakey database tests

Flakey tests are caused by non-determinism - using the current time, calls to random number generators, relying on hashmap/set ordering, mutable globals leaking between tests, relying on implicit database ordering. Let's focus on the latter.

In Postgres, doing SELECT * FROM t without an ORDER BY returns rows in no guaranteed order. However, the order is somewhat predictable, meaning that a test relying on the row order may pass most of the time, but eg. the introduction of a preceding test that writes to the database might change said order and cause the test to fail.

Doing the following yields consistent results for me at the time of writing (rows returned somewhat in order of insert/update time), but again, Postgres provides no guarantees to that effect.

CREATE TABLE t (x INT);
INSERT INTO t VALUES (1), (2), (3);
SELECT * FROM t;  -- 1, 2, 3
DELETE FROM t WHERE x = 1;
INSERT INTO t VALUES (1);
SELECT * FROM t;  -- 2, 3, 1

It would be nice if Postgres had a SET row_order_without_explicit_clause = 'random' setting - then we could run all of our tests a few times with it turned on and weed out any tests that fail. It doesn't, but with some hackery, we can acheive similar results.


Consider a classic test run:

Let's add some extra steps:

Some notes on this implementation:

Python + Pytest + Postgres + Django implementation

Following is an implementation for Python + Pytest + Postgres + Django, place the fixture in your conftest.py, then call by running:

pytest ... --unorder-tables

Conftest code:

def pytest_addoption(parser: Any) -> None:
    parser.addoption(
        "--unorder-tables",
        action="store_true",
        help="Proxy the db tables through unordered VIEWs",
    )
@pytest.fixture(autouse=True, scope="session")
def unorder_tables(request: Any) -> Iterator[None]:
    if not request.config.getoption("--unorder-tables"):
        yield None
        return

    request.getfixturevalue("django_db_setup")
    conn = psycopg2.connect(
        dbname=settings.DATABASES["default"]["NAME"],
        host=settings.DATABASES["default"]["HOST"],
        port=settings.DATABASES["default"]["PORT"],
        user=settings.DATABASES["default"]["USER"],
        password=settings.DATABASES["default"]["PASSWORD"],
    )
    cur = conn.cursor()

    # Get all the table names
    qry = """
        SELECT table_name
        FROM information_schema.tables
        WHERE table_schema='public'
        AND table_type='BASE TABLE'
        AND table_name != 'django_migrations'
    """
    cur.execute(qry)
    table_names: list[str] = [table_name for (table_name,) in cur.fetchall()]

    # Make a VIEW proxy for each table, ORDERed by the internal Postgres id, DESC
    prefix = "_original_"
    for table_name in table_names:
        cur.execute(f"ALTER TABLE {table_name} RENAME TO {prefix}{table_name}")
    for table_name in table_names:
        cur.execute(
            f"CREATE VIEW {table_name} AS SELECT * FROM {prefix}{table_name} ORDER BY ctid DESC"
        )
    conn.commit()

    yield None

    # Reinstate the original tables
    for table_name in table_names:
        cur.execute(f"DROP VIEW {table_name}")
    for table_name in table_names:
        cur.execute(f"ALTER TABLE {prefix}{table_name} RENAME TO {table_name}")
    conn.commit()

There's probably edge cases that won't work - so no guarantees - but on my machine it weeded out 3 tests in our suite that were reliant on implicit ordering - success!

Future