Building a Credit Risk Feature Set From Transactions (Without Lying to Yourself)

Credit risk sounds intimidating until you realize it’s mostly one question asked a thousand different ways: based on what we know right now, how likely is it that this borrower pays us back?

The tricky part is that “what we know right now” is where a lot of projects quietly fall apart. People build beautiful features, run a model, get great accuracy… and then realize the features accidentally used information from the future. That’s like predicting tomorrow’s weather using tomorrow’s forecast. It works, but it’s not exactly honest.

This post walks through how I’d turn raw transaction data into a credit-risk-ready feature set in a way that’s practical, explainable, and actually usable.

What makes transaction data valuable for risk

Transaction data is useful because it shows behavior, not promises.

Income consistency shows up in deposit patterns.
Spending habits show up in merchant categories and timing.
Cash flow stress shows up in overdrafts, fee events, and volatility.
Debt pressure shows up in recurring payments and utilization-like patterns.

The goal isn’t to profile someone’s life. It’s to summarize financial stability in a way that’s measurable and fair.

Start with the table you wish you had

Raw transactions are messy. Before building features, I like to create a clean base table with a few standardized columns:

account_id
transaction_id
posted_at (a real timestamp)
amount (with clear sign convention)
merchant_name
merchant_category
transaction_type (debit/credit)
description
balance_after (if available)

If you don’t have balance_after, you can still do a lot, but you’ll lean more heavily on inflows/outflows.

Define the moment in time you’re predicting from

This is the part that keeps you honest.

You need an “as of” date for each borrower. That is the date when the credit decision would have been made. All features must use only data on or before that date.

If you’re building this from historical data, you’ll also need an outcome window, like “did they default within 90 days after origination?” The features use the past; the label uses the future.

So the structure is:

as_of_date (decision date)
lookback window (past 30/60/90 days)
outcome window (next 60/90/180 days)

If you get this wrong, the model can look amazing and still be useless.

Pick features that are explainable and durable

I like features that a non-technical person can understand in one sentence. That doesn’t mean they’re simplistic. It means they’re defensible.

Here are categories that usually work well.

Income consistency features

You’re not trying to “detect salary” perfectly. You’re trying to measure stability.

Pay frequency estimate (weekly/biweekly/monthly)
Median deposit amount over a lookback window
Deposit volatility (how much deposits vary)
Longest gap between deposits
Share of income from top 1 source (concentration risk)

Example: deposit regularity by month

WITH daily AS (
  SELECT
    account_id,
    DATE(posted_at) AS d,
    SUM(CASE WHEN amount < 0 THEN ABS(amount) ELSE 0 END) AS daily_spend
  FROM clean_transactions
  WHERE posted_at <= @as_of_date
    AND posted_at >= DATE_SUB(@as_of_date, INTERVAL 90 DAY)
  GROUP BY 1, 2
)
SELECT
  account_id,
  AVG(daily_spend) AS avg_daily_spend,
  STDDEV_SAMP(daily_spend) AS std_daily_spend
FROM daily
GROUP BY account_id;

Cash flow stress features

These capture “are they constantly close to the edge?”

Number of days with negative balance (if balance exists)
Count of overdraft/NSF fees (category-based)
Low-cash days (balance below a threshold)
Outflow-to-inflow ratio
Biggest single-day spend compared to typical days

Example: outflow/inflow ratio

SELECT
  account_id,
  SUM(CASE WHEN amount < 0 THEN ABS(amount) ELSE 0 END) /
  NULLIF(SUM(CASE WHEN amount > 0 THEN amount ELSE 0 END), 0) AS outflow_inflow_ratio
FROM clean_transactions
WHERE posted_at <= @as_of_date
  AND posted_at >= DATE_SUB(@as_of_date, INTERVAL 60 DAY)
GROUP BY account_id;

Spending stability features

These answer: do they have predictable spending patterns or big swings?

Daily spend volatility
Weekend vs weekday spend ratio
Share of spend in essentials categories
Number of distinct merchants (behavior complexity)
Largest category concentration

Example: daily spend volatility

WITH daily AS (
  SELECT
    account_id,
    DATE(posted_at) AS d,
    SUM(CASE WHEN amount < 0 THEN ABS(amount) ELSE 0 END) AS daily_spend
  FROM clean_transactions
  WHERE posted_at <= @as_of_date
    AND posted_at >= DATE_SUB(@as_of_date, INTERVAL 90 DAY)
  GROUP BY 1, 2
)
SELECT
  account_id,
  AVG(daily_spend) AS avg_daily_spend,
  STDDEV_SAMP(daily_spend) AS std_daily_spend
FROM daily
GROUP BY account_id;

Debt pressure features

If you can identify recurring payments (rent, utilities, loans), you can estimate obligations.

Count of recurring bills
Total recurring amount as a share of deposits
Presence of debt collections indicators (if categories exist)
Late fee patterns (fees often correlate with stress)

This can be done with merchant matching plus cadence checks, but even a light approach can help.

The biggest mistake: leaking future information

Here’s what “leakage” looks like in the wild:

Using transactions that happened after the as_of_date
Using “30-day balance change” when the change includes future days
Joining to a table that was updated later and not point-in-time safe
Using “days since last payment” calculated from a future payment record

The fix is to build features with the as_of_date baked into every query and to ensure any joins only pull records available at that date.

Make it usable: one feature table per as_of_date

Instead of generating features once, I like to generate a table like:

account_id
as_of_date
feature_1
feature_2
…
feature_n

This makes training and scoring consistent. It also forces you to treat the as_of_date seriously.

Quality checks before you trust the features

Before modeling, I run checks like:

Null rate by feature
Outliers (impossible values, negative ratios, division issues)
Stability over time (features drifting unexpectedly)
Correlation sanity (features that are suspiciously predictive may be leaking)

A model should be impressive, but the data should be believable first.

Closing thoughts

The coolest part of credit risk isn’t the model. It’s turning messy transaction streams into clean signals that stay honest over time.

When you build features that are explainable, point-in-time correct, and easy to reproduce, you end up with something employers love: a real data product that could actually be used in production, not just a one-off notebook that looked good once.