Sentinel

Real-time account-takeover risk engine

Model card

xgboost won on PR-AUC. It was scored on a temporal hold-out, meaning we train on the earliest events and test on the latest, never a random shuffle that would let the model peek at the future. That test set is 58,031 events at 3.95% ATO. Scores are isotonic-calibrated, and every number below comes from that calibrated test set.

PR-AUC
0.979
vs prevalence baseline
ROC-AUC
0.999
Recall @2% FPR
99.8%
alert budget
Precision
78.1%
at operating threshold
Recall
99.8%
at operating threshold
Brier
0.0040
calibration error

Precision–Recall

Precision–Recall

At 3% prevalence this is the curve that matters. It sits far above the baseline.

Calibration

Calibration

Isotonic-calibrated scores track how often ATO actually happens, so a 0.8 means about 80%.

Feature importance

Feature importance

Velocity, device and network novelty, and failed-attempt counts do most of the work.

Score separation

Score separation

Legit vs ATO scores on a log scale. Where they overlap is error nothing can fix.

Detection by attack type

CampaignEventsDetectedRecall
credential stuffing2194219299.9%
impossible travel444397.7%
new device takeover535196.2%

Noisy credential-stuffing is easy and gets caught almost every time. The hard ones are the quiet session hijacks that log in from the victim's own device, and some of those nothing behavioral can catch.

Model comparison & operating point

ModelPR-AUC (test)
logreg 0.9710
xgboost 0.9839
lightgbm 0.9832

The operating threshold is 0.027, the highest-recall point that stays within an FPR of 2%(the SOC's alert budget). At that threshold the test set gives 2286 true positives, 641 false positives, and 5 misses.