Sentinel

Real-time account-takeover risk engine

Model card

xgboost won on PR-AUC. It was scored on a temporal hold-out, meaning we train on the earliest events and test on the latest, never a random shuffle that would let the model peek at the future. That test set is 58,031 events at 3.95% ATO. Scores are isotonic-calibrated, and every number below comes from that calibrated test set.

PR-AUC

0.979

vs prevalence baseline

ROC-AUC

0.999

Recall @2% FPR

99.8%

alert budget

Precision

78.1%

at operating threshold

Recall

99.8%

at operating threshold

Brier

0.0040

calibration error

Precision–Recall

At 3% prevalence this is the curve that matters. It sits far above the baseline.

Calibration

Isotonic-calibrated scores track how often ATO actually happens, so a 0.8 means about 80%.

Feature importance

Velocity, device and network novelty, and failed-attempt counts do most of the work.

Score separation

Legit vs ATO scores on a log scale. Where they overlap is error nothing can fix.

Detection by attack type

Campaign	Events	Detected	Recall
credential stuffing	2194	2192	99.9%
impossible travel	44	43	97.7%
new device takeover	53	51	96.2%

Noisy credential-stuffing is easy and gets caught almost every time. The hard ones are the quiet session hijacks that log in from the victim's own device, and some of those nothing behavioral can catch.

Model comparison & operating point

Model	PR-AUC (test)
logreg	0.9710
xgboost ★	0.9839
lightgbm	0.9832

The operating threshold is 0.027, the highest-recall point that stays within an FPR of 2%(the SOC's alert budget). At that threshold the test set gives 2286 true positives, 641 false positives, and 5 misses.