Case Study — B2B SaaS agentic lead-scoring deployment | GTMify Cortex

The situation

A B2B SaaS team had spent eight months debating their lead scoring model — signals, weights, fit vs intent, predictive vs explainable, vendor vs in-house. Pipeline lift had not improved. Reps were making manual judgments at lower precision than even a mediocre agent would have provided. The CRO knew the debate was the wrong fight but couldn't articulate why.

The actual problem wasn't the model. It was the orchestration around the model.

What I built

GTMify Cortex — a production scoring agent, not a scoring model:

Continuous signal ingestion. Re-scores on every event (Clay enrichment, intent signal, CRM activity, LinkedIn change, response received), not on a nightly batch schedule.
ICP tier logic. Three tiers based on fit, with explicit disqualifier reasoning surfaced per lead so reps know why the score is what it is.
Next-best-action layer. For every score band, a default action — enrichment, routing, drafted outbound, AE handoff, suppression — that fires automatically. Reps approve or override; they don't initiate.
Closed feedback loop. Outcomes — replied, booked, no-showed, closed-won, closed-lost — feed back into the agent's eval harness. Confidence calibration improves over time.
Eval harness. Run weekly against held-out cohort. Tracks precision, recall, calibration, time-to-first-action.
Semantic layer. Lead and account data exposed to the agent in something close to natural language, not just feature vectors.

What changed

Closed-loop scoring replaced nightly batch. Lead state at 9am Tuesday matched lead state at 9am Tuesday — not what they were the prior nightly refresh.
Pipeline lift visible at 90 days on the cohort exposed to the agent vs. the control cohort still on batch scoring.
The "which model" debate stopped. The CRO had a system that could detect when the model was wrong, react to it, and improve. The model itself stopped being the bottleneck.
Time-to-first-action on hot leads compressed materially — auto-routing + auto-drafting eliminated the queue gap between scoring and outreach.

What was hardest

Convincing the team that "mediocre model in real-time orchestration beats brilliant model in batch" is true, not a slogan. They had to see it in production cohort data before they trusted it. Three months of running both systems in parallel before the batch system was retired.

Stack used

Claude (API + MCP) + Clay (enrichment + research) + Supabase (lead state + eval harness data + semantic layer). Custom MCP server exposing CRM, intent platform, and outbound execution layer to the scoring agent. Eval harness is a Supabase Edge Function running scheduled with a Slack-delivered weekly report.

Where this engagement shape fits

This was a transformation sprint — defined deliverable (production scoring agent + eval harness), 6–12 week scope, outcome-priced. Best fit when a GTM team is deep into a "which model" debate and needs someone to redirect the energy at orchestration. See engagement shapes →

Methodology

Run by the CODN framework. Engagements delivered through Stravonvale.

Want similar outcomes?

Book a 30-minute discovery call →

All case studies · Engagement shapes · CODN methodology

B2B SaaS — agentic lead-scoring (GTMify Cortex)