Case Study 06

Real-Estate Prospecting Agent

Industry
Real Estate — Agencies & Brokerages
Platform
Google Cloud Run (Python)
Build time
~1 week
Service
AI Agent Implementation

Portfolio build — a complete, working system built by Altus Initiatives to demonstrate this capability, and the same agent Altus runs on its own pipeline. Capability figures below are measured from the build; agency-impact figures are projected for a representative brokerage.

The problem

Before you can win a listing, you have to find the right broker to talk to — and that sourcing work is slow, manual, and easy to do inconsistently. Building a usable prospect list means searching multiple places (LinkedIn, Google Maps, Realtor.com, Zillow), opening each brokerage's site, cross-checking volume and reachability, judging whether the business is actually a fit, and confirming the person isn't already in the pipeline. Done by hand, every prospect is several windows and several judgment calls.

The deeper problem is consistency. Qualification criteria that live in someone's head get applied differently on a busy day than a quiet one. A genuinely good prospect gets skipped; a poor-fit one gets added. And the time spent sourcing is time not spent on the revenue activity — outreach and conversations.

The solution

A single agent now runs the entire sourcing-to-CRM pipeline on demand. Given a set of cities and a cap, it sweeps four sources in order, qualifies every candidate against a fixed ruleset, removes duplicates, and writes only qualified, new prospects into the CRM — with a full reasoning log for every decision.

For each candidate it runs a cost-ordered pipeline, so no money is ever spent on a duplicate or a reject:

  1. De-duplicate against the CRM first (the cheapest possible skip) — matched on email, LinkedIn URL, or name + company.
  2. Read the page — the brokerage site, or the directory profile when there's no site, for full context.
  3. Extract & assess — an AI step pulls the facts: volume signals, qualifying and disqualifying signals, and whether a named decision-maker is reachable.
  4. Qualify deterministically — code (not the AI) applies every threshold: disqualifiers → volume floor → green-flag count → decision-maker confirmed. Uncertainty on a required gate is resolved as a conservative skip, never a guess.
  5. Enrich & note — for qualified prospects only, a direct email is found if missing and one specific, grounded outreach note is written.
  6. Write to the CRM — append-only, with a write-time duplicate safety net, plus a follow-up date.

Every run produces a report that puts skipped and uncertain leads first, each with its reasoning and a look-up link — so a human can spot a good lead the agent was too cautious about and re-add it with a single command.

Results

55 candidates qualified for $0.67

The agent is deployed and running in production as a scheduled-capable batch job, writing real prospects to the live CRM.

  • End-to-end qualification at trivial cost. A representative production sweep investigated 55 candidates and added 3 qualified prospects for about $0.67 in AI cost.
  • Built to production standard, not demo standard. 193 automated tests cover the deterministic core, both AI steps, every integration, and the deliberate-failure matrix (timeouts, bad data, duplicates, provider outages) — the system degrades gracefully and never crashes a run.
  • Consistent judgment, every time. The qualification rules are applied identically on every candidate by code, not by whoever is doing the sourcing that day. The one genuinely subjective step — the outreach note — is the only thing left to the AI, and it is grounded strictly in extracted facts.
  • Time returned to revenue work. By removing manual multi-source sourcing and vetting, the owner's time shifts from research to outreach — the activity that actually closes business. (Projected.)

Architecture

A deterministic Python controller drives the whole sweep; the AI is invoked only as two narrow, structured-output steps. Every threshold, count, date, and the cap are plain code.

Search request (cities · sources · max leads)
        │
        ▼
  Sweep controller ──────────────► for each city × source (fixed order)
        │                                   │
        │                                   ▼
        │                          Discover candidates
        │                                   │
        ▼                                   ▼
  Per-candidate pipeline (cost-ordered, cheapest skip first):
        De-dup ─► Read page ─► [AI] Extract & assess ─► Qualifier gates
                                                              │
                              ┌──── not qualified ──► skip + log reasoning
                              │
                              └──── qualified ──► Enrich email ─► [AI] Write note
                                                       │
                                                       ▼
                                            Append to CRM (append-only)
        │
        ▼
  Run report (skips surfaced first) ──► durable storage

Full architecture documentation available upon engagement.

Tech stack

ComponentTool
Agent runtimePython on Google Cloud Run (batch Job)
AI extraction, assessment, and note-writingAnthropic Claude Haiku
Source discoveryLinkedIn, Google Maps, Realtor.com, Zillow (via licensed data providers)
Website / profile extractionFirecrawl
Email enrichmentApollo + Hunter
CRM + reportingGoogle Sheets + Google Drive
Secrets & configSecret Manager (runtime injection)

Key design decisions

The AI extracts; the code decides. Thresholds, counts, dedup, dates, geography, and the spend cap are all deterministic Python. The model's only jobs are fuzzy extraction (is this the owner? is this signal present?) and writing one grounded note. This is what makes the agent testable, cheap, and predictable — and it is why qualification is consistent rather than vibe-based.

Cheapest skip first. The pipeline is ordered so a duplicate or an obvious reject is dropped before any paid website read, email lookup, or note generation. Spend is bounded by a hard cap on prospects added and a per-source discovery limit, regardless of how large a city is.

Conservative by default, with a human re-add path. When a required signal can't be confidently assessed, the agent skips rather than adds a wrong prospect — and every skip is logged with its reasoning. The run report surfaces those skips first, and a one-command re-add lets a human override a too-cautious call. The safety mechanism is the ruleset plus the audit trail, not a person watching every decision.

Right-sized AI. Structured extraction and short note-writing don't need a frontier model. A fast, lean model handles the whole pipeline, which is why a full sweep costs cents — efficiency designed in, not bolted on.

Want a system like this in your agency?

This is the same architecture we build for clients. The first step is a 30-minute discovery call — no pitch, no commitment.

Book a discovery call

View all case studies