Deep Dive

The Backend: From Scrape to Sent Message

Every layer of the intelligence and outbound system — what connects to what, what runs on what schedule, what broke and how we fixed it.

1. The Pipeline — End to End

Everything flows through a single pipeline:

Data Collection → Classification → Enrichment → Scoring → Persona Assignment → Message Generation → Channel Delivery → Reply Ingestion → CRM Sync

Each step is a separate system — its own code, its own database tables, its own API connections. Connected by cron jobs and Edge Functions, not monolithic application code. When something breaks, it breaks in isolation.

LayerWhat It DoesSkills Involved
Research IntelligenceCollect and enrich data from 8+ public APIsNews, Social, Clinical Trials, NIH, Patents, SEC, PubMed, Academic
Account IntelligenceScore, tier, and segment the target marketAI Classification, ICP Matching, List Builder, Event Intelligence
ActivationExecute and track multi-channel outreachHeyReach (LinkedIn), EmailBison (Email), Campaign Tracker, Content Studio

2. Data Ingestion Layer

8+ public and commercial APIs. Every one is a separate Python script or Colab notebook with its own error handling, pagination, rate limiting, and deduplication.

Public APIs (Free)

SourceWhat We CollectScale
ClinicalTrials.govCell and gene therapy trials by modality, geography, phase7,840 trials
NIH RePORTERCGT-related grants across 187 R1 universities, 6 geo waves2,946 grants ($757M+)
PatentsViewPatents by assignee (45 orgs), CPC codes (5 CGT classifications)2,919 patents
PubMed / EntrezPublications — 18 search queries (10 topic + 8 institution)Authors, journals, DOIs
OpenAlexAdditional publication coverage with citation networksSupplements PubMed
SEC EDGARFilings for 21 public CGT companies — 10-K, 10-Q, S-1, 8-KFinancial signals
RSS + NewsAPI18 RSS feeds + 5 signal queries, Claude AI relevance scoring5,300+ articles, 2× daily

Commercial / Enrichment APIs

SourceWhat We CollectUsage
ClayPerson enrichment — headline, summary, experience6,546 contacts enriched
FullEnrichEmail discovery and validation60%+ coverage on conference contacts
TrigifyLinkedIn engagement webhooksSocial listening signals
Brave SearchMarket context enrichment + PI lab discovery~$0.03/PI
GlobalDataDrug candidate types matched to companiesCGT filtering (94 accounts)

3. Classification Layer

Raw data is useless without classification. Here's how we organize 17,300+ companies and 265,000+ contacts.

Company Classification

DimensionLevels
ICP Tier5 tiers (Tier 1 highest → Tier 5 lowest) via T-M-C-R model
Development StageDiscovery → Preclinical → IND Filed → Phase I → Phase II → Phase III → Approved
ModalityCell therapy, gene therapy, gene editing, CAR-T, CAR-NK, allogeneic, autologous…
Platform TypeAutologous, allogeneic, viral vector, non-viral, mRNA…
CGT GateBinary: is this company in the cell & gene therapy universe?

ICP Scoring Model (T-M-C-R) — 0 to 40 Scale

30%
Technical — therapeutic + modality + technology
25%
Market — funding, stage, size
25%
Competitive — subcategory fit, confidence
20%
Right-to-Win — geography, relationships

Each client gets a client_profiles entry in the database. Change the profile, change every downstream score.

4. The Messaging Engine

Where intelligence becomes action. The messaging engine turns classified accounts and enriched contacts into personalized outreach at scale.

Components

ComponentWhat It Does
SSO Matrix38 rows — persona × ICP tier × context. Defines angle, tone, pain point, value prop, CTA.
Positioning JSON~8KB per company — trials, grants, patents, publications, news, social activity. All enrichment data.
generate-outreach-message v93Edge Function: contact profile + SSO row + positioning JSON + sender voice → personalized message
Clay PassthroughclayHeadline, claySummary, clayExperienceSummary — the AI references the contact's actual professional background

Honest Assessment

5. Channel Infrastructure

HeyReach (LinkedIn)

DimensionCurrent State
Sender accounts5 active (Adam, Chathuranga, Joe, Karen, Simona)
Campaigns28 total
Pacing9 connection requests/sender/day
SyncEvery 6h + 15-min accepts
WebhookSchema bug — fix queued

EmailBison (Email)

DimensionCurrent State
Warm domains3
Sender accounts6
Pacing10 emails/day per sender (warming phase)
SyncEvery 6h + push every 2h
WebhookPartial promote bug — fix queued

Domain Warming Pipeline

Purchase via ZapMail → Transfer to EmailBison → Warmup (2–4 weeks, 2/day → 10+/day) → Isolated-reputation sending → Master inbox routing. Made's primary domain reputation stays clean.

6. Cron & Edge Function Inventory

Everything that runs automatically. Frozen snapshot as of May 12, 2026.

JobScheduleStatus
sync-heyreach-every-6hEvery 6hLive
sync-heyreach-acceptedEvery 15 minLive
sync-emailbisonEvery 6hLive
push-to-emailbisonEvery 2hLive
send-linkedin-messagesEvery 2hLive
sync_all_from_heyreach()Every 6hLive
News Pipeline (made_sci scope)2× dailyLive
generate-outreach-message v93On-demand + batchLive
heyreach-webhookLive eventsSchema bug
emailbison-webhookLive eventsPromote bug

7. Architecture Summary

Technology Stack

LayerTechnology
DatabaseSupabase (PostgreSQL) — Transfer DB + Hub DB + RAG Content DB
Backend logicPython scripts + Supabase Edge Functions (Deno/TypeScript)
AI modelsClaude (classification, scoring) + GPT-4 (enrichment) + DALL-E 3 (images) + Gemini (embeddings)
UXReact + Lovable (minimal-science-hub repo)
LinkedInHeyReach API (workspace-scoped)
EmailEmailBison + ZapMail domain procurement
EnrichmentClay + FullEnrich
SocialTrigify + PhantomBuster
DeploymentSupabase cloud + Ops VPS (168.231.69.81)

Hub-Spoke Architecture

Hub (BioCreative Life Science Master DB) — collects and analyzes everything: 17,300 companies, 265K contacts, 7,840 trials, 14K PIs, 29,333 embedded knowledge chunks.

Transfer DB (Made's spoke) — receives only data relevant to Made's ICP, filtered, scored, CGT-gated. Sync functions run on cron.

Joe's RAG Content DB — Made's own knowledge base. Separate system, separate embeddings, separate architecture.

What Makes This Different