The Backend — Made × BioCreative

1. The Pipeline — End to End

Everything flows through a single pipeline:

Data Collection → Classification → Enrichment → Scoring → Persona Assignment → Message Generation → Channel Delivery → Reply Ingestion → CRM Sync

Each step is a separate system — its own code, its own database tables, its own API connections. Connected by cron jobs and Edge Functions, not monolithic application code. When something breaks, it breaks in isolation.

Layer	What It Does	Skills Involved
Research Intelligence	Collect and enrich data from 8+ public APIs	News, Social, Clinical Trials, NIH, Patents, SEC, PubMed, Academic
Account Intelligence	Score, tier, and segment the target market	AI Classification, ICP Matching, List Builder, Event Intelligence
Activation	Execute and track multi-channel outreach	HeyReach (LinkedIn), EmailBison (Email), Campaign Tracker, Content Studio

2. Data Ingestion Layer

8+ public and commercial APIs. Every one is a separate Python script or Colab notebook with its own error handling, pagination, rate limiting, and deduplication.

Public APIs (Free)

Source	What We Collect	Scale
ClinicalTrials.gov	Cell and gene therapy trials by modality, geography, phase	7,840 trials
NIH RePORTER	CGT-related grants across 187 R1 universities, 6 geo waves	2,946 grants ($757M+)
PatentsView	Patents by assignee (45 orgs), CPC codes (5 CGT classifications)	2,919 patents
PubMed / Entrez	Publications — 18 search queries (10 topic + 8 institution)	Authors, journals, DOIs
OpenAlex	Additional publication coverage with citation networks	Supplements PubMed
SEC EDGAR	Filings for 21 public CGT companies — 10-K, 10-Q, S-1, 8-K	Financial signals
RSS + NewsAPI	18 RSS feeds + 5 signal queries, Claude AI relevance scoring	5,300+ articles, 2× daily

Commercial / Enrichment APIs

Source	What We Collect	Usage
Clay	Person enrichment — headline, summary, experience	6,546 contacts enriched
FullEnrich	Email discovery and validation	60%+ coverage on conference contacts
Trigify	LinkedIn engagement webhooks	Social listening signals
Brave Search	Market context enrichment + PI lab discovery	~$0.03/PI
GlobalData	Drug candidate types matched to companies	CGT filtering (94 accounts)

3. Classification Layer

Raw data is useless without classification. Here's how we organize 17,300+ companies and 265,000+ contacts.

Company Classification

Dimension	Levels
ICP Tier	5 tiers (Tier 1 highest → Tier 5 lowest) via T-M-C-R model
Development Stage	Discovery → Preclinical → IND Filed → Phase I → Phase II → Phase III → Approved
Modality	Cell therapy, gene therapy, gene editing, CAR-T, CAR-NK, allogeneic, autologous…
Platform Type	Autologous, allogeneic, viral vector, non-viral, mRNA…
CGT Gate	Binary: is this company in the cell & gene therapy universe?

ICP Scoring Model (T-M-C-R) — 0 to 40 Scale

30%

Technical — therapeutic + modality + technology

25%

Market — funding, stage, size

25%

Competitive — subcategory fit, confidence

20%

Right-to-Win — geography, relationships

Each client gets a client_profiles entry in the database. Change the profile, change every downstream score.

4. The Messaging Engine

Where intelligence becomes action. The messaging engine turns classified accounts and enriched contacts into personalized outreach at scale.

Components

Component	What It Does
SSO Matrix	38 rows — persona × ICP tier × context. Defines angle, tone, pain point, value prop, CTA.
Positioning JSON	~8KB per company — trials, grants, patents, publications, news, social activity. All enrichment data.
`generate-outreach-message` v93	Edge Function: contact profile + SSO row + positioning JSON + sender voice → personalized message
Clay Passthrough	`clayHeadline`, `claySummary`, `clayExperienceSummary` — the AI references the contact's actual professional background

Honest Assessment

Good at: First-touch personalization, company-context references, modality-specific language
Less good at: Follow-up sequences that feel natural over time (tends toward sameness on messages 3+)
Known issue: Not all 38 SSO matrix intersections have been tested in live campaigns

5. Channel Infrastructure

HeyReach (LinkedIn)

Dimension	Current State
Sender accounts	5 active (Adam, Chathuranga, Joe, Karen, Simona)
Campaigns	28 total
Pacing	9 connection requests/sender/day
Sync	Every 6h + 15-min accepts
Webhook	Schema bug — fix queued

EmailBison (Email)

Dimension	Current State
Warm domains	3
Sender accounts	6
Pacing	10 emails/day per sender (warming phase)
Sync	Every 6h + push every 2h
Webhook	Partial promote bug — fix queued

Domain Warming Pipeline

Purchase via ZapMail → Transfer to EmailBison → Warmup (2–4 weeks, 2/day → 10+/day) → Isolated-reputation sending → Master inbox routing. Made's primary domain reputation stays clean.

6. Cron & Edge Function Inventory

Everything that runs automatically. Frozen snapshot as of May 12, 2026.

Job	Schedule	Status
`sync-heyreach-every-6h`	Every 6h	Live
`sync-heyreach-accepted`	Every 15 min	Live
`sync-emailbison`	Every 6h	Live
`push-to-emailbison`	Every 2h	Live
`send-linkedin-messages`	Every 2h	Live
`sync_all_from_heyreach()`	Every 6h	Live
News Pipeline (made_sci scope)	2× daily	Live
`generate-outreach-message` v93	On-demand + batch	Live
`heyreach-webhook`	Live events	Schema bug
`emailbison-webhook`	Live events	Promote bug

7. Architecture Summary

Technology Stack

Layer	Technology
Database	Supabase (PostgreSQL) — Transfer DB + Hub DB + RAG Content DB
Backend logic	Python scripts + Supabase Edge Functions (Deno/TypeScript)
AI models	Claude (classification, scoring) + GPT-4 (enrichment) + DALL-E 3 (images) + Gemini (embeddings)
UX	React + Lovable (minimal-science-hub repo)
LinkedIn	HeyReach API (workspace-scoped)
Email	EmailBison + ZapMail domain procurement
Enrichment	Clay + FullEnrich
Social	Trigify + PhantomBuster
Deployment	Supabase cloud + Ops VPS (168.231.69.81)

Hub-Spoke Architecture

Hub (BioCreative Life Science Master DB) — collects and analyzes everything: 17,300 companies, 265K contacts, 7,840 trials, 14K PIs, 29,333 embedded knowledge chunks.

Transfer DB (Made's spoke) — receives only data relevant to Made's ICP, filtered, scored, CGT-gated. Sync functions run on cron.

Joe's RAG Content DB — Made's own knowledge base. Separate system, separate embeddings, separate architecture.

What Makes This Different

Skills, not features. Each capability is a coded workflow with API connections — not a checkbox in a SaaS tool.
Switchable per client. Every skill has client variables that change the output.
Code you own. Everything lives in GitHub. Version-controlled. No vendor lock-in.
Compounding intelligence. Skills feed each other. More data collected → smarter scoring → better targeting.

The Backend: From Scrape to Sent Message