8 AI Agents · LangGraph · FastAPI · MongoDB

    Not a pipeline.A graph ofautonomous agents.

    NEXUS deploys eight specialized agents equipped with real tools — RFM clustering, spam scoring, chi-square tests — that cycle on failure, run in parallel, and accumulate cross-campaign memory.

    8
    Specialized agents
    LangGraph
    Orchestration
    MongoDB
    State persistence
    FastAPI
    Backend

    Built by Abhay Agarwal · MNNIT Allahabad · FrostHack

    campaign-graph · nexus

    $ langgraph run --brief "Winter sale · urban shoppers · ₹1500+ AOV"

    brief-parser-agentdone
    Product: Winter Collection · Goal: Sales Conversion · CTA extracted
    segmentation-agentdone
    RFM scored · k=4 clusters · silhouette=0.61 · 4 segments validated
    strategy-agentdone
    Quality gate passed · Send: Tue 10AM · Budget allocated
    content-gen-agentrunning
    Parallel fan-out · 4 segments · spam score: 0.8 · CTR pred: 3.1%
    approval-agentqueued
    interrupt() · awaiting human review
    execution-agentqueued
    Pending approval
    monitoring-agentqueued
    Event-driven · Redis Streams subscriber
    optimization-agentqueued
    Chi-square test · feature delta · memory retrieval
    Conditional Graph Topology

    Every edge has a condition.
    Every failure has a cycle.

    LangGraph routes based on what agents compute — not a fixed sequence. The Quality Gate cycles back to Strategy on failure. Human rejection routes back to Content Gen with feedback injected into state.

    01Brief Parser
    02Segmentation
    03Strategy
    04Quality Gatedeterministic
    05Content Gen ×NSend API ×N
    06Approval ⏸interrupt()
    07Execution
    08Monitoring
    09Perf Gatedeterministic
    10Optimization
    ↺ Quality Gate → Strategy·retry on failure (max 3)
    ↺ Approval → Content Gen·rejected/edits with feedback injected
    ↺ Optimization → Content Gen·evidence-backed brief from memory
    Agent Roster

    Eight agents. Each with a real tool belt.

    Tools are deterministic Python functions — not LLM calls. The LLM decides which tool to invoke; Python executes it and returns real data.

    01

    Brief Parser Agent

    Extracts product, audience, goals and CTA from natural language. GPT-4 with schema validation.

    02

    Segmentation Agent

    Real RFM scoring + sklearn KMeans on actual customer data. Validates segment sizes for A/B significance.

    03

    Strategy Agent

    Plans send timing, budget allocation, A/B test design. Routes back through the Quality Gate on failure.

    04

    Quality Gate Agent

    Pure deterministic checks — no LLM. Validates sizes, budget math, spacing. Routes to Strategy on failure.

    05

    Content Gen Agent

    Fan-out via LangGraph Send API. Parallel variants per segment, grounded in spam scores and CTR predictions.

    06

    Approval Agent

    LangGraph interrupt() pauses the graph, serializes state to MongoDB, resumes from checkpoint on approval.

    07

    Execution Agent

    Interfaces with Campaign & Email APIs. Logs recipient counts, timestamps, and delivery status into shared state.

    08

    Monitoring & Opt Agent

    Event-driven via Redis Streams. Chi-square significance tests before routing underperformers to Content Gen.

    Architecture Highlights

    What makes it genuinely agentic.

    Three architectural patterns that separate NEXUS from a sequential LLM pipeline — implemented with real code, not described in a prompt.

    LangGraph Send API

    Parallel Fan-Out / Fan-In

    Instead of generating content for each segment sequentially, NEXUS uses LangGraph's Send API to fan out one Content Gen task per segment concurrently. Four segments in 15 seconds instead of 60. Architecturally correct — segments are independent.

    Fan-out: Strategy → Send([seg-A, seg-B, seg-C, seg-D])
    ├─ seg-A: Content Gen ✓ spam: 0.8 CTR: 3.2%
    ├─ seg-B: Content Gen ✓ spam: 1.1 CTR: 2.9%
    ├─ seg-C: Content Gen ✓ spam: 0.6 CTR: 3.5%
    └─ seg-D: Content Gen ✓ spam: 0.9 CTR: 3.1%
    Fan-in: merge_variants → Human Approval
    MongoDB + pgvector

    Two-Layer Persistent Memory

    Run-level state is checkpointed to MongoDB after every node — the graph survives server restarts. Cross-campaign pgvector store captures what worked and why, queried by semantic similarity when new campaigns start.

    # Layer 1 — run-level (MongoDB checkpoint)
    state.checkpoint(node='strategy', data={...})
    # survives server restart at any point
     
    # Layer 2 — cross-campaign (pgvector)
    memory.query(brief, top_k=5) # semantic similarity
    # → learnings from 50 past campaigns
    Quality Gate + interrupt()

    Conditional Graph with Cycles

    Every edge has a condition. Quality Gate checks segment sizes, budget math, send-time spacing — all deterministic, no LLM. Failure routes back to Strategy (max 3 retries). Human rejection routes back to Content Gen with the feedback injected into state.

    Quality Gate → Strategy # fail: segment too small
    Strategy → Quality Gate # retry with merged seg
    Quality Gate → Content Gen # pass: all checks ✓
     
    Content Gen → interrupt() # pause graph, save state
    Human Approval → Execution # approved
    Human Approval → Content Gen # rejected + feedback
    Full pipeline tracing

    LangSmith Observability

    Every node, every LLM call, every tool call traced with latencies and token counts. Detects prompt injection, measures per-tenant costs, and surfaces pipeline bottlenecks. Makes "how do you know it's working?" answerable with data.

    run: campaign-47 · duration: 94s · $0.38
    ├─ segmentation-agent 12s ✓ silhouette=0.61
    ├─ strategy-agent 8s ✓ quality gate pass
    ├─ content-gen ×4 18s ✓ parallel fan-out
    ├─ approval 14h ⏸ human pending
    └─ hallucination check 0 unverified claims
    cost breakdown: content-gen 61% · strategy 18%
    Phase 2 — Enterprise Platform

    Six additions that make it a platform.

    Multi-tenancy, federated learning, fine-tuned models, adversarial debate, autonomous monitoring — problems that only exist at scale. Each one a genuine architectural addition, not a feature flag.

    Multi-Tenancy + Federated Memory

    Complete data isolation per tenant. Abstracted patterns — normalized metrics, structural content patterns, anonymized audience tier signals — are aggregated nightly into a shared knowledge base. New tenants benefit from platform-wide learnings without accessing any existing customer's data.

    Event-Driven Autonomous Monitoring

    Continuous Redis Streams subscriber — not a polling job. Reacts to unsubscribe spikes, out-of-stock events, competitor campaign detections, and viral moments in real time. Each signal type has a response playbook with ordered actions by urgency and reversibility.

    Outcome-Based Fine-Tuning (CCIM)

    Llama 3.1 8B fine-tuned with LoRA adapters on actual campaign open rates and CTRs — not human preference labels. Deployed after 20+ campaigns, versioned in MLflow. Training labels are real behavioral outcomes, not a human rater's opinion on what sounds good.

    Multi-Agent Debate Protocol

    High-stakes campaigns (large budget, large audience, new category) route through a four-phase deliberation: Strategist proposes → Devil's Advocate critiques with evidence → Strategist revises → Risk Assessment Agent scores across brand safety, audience risk, financial risk, and compliance.

    Natural Language Supervisor Agent

    Top-level agent that accepts plain-language instructions and orchestrates any combination of agents, database operations, and scheduled tasks. Uses confidence thresholds for ambiguity resolution — proceeds autonomously above 0.85, asks one clarifying question below 0.60.

    AgentEval Framework

    Continuously measures four dimensions: task completion rate, decision accuracy (predicted vs actual CTR), hallucination rate (unverified factual claims from memory), and cost efficiency per node. Generates weekly reports. Answers "how do you know it's working?" with data.

    System is live

    Submit a brief.
    Watch eight agents work.

    The full pipeline — RFM clustering, quality gates, parallel content generation, human approval, statistical optimization — runs end to end from a plain-language brief.

    Real sklearn clustering — not GPT guessing segments
    Conditional graph with cycles and retry loops
    LangGraph interrupt() for stateful human approval
    Chi-square significance tests before any optimization