Joshua Dazas

Project	Domain	Scale / Outcome
Gandalf — KYB Compliance Agent	AI · Compliance · FinTech	Production LLM system; 133 jobs evaluated; ICAIL 2026 paper
AI Triage & Sentiment Pipeline	Data Engineering · ML Ops · CX	1,205 clients scored weekly; full serverless AWS pipeline
HireDesk	Full-Stack · AI · People Ops	End-to-end hiring platform shipped and used in production
Synthetic Data Generator	Data Engineering · ML Tooling	Reusable OSS-grade dataset generator for AI experiments

Project 01

AI · Compliance FinTech Production ICAIL 2026

Gandalf — KYB Compliance Multi-Agent System

A production-grade multi-agent LLM system for automating Know Your Business (KYB) due diligence at a global fintech and payroll platform, with a peer-reviewed academic paper submitted to ICAIL 2026.

133

Production KYB Jobs

43.6%

Auto-Accepted

56.4%

Routed to Analyst

Specialist Agents

Reliability Patterns

Business Problem

Global payroll and employer-of-record platforms must verify the legal entities they onboard before processing cross-border payroll. This Know Your Business process — checking corporate registry data, beneficial ownership chains, sanctions exposure, and document authenticity — was performed entirely by manual compliance analysts. The process took days to weeks per entity, couldn't scale with growth, introduced inconsistency across analysts, and produced decisions without traceable evidence trails.

Regulatory frameworks (FATF, FinCEN AML/CTF guidelines) required that any automation maintain full decision traceability and remain defensible under audit. This made naive LLM automation dangerous: a hallucinated sanctions check or fabricated ownership detail would create legal liability. The challenge was to automate aggressively while maintaining regulatory defensibility.

Solution & Architecture

Gandalf is a reliability-oriented multi-agent system that decomposes KYB into discrete stages, each handled by the most appropriate mechanism: deterministic logic for clear-cut cases, specialist LLM agents for domains requiring judgment, and external RegTech data sources for objective ground truth. No single "do everything" prompt — every component has a narrow, well-defined responsibility.

Layer 1 — Early Exit Gates

topLevel.py — Deterministic country risk check + industry risk check before any LLM invocation. Handles ~15–20% of cases with zero token cost. Output: rejected or manual_review with 1.0 confidence.

↓

Layer 2 — Specialist Agent Ensemble (parallel)

Company Research Agent — Web research via Firecrawl; corporate registry lookups; business legitimacy signals.
Representative Research Agent — PEP screening, global sanctions checks, fraud scoring via RegTech API providers.
Shareholder Analysis Agent — Beneficial ownership chain traversal; UBO identification; circular ownership detection.
Document Reviewer Agent — OCR + R1–R5 rule evaluation on uploaded certificates, IDs, and ownership documents.

↓

Layer 3 — Orchestrator

topLevel.py orchestration — Aggregates specialist outputs, resolves conflicts, applies confidence weighting. Each agent output is schema-validated before aggregation; validation failure is treated as a hard error, not silently ignored.

↓

Layer 4 — Decision Gate & Audit Output

Schema-constrained final decision — accept | reject | manual_review with confidence, evidence list, rule violations, and structured justification. Analyst override pathway (R3) logs the override rationale for audit trail.

Reliability-Oriented Design Patterns

These six patterns form the core contribution of the ICAIL 2026 paper. Each addresses a specific failure mode common to production LLM compliance systems:

Early Exit Gates. Deterministic checks before LLM invocation. Country and industry risk rules are coded explicitly — they require no LLM judgment and should never be delegated to one.
Specialist Ensemble. Four narrow agents instead of one general agent. Each specialist has a well-scoped task, reducing context confusion and enabling independent validation.
Schema-Constrained Outputs. Every LLM response is validated against a Pydantic schema. A hallucinated or malformed response fails validation and routes to manual review — it never silently passes through.
RegTech Integration. Objective ground truth from external providers (global sanctions databases, state secretary corporate registries, fraud scoring APIs) replaces LLM knowledge for facts that must be current and auditable.
Idempotency & Audit Trail. Every agent invocation is logged with inputs, outputs, timestamps, and decision rationale. The system can be re-run on the same entity and will produce the same decision (deterministic for gates; traceable for agents).
Human Override Pathway. Analysts can override any system decision via a structured form that requires a justification code. Overrides are logged and reported separately from system decisions, preserving decision quality metrics.

Key Code — Early Exit Gate & Schema-Constrained Output

topLevel.py

# Deterministic early-exit gates — no LLM cost for clear cases
def evaluate_entity(entity_data: dict) -> KYBDecision:

    # Gate 1: Jurisdiction risk — no LLM needed
    if entity_data['country'] in PROHIBITED_COUNTRIES:
        return KYBDecision(
            status='rejected',
            reason='Prohibited jurisdiction under AML policy',
            confidence=1.0,
            llm_used=False,
            audit_code='GATE_COUNTRY'
        )

    # Gate 2: Industry risk
    if entity_data['industry'] in HIGH_RISK_INDUSTRIES:
        return KYBDecision(
            status='manual_review',
            reason='High-risk industry classification',
            confidence=1.0,
            llm_used=False,
            audit_code='GATE_INDUSTRY'
        )

    return run_agent_ensemble(entity_data)


# Schema-constrained agent output — hallucinations fail validation, not silently pass
class AgentOutput(BaseModel):
    decision:       Literal['accept', 'reject', 'manual_review']
    confidence:     float = Field(ge=0.0, le=1.0)
    evidence:       list[str]         # must be non-empty
    rule_violations:list[str]         # R1-R5 violations found
    regtech_flags:  list[str]         # sanctions / fraud flags from external APIs
    requires_analyst_review: bool

# Any LLM response that fails this schema → manual_review (never silently passes)

Evaluation Results — Production Data (Aug–Nov 2025)

Metric	Value	Notes
Total cases processed	133 KYB jobs	61 corporate clients + 72 business contractors
Auto-accepted	58 (43.6%)	Fully automated — no analyst touch required
Routed to manual review	75 (56.4%)	Agent provided evidence summary to analyst
Avg. risk score	2.76 / 5.0	Confidence-weighted scoring across all agents
Top rejection signal	Document rules not satisfied	R1–R5 rule evaluation failures (Document Reviewer Agent)
Model	GPT-4 (OpenAI)	All agents used same model; specialist prompts differ

Academic Publication

"Gandalf: Architecting Multi-Agent Systems for Know Your Business Compliance in Global Financial Services"
Joshua Dazas & Felipe García — Submitted to ICAIL 2026 (International Conference on Artificial Intelligence and Law).
The paper introduces six reliability-oriented design patterns for production LLM compliance systems, grounded entirely in real implementation and production evaluation data. Theory derived from working code, not hypothetical architectures.

Product Document

Discovery → Design → Execution → Deployment

Phase 1 — Discovery

Problem Identification & Validation

The KYB problem was identified by observing the compliance analyst workflow directly. Key signals: average entity review took 3–5 business days; analysts repeatedly checked the same sources (corporate registry, sanctions list, Google for reputation) in the same sequence; rejections were almost always explainable by a small set of rule violations; the reasoning was formulaic but the volume was not.

Regulatory constraint discovery: Mapped FATF/FinCEN requirements to understand what could be automated. Finding: automation is permitted if decisions are traceable and evidence-backed. Full black-box LLM decisions are not compliant; schema-constrained outputs with source citations are.
Failure mode mapping: Identified LLM-specific risks before writing a line of code — hallucinated sanctions hits, fabricated corporate registry data, inconsistent beneficial ownership analysis. Each became a named design constraint.
Data source inventory: Catalogued every external data source analysts used. Categorized as: deterministic (sanctions lists, corporate registries — should never be LLM-generated) vs. judgment-requiring (reputation signals, document interpretation).

Phase 2 — Design

Architecture & Reliability Strategy

Agent decomposition: Broke the monolithic "review this entity" task into four specialist agents, each owning one domain. Key insight: specialist prompts dramatically outperform generalist ones on constrained compliance tasks.
Schema-first design: Defined the output schema before writing prompts. Every agent was built to produce a validated Pydantic model. This inverts the usual LLM approach (prompt first, output second) and eliminates a class of production bugs.
External data integration: Designed RegTech API integration as a non-negotiable. LLMs should never be the source of truth for sanctions lists or corporate registry data — these require current, auditable, API-verified data.
Early exit strategy: Identified all deterministic cases (prohibited countries, banned industries) and moved them before LLM invocation. This improves cost, latency, and consistency simultaneously.
Human-in-the-loop design: Built the override pathway into the initial design, not as an afterthought. Analysts needed to trust the system before it could replace their workflow — visible overrides with tracked rationale built that trust.

Phase 3 — Execution

Implementation Details

Stack: Python, OpenAI GPT-4, Firecrawl (web research), RegTech provider APIs (sanctions, corporate registries, fraud scoring — anonymized per NDA), Pydantic for schema validation, AWS for infrastructure.
Agent implementation pattern: Each agent = LLM + tool access + goal-directed system prompt + output schema. Agents are invoked sequentially by the orchestrator, not autonomously. Parallel invocation for independent agents (company research + rep research + doc review can run concurrently).
OCR pipeline: Document Reviewer Agent uses OCR to extract text from uploaded PDFs/images before LLM evaluation. R1–R5 rules are evaluated by the LLM against extracted text, not raw images.
Evaluation methodology: 133-case production dataset evaluated across Aug–Nov 2025. Justification type analysis (10 unique rejection reason categories). Timeline and volume distribution analysis for ICAIL paper metrics.

Phase 4 — Deployment & Ops

Production Operations

Deployment: Python services on AWS. Integrated with existing Ontop compliance workflow via API.
Monitoring: Every agent invocation logged with inputs, outputs, and decision rationale. Decision quality tracked via manual review override rate — a rising override rate signals prompt drift or data quality issues.
Ongoing calibration: R1–R5 rules updated as regulatory guidance evolves. External API schemas versioned to prevent silent breaking changes from providers.
Academic documentation: Production metrics used directly in ICAIL 2026 paper. The system is the paper's empirical basis — no synthetic evaluation benchmarks.

Project 02

Data Engineering ML Ops AWS Serverless Production

AI Triage & Sentiment Analysis Pipeline

An end-to-end, fully serverless ML pipeline that turns raw customer support tickets into ranked churn-risk recommendations, delivered weekly to account managers via Slack.

1,205

Active Clients Scored

938

Above Risk Threshold

Lambda Functions

Step Functions

Weekly

Pipeline Cadence

Business Problem

Ontop serves 1,200+ active corporate clients across Latin America, each generating support tickets in Zendesk and an internal messaging platform (DIIO). With this volume, identifying which clients are at genuine risk of churn or operational escalation before a situation becomes a crisis required either a large account management team or an automated system. No existing tooling could surface the right clients at the right time.

The specific failure mode the business experienced: account managers would only become aware of a deteriorating client relationship when the client threatened to churn or escalated to senior leadership — at which point it was often too late for meaningful intervention. A weekly automated digest of the highest-risk, newest cases would give account managers actionable intelligence when there was still time to act.

Pipeline Architecture

Stage 1 — Glue ETL

AWS Glue Python jobs — Extract Zendesk tickets and DIIO conversations via API. Filter to external tickets only (is_external = 'true'). Land raw data into Redshift: external.zendesk__tickets_sentiment_analysis and external.diio__sentiment_analysis.

↓

Stage 2 — Sentiment Scoring Lambda

XLM-RoBERTa via SageMaker — Three Lambdas: warm (keep endpoint alive), fetch (query Redshift for unscored tickets), score (invoke SageMaker endpoint per batch, parallel via Step Functions MaxConcurrency=10). Outputs to process_data.zendesk_sentiment and process_data.diio_sentiment.

↓

Stage 3 — Issue Extractor Lambda

GPT-4o Mini + Bedrock Embeddings — Two Lambdas (batch-query + extract) via Step Functions. Extracts structured issue types from ticket text. Outputs to process_data.extracted_issues. Parallel execution across ticket batches (MaxConcurrency=10).

↓

Stage 4 — Context Signals ETL

Salesforce + Redshift + Aura API — Aggregates 1,205 active client records with transaction health metrics, conversation volume (Aura, 4-week window), and L1 signals (sentiment + issues, 30-day window). Outputs one row per client to process_data.client_context_rules. Uses execute_values(page_size=200) for bulk upsert within Lambda timeout.

↓

Stage 5 — Triage Agent Lambda

AWS Bedrock — Claude 3.5 Haiku (cross-region inference profile) — Reads top 70 clients by risk score from Redshift. For each client, invokes Claude with a structured prompt containing sentiment trends, issue categories, transaction health, and churn signals. Outputs schema-constrained JSON: urgency, reason_summary, recommended_action, confidence. Writes to process_data.triage_recommendations.

↓

Stage 6 — Slack Digest Lambda

n8n webhook → Slack — Queries top 10 new clients from triage_recommendations (7-day dedup via slack_digest_log table). POSTs structured JSON payload to n8n webhook. n8n formats and delivers to account management Slack channel. Decoupled from core pipeline — Slack formatting changes don't require Lambda redeployment.

Deployed AWS Infrastructure

CloudFormation Stack	Key Functions	Schedule (EventBridge)
sentiment-classifier-v2	sentiment-warm-v2, sentiment-fetch-v2, sentiment-score-v2	Mondays 01:00 UTC
issue-extractor-v1	issue-batch-query-v1, issue-extract-v1	Mondays 01:00 UTC
context-signals-etl	context-signals-etl	Mondays 03:00 UTC
triage-agent	triage-agent-v1	Mondays 04:30 UTC
slack-digest	slack-digest-lambda	Mondays 06:00 UTC

Risk Score Distribution (Feb 2026)

Tier	Score Range	Client Count	% of Base	Recommended Response
Critical	50+	6	0.5%	Immediate escalation to senior AM
High	30–49	337	28%	Proactive outreach within 48 hours
Medium	20–29	595	49%	Include in weekly digest, monitor
Low	< 20	267	22%	Routine check, no action needed

Key Code — Triage Agent & Bedrock Inference Profile

triage-agent-lambda/lambda_function.py

# Cross-region inference profile required for all newer Claude models
# Direct model IDs are blocked by Bedrock; inference profiles are mandatory
BEDROCK_MODEL_ID = "us.anthropic.claude-3-5-haiku-20241022-v1:0"
RISK_THRESHOLD = 20
MAX_CLIENTS   = 70   # Top 70 clients by risk score, weekly

def build_triage_prompt(client: dict) -> str:
    return f"""You are a customer success triage agent for a payroll platform.
Analyze this client and respond ONLY with valid JSON matching the schema below.

Client: {client['company_name']}
Risk Score: {client['risk_score']} / 100
Sentiment Trend (30d): {client['sentiment_trend']}
Top Issues: {client['top_issues']}
Transaction Volume: {client['transaction_count']} transactions (4-week window)
Active Conversations: {client['conversation_count']} (Aura, 4 weeks)
Churn Signals: {client['churn_signals']}

Required JSON schema:
{{
  "urgency": "critical | high | medium | low",
  "reason_summary": "1-2 sentences explaining root cause",
  "recommended_action": "specific next step for account manager",
  "confidence": 0.0 - 1.0
}}"""

# Fetch top at-risk clients — ordered by risk, deduped for new entries only
FETCH_QUERY = """
    SELECT c.client_id, c.company_name, c.top_issues,
           t.risk_score, t.sentiment_trend, t.transaction_count,
           t.conversation_count, t.churn_signals
    FROM   process_data.client_context_rules c
    JOIN   process_data.triage_recommendations t USING (client_id)
    WHERE  t.risk_score >= :threshold
    ORDER  BY t.risk_score DESC
    LIMIT  :max_clients
"""

Product Document

Discovery → Design → Execution → Deployment

Phase 1 — Discovery

Problem Identification & Scoping

The original problem statement was "we need to know which clients are unhappy." The discovery process refined this considerably over several conversations with the account management team.

V1 (Supabase) → V2 (Redshift) evolution: The first version landed data in Supabase. The discovery that all other company analytics ran on Redshift meant V1 created an isolated data silo. V2 was redesigned ground-up on Redshift as the single analytics layer — a key architectural correction discovered early enough to avoid technical debt.
Multilingual signal discovery: A critical early finding was that Ontop's support tickets mixed Spanish and English within the same conversation thread. Standard English-only sentiment models produced unreliable results. This drove the XLM-RoBERTa model selection — a deliberate technical decision from a business insight, not a generic default.
Context signals gap: Ticket sentiment alone proved insufficient. A client with negative tickets but healthy transaction volume needed different treatment than one with negative tickets and declining transactions. This gap drove the Salesforce + Aura context signals ETL as a separate stage.
Alert fatigue concern: Early stakeholder feedback on prototypes surfaced a concern about over-alerting. The 7-day dedup window and TOP 10 cap were direct responses to stated user preferences from account managers who feared a system they'd start ignoring.

Phase 2 — Design

Pipeline Design & Architecture Decisions

Redshift as single source of truth: All pipeline stages read/write to Redshift. Enables SQL-based auditing, BI tool connectivity, and pipeline observability without additional data movement or sync complexity.
Step Functions for orchestration: CloudFormation-managed state machines with DefinitionSubstitutions — no hardcoded ARNs. MaxConcurrency=10 for ticket processing. Design decision: Step Functions over Airflow or cron because it integrates natively with Lambda and provides built-in retry/error handling.
Decoupled notification layer: Lambda POSTs structured JSON to n8n webhook; n8n handles Slack formatting. Deliberate decoupling — Slack message format changes without requiring Lambda redeployment or AWS credential management for Slack API.
Weekly cadence: Daily was considered and rejected. Account managers can't meaningfully act on daily updates for 938 clients. Weekly digest at the start of the work week gives a planning horizon for proactive outreach.

Phase 3 — Execution

Implementation & Critical Bug Fixes

Key bug: is_external filter (lowercase string). Glue ETL was filtering on is_external = 'True' (Python boolean string) instead of 'true' (database string). Result: zero Zendesk tickets were processed. Fix: explicit lowercase string comparison. Lesson: verify filter values against actual database column values before declaring a pipeline working.
Key bug: context-signals timeout. Original executemany() upsert for 1,205 rows was timing out at the 900s Lambda limit. Fix: replaced with psycopg2.extras.execute_values(page_size=200). Lesson: batch insert patterns matter at scale; test with production-volume data.
Key bug: Bedrock model ID. Triage agent initially used direct model ID. Bedrock now requires cross-region inference profiles (us.* prefix) for all newer Claude models. IAM policy required two separate resource ARNs: one for the inference profile (with account ID) and one for the foundation model (without). Lesson: AWS Bedrock model access patterns change; always verify against current documentation.
Schema discovery: Workflow documentation initially used wrong column names for triage_recommendations. Fixed by querying the actual table schema and updating the Lambda JOIN logic to derive missing fields (risk_category via CASE statement; client_name via JOIN).

Phase 4 — Deployment & Operations

Go-Live Strategy & Ongoing Ops

Deployment tooling: AWS SAM (sam build && sam deploy) for all five stacks. All schedules deployed in DISABLED state, then enabled post-integration test. This allows safe incremental rollout without accidental cron execution during deployment.
Secrets management: All credentials stored in AWS Secrets Manager (Redshift, Salesforce, Aura, OpenAI, n8n webhook URL). No secrets in environment variables or code.
Observability: CloudWatch Logs for every Lambda invocation. Step Functions execution history for pipeline-level tracing. Redshift query history for data audit.
Known next step: Raise MAX_CLIENTS from 70 to 343 (threshold ≥ 30) to cover the full High tier. Current limit was a conservative go-live decision; production stability confirmed, ready to scale.

Project 03

Full-Stack AI People Ops Production

HireDesk — AI-Powered Internal Hiring Platform

A purpose-built hiring platform for Ontop's People Ops team, with AI candidate ranking, automated email workflows, video screening, and bulk candidate management — shipped and used in production.

Next.js 16

App Router

GPT-4o

Candidate Ranking

Email Automations

SendGrid

Email Provider

Vercel

Deployed On

Business Problem

Ontop's People Ops team managed hiring across multiple open roles using a combination of spreadsheets, email threads, and manual Calendly coordination. The specific pain points:

No consistent evaluation rubric. Each hiring manager ranked candidates differently, making cross-role comparisons impossible and introducing bias.
Communication inconsistency. Candidates received rejection emails at irregular intervals, or not at all. Interview scheduling was a manual back-and-forth. No systematic video screening.
End-of-cycle complexity. When a role closed, there was no efficient way to mass-reject the remaining candidates, update their records, and send communications simultaneously.
No pipeline visibility. Stakeholders had no real-time view of where candidates were in the process without asking a People Ops admin directly.

Core Feature Set

Job Requisitions

Structured hiring briefs with AI-generated application form schemas. Required fields include job title, description, salary band, and a mandatory Calendly booking link (validated at form submission — no requisition can be published without one, ensuring interview emails always have a valid scheduling link). Requisition lifecycle: pending → form_generated → published → closed.

AI Candidate Ranking

On each application submission, GPT-4o evaluates the applicant's responses against the job description and outputs a structured ranking: Very High Fit / High Fit / Average / Low Fit, with a justification paragraph. Auto-rejection logic: only Low Fit candidates are auto-rejected (not Average — a deliberate product decision to give borderline candidates a chance at video screening).

Automated Email Workflows

Status Change	Email Triggered	Requirement / Condition
→ hm_interview	Interview scheduling email with Calendly link	Requires `booking_link` on the requisition
→ chro_interview	CHRO interview invitation	Requires `CHRO_BOOKING_LINK` environment variable
→ rejected	Branded rejection email (Ontop copy, warm tone)	Fires on every status change to rejected, including bulk
video_requested	Video submission request with token URL	Via Vercel cron, 24 hours after application received

Video Screening

A token-based video upload URL (/video/[token]) is generated on application creation and dispatched 24 hours later via cron. Upload-only (no browser recording) — MP4, WebM, MOV, AVI; max 50MB; 2–3 minutes; English only. Videos stored in Supabase Storage with signed URLs. The cron job checks idempotency before dispatch: skips the video request if the application status has changed from application_received.

Bulk Candidate Management

Multi-select checkboxes on the applications dashboard allow bulk status changes across any combination of candidates. Bulk rejection fires individual rejection emails for each selected candidate via Promise.allSettled() — non-blocking, with per-email error tracking. Requisition close flow: a "Close Requisition" button triggers a two-step confirmation modal that auto-rejects all non-hired candidates and sets requisition status to closed.

Google Sheets Sync

All status changes are synced to a connected Google Sheet in real time, giving hiring stakeholders a read-only pipeline view without requiring platform access.

Tech Stack

Layer	Technology	Purpose
Frontend	Next.js 16 App Router + React 19 + TypeScript	Server components, file-based routing, type safety
UI Library	shadcn/ui + Tailwind CSS + Radix	Accessible, composable component system
Database	Supabase (PostgreSQL)	Applications, requisitions, form_schemas, scheduled_jobs tables
Auth	Supabase Auth	Session-based auth for People Ops admins
AI	OpenAI GPT-4o	Candidate ranking + structured JSON output
Email	SendGrid	Interview scheduling, rejection, video request emails
File Storage	Supabase Storage + signed URLs	Video upload with expiring token-based access
Deployment	Vercel (serverless functions + cron)	Edge deployment, automatic scaling, cron job execution
Integrations	Google Sheets API	Real-time status sync for stakeholder reporting

Key Code — Bulk Status Change with Parallel Email Dispatch

src/app/api/applications/bulk-status/route.ts

export async function PATCH(request: Request) {
  const { ids, status } = await request.json()

  // Update all applications atomically
  const { data: apps } = await supabase
    .from('applications')
    .update({ status })
    .in('id', ids)
    .select('id, applicant_name, applicant_email, form_schema_id')

  // Fire emails in parallel — non-blocking, track failures without throwing
  const results = await Promise.allSettled(
    apps.map(async (app) => {
      if (status === 'rejected') {
        const { data: schema } = await supabase
          .from('form_schemas')
          .select('job_title')
          .eq('id', app.form_schema_id)
          .single()

        return sendRejectionEmail(
          { applicant_name: app.applicant_name,
            applicant_email: app.applicant_email },
          schema.job_title
        )
      }
    })
  )

  const failed = results.filter(r => r.status === 'rejected').length
  return NextResponse.json({
    updated:      apps.length,
    emails_sent:  apps.length - failed,
    emails_failed: failed
  })
}

Key Code — Rejection Email (Ontop Brand Copy)

src/lib/email/service.ts

export async function sendRejectionEmail(
  applicant: { applicant_name: string; applicant_email: string },
  jobTitle: string
) {
  const msg = {
    to:      applicant.applicant_email,
    from:    process.env.SENDGRID_FROM_EMAIL || 'noreply@hiredesk.com',
    subject: `Update on your application for ${jobTitle}`,
    html: `
      <p>Hi ${applicant.applicant_name},</p>
      <p>Thank you for taking the time to go through our process and for the
      energy you put into your application. We truly appreciate the effort
      and the thoughtfulness you showed along the way.</p>
      <p>It was great getting to know you, and we're grateful you considered
      being part of Ontop.</p>
      <p>Wishing you the best in what's ahead.</p>
      <p><strong>The Ontop Team</strong></p>
      <hr/>
      <p style="color:#888;font-size:12px;">
        Questions? Contact us at hr@getontop.com
      </p>
    `,
  }
  await sgMail.send(msg)
}

Product Document

Discovery → Design → Execution → Deployment

Phase 1 — Discovery

User Research & Problem Framing

Discovery started from a simple request: "we need a way to handle applications." Several conversations with the People Ops team and hiring managers surfaced a more complete picture.

The actual workflow: Applications came in via a Google Form. Admins copy-pasted responses into a spreadsheet, manually emailed Calendly links to candidates they liked, and sent rejection emails on an ad-hoc basis. The "system" was entirely manual and person-dependent.
The hidden problem: Without a standardized ranking mechanism, different hiring managers had different mental models for "good candidate." This made it impossible to compare candidates across requisitions or track conversion rates by fit tier.
The CHRO bottleneck: A second interview stage involving the CHRO required a separate Calendly link (Vivian Forero's calendar), separate email copy, and was often forgotten in the manual process. This became a discrete automation target.
The "closing a role" pain: When a role filled, there was no way to cleanly close it — no mass rejection mechanism, no status indicator, and residual candidates still thought they were in the process. Discovered late in requirements gathering, became a specific feature (Requisition Close flow).
Auto-rejection scope decision: Early design assumed auto-rejecting Average candidates. Stakeholder review identified this was too aggressive — Average candidates should get video screening. Changed to: only auto-reject Low Fit. This is a business logic decision with meaningful hiring impact.

Phase 2 — Design

Architecture & Key Design Decisions

Status-driven email architecture: Rather than building individual email triggers per feature, designed a single status change model where each status transition can optionally fire an email. This makes adding future email triggers trivial and keeps email logic centralized.
Mandatory booking link at requisition creation: Initially the booking link was optional. The first email automation bug — sending interview emails to candidates without a Calendly link — drove this to a required field validated at form submission. Prevention over error handling.
Token-based video URL: Video submissions needed a public-facing URL (candidates aren't platform users) but secure storage. Token pattern: a UUID stored in the applications table, embedded in the URL. No auth required to upload; token expiry not implemented (deliberate simplicity for MVP).
Cron idempotency: The scheduled job processor checks application status before firing video requests. Without this, manually advancing a candidate to hm_interview then back to application_received would retrigger the video request. Idempotency prevents this.
Google Sheets sync over in-app reporting: Building a reporting dashboard was deprioritized. Google Sheets sync gave stakeholders immediate read-only visibility with zero new UI work — and they already knew how to use it.

Phase 3 — Execution

Implementation Details & Iterations

Stack rationale: Next.js 16 App Router chosen for server components (reduces client-side JS for data-heavy tables), Supabase for rapid iteration (built-in auth, storage, real-time), Vercel for frictionless deployment and native cron support.
Schema cascade design: Database uses full CASCADE deletes: job_requisitions → form_schemas → applications → scheduled_jobs. Deleting a requisition automatically cleans up the entire hierarchy. No orphaned records, no manual cleanup.
Rejection email — manual trigger pattern: For bulk sending rejection emails to already-rejected candidates (e.g., after closing a requisition), chose a curl-accessible API endpoint rather than a UI button. Simpler to build, less surface area for accidental trigger, admin-only access via session cookie.
CHRO interview integration: The CHRO booking link is an environment variable (CHRO_BOOKING_LINK) rather than a database field. Rationale: it changes infrequently, applies globally (not per-requisition), and doesn't need user-editable UI.

Phase 4 — Deployment & Ops

Production Operations

SendGrid configuration: SENDGRID_SANDBOX_MODE=true for staging (disables actual delivery without changing code). From address configured via SENDGRID_FROM_EMAIL env var. Sender verification required in SendGrid dashboard before production use.
Cron monitoring: Vercel cron executes /api/cron/process-scheduled-jobs. Cron logs available in Vercel dashboard. Idempotency checks ensure safe re-execution if cron fires unexpectedly.
Google Sheets sync: Requires service account credentials. Connection verified at startup; sync failures are logged but don't block status updates (non-critical path).
Iterative feature shipping: Platform was shipped incrementally. Core requisition + application flow first, then AI ranking, then email automation, then video screening, then bulk management. Each sprint was functional and used by the team before the next began.

Capability	How Demonstrated	Projects
Production AI/LLM Systems	Multi-agent KYB pipeline, client triage agent, GPT-4o candidate ranking — all running in production, not demos	Gandalf, Triage, HireDesk
Compliance & RegTech	FATF/AML KYB automation with schema-constrained outputs, audit trails, and RegTech API integration	Gandalf
AWS Serverless Architecture	Lambda, Step Functions, EventBridge, Bedrock, SageMaker, Secrets Manager, Glue ETL — all in production CloudFormation stacks	Triage Pipeline
Full-Stack Product Engineering	Next.js 16 App Router + Supabase + OpenAI + SendGrid, from DB schema to deployed UI, iteratively shipped	HireDesk
Data Engineering & ML Ops	End-to-end ML pipeline (ETL → model inference → LLM → notification), multilingual NLP, production data quality fixes	Triage Pipeline
Business × Technology Translation	Converted ambiguous business problems (KYB manual review, client churn, hiring chaos) into buildable, measurable systems	All projects
Product Thinking	Identified failure modes (alert fatigue, auto-rejection scope, mandatory booking link) through stakeholder conversations before writing code	HireDesk, Triage
Academic Research	Extracted publishable design patterns from a production system; ICAIL 2026 paper grounded entirely in real code and production data	Gandalf

AI & LLM	OpenAI GPT-4o · GPT-4o Mini · Claude 3.5 Haiku (AWS Bedrock) · XLM-RoBERTa (SageMaker) · Bedrock Embeddings
Cloud & Infra	AWS Lambda · AWS SAM · CloudFormation · AWS Glue · Amazon Redshift · Step Functions · EventBridge · Secrets Manager · Bedrock · Vercel
Backend	Python · TypeScript · Next.js 16 API Routes · psycopg2 · Supabase · PostgreSQL · SendGrid · n8n webhooks
Frontend	Next.js 16 App Router · React 19 · TypeScript · shadcn/ui · Tailwind CSS · Radix UI
Data & ML	pandas · AWS Glue ETL · Amazon Redshift DWH · Faker · Synthetic dataset generation
Compliance & Integrations	RegTech APIs (sanctions lists, corporate registries, fraud scoring) · Salesforce API · Google Sheets API · Aura API · Slack · n8n
Research & Writing	ICAIL 2026 academic paper · Business case / ROI documentation · Product requirements documentation

Gandalf — KYB Compliance Multi-Agent System

Product Document

Problem Identification & Validation

Architecture & Reliability Strategy

Implementation Details

Production Operations

AI Triage & Sentiment Analysis Pipeline

Product Document

Problem Identification & Scoping

Pipeline Design & Architecture Decisions

Implementation & Critical Bug Fixes

Go-Live Strategy & Ongoing Ops

HireDesk — AI-Powered Internal Hiring Platform

Job Requisitions

AI Candidate Ranking

Automated Email Workflows

Video Screening

Bulk Candidate Management

Google Sheets Sync

Product Document

User Research & Problem Framing

Architecture & Key Design Decisions

Implementation Details & Iterations

Production Operations

Supporting Work & Tooling

Synthetic Support Ticket Dataset Generator

AI Triage — Internal Business Case

Capabilities & What This Work Demonstrates