Technical Portfolio — 2024–2026

Joshua Dazas

Product Engineering · AI/ML Systems · Compliance · Business & Technology

This portfolio documents original systems I designed, built, and shipped at Ontop — a LATAM payroll and employer-of-record platform. The work spans production AI systems, data engineering pipelines, full-stack product development, and academic research. Each project began with a real business problem and ended with software running in production.

Project Domain Scale / Outcome
Gandalf — KYB Compliance Agent AI · Compliance · FinTech Production LLM system; 133 jobs evaluated; ICAIL 2026 paper
AI Triage & Sentiment Pipeline Data Engineering · ML Ops · CX 1,205 clients scored weekly; full serverless AWS pipeline
HireDesk Full-Stack · AI · People Ops End-to-end hiring platform shipped and used in production
Synthetic Data Generator Data Engineering · ML Tooling Reusable OSS-grade dataset generator for AI experiments
Project 01
AI · Compliance FinTech Production ICAIL 2026

Gandalf — KYB Compliance Multi-Agent System

A production-grade multi-agent LLM system for automating Know Your Business (KYB) due diligence at a global fintech and payroll platform, with a peer-reviewed academic paper submitted to ICAIL 2026.

133
Production KYB Jobs
43.6%
Auto-Accepted
56.4%
Routed to Analyst
4
Specialist Agents
6
Reliability Patterns

Global payroll and employer-of-record platforms must verify the legal entities they onboard before processing cross-border payroll. This Know Your Business process — checking corporate registry data, beneficial ownership chains, sanctions exposure, and document authenticity — was performed entirely by manual compliance analysts. The process took days to weeks per entity, couldn't scale with growth, introduced inconsistency across analysts, and produced decisions without traceable evidence trails.

Regulatory frameworks (FATF, FinCEN AML/CTF guidelines) required that any automation maintain full decision traceability and remain defensible under audit. This made naive LLM automation dangerous: a hallucinated sanctions check or fabricated ownership detail would create legal liability. The challenge was to automate aggressively while maintaining regulatory defensibility.

Gandalf is a reliability-oriented multi-agent system that decomposes KYB into discrete stages, each handled by the most appropriate mechanism: deterministic logic for clear-cut cases, specialist LLM agents for domains requiring judgment, and external RegTech data sources for objective ground truth. No single "do everything" prompt — every component has a narrow, well-defined responsibility.

Layer 1 — Early Exit Gates

topLevel.py — Deterministic country risk check + industry risk check before any LLM invocation. Handles ~15–20% of cases with zero token cost. Output: rejected or manual_review with 1.0 confidence.

Layer 2 — Specialist Agent Ensemble (parallel)

Company Research Agent — Web research via Firecrawl; corporate registry lookups; business legitimacy signals.
Representative Research Agent — PEP screening, global sanctions checks, fraud scoring via RegTech API providers.
Shareholder Analysis Agent — Beneficial ownership chain traversal; UBO identification; circular ownership detection.
Document Reviewer Agent — OCR + R1–R5 rule evaluation on uploaded certificates, IDs, and ownership documents.

Layer 3 — Orchestrator

topLevel.py orchestration — Aggregates specialist outputs, resolves conflicts, applies confidence weighting. Each agent output is schema-validated before aggregation; validation failure is treated as a hard error, not silently ignored.

Layer 4 — Decision Gate & Audit Output

Schema-constrained final decisionaccept | reject | manual_review with confidence, evidence list, rule violations, and structured justification. Analyst override pathway (R3) logs the override rationale for audit trail.

These six patterns form the core contribution of the ICAIL 2026 paper. Each addresses a specific failure mode common to production LLM compliance systems:

  1. Early Exit Gates. Deterministic checks before LLM invocation. Country and industry risk rules are coded explicitly — they require no LLM judgment and should never be delegated to one.
  2. Specialist Ensemble. Four narrow agents instead of one general agent. Each specialist has a well-scoped task, reducing context confusion and enabling independent validation.
  3. Schema-Constrained Outputs. Every LLM response is validated against a Pydantic schema. A hallucinated or malformed response fails validation and routes to manual review — it never silently passes through.
  4. RegTech Integration. Objective ground truth from external providers (global sanctions databases, state secretary corporate registries, fraud scoring APIs) replaces LLM knowledge for facts that must be current and auditable.
  5. Idempotency & Audit Trail. Every agent invocation is logged with inputs, outputs, timestamps, and decision rationale. The system can be re-run on the same entity and will produce the same decision (deterministic for gates; traceable for agents).
  6. Human Override Pathway. Analysts can override any system decision via a structured form that requires a justification code. Overrides are logged and reported separately from system decisions, preserving decision quality metrics.
topLevel.py
# Deterministic early-exit gates — no LLM cost for clear cases
def evaluate_entity(entity_data: dict) -> KYBDecision:

    # Gate 1: Jurisdiction risk — no LLM needed
    if entity_data['country'] in PROHIBITED_COUNTRIES:
        return KYBDecision(
            status='rejected',
            reason='Prohibited jurisdiction under AML policy',
            confidence=1.0,
            llm_used=False,
            audit_code='GATE_COUNTRY'
        )

    # Gate 2: Industry risk
    if entity_data['industry'] in HIGH_RISK_INDUSTRIES:
        return KYBDecision(
            status='manual_review',
            reason='High-risk industry classification',
            confidence=1.0,
            llm_used=False,
            audit_code='GATE_INDUSTRY'
        )

    return run_agent_ensemble(entity_data)


# Schema-constrained agent output — hallucinations fail validation, not silently pass
class AgentOutput(BaseModel):
    decision:       Literal['accept', 'reject', 'manual_review']
    confidence:     float = Field(ge=0.0, le=1.0)
    evidence:       list[str]         # must be non-empty
    rule_violations:list[str]         # R1-R5 violations found
    regtech_flags:  list[str]         # sanctions / fraud flags from external APIs
    requires_analyst_review: bool

# Any LLM response that fails this schema → manual_review (never silently passes)
MetricValueNotes
Total cases processed133 KYB jobs61 corporate clients + 72 business contractors
Auto-accepted58  (43.6%)Fully automated — no analyst touch required
Routed to manual review75  (56.4%)Agent provided evidence summary to analyst
Avg. risk score2.76 / 5.0Confidence-weighted scoring across all agents
Top rejection signalDocument rules not satisfiedR1–R5 rule evaluation failures (Document Reviewer Agent)
ModelGPT-4 (OpenAI)All agents used same model; specialist prompts differ

"Gandalf: Architecting Multi-Agent Systems for Know Your Business Compliance in Global Financial Services"
Joshua Dazas & Felipe García — Submitted to ICAIL 2026 (International Conference on Artificial Intelligence and Law).
The paper introduces six reliability-oriented design patterns for production LLM compliance systems, grounded entirely in real implementation and production evaluation data. Theory derived from working code, not hypothetical architectures.

Product Document

Discovery → Design → Execution → Deployment
Phase 1 — Discovery

Problem Identification & Validation

The KYB problem was identified by observing the compliance analyst workflow directly. Key signals: average entity review took 3–5 business days; analysts repeatedly checked the same sources (corporate registry, sanctions list, Google for reputation) in the same sequence; rejections were almost always explainable by a small set of rule violations; the reasoning was formulaic but the volume was not.

  • Regulatory constraint discovery: Mapped FATF/FinCEN requirements to understand what could be automated. Finding: automation is permitted if decisions are traceable and evidence-backed. Full black-box LLM decisions are not compliant; schema-constrained outputs with source citations are.
  • Failure mode mapping: Identified LLM-specific risks before writing a line of code — hallucinated sanctions hits, fabricated corporate registry data, inconsistent beneficial ownership analysis. Each became a named design constraint.
  • Data source inventory: Catalogued every external data source analysts used. Categorized as: deterministic (sanctions lists, corporate registries — should never be LLM-generated) vs. judgment-requiring (reputation signals, document interpretation).
Phase 2 — Design

Architecture & Reliability Strategy

  • Agent decomposition: Broke the monolithic "review this entity" task into four specialist agents, each owning one domain. Key insight: specialist prompts dramatically outperform generalist ones on constrained compliance tasks.
  • Schema-first design: Defined the output schema before writing prompts. Every agent was built to produce a validated Pydantic model. This inverts the usual LLM approach (prompt first, output second) and eliminates a class of production bugs.
  • External data integration: Designed RegTech API integration as a non-negotiable. LLMs should never be the source of truth for sanctions lists or corporate registry data — these require current, auditable, API-verified data.
  • Early exit strategy: Identified all deterministic cases (prohibited countries, banned industries) and moved them before LLM invocation. This improves cost, latency, and consistency simultaneously.
  • Human-in-the-loop design: Built the override pathway into the initial design, not as an afterthought. Analysts needed to trust the system before it could replace their workflow — visible overrides with tracked rationale built that trust.
Phase 3 — Execution

Implementation Details

  • Stack: Python, OpenAI GPT-4, Firecrawl (web research), RegTech provider APIs (sanctions, corporate registries, fraud scoring — anonymized per NDA), Pydantic for schema validation, AWS for infrastructure.
  • Agent implementation pattern: Each agent = LLM + tool access + goal-directed system prompt + output schema. Agents are invoked sequentially by the orchestrator, not autonomously. Parallel invocation for independent agents (company research + rep research + doc review can run concurrently).
  • OCR pipeline: Document Reviewer Agent uses OCR to extract text from uploaded PDFs/images before LLM evaluation. R1–R5 rules are evaluated by the LLM against extracted text, not raw images.
  • Evaluation methodology: 133-case production dataset evaluated across Aug–Nov 2025. Justification type analysis (10 unique rejection reason categories). Timeline and volume distribution analysis for ICAIL paper metrics.
Phase 4 — Deployment & Ops

Production Operations

  • Deployment: Python services on AWS. Integrated with existing Ontop compliance workflow via API.
  • Monitoring: Every agent invocation logged with inputs, outputs, and decision rationale. Decision quality tracked via manual review override rate — a rising override rate signals prompt drift or data quality issues.
  • Ongoing calibration: R1–R5 rules updated as regulatory guidance evolves. External API schemas versioned to prevent silent breaking changes from providers.
  • Academic documentation: Production metrics used directly in ICAIL 2026 paper. The system is the paper's empirical basis — no synthetic evaluation benchmarks.
Project 02
Data Engineering ML Ops AWS Serverless Production

AI Triage & Sentiment Analysis Pipeline

An end-to-end, fully serverless ML pipeline that turns raw customer support tickets into ranked churn-risk recommendations, delivered weekly to account managers via Slack.

1,205
Active Clients Scored
938
Above Risk Threshold
5
Lambda Functions
2
Step Functions
Weekly
Pipeline Cadence

Ontop serves 1,200+ active corporate clients across Latin America, each generating support tickets in Zendesk and an internal messaging platform (DIIO). With this volume, identifying which clients are at genuine risk of churn or operational escalation before a situation becomes a crisis required either a large account management team or an automated system. No existing tooling could surface the right clients at the right time.

The specific failure mode the business experienced: account managers would only become aware of a deteriorating client relationship when the client threatened to churn or escalated to senior leadership — at which point it was often too late for meaningful intervention. A weekly automated digest of the highest-risk, newest cases would give account managers actionable intelligence when there was still time to act.

Stage 1 — Glue ETL

AWS Glue Python jobs — Extract Zendesk tickets and DIIO conversations via API. Filter to external tickets only (is_external = 'true'). Land raw data into Redshift: external.zendesk__tickets_sentiment_analysis and external.diio__sentiment_analysis.

Stage 2 — Sentiment Scoring Lambda

XLM-RoBERTa via SageMaker — Three Lambdas: warm (keep endpoint alive), fetch (query Redshift for unscored tickets), score (invoke SageMaker endpoint per batch, parallel via Step Functions MaxConcurrency=10). Outputs to process_data.zendesk_sentiment and process_data.diio_sentiment.

Stage 3 — Issue Extractor Lambda

GPT-4o Mini + Bedrock Embeddings — Two Lambdas (batch-query + extract) via Step Functions. Extracts structured issue types from ticket text. Outputs to process_data.extracted_issues. Parallel execution across ticket batches (MaxConcurrency=10).

Stage 4 — Context Signals ETL

Salesforce + Redshift + Aura API — Aggregates 1,205 active client records with transaction health metrics, conversation volume (Aura, 4-week window), and L1 signals (sentiment + issues, 30-day window). Outputs one row per client to process_data.client_context_rules. Uses execute_values(page_size=200) for bulk upsert within Lambda timeout.

Stage 5 — Triage Agent Lambda

AWS Bedrock — Claude 3.5 Haiku (cross-region inference profile) — Reads top 70 clients by risk score from Redshift. For each client, invokes Claude with a structured prompt containing sentiment trends, issue categories, transaction health, and churn signals. Outputs schema-constrained JSON: urgency, reason_summary, recommended_action, confidence. Writes to process_data.triage_recommendations.

Stage 6 — Slack Digest Lambda

n8n webhook → Slack — Queries top 10 new clients from triage_recommendations (7-day dedup via slack_digest_log table). POSTs structured JSON payload to n8n webhook. n8n formats and delivers to account management Slack channel. Decoupled from core pipeline — Slack formatting changes don't require Lambda redeployment.

CloudFormation StackKey FunctionsSchedule (EventBridge)
sentiment-classifier-v2sentiment-warm-v2, sentiment-fetch-v2, sentiment-score-v2Mondays 01:00 UTC
issue-extractor-v1issue-batch-query-v1, issue-extract-v1Mondays 01:00 UTC
context-signals-etlcontext-signals-etlMondays 03:00 UTC
triage-agenttriage-agent-v1Mondays 04:30 UTC
slack-digestslack-digest-lambdaMondays 06:00 UTC
TierScore RangeClient Count% of BaseRecommended Response
Critical50+60.5%Immediate escalation to senior AM
High30–4933728%Proactive outreach within 48 hours
Medium20–2959549%Include in weekly digest, monitor
Low< 2026722%Routine check, no action needed
triage-agent-lambda/lambda_function.py
# Cross-region inference profile required for all newer Claude models
# Direct model IDs are blocked by Bedrock; inference profiles are mandatory
BEDROCK_MODEL_ID = "us.anthropic.claude-3-5-haiku-20241022-v1:0"
RISK_THRESHOLD = 20
MAX_CLIENTS   = 70   # Top 70 clients by risk score, weekly

def build_triage_prompt(client: dict) -> str:
    return f"""You are a customer success triage agent for a payroll platform.
Analyze this client and respond ONLY with valid JSON matching the schema below.

Client: {client['company_name']}
Risk Score: {client['risk_score']} / 100
Sentiment Trend (30d): {client['sentiment_trend']}
Top Issues: {client['top_issues']}
Transaction Volume: {client['transaction_count']} transactions (4-week window)
Active Conversations: {client['conversation_count']} (Aura, 4 weeks)
Churn Signals: {client['churn_signals']}

Required JSON schema:
{{
  "urgency": "critical | high | medium | low",
  "reason_summary": "1-2 sentences explaining root cause",
  "recommended_action": "specific next step for account manager",
  "confidence": 0.0 - 1.0
}}"""

# Fetch top at-risk clients — ordered by risk, deduped for new entries only
FETCH_QUERY = """
    SELECT c.client_id, c.company_name, c.top_issues,
           t.risk_score, t.sentiment_trend, t.transaction_count,
           t.conversation_count, t.churn_signals
    FROM   process_data.client_context_rules c
    JOIN   process_data.triage_recommendations t USING (client_id)
    WHERE  t.risk_score >= :threshold
    ORDER  BY t.risk_score DESC
    LIMIT  :max_clients
"""

Product Document

Discovery → Design → Execution → Deployment
Phase 1 — Discovery

Problem Identification & Scoping

The original problem statement was "we need to know which clients are unhappy." The discovery process refined this considerably over several conversations with the account management team.

  • V1 (Supabase) → V2 (Redshift) evolution: The first version landed data in Supabase. The discovery that all other company analytics ran on Redshift meant V1 created an isolated data silo. V2 was redesigned ground-up on Redshift as the single analytics layer — a key architectural correction discovered early enough to avoid technical debt.
  • Multilingual signal discovery: A critical early finding was that Ontop's support tickets mixed Spanish and English within the same conversation thread. Standard English-only sentiment models produced unreliable results. This drove the XLM-RoBERTa model selection — a deliberate technical decision from a business insight, not a generic default.
  • Context signals gap: Ticket sentiment alone proved insufficient. A client with negative tickets but healthy transaction volume needed different treatment than one with negative tickets and declining transactions. This gap drove the Salesforce + Aura context signals ETL as a separate stage.
  • Alert fatigue concern: Early stakeholder feedback on prototypes surfaced a concern about over-alerting. The 7-day dedup window and TOP 10 cap were direct responses to stated user preferences from account managers who feared a system they'd start ignoring.
Phase 2 — Design

Pipeline Design & Architecture Decisions

  • Redshift as single source of truth: All pipeline stages read/write to Redshift. Enables SQL-based auditing, BI tool connectivity, and pipeline observability without additional data movement or sync complexity.
  • Step Functions for orchestration: CloudFormation-managed state machines with DefinitionSubstitutions — no hardcoded ARNs. MaxConcurrency=10 for ticket processing. Design decision: Step Functions over Airflow or cron because it integrates natively with Lambda and provides built-in retry/error handling.
  • Decoupled notification layer: Lambda POSTs structured JSON to n8n webhook; n8n handles Slack formatting. Deliberate decoupling — Slack message format changes without requiring Lambda redeployment or AWS credential management for Slack API.
  • Weekly cadence: Daily was considered and rejected. Account managers can't meaningfully act on daily updates for 938 clients. Weekly digest at the start of the work week gives a planning horizon for proactive outreach.
Phase 3 — Execution

Implementation & Critical Bug Fixes

  • Key bug: is_external filter (lowercase string). Glue ETL was filtering on is_external = 'True' (Python boolean string) instead of 'true' (database string). Result: zero Zendesk tickets were processed. Fix: explicit lowercase string comparison. Lesson: verify filter values against actual database column values before declaring a pipeline working.
  • Key bug: context-signals timeout. Original executemany() upsert for 1,205 rows was timing out at the 900s Lambda limit. Fix: replaced with psycopg2.extras.execute_values(page_size=200). Lesson: batch insert patterns matter at scale; test with production-volume data.
  • Key bug: Bedrock model ID. Triage agent initially used direct model ID. Bedrock now requires cross-region inference profiles (us.* prefix) for all newer Claude models. IAM policy required two separate resource ARNs: one for the inference profile (with account ID) and one for the foundation model (without). Lesson: AWS Bedrock model access patterns change; always verify against current documentation.
  • Schema discovery: Workflow documentation initially used wrong column names for triage_recommendations. Fixed by querying the actual table schema and updating the Lambda JOIN logic to derive missing fields (risk_category via CASE statement; client_name via JOIN).
Phase 4 — Deployment & Operations

Go-Live Strategy & Ongoing Ops

  • Deployment tooling: AWS SAM (sam build && sam deploy) for all five stacks. All schedules deployed in DISABLED state, then enabled post-integration test. This allows safe incremental rollout without accidental cron execution during deployment.
  • Secrets management: All credentials stored in AWS Secrets Manager (Redshift, Salesforce, Aura, OpenAI, n8n webhook URL). No secrets in environment variables or code.
  • Observability: CloudWatch Logs for every Lambda invocation. Step Functions execution history for pipeline-level tracing. Redshift query history for data audit.
  • Known next step: Raise MAX_CLIENTS from 70 to 343 (threshold ≥ 30) to cover the full High tier. Current limit was a conservative go-live decision; production stability confirmed, ready to scale.
Project 03
Full-Stack AI People Ops Production

HireDesk — AI-Powered Internal Hiring Platform

A purpose-built hiring platform for Ontop's People Ops team, with AI candidate ranking, automated email workflows, video screening, and bulk candidate management — shipped and used in production.

Next.js 16
App Router
GPT-4o
Candidate Ranking
4
Email Automations
SendGrid
Email Provider
Vercel
Deployed On

Ontop's People Ops team managed hiring across multiple open roles using a combination of spreadsheets, email threads, and manual Calendly coordination. The specific pain points:

Job Requisitions

Structured hiring briefs with AI-generated application form schemas. Required fields include job title, description, salary band, and a mandatory Calendly booking link (validated at form submission — no requisition can be published without one, ensuring interview emails always have a valid scheduling link). Requisition lifecycle: pending → form_generated → published → closed.

AI Candidate Ranking

On each application submission, GPT-4o evaluates the applicant's responses against the job description and outputs a structured ranking: Very High Fit / High Fit / Average / Low Fit, with a justification paragraph. Auto-rejection logic: only Low Fit candidates are auto-rejected (not Average — a deliberate product decision to give borderline candidates a chance at video screening).

Automated Email Workflows

Status ChangeEmail TriggeredRequirement / Condition
→ hm_interviewInterview scheduling email with Calendly linkRequires booking_link on the requisition
→ chro_interviewCHRO interview invitationRequires CHRO_BOOKING_LINK environment variable
→ rejectedBranded rejection email (Ontop copy, warm tone)Fires on every status change to rejected, including bulk
video_requestedVideo submission request with token URLVia Vercel cron, 24 hours after application received

Video Screening

A token-based video upload URL (/video/[token]) is generated on application creation and dispatched 24 hours later via cron. Upload-only (no browser recording) — MP4, WebM, MOV, AVI; max 50MB; 2–3 minutes; English only. Videos stored in Supabase Storage with signed URLs. The cron job checks idempotency before dispatch: skips the video request if the application status has changed from application_received.

Bulk Candidate Management

Multi-select checkboxes on the applications dashboard allow bulk status changes across any combination of candidates. Bulk rejection fires individual rejection emails for each selected candidate via Promise.allSettled() — non-blocking, with per-email error tracking. Requisition close flow: a "Close Requisition" button triggers a two-step confirmation modal that auto-rejects all non-hired candidates and sets requisition status to closed.

Google Sheets Sync

All status changes are synced to a connected Google Sheet in real time, giving hiring stakeholders a read-only pipeline view without requiring platform access.

LayerTechnologyPurpose
FrontendNext.js 16 App Router + React 19 + TypeScriptServer components, file-based routing, type safety
UI Libraryshadcn/ui + Tailwind CSS + RadixAccessible, composable component system
DatabaseSupabase (PostgreSQL)Applications, requisitions, form_schemas, scheduled_jobs tables
AuthSupabase AuthSession-based auth for People Ops admins
AIOpenAI GPT-4oCandidate ranking + structured JSON output
EmailSendGridInterview scheduling, rejection, video request emails
File StorageSupabase Storage + signed URLsVideo upload with expiring token-based access
DeploymentVercel (serverless functions + cron)Edge deployment, automatic scaling, cron job execution
IntegrationsGoogle Sheets APIReal-time status sync for stakeholder reporting
src/app/api/applications/bulk-status/route.ts
export async function PATCH(request: Request) {
  const { ids, status } = await request.json()

  // Update all applications atomically
  const { data: apps } = await supabase
    .from('applications')
    .update({ status })
    .in('id', ids)
    .select('id, applicant_name, applicant_email, form_schema_id')

  // Fire emails in parallel — non-blocking, track failures without throwing
  const results = await Promise.allSettled(
    apps.map(async (app) => {
      if (status === 'rejected') {
        const { data: schema } = await supabase
          .from('form_schemas')
          .select('job_title')
          .eq('id', app.form_schema_id)
          .single()

        return sendRejectionEmail(
          { applicant_name: app.applicant_name,
            applicant_email: app.applicant_email },
          schema.job_title
        )
      }
    })
  )

  const failed = results.filter(r => r.status === 'rejected').length
  return NextResponse.json({
    updated:      apps.length,
    emails_sent:  apps.length - failed,
    emails_failed: failed
  })
}
src/lib/email/service.ts
export async function sendRejectionEmail(
  applicant: { applicant_name: string; applicant_email: string },
  jobTitle: string
) {
  const msg = {
    to:      applicant.applicant_email,
    from:    process.env.SENDGRID_FROM_EMAIL || 'noreply@hiredesk.com',
    subject: `Update on your application for ${jobTitle}`,
    html: `
      <p>Hi ${applicant.applicant_name},</p>
      <p>Thank you for taking the time to go through our process and for the
      energy you put into your application. We truly appreciate the effort
      and the thoughtfulness you showed along the way.</p>
      <p>It was great getting to know you, and we're grateful you considered
      being part of Ontop.</p>
      <p>Wishing you the best in what's ahead.</p>
      <p><strong>The Ontop Team</strong></p>
      <hr/>
      <p style="color:#888;font-size:12px;">
        Questions? Contact us at hr@getontop.com
      </p>
    `,
  }
  await sgMail.send(msg)
}

Product Document

Discovery → Design → Execution → Deployment
Phase 1 — Discovery

User Research & Problem Framing

Discovery started from a simple request: "we need a way to handle applications." Several conversations with the People Ops team and hiring managers surfaced a more complete picture.

  • The actual workflow: Applications came in via a Google Form. Admins copy-pasted responses into a spreadsheet, manually emailed Calendly links to candidates they liked, and sent rejection emails on an ad-hoc basis. The "system" was entirely manual and person-dependent.
  • The hidden problem: Without a standardized ranking mechanism, different hiring managers had different mental models for "good candidate." This made it impossible to compare candidates across requisitions or track conversion rates by fit tier.
  • The CHRO bottleneck: A second interview stage involving the CHRO required a separate Calendly link (Vivian Forero's calendar), separate email copy, and was often forgotten in the manual process. This became a discrete automation target.
  • The "closing a role" pain: When a role filled, there was no way to cleanly close it — no mass rejection mechanism, no status indicator, and residual candidates still thought they were in the process. Discovered late in requirements gathering, became a specific feature (Requisition Close flow).
  • Auto-rejection scope decision: Early design assumed auto-rejecting Average candidates. Stakeholder review identified this was too aggressive — Average candidates should get video screening. Changed to: only auto-reject Low Fit. This is a business logic decision with meaningful hiring impact.
Phase 2 — Design

Architecture & Key Design Decisions

  • Status-driven email architecture: Rather than building individual email triggers per feature, designed a single status change model where each status transition can optionally fire an email. This makes adding future email triggers trivial and keeps email logic centralized.
  • Mandatory booking link at requisition creation: Initially the booking link was optional. The first email automation bug — sending interview emails to candidates without a Calendly link — drove this to a required field validated at form submission. Prevention over error handling.
  • Token-based video URL: Video submissions needed a public-facing URL (candidates aren't platform users) but secure storage. Token pattern: a UUID stored in the applications table, embedded in the URL. No auth required to upload; token expiry not implemented (deliberate simplicity for MVP).
  • Cron idempotency: The scheduled job processor checks application status before firing video requests. Without this, manually advancing a candidate to hm_interview then back to application_received would retrigger the video request. Idempotency prevents this.
  • Google Sheets sync over in-app reporting: Building a reporting dashboard was deprioritized. Google Sheets sync gave stakeholders immediate read-only visibility with zero new UI work — and they already knew how to use it.
Phase 3 — Execution

Implementation Details & Iterations

  • Stack rationale: Next.js 16 App Router chosen for server components (reduces client-side JS for data-heavy tables), Supabase for rapid iteration (built-in auth, storage, real-time), Vercel for frictionless deployment and native cron support.
  • Schema cascade design: Database uses full CASCADE deletes: job_requisitions → form_schemas → applications → scheduled_jobs. Deleting a requisition automatically cleans up the entire hierarchy. No orphaned records, no manual cleanup.
  • Rejection email — manual trigger pattern: For bulk sending rejection emails to already-rejected candidates (e.g., after closing a requisition), chose a curl-accessible API endpoint rather than a UI button. Simpler to build, less surface area for accidental trigger, admin-only access via session cookie.
  • CHRO interview integration: The CHRO booking link is an environment variable (CHRO_BOOKING_LINK) rather than a database field. Rationale: it changes infrequently, applies globally (not per-requisition), and doesn't need user-editable UI.
Phase 4 — Deployment & Ops

Production Operations

  • SendGrid configuration: SENDGRID_SANDBOX_MODE=true for staging (disables actual delivery without changing code). From address configured via SENDGRID_FROM_EMAIL env var. Sender verification required in SendGrid dashboard before production use.
  • Cron monitoring: Vercel cron executes /api/cron/process-scheduled-jobs. Cron logs available in Vercel dashboard. Idempotency checks ensure safe re-execution if cron fires unexpectedly.
  • Google Sheets sync: Requires service account credentials. Connection verified at startup; sync failures are logged but don't block status updates (non-critical path).
  • Iterative feature shipping: Platform was shipped incrementally. Core requisition + application flow first, then AI ranking, then email automation, then video screening, then bulk management. Each sprint was functional and used by the team before the next began.
Project 04
Tooling Data Engineering

Supporting Work & Tooling

Supplementary work that enabled or validated the larger projects above.

Synthetic Support Ticket Dataset Generator

Built to generate realistic synthetic customer support ticket datasets for training, evaluation, and baseline benchmarking of the triage pipeline. Fully configurable — schema, volume, category distributions, and sentiment patterns are all parameterized.

Output files: clients.csv (50 clients with Faker-generated profiles), tickets.csv (200 tickets with realistic category/severity distributions), conversations.csv (3–8 messages per ticket, LLM-augmented content), ticket_history.csv (~300 historical tickets for recurrence pattern simulation).

Why it matters: Without realistic synthetic data, testing the triage pipeline required waiting for real production data cycles. The generator enabled rapid iteration on the scoring model and issue extractor without touching production systems. It also served as a reusable artifact for the AI triage business case presentation.

AI Triage — Internal Business Case

A structured business case document quantifying the ROI of deploying the triage automation vs. expanding the account management headcount. Covered: cost modeling (Lambda + Bedrock + SageMaker vs. additional FTE salary), time-savings analysis (manual weekly review estimated at 8–12 hours per AM), risk assessment, and phased implementation roadmap. Designed for executive review and used as the basis for the build decision.

Business insight demonstrated: The ability to build the business case and the system represents a pattern that runs through all four projects — identifying business value, designing the technical solution, and shipping it. None of these were handed to me as fully-specified engineering tickets. Each began as an ambiguous problem and was shaped into a buildable, measurable system.

Capabilities & What This Work Demonstrates

These projects represent the full product engineering lifecycle — discovery, design, implementation, and deployment — across AI/ML systems, regulatory technology, full-stack development, and data engineering. All shipped to production, not prototypes.

CapabilityHow DemonstratedProjects
Production AI/LLM Systems Multi-agent KYB pipeline, client triage agent, GPT-4o candidate ranking — all running in production, not demos Gandalf, Triage, HireDesk
Compliance & RegTech FATF/AML KYB automation with schema-constrained outputs, audit trails, and RegTech API integration Gandalf
AWS Serverless Architecture Lambda, Step Functions, EventBridge, Bedrock, SageMaker, Secrets Manager, Glue ETL — all in production CloudFormation stacks Triage Pipeline
Full-Stack Product Engineering Next.js 16 App Router + Supabase + OpenAI + SendGrid, from DB schema to deployed UI, iteratively shipped HireDesk
Data Engineering & ML Ops End-to-end ML pipeline (ETL → model inference → LLM → notification), multilingual NLP, production data quality fixes Triage Pipeline
Business × Technology Translation Converted ambiguous business problems (KYB manual review, client churn, hiring chaos) into buildable, measurable systems All projects
Product Thinking Identified failure modes (alert fatigue, auto-rejection scope, mandatory booking link) through stakeholder conversations before writing code HireDesk, Triage
Academic Research Extracted publishable design patterns from a production system; ICAIL 2026 paper grounded entirely in real code and production data Gandalf
AI & LLMOpenAI GPT-4o · GPT-4o Mini · Claude 3.5 Haiku (AWS Bedrock) · XLM-RoBERTa (SageMaker) · Bedrock Embeddings
Cloud & InfraAWS Lambda · AWS SAM · CloudFormation · AWS Glue · Amazon Redshift · Step Functions · EventBridge · Secrets Manager · Bedrock · Vercel
BackendPython · TypeScript · Next.js 16 API Routes · psycopg2 · Supabase · PostgreSQL · SendGrid · n8n webhooks
FrontendNext.js 16 App Router · React 19 · TypeScript · shadcn/ui · Tailwind CSS · Radix UI
Data & MLpandas · AWS Glue ETL · Amazon Redshift DWH · Faker · Synthetic dataset generation
Compliance & IntegrationsRegTech APIs (sanctions lists, corporate registries, fraud scoring) · Salesforce API · Google Sheets API · Aura API · Slack · n8n
Research & WritingICAIL 2026 academic paper · Business case / ROI documentation · Product requirements documentation