How I Built
AIHRPilot
in a Weekend
A Portfolio Executive's walkthrough of building an HR policy engine without being a developer. The full stack, the exact prompts, the code, and the lessons.
Yuri Kruman
3x CHRO · AI Trainer (OpenAI, Meta, Microsoft) · Jun 2026
% of tickets auto-resolved
/month to run at scale
days to working prototype
saved per HRBP per week
The 30-Second Version
AIHRPilot is an HR policy intelligence engine that classifies inbound HR tickets, answers ~80% of them automatically from a company's own policy corpus, and routes the rest to the right human with a pre-drafted reply.
The Problem
HR teams burning 30-40 hours a week answering the same 8-12 recurring questions.
The Stack
Flask + scikit-learn (TF-IDF) + Claude API + a Lattice-inspired UI.
What It Doesn't Need
No vector database. No fine-tuning. No MLOps team.
Build Time
One weekend for v1. Three weeks to production-grade.
If you are a non-developer executive who has ever thought "I wish something like this existed for my team," this walkthrough is for you. Read it like a build log, not a tutorial. The point is not that you should clone AIHRPilot; the point is that you should build the equivalent for your recurring pain.
Why This Tool, and Why in a Weekend
I had been doing fractional CHRO work for a Fortune 500 client with a ~60-person HR team on Lattice. Every quarter we ran the same post-mortem on their ticketing metrics. Every quarter the same pattern emerged:
~200
inbound policy tickets per week across regions
~80%
were variations on the same 8-12 questions
10-15 min
per ticket: read, verify, draft reply, follow up
30-50 hrs
of senior HRBP time burned weekly on recurrence
The company's first instinct was classic: "Let's buy an AI HR copilot." They had demos lined up with four vendors, each quoting $60K-$180K ARR for a black-box tool that would still require custom policy ingestion and still would not integrate with Lattice the way they wanted.
My instinct was different. I'd seen enough vendor demos to know two things:
- 1 Nothing the vendors were offering was meaningfully harder than what Claude + a retrieval layer could already do.
- 2 The real work was not the model — it was the policy corpus, the classification taxonomy, and the UX that HR actually wanted to use. Vendors solve the first; the client has to solve the second two regardless.
So I proposed a two-week spike: let me build a prototype over a weekend, test it against the last 500 tickets, and if the accuracy was acceptable we'd deploy a v1 inside their existing HR workflow. If it failed, they could go back to vendor shopping two weeks later with better requirements in hand.
Augment, Don't Replace.
I was not replacing Lattice. I was building a layer on top of Lattice. Every ticket still lived in the system of record; AIHRPilot was the intelligence layer that read, classified, drafted, and routed. Replacing a system of record is an 18-month change-management project. Augmenting one is a weekend build.
The Stack (and Why Each Piece)
Click each layer to see why I chose it. If you're non-technical, the "why" matters more than the "what."
Language
Python
Largest ecosystem for NLP. What every AI lab uses. If you're building anything with machine learning or LLM APIs, Python is the default and the right one. Every library, every tutorial, every StackOverflow answer assumes Python.
Web Framework
Flask
Simplest possible Python web app. Avoid Django overhead. For an internal tool that serves 60 users, Flask gives you routing, templating and nothing else. That's the point. You don't need an ORM, admin panel, or migration system at v1.
Retrieval
scikit-learn TF-IDF
Works at this corpus size. No vector DB needed. Everybody reaches for Pinecone, Weaviate, or Chroma at the start. Don't.
Rule of thumb: Under ~500 pages? TF-IDF is fine. Over 500? Move to embeddings + vector DB. Over ~50,000? Hire a real ML engineer.
Reasoning
Claude API (Sonnet)
Best reasoning/writing quality at this price point. RAG gives you 90%+ of the benefit of fine-tuning with 10% of the engineering overhead. The policy corpus lives in a folder; Claude reads it at query time.
Why not fine-tuning? You lose general reasoning quality, gain marginal domain accuracy, and introduce a training/evaluation/retraining loop you do not want to maintain.
UI
HTML + Tailwind + HTMX
No React, no framework war. Lattice-inspired look. HTMX lets you build interactive UIs with server-rendered HTML and ~20 lines of JavaScript. A non-developer can actually read and modify HTMX code; React code requires you to know React. For this class of tool, always choose the simpler stack.
Hosting
Render + Cloudflare
~$20/month combined. Render handles the Python backend on a $7/month hobby plan. Cloudflare provides CDN, DDoS protection and SSL for free. Total cost to keep the whole system running at a mid-sized company's ticket volume: ~$40/month including Claude API calls.
The Six-Phase Build Sequence
Each phase is 2-10 hours. Click through the timeline below. Sequence them in order; don't try to parallelize until Phase 4.
Corpus Ingestion
Get every piece of HR policy content into a single, searchable format. I asked the client for everything they considered "authoritative policy": the PDF handbook, regional addenda, benefits summary plan descriptions, code of conduct, equity plan doc. About 180 pages total across 40 files.
The ingestion pipeline:
Extract text from PDFs (pypdf or pdfplumber)
Chunk each document into ~500-word passages with 50-word overlap
Tag each chunk with source document, section heading and effective date
Store as a JSON file for Phase 2
"Write a Python script that reads all PDF files in a folder, extracts the text, chunks each document into 500-word passages with 50-word overlap, and outputs a JSONL file with fields: id, source_doc, section_heading, effective_date, text. Use pdfplumber. Preserve section headings by detecting lines that are all caps or bold in the source PDF."
Don't skip this: The single biggest mistake non-developers make is skipping section-heading preservation. You need it for Phase 5 (citations).
What I'd Do Differently Today
Start with Claude Projects for the reasoning layer
Only move to direct API calls once I knew the prompt was stable. Projects lets you iterate on the system prompt and corpus in the same interface. Only move to API when you're ready to embed in the workflow.
Use Cursor or Claude Code from Phase 1
The ability to iterate on code, tests and deployment in one environment is a 3x speedup over the workflow I actually used (copy-pasting prompts into chat).
Instrument the accuracy log from day one
Not Phase 6. The data from the first two weeks of real usage is the single most valuable thing you get out of the build. Don't lose it.
Adapt This for YOUR Recurring Task
The architecture (corpus → retrieval → reasoning → routing → UI) is the template for any recurring cognitive task. Here are five adaptations built off this same pattern:
| Recurring Task | Corpus | Classification | Routing |
|---|---|---|---|
| Candidate screening | Job descriptions + resume rubric | Fit tier (1-4) | Auto-reject / auto-advance / human review |
| Vendor RFP scoring | Past RFPs + rubric | Vendor tier | Shortlist / review / reject |
| Board packet pre-read | Prior board packets + company context | Question clusters | Pre-drafted responses for CEO |
| Deal flow triage (VC/PE) | IC memo template + thesis doc | Fit score | Pass / diligence / follow-up |
| Policy/compliance Q&A | Policy corpus | Question categories | Auto-reply / human review / escalate |
Starter Prompts for Claude / Cursor
If you want to start today, these are the four prompts that got me from zero to a working prototype. Copy them directly. Substitute the bracketed placeholders for your domain.
"I have a folder of PDFs containing [TYPE OF DOCUMENTS, e.g., HR policies]. Write a Python script that extracts text from every PDF, chunks each document into 500-word passages with 50-word overlap, preserves section headings, and outputs a JSONL file with fields: id, source_doc, section_heading, effective_date, text. Use pdfplumber."
"Using the JSONL corpus from the previous script, write a retrieval function retrieve(question, k=5) that returns the top k most relevant chunks using TfidfVectorizer with (1,2) n-grams, English stop words, max_df=0.8, min_df=2. Persist the fitted vectorizer and vectors with joblib."
"I'm building a [YOUR TOOL TYPE] that will classify inbound [TICKETS / CANDIDATES / RFPs / DEALS] into one of these categories: [YOUR LIST]. Write a system prompt for Claude that: (a) classifies into exactly one category, (b) assigns a 0-1 confidence score, (c) generates a draft response, (d) cites the specific source passage, and (e) refuses to answer anything involving [YOUR REFUSAL TRIGGERS]. Output must be a valid JSON object with fields: category, confidence, draft_reply, citation."
"Build a Flask + HTMX web app with three pages: an inbox showing [TICKETS / CANDIDATES / DEALS] color-coded by confidence tier, a detail page with draft reply and source citation, and an admin page to upload source documents and adjust thresholds. Use Tailwind for styling. Feel: clean, white, [BRAND-ADJACENT] accents. No React."
What AIHRPilot Is Not
It is not a replacement for Lattice, Workday, SAP, BambooHR, or any HRIS. It is not a replacement for a general counsel, an employment attorney, or a compliance officer. It is not a "chatbot." It does not handle PII, PHI, or anything that would require SOC 2 Type II compliance without additional hardening.
What it is: a thin intelligence layer that sits on top of your real systems of record and removes the recurring cognitive tax of answering the same questions repeatedly. The value is narrow, deep and immediate. That narrowness is the point. The tools that actually ship and stick are the ones that solve one problem for one team. The tools that die in demo are the ones that try to be platforms.
The question is not
"Can I build this?"
The question is:
"What is the one repeating task in my week that, if I removed it, would free up 10+ hours for higher-leverage work?"
If you can answer that in one sentence, you have a build. If you can't, the first hour of your weekend is answering that question. The next 40-120 are building the thing.
This walkthrough is part of the Portfolio Leverage Co. Build Bench series. For the weekly operating brief, subscribe above. For the cohort where we build these tools together, apply here.