The hard part of benefits is not the chatbot. It is reading contradictory documents correctly, proving it, and doing it cheaply enough to offer for free. This is what the day actually looks like.
Async by default, deep work protected, meetings to decide and not to update. We hire for judgment, then get out of the way.
Every answer has a cost. We treat tokens like a budget: prompt caching, context pruning, and a hard eye on cost per resolved question, because the answer has to be free to the member.
No single model for everything. We route each task to the cheapest model that can answer it and escalate to the frontier only when the question earns it. The router is its own research problem.
We fine-tune and post-train on real plan documents and real questions, graded by licensed experts. A general model does not know what a qualifying life event is, or which clause governs.
We push inference to the cheapest place it can run correctly and watch p50 and p95 like vital signs. A right answer that arrives too late is a wrong answer to the person waiting.
Every release clears the eval suite first. We would rather hold a launch than ship a confident wrong answer on someone's coverage. The bar is the whole job.
Each role owns one. Earlier-career or senior, if you can do the work, we want to talk.
No trivia. Real benefits, the way they arrive: priced to confuse, written to contradict, easy to get wrong. Work each one. You only see the next after you clear the last.
You priced a plan that can only lose, caught an answer the document never made, refused to guess when the sources fought each other, and then put your own reasoning on the line in writing. The last part is the one we cannot teach. We are strict about it for a reason: Show your work, or do not ship. Your answer is in the email draft we just opened. Hit send.
If you can do the work on this page, write us. Tell us what you would build.