Open enrollment hands an ordinary person a genuinely hard problem: dozens of plan combinations, a year of their own health they cannot yet see, and a vocabulary most insured adults cannot define. Then it gives them a deadline.
There is an objection worth granting up front, because it sounds right: people barely try. In Voya's 2024 survey, 91 percent of working Americans said they typically re-select last year's plan, and 49 percent spend under twenty minutes on the whole review, less time than many people give a restaurant booking. So maybe the fix is effort, or a sterner reminder from HR. I believed a version of that until I read the choice studies, which tested effort and menu length directly and found that neither moves the result much. Comprehension moves it. Plan choice is an optimization with more variables than a person can hold in their head, and we hand it, without tools, to the one party who has never seen the math. This paper walks through what that costs, who pays it, why the popular fixes backfire, and what a system built for the job has to look like.
A pay cut you can sign up for
The cleanest evidence in benefits research comes from one large US firm that let its 23,894 employees compose their own coverage: pick a deductible, pick a coinsurance rate, pick a copay tier, pick an out-of-pocket maximum. Forty-eight combinations. Every combination carried the same insurer, the same network, the same covered services. The only thing an employee could change was the financial plumbing, which gave the menu a property almost no real menu has: every choice could be scored objectively against every other choice, for every possible amount of care. Bhargava, Loewenstein and Sydnor published the score in the Quarterly Journal of Economics as Choose to Lose.
The pricing did something strange. Cutting the deductible from $1,000 to $750 cost an average of $528 in extra annual premium. The most that $250 of extra deductible coverage can ever return is $250. Anyone buying that upgrade accepted a loss of at least $278 a year before seeing a single doctor, and the rest of the low-deductible menu was priced the same way. In the language of economics, nearly every low-deductible plan was dominated: more expensive at every possible level of health spending, sick or healthy, lucky or not.
61 percent of employees picked a dominated plan anyway. Adjust for the tax treatment of premiums and the figure is still 55 percent. The average employee in a dominated plan overpaid by $372 a year, which is 24 percent of the premium they chose and about 2 percent of salary. Picture a 2 percent pay cut, self-renewing every autumn, exchanged for nothing.
And it ran downhill. Employees in the firm's three lowest salary bands, two-thirds of the workforce, chose dominated plans at 63 percent, against 38 percent for everyone above them. In the full regression, an employee earning under $20,000 was 24 percentage points more likely to choose a dominated plan than one earning over $100,000, all else equal, and because the dollar loss was similar up and down the pay scale, the loss as a share of salary was steeply regressive. The next year, lower earners were also less likely to switch out: 20 percent switched plans, against 28 percent of higher earners. The most regressive tax in benefits is collected through a dropdown menu.
Effort was never the variable
Three explanations sound plausible, and the authors tested all three. Maybe 48 options is too many. Maybe people will not spend the time. Maybe some people knowingly pay extra to escape a big deductible, because peace of mind is worth something. So the authors ran experiments with the menu cut to four plans in a single table, differing only in deductible and premium, where search cost was close to zero. 66 percent of subjects still chose a dominated plan, a rate above the employees' own. Raising the stakes did not repair it. And the field choices were too internally inconsistent to read as a taste for protection: many employees paid heavily to trim the deductible while accepting cost-sharing elsewhere that handed the exposure right back.
What did predict the error was comprehension, measured directly. Among experiment participants who scored high on a basic insurance literacy test, 22 percent chose dominated plans; among those who scored low, 45 percent. Score people instead on whether they understood a plan well enough to estimate what care would cost under it, and the gap widens to 8 percent against 47. Among participants who scored high on every measure, dominated choice nearly vanished. Dominated choice behaves like a reading problem: it disappears the moment the reading does.
Now look at how the population reads. When Loewenstein's group quizzed insured adults on the four input variables of every plan comparison, 78 percent understood a deductible, 72 percent a copay, 55 percent an out-of-pocket maximum, and 34 percent coinsurance. Fourteen percent understood all four. Asked to compute the cost of a four-day hospital stay under a simple plan, 11 percent got it right, and the misses ran to thousands of dollars. The overconfidence is the sharp edge: 57 percent said they understood coinsurance before 34 percent demonstrated it. KFF found the same shape in a national sample: 4 percent answered all ten of its basic insurance questions, and 16 percent could compute an out-of-network lab bill that required applying a coinsurance rate. These are the variables of the optimization. Most of the people asked to optimize cannot read them, and do not know they cannot.
Not one firm, and it does not heal
One firm could be one badly priced menu. It is not. Liu and Sydnor took national data on what employers actually offer and found that where a firm pairs a high-deductible plan with a lower-deductible plan, the high-deductible option carries lower maximum possible spending 62 percent of the time, and strictly dominates its sibling at roughly half of firms, with typical savings above $500 a year. Sinaiko and Hirth found a University of Michigan plan that was dominated outright; a third of covered workers sat in it. The trap in Choose to Lose is the normal architecture of American employer menus, which is why we gave it a paper of its own.
The natural rebuttal is that people learn. Medicare Part D is the longest-running test of that hope, and it failed. Abaluck and Gruber found that only about 12 percent of seniors chose the plan that minimized their drug costs, that the average enrollee could have cut total spending by about 30 percent, and that welfare would have been roughly 27 percent higher under fully rational choice. Their follow-up tracked the same market through 2010 and found foregone savings growing, with little learning detectable at the individual or cohort level. Honesty requires the footnote: Ketcham, Kuminoff and Powers published a formal Comment disputing the welfare framing, arguing most choices can be reconciled with consumer theory under full information, and Abaluck and Gruber replied. We rest nothing on the contested welfare magnitudes. The descriptive facts, few enrollees minimize cost and the gap did not close with experience, survived the exchange.
Meanwhile the wrong choice, once made, hardens. Handel measured inertia at a large employer and found workers behaving as if forgoing $2,032 a year to avoid switching plans. Ericson showed insurers pricing against exactly that: Part D carriers enter markets cheap and ratchet premiums on locked-in enrollees, so an older plan runs about 10 percent pricier than an identical newer one. Set those findings beside Voya's 91 percent re-selection rate and the picture closes. A wrong choice made once stings. Re-stamped unexamined every autumn, it compounds into an annuity paid out against the employee, and the supply side prices the annuity in.
Why the obvious fixes point at each other
By this point two fixes look obvious: shorten the menu, and nudge people out of their defaults. Both have been tested, and the results should slow down anyone selling either one alone. Shortening the menu was the four-plan experiment: 66 percent still chose dominated plans, because the comparison itself, at any menu length, exceeded the reader. Nudging is worse. Handel modeled a policy that eliminates three-quarters of measured inertia. Individual choices improve, exactly as intended. Then the healthy re-sort themselves into cheaper plans, the sick are left concentrated in expensive ones, and adverse selection worsens enough to roughly double the existing 8.2 percent welfare loss in his setting. The inertia everyone wants to cure was accidentally holding the risk pool together.
Take both findings seriously and the space of honest fixes gets narrow. A blanket nudge moves everyone the same direction and unravels the pool. A simplified menu still asks a person to run a comparison they cannot read. What survives is harder and more specific: solve the comparison for one person at a time, correctly, while watching what the solutions do to the pool in aggregate. That is an engineering claim, and it is the claim this paper exists to make.
This is a job for a model, and we can say why
Choosing a plan is a per-person decision, made under uncertainty, repeated every year, with a measurable outcome. In machine learning that shape has a name: it is close to a contextual bandit. Each recommendation is conditioned on a context vector for one specific person, their family, their conditions, their budget, their tolerance for downside, and the system learns from how each recommended choice turns out. The same problem shape drives the recommendations people already trust for trivial decisions. Here it is pointed at one that costs real money.
Be precise about where the model earns its keep, because part of this job needs no model at all. Screening out dominated plans is arithmetic; a checklist does it, and any enrollment tool that fails to is malpractice. The model matters on the plans that survive the screen, where the right answer genuinely depends on the life: a family expecting a second child, a managed chronic condition, a thin emergency fund that makes a high deductible a different proposition than it is for a saver. The key shift is the objective. A generic benefits wizard optimizes for getting you to the end of the form, and it succeeds; the form gets finished, the dominated plan gets chosen, everyone moves on. The right objective is to minimize a person's realized cost across the whole year, including the care they will probably need and the tail risk they cannot afford to carry. Optimize the year, not the click.
The recommender that grades itself on real outcomes
Here is the part that compounds, and the reason this sits in a research paper rather than a product brochure. Every recommendation produces an outcome: a plan was chosen, the year happened, the costs arrived. Those outcomes are labels. Labeled outcomes train a better recommender, which produces better outcomes, which produce better labels. The loop closes on itself, and unlike engagement metrics, it cannot be gamed without getting caught by reality, because the label is what the year actually cost.
a16z has argued, correctly in our view, that raw data volume makes a weak moat; rows are a commodity and scale effects flatten quickly. The asset that is hard to replicate is outcome-labeled benefits data: the record of what was chosen, what it actually cost, and whether it held when someone got sick. We will say plainly that this is an argument we hold rather than a settled finding, and we intend to publish the evidence as our own numbers accumulate.
Where this can go wrong, and what would change our mind
Personalization can overfit to noisy signals, and it can quietly encode bias against the very people Choose to Lose found erring most. So a recommendation a person cannot understand is a recommendation we will not ship. Every suggestion has to be explainable to the human receiving it and checkable against the plan documents themselves, and the person who disagrees keeps the last word.
Three findings would make us retract this paper's framing, and we are watching for all three. If recommendations tuned to individual context cannot beat the dumb rule, screen out dominated plans and take the cheapest survivor, then the optimization story is overbuilt and a checklist would have done. If the error rate in our own system tracks income the way dominated choices did, we will have automated the original injustice instead of repairing it. And if our recommendations re-sort risk the way Handel's modeled nudge did, improving each choice while degrading the pool that prices everyone's coverage, the welfare claim fails even where the individual claim holds. Those are the three numbers we intend to publish, whichever way they come out.
An enrollment engine that designs the plan around one life, and gets better every time it learns how the year turned out.
This sits directly on top of the rest of our research, so the parts connect cleanly. The relay paper showed why no one can see the whole. The no-price paper showed how to hold a person's plan, needs, and real costs in one place. Personalized enrollment is what you do once you can: stop handing people a menu and start designing the choice around their life. Here is how it assembles.
Screen the menu before anyone chooses
Dominated options are arithmetic, and at roughly half of US firms they are sitting on the menu. Keel flags them first, so no member ever starts from a trapped choice set.
Build the context from a real life
Family, conditions, budget, risk tolerance, and likely care become a structured per-person context, drawn from the benefits graph, never a guess from a generic profile.
Optimize the year, not the click
Every surviving plan is scored against the person's predicted annual cost and downside risk, instead of nudging them to finish a form. The objective is their outcome, full stop.
Close the outcome loop →
What was chosen, what it cost, and whether it protected the member becomes labeled data. That data trains the next recommendation. The system grades itself on reality, not on engagement.
Fathom and Amanda put the answer in reach →
Fathom makes the recommendation, grounded in the plan documents, and explains in plain terms why this plan beats the default. Amanda walks one person through it at the moment they decide, and recovers the $372 they were about to overspend.
The mistake in Choose to Lose was a hard optimization handed to the one party least equipped to solve it. We are moving that work to a system built for it, publishing the three numbers that would falsify us, and keeping the person in charge of the answer.