# Build vs Buy: The Real Cost of Building Your Own Adaptive Assessment Engine
When a product team decides to add adaptive assessment to their platform, the first question is build vs. buy. Engineering leads typically estimate 4-6 months of development for a "basic CAT engine." That estimate is technically correct and economically catastrophic.
What Engineering Teams Underestimate
A Computerized Adaptive Testing (CAT) engine is not a recommendation algorithm with questions. It is a psychometric measurement system that must meet statistical standards developed over 50 years of educational and psychological measurement research. The engineering is the easy part. The psychometrics are where build projects fail.
**Component 1: IRT Parameter Estimation.** Every item in the assessment must have calibrated IRT parameters (typically a 3-parameter logistic model: difficulty, discrimination, and guessing). Calibrating a single item requires administering it to a minimum of 200 examinees and fitting the IRT model using marginal maximum likelihood estimation. For a 500-item bank, this means collecting 100,000 response records before the engine can operate adaptively.
Engineering estimate: "We'll use a simple Rasch model." Reality: Rasch (1PL) assumes all items have equal discrimination, which is empirically false for most content domains. Using it produces systematically biased ability estimates and unreliable pass/fail decisions.
**Component 2: Item Selection Algorithm.** The engine must select items that maximize information at the current ability estimate while satisfying constraints: content coverage, item exposure control, enemy item exclusion, and response time management. This is a constrained optimization problem, not a simple "pick the next hardest item" rule.
Engineering estimate: "Maximum Fisher information, one line of code." Reality: Maximum information without exposure control produces item banks where 10% of items handle 90% of test sessions. Those items become known within weeks. Shadow test assembly or the Sympson-Hetter method adds 4-8 weeks of development and requires ongoing calibration.
**Component 3: Ability Estimation.** The engine must estimate the examinee's ability after each response using either maximum likelihood estimation (MLE) or expected a posteriori (EAP) estimation. Both require numerical integration and have edge cases: MLE is undefined for all-correct or all-incorrect response strings. EAP requires a prior distribution that must be empirically justified.
**Component 4: Stopping Rules.** When has the engine collected enough information to make a reliable classification? The stopping rule must balance precision (don't stop too early) against efficiency (don't administer unnecessary items). Standard error thresholds, sequential probability ratio tests (SPRT), and minimum/maximum item constraints all interact.
**Component 5: Scoring and Reporting.** Raw theta scores must be transformed to interpretable scales, mapped to proficiency levels, and reported with confidence intervals. Score reports must include domain-level sub-scores (which require separate estimation procedures) and must be comparable across test forms and time points.
The 3-Year Total Cost of Build
| Component | Year 1 | Year 2 | Year 3 | |---|---|---|---| | IRT engine development | $240K | - | - | | Item bank development (500 items) | $180K | $60K | $60K | | Psychometric calibration | $120K | $40K | $40K | | Item exposure monitoring | $35K | $20K | $20K | | DIF analysis and fairness review | $45K | $25K | $25K | | Ongoing validation studies | - | $60K | $60K | | Psychometrician staff (0.5 FTE) | $75K | $75K | $75K | | Infrastructure and scaling | $40K | $25K | $25K | | **Total** | **$735K** | **$305K** | **$305K** | | **Cumulative** | **$735K** | **$1.04M** | **$1.345M** |
These figures assume a mid-market engineering cost base. For Bay Area teams, multiply by 1.4-1.8x.
What Buy Provides
A production-grade adaptive assessment engine provides:
The typical license cost for an enterprise adaptive testing API: $80,000-$180,000/year depending on volume, which over 3 years totals $240,000-$540,000 — roughly 20-40% of the build cost.
When Build Makes Sense
Build is justified only when:
For everyone else — platforms adding assessment as a feature, EdTech companies that need reliable measurement, enterprise L&D teams — the build decision destroys capital that should be spent on the product's actual differentiator.
**QLM's adaptive assessment API provides the full psychometric stack — IRT engine, item banking, adaptive delivery, and scoring — as an embeddable service.** Learn more at [quantumlearningmachines.com](https://quantumlearningmachines.com).