Now in public beta·Generate, validate, and gate production-ready tests — without writing a single line.Book a demo
Blog/Engineering

deepEval Quality Gates: How Spectr Prevents Bad Tests from Shipping

A behind-the-scenes look at how Spectr's eval pipeline scores AI-generated tests across five metrics before they ever reach your CI pipeline.

SE
Spectr Engineering
May 2026
ENGINEERING

Every test Spectr generates is scored on five metrics before it ships. If the composite score falls below the threshold, the test is flagged, not silently merged into your suite.

The Five Metrics

Faithfulness, Coverage, Correctness, AssertionDensity, and AntiPattern avoidance each capture a different dimension of test quality.

  • Faithfulness — does the test actually test what the user story describes?
  • Coverage — what proportion of the described behaviour has at least one assertion?
  • Correctness — are the assertions logically sound given the code under test?
  • AssertionDensity — is there enough verification per line of test code?
  • AntiPattern — no magic sleeps, no hardcoded selectors, no empty catch blocks

Gate Levels

PASS (≥75) ships immediately. WARN (55–74) ships with a developer notification and is flagged in the PR comment. FAIL (<55) blocks the release gate until the test is revised or manually overridden.

Why This Matters

AI-generated tests can be syntactically correct and still provide zero quality signal. A test that asserts true === true will pass every time and protect nothing. The eval pipeline exists specifically to catch this class of silent failure.

Spectr
AI Testing Cloud
AI TESTING
AI Operations
Multi-LLM test generation
Test Generator
Create & run test cases
Report Analyzer
AI root cause clustering
INTELLIGENCE
JARVIS AI
Testing assistant & chat
Observability
LangSmith agent traces
INTEGRATIONS
Jira Automation
AI sprint planning & tickets
Device Matrix
Browser & device testing
SDK & API
Reference & integrations
Settings
API keys & trial usage
Sign In
Access your workspace