Problem Architecture Benchmarks Leaderboard Get Early Access
Private Beta

Verified Qiskit.
Not guesses.

GrayGate runs your quantum code through simulation before you see it. If it doesn't pass, you don't get broken output.

View Leaderboard
73.8%
Pass Rate
21
Requests
1
Beta Users
10k
Tokens
Graygate
THE PROBLEM

Quantum code is hard to trust

When your circuit compiles but produces garbage distributions, you've already lost an afternoon. Current AI tools make this worse, not better.

Bugs that run

A wrong gate doesn't throw an error. It runs, simulates, and gives you counts that look plausible until you realize they're nonsense. Debugging quantum logic is slow because the feedback loop is broken.

Stale training data

Qiskit 1.0 broke half the tutorials online. LLMs trained on 2021 examples still suggest execute() instead of run(). The API moves faster than model weights update.

No execution check

ChatGPT predicts the next likely token. It doesn't run your circuit. It doesn't know if the output compiles, let alone if the simulation produces valid Bell state correlations.

THE ARCHITECTURE

A 10-stage reliability pipeline

GrayGate wraps code generation in retrieval, planning, and two verification gates. Code only ships if simulation passes.

Pipeline Flow

1
Input Normalization → UserRequest
2
Intent Parsing → IntentSpec
RETRY LOOP (Parts 3-9)
3A
Query Planning → RetrievalPlan
3B
Retrieval → ContextBundle
4
Planning → ExecutionPlan
5
Code Generation → GeneratedOutput
6 Static Gate
7 Runtime Gate
8 Failure Analysis 9 Retry Decision
10
Output Assembly → FinalOutput + Report
✓ VERIFIED

Example Output

bell_state.py ✓ Passed
from qiskit import QuantumCircuit
from qiskit_aer import AerSimulator

qc = QuantumCircuit(2)
qc.h(0)
qc.cx(0, 1)
qc.measure_all()

sim = AerSimulator()
result = sim.run(qc).result()
counts = result.get_counts()

Verification Report

Static Gate Pass
Runtime Gate Pass
Acceptance Test Matched
Counts: {'00': 512, '11': 512}

Key insight: The runtime gate executes on Qiskit Aer and checks that output matches the acceptance test defined during planning. Wrong distributions = no output.

BENCHMARKS

Qiskit-HumanEval-Hard

151 challenging quantum programming tasks. GrayGate uses Gemini 3.0 Flash as its base model, then wraps it in verification. The wrapper more than doubles the pass rate.

Pass Rate Comparison

GrayGate (w/ Gemini 3.0 Flash) 73.8%
141 / 151
Gemini 3.0 Pro Preview 51.66%
78 / 151
Gemini 3.0 Flash 46.36%
70 / 151
Kimi K2-Thinking 33.11%
50 / 151
DeepSeek Reasoner 27.15%
41 / 151

Development Status

Active development

GrayGate improves weekly. Architecture and retrieval systems are under constant iteration.

Fine-tuning pipeline

Building infrastructure to train Qiskit-specialized models. Current results use off-the-shelf Gemini.

Autonomous research

Long-term: autonomous quantum algorithm research and evaluation systems.

These benchmarks reflect current state. We're transparent about what works and what we're building.

Same base model. 2× the results.

The verification loop is the difference.

TEAM

Founder

Building tools for the next decade of quantum development.

Wyatt Greene

Wyatt Greene

Founder

Building the verification engine and developer experience.

Longmont, CO

Design Partners

Looking for labs, startups, and educators to validate workflows.