Home Benchmarks Get Early Access
Live Results

Qiskit-HumanEval-Hard
Leaderboard

151 challenging quantum programming tasks testing code generation, correctness, and execution. Models are evaluated on their ability to generate valid, runnable Qiskit code that passes simulation verification.

0
Models Tested
151
Total Tasks
Qiskit 2.3
Version
Zero
Temperature
LEADERBOARD

Model Rankings

No benchmark data available

Check back later for updated results.

ERROR ANALYSIS

Failure Modes

Understanding where models fail helps identify areas for improvement in quantum code generation.