Scientific AI Benchmarks

Benchmarks and evaluations for AI systems used in scientific research, biology, medicine, chemistry, and technical discovery workflows.

Zscaler
Zoox
Zoom Workplace
Zoom
Zero-Day
Z.ai
YouTube
Your Algorithm
XREAL
xOT Security

Technician in a genomics laboratory operating DNA sequencing equipment

AI
Health Tech
Science Tech

OpenAI GeneBench-Pro Shows Scientific AI Agents Still Need Supervision

OpenAI’s GeneBench-Pro benchmark tests whether AI agents can make messy judgment calls in genomics and translational biology. GPT-5.6 Sol leads the field, but a 31.5% top score shows scientific AI still needs expert supervision before it can be trusted with consequential research decisions.

July 2, 2026