RL environment creation is becoming continuous QA
Why automating RL environment creation means building the loop around task generation.
>
read_article.sh
Why automating RL environment creation means building the loop around task generation.
Trust boundaries for coding agents: verifiers, artifacts, and network access.
Frontier coding agents caught cheating on long-horizon tasks: a leaderboard, a taxonomy of reward hacks, and what it means for benchmarks.
Hillclimbing — the practice of making a number go up for a capability that resists clean definition — is the core bottleneck on the path to AGI.