Old way (verbose, error-prone):
import cuda @cuda.kernel def vec_add(a, b, c): idx = cuda.thread_idx.x + cuda.block_idx.x * cuda.block_dim.x if idx < a.size: c[idx] = a[idx] + b[idx] vec_add[blocks, threads](a, b, c) cuda release news
Instead of writing raw PTX or using third-party wrappers, you can now write: Old way (verbose, error-prone): import cuda @cuda
NVIDIA just cut the official release of – and while hardware gets the headlines (looking at you, next-gen Rubin GPUs), this software update might be the real performance unlock for the next 18 months. The biggest quality-of-life shift: cuda
April 14, 2026 Reading time: 4 min
CUDA 13 Drops: Hopper Tuning, Python First-Class, and a Smarter Unified Memory Subtitle: What you need to know about NVIDIA’s biggest software leap since Ampere.
If you’re building HPC simulations, training LLMs, or optimizing edge inference, here’s what changed, what broke (sorry, legacy Kepler devs), and what to benchmark first. The biggest quality-of-life shift: cuda.compile and cuda.execute are now built into the core driver API.