Guray Ozen
Compiler Research at NVIDIA
Guray Ozen is a Principal Compiler Engineer on the Machine Learning Compiler team at NVIDIA.
Current Projects
Guray currently works on programming models based on MLIR compiler:
- CuTeDSL (CUTLASS Python DSL): A peak-performance DSL for composable, high-performance GPU kernels, tightly integrated with CUTLASS.
- Python DSL Infra for MLIR: I designed and built that powers CuTeDSL. While CuTeDSL is the public face, the DSL Infra is the underlying framework — used internally as the frontend layer for several MLIR-based compiler projects.
At its core, it is a multi-stage programming system embedded in Python, targeting MLIR. It cleanly separates two phases:
- Meta phase — full Python (dynamic typing, classes, metaprogramming) runs at compile time to configure and generate kernel code.
- Stage phase — only computation and control flow are captured as statically-typed IR for the GPU.
This design draws on classical work in multi-stage programming, partial evaluation, and phase-driven typing, but is shaped by a single pragmatic goal: writing fast GPU kernels. Python’s expressiveness is exactly what you want when constructing a kernel — and exactly what you don’t want when executing one. MLIR Python DSL Infra gives you all of it at compile time, then discards it before a single instruction reaches the GPU.
- cuda TILE Compiler: Focused on productivity and portability, targeting kernel composition and performance tuning across ML and HPC workloads.
Research Interest
His current focus centers on optimizing compilers and programming languages for GPU utilization in machine learning (ML) and high-performance computing (HPC). He has made key contributions to several production-grade compilers, including Clang, Flang, MLIR, IREE, and NVIDIA HPC (formerly PGI).
Previously, he was actively involved in language design for parallel programming models, such as OpenMP and OpenACC. He served as a voting member of the OpenMP Language Committee for NVIDIA and contributed extensively to the OpenACC language specification.
News
| Dec 19, 2025 | TileIR’s cuda-tile MLIR Dialect is now open source |
|---|---|
| Dec 01, 2025 | TileIR compiler is released with Cuda 13.1 |
| Oct 28, 2025 | Talk on CuTeDSL, CUTLASS Python DSL Infrastructure at LLVM’25 |
| Aug 18, 2025 | Now an official maintainer of the NVVM and NVGPU dialects in MLIR! See all maintainers |
| May 13, 2025 | CuTeDSL is released with Cutlass 4.0 |
Work Experience
- Compiler Engineer, ML Compilers, NVIDIA, Switzerland (2024 - Present)
- Compiler Research Engineer, ML Compiler Systems Research, Google Research (DeepMind), Switzerland (2022 - 2024)
- Compiler Engineer, NVIDIA HPC Compilers, NVIDIA, Germany (2018 - 2022)
- Compiler Research Intern, NVIDIA, USA (2017)
- Short-term Researcher, Advanced Compilers Group, IBM T.J. Watson Research Center, Yorktown Heights, USA (2016)
- Research Assistant, Barcelona Supercomputing Center, Spain (2013 - 2017)
- Software Engineer, Veripark, Akbank, Istanbul (2010 - 2012)
Education
- PhD in Computer Architecture (Excellent Cum Laude), Universitat Politècnica de Catalunya (UPC), Spain (2018)
- MSc High-Performance Computing, Universitat Politècnica de Catalunya (UPC), Spain (2014)
- BSc Computer Science Engineering, Dokuz Eylul University, Turkey (2010)
Talk on