Guray Ozen

Guray Ozen is a Principal Compiler Engineer on the Machine Learning Compiler team at NVIDIA.

Current Projects

Guray currently works on programming models based on MLIR compiler:

CuTeDSL (CUTLASS Python DSL): A peak-performance DSL for composable, high-performance GPU kernels, tightly integrated with CUTLASS.
Python DSL Infra for MLIR: I designed and built that powers CuTeDSL. While CuTeDSL is the public face, the DSL Infra is the underlying framework — used internally as the frontend layer for several MLIR-based compiler projects.

At its core, it is a multi-stage programming system embedded in Python, targeting MLIR. This design draws on classical work in multi-stage programming, partial evaluation, and phase-driven typing, but is shaped by a single pragmatic goal: writing fast GPU kernels. Python’s expressiveness is exactly what you want when constructing a kernel — and exactly what you don’t want when executing one. MLIR Python DSL Infra gives you all of it at compile time, then discards it before a single instruction reaches the GPU.

cuda TILE Compiler: Focused on productivity and portability, targeting kernel composition and performance tuning across ML and HPC workloads.

Research Interest

His current focus centers on optimizing compilers and programming languages for GPU utilization in machine learning (ML) and high-performance computing (HPC). He has made key contributions to several production-grade compilers, including Clang, Flang, MLIR, IREE, and NVIDIA HPC (formerly PGI).

Previously, he was actively involved in language design for parallel programming models, such as OpenMP and OpenACC. He served as a voting member of the OpenMP Language Committee for NVIDIA and contributed extensively to the OpenACC language specification.

News

Dec 19, 2025	TileIR’s cuda-tile MLIR Dialect is now open source
Dec 01, 2025	TileIR compiler is released with Cuda 13.1
Oct 28, 2025	Talk on CuTeDSL, CUTLASS Python DSL Infrastructure at LLVM’25
Aug 18, 2025	Now an official maintainer of the NVVM and NVGPU dialects in MLIR! See all maintainers
May 13, 2025	CuTeDSL is released with Cutlass 4.0

Work Experience

Compiler Engineer, ML Compilers, NVIDIA, Switzerland (2024 - Present)
Compiler Research Engineer, ML Compiler Systems Research, Google Research (DeepMind), Switzerland (2022 - 2024)
Compiler Engineer, NVIDIA HPC Compilers, NVIDIA, Germany (2018 - 2022)
Compiler Research Intern, NVIDIA, USA (2017)
Short-term Researcher, Advanced Compilers Group, IBM T.J. Watson Research Center, Yorktown Heights, USA (2016)
Research Assistant, Barcelona Supercomputing Center, Spain (2013 - 2017)
Software Engineer, Veripark, Akbank, Istanbul (2010 - 2012)

Education

PhD in Computer Architecture (Excellent Cum Laude), Universitat Politècnica de Catalunya (UPC), Spain (2018)
MSc High-Performance Computing, Universitat Politècnica de Catalunya (UPC), Spain (2014)
BSc Computer Science Engineering, Dokuz Eylul University, Turkey (2010)

Selected talks and publications

2025

LLVM

CuTeDSL, CUTLASS Python DSL Infrastructure

Guray Ozen

2025

Bib Video Slides

EuroLLVM

Bringing NVIDIA Blackwell support to LLVM and MLIR

Guray Ozen

2025

Bib Video Slides

2024

ICML

NVDSL: Simplifying Tensor Cores with Python-Driven MLIR Metaprogramming

Guray Ozen

In ESFOMO Workshop at ICML, 2024

Bib PDF

@inproceedings{ozen2024nvdsl,
  title = {NVDSL: Simplifying Tensor Cores with Python-Driven MLIR Metaprogramming},
  author = {Ozen, Guray},
  booktitle = {ESFOMO Workshop at ICML},
  year = {2024},
}

EuroLLVM

Zero to Hero: Programming Nvidia Hopper Tensor Core with MLIR’s NVGPU Dialect

Guray Ozen

2024

Bib Video Slides