Guray Ozen

Compiler Research at NVIDIA

gry3.jpg

Guray Ozen is a Principal Compiler Engineer on the Machine Learning Compiler team at NVIDIA.

Current Projects

Guray currently works on programming models based on MLIR compiler:

  • CuTeDSL (CUTLASS Python DSL): A peak-performance DSL for composable, high-performance GPU kernels, tightly integrated with CUTLASS.
  • Python DSL Infra for MLIR: I designed and built that powers CuTeDSL. While CuTeDSL is the public face, the DSL Infra is the underlying framework — used internally as the frontend layer for several MLIR-based compiler projects.

At its core, it is a multi-stage programming system embedded in Python, targeting MLIR. It cleanly separates two phases:

  • Meta phase — full Python (dynamic typing, classes, metaprogramming) runs at compile time to configure and generate kernel code.
  • Stage phase — only computation and control flow are captured as statically-typed IR for the GPU.

This design draws on classical work in multi-stage programming, partial evaluation, and phase-driven typing, but is shaped by a single pragmatic goal: writing fast GPU kernels. Python’s expressiveness is exactly what you want when constructing a kernel — and exactly what you don’t want when executing one. MLIR Python DSL Infra gives you all of it at compile time, then discards it before a single instruction reaches the GPU.

  • cuda TILE Compiler: Focused on productivity and portability, targeting kernel composition and performance tuning across ML and HPC workloads.

Research Interest

His current focus centers on optimizing compilers and programming languages for GPU utilization in machine learning (ML) and high-performance computing (HPC). He has made key contributions to several production-grade compilers, including Clang, Flang, MLIR, IREE, and NVIDIA HPC (formerly PGI).

Previously, he was actively involved in language design for parallel programming models, such as OpenMP and OpenACC. He served as a voting member of the OpenMP Language Committee for NVIDIA and contributed extensively to the OpenACC language specification.

News

Dec 19, 2025 GitHub TileIR’s cuda-tile MLIR Dialect is now open source
Dec 01, 2025 NVIDIA TileIR compiler is released with Cuda 13.1
Oct 28, 2025 MLIR Talk on CuTeDSL, CUTLASS Python DSL Infrastructure at LLVM’25
Aug 18, 2025 MLIR Now an official maintainer of the NVVM and NVGPU dialects in MLIR! See all maintainers
May 13, 2025 NVIDIA CuTeDSL is released with Cutlass 4.0

Work Experience

  • Compiler Engineer, ML Compilers, NVIDIA, Switzerland (2024 - Present)
  • Compiler Research Engineer, ML Compiler Systems Research, Google Research (DeepMind), Switzerland (2022 - 2024)
  • Compiler Engineer, NVIDIA HPC Compilers, NVIDIA, Germany (2018 - 2022)
  • Compiler Research Intern, NVIDIA, USA (2017)
  • Short-term Researcher, Advanced Compilers Group, IBM T.J. Watson Research Center, Yorktown Heights, USA (2016)
  • Research Assistant, Barcelona Supercomputing Center, Spain (2013 - 2017)
  • Software Engineer, Veripark, Akbank, Istanbul (2010 - 2012)

Education

  • PhD in Computer Architecture (Excellent Cum Laude), Universitat Politècnica de Catalunya (UPC), Spain (2018)
  • MSc High-Performance Computing, Universitat Politècnica de Catalunya (UPC), Spain (2014)
  • BSc Computer Science Engineering, Dokuz Eylul University, Turkey (2010)

Selected talks and publications

2025

  1. LLVM
    llvm.png
    CuTeDSL, CUTLASS Python DSL Infrastructure
    Guray Ozen
    2025
  2. EuroLLVM
    llvm.png
    Bringing NVIDIA Blackwell support to LLVM and MLIR
    Guray Ozen
    2025

2024

  1. ICML
    icml.png
    NVDSL: Simplifying Tensor Cores with Python-Driven MLIR Metaprogramming
    Guray Ozen
    In ESFOMO Workshop at ICML, 2024
  2. EuroLLVM
    llvm.png
    Zero to Hero: Programming Nvidia Hopper Tensor Core with MLIR’s NVGPU Dialect
    Guray Ozen
    2024