skip to content
Xiaozhe Yao 姚晓哲 · he / him · Zürich

I am a third-year doctoral student at the Systems Group, Department of Computer Science, ETH Zürich, advised by Prof. Dr. Ana Klimović.

My research explores the complex and fundamental tensions between three pillars: from optimizing systems for efficient ML, to improving data quality and organization for ML, to developing frameworks that bridge the gap between algorithms and their practical deployment. Through this multi-faceted approach, my work aims to better understand and build AI systems.

I build scalable systems such as OpenTela, which has been deployed in production, and I am particularly interested in how to scale such systems even further with theoretical understanding. For example, how can we scale the number of models we serve under limited resources?

Prior to ETH Zürich, I gained a Master's degree at the University of Zürich in Data Science, advised by Prof. Dr. Michael Böhlen and Qing Chen. Before that, I completed my Bachelor's study at Shenzhen University in Computer Science, advised by Prof. Dr. Shiqi Yu. I interned at Fundamental AI Research (FAIR) @ Meta in 2025 as a research scientist and the Shenzhen Institute of Advanced Technology in 2016 as a data scientist. Between 2021 and 2022, I worked on project AID as an Innovator Fellow at the Library Lab, ETH Zürich.

If you have feedback on my research, interactions, or anything else, you can write here.

Education
ETH Zürich
Ph.D. Computer Science (ongoing) — Systems Group. Advisor: Prof. Ana Klimović.
Sep 2023 — now
University of Zürich
M.Sc. Data Science (summa cum laude, minor in Informatics) — Advisors: Prof. M. Böhlen, Q. Chen.
Sep 2019 — Apr 2022
Shenzhen University
B.Sc. Computer Science (honour in HPC) — Advisor: Prof. Shiqi Yu.
Sep 2013 — Jun 2017
Selected publications

* equal contribution · α alphabetical ordering · β students I advise

Better Systems
OpenTela: Unifying Decentralized HPC Clusters for Heterogeneous LLM Serving
Xiaozhe Yao , Youhe Jiang , Ilia Badanin , Qinghao Hu , Binhang Yuan , Imanol Schlag , Eiko Yoneki , Ana Klimović
To appear in OSDI · 2026 · code
Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter
Qinghao Hu * , Shang Yang * , Junxian Guo , Xiaozhe Yao , Yujun Lin , Yuxian Gu , Han Cai , Chuang Gan , Ana Klimović , Song Han
ASPLOS · 2026 · papercodeMIT News
Navigating the Cost-Performance Pareto Frontier of Test-Time LLM Agent Adaptation
Konrad Szafer * , Xiaozhe Yao * , Maximilian Böther , Gregor Bachmann , Tiago Pimentel , Ana Klimović
Lifelong Agents @ ICLR · 2026 · paper
DeltaMoE: Memory-Efficient Inference for Merged Mixture of Experts with Delta Compression
Boyko Borisov β , Xiaozhe Yao , Nezihe Merve Gürel , Ana Klimović
SLLM @ ICLR · 2025 · paper
Demystifying Cost-Efficiency in LLM Serving over Heterogeneous GPUs
Youhe Jiang * , Fangcheng Fu * , Xiaozhe Yao * , Guoliang He , Xupeng Miao , Ana Klimović , Bin Cui , Binhang Yuan , Eiko Yoneki
ICML · 2025 · paper
ThunderServe: High-performance and Cost-efficient LLM Serving in Cloud Environments
Youhe Jiang * , Fangcheng Fu * , Xiaozhe Yao * , Taiyi Wang , Bin Cui , Ana Klimović , Eiko Yoneki
MLSys · 2025 · paper
DeltaZip: Multi-Tenant Language Model Serving via Delta Compression
Xiaozhe Yao , Qinghao Hu , Ana Klimović
EuroSys · 2025 · papercode
HexGen: Generative Inference of Foundation Model over Heterogeneous Decentralized Environment
Youhe Jiang * , Ran Yan * , Xiaozhe Yao * , Beidi Chen , Binhang Yuan
ICML · 2024 · paper
GPT-Zip: Deep Compression of Finetuned Large Language Models
Berivan Isik *α , Hermann Kumbong *α , Wanyi Ning *α , Xiaozhe Yao *α , Sanmi Koyejo , Ce Zhang
ES-FoMo @ ICML · 2023 · paper
SHiFT: An Efficient, Flexible Search Engine for Transfer Learning
Cedric Renggli , Xiaozhe Yao , Luka Kolar , Luka Rimanic , Ana Klimović , Ce Zhang
VLDB · 2023 · papercode
MLPM: Machine Learning Package Manager
Xiaozhe Yao
MLOps Workshop @ MLSys · 2020 · code
Better Data
Mixtera: A Data Plane for Foundation Model Training
Maximilian Böther , Xiaozhe Yao , Tolga Kerimoglu , Dan Graur , Viktor Gsteiger , Ana Klimović
SIGMOD · 2026 · paper
Decluttering the data mess in LLM training
Maximilian Böther , Dan-Ovidiu Graur , Xiaozhe Yao , Ana Klimović
HotInfra · 2024 · paper
RedPajama: an Open Dataset for Training Large Language Models
Maurice Weber , Dan Fu , Quentin Anthony , Yonatan Oren , Shane Adams , Anton Alexandrov , Xiaozhong Lyu , Huu Nguyen , Xiaozhe Yao , et al.
NeurIPS D&B — Spotlight · 2024 · papercode
DMLR: Data-centric Machine Learning Research — Past, Present and Future
Luis Oala , Manil Maskey , Lilith Bat-Leah , Alicia Parrish , Nezihe Merve Gürel , Tzu-Sheng Kuo , Yang Liu , Rotem Dror , Danilo Brajovic , Xiaozhe Yao , et al.
DMLR Journal · 2023 · paper
DataPerf: Benchmarks for Data-Centric AI Development
Mark Mazumder , Colby Banbury , Xiaozhe Yao , Bojan Karlaš , William Gaviria Rojas , Sudnya Diamos , Greg Diamos , et al.
NeurIPS D&B · 2023 · papercodeblog
Others (community efforts)
Apertus: Democratizing Open and Compliant LLMs for Global Language Environments
Swiss AI & collaborators
Community effort · 2025 · paperhuggingfacecode
Aurora-M: Open Source Continual Pre-training for Multilingual Language and Code
Aurora-M authors
Community effort · 2024 · paperhuggingface
Industry & research experience
ETH Zürich — Systems Group
Doctoral researcher — Advisor: Prof. Ana Klimović.
2023 — now
FAIR @ Meta
Research Scientist (Intern) — New York, US.
Jun 2025 — Sep 2025
Together AI
Research Consultant
Dec 2022 — Jun 2023
Library Lab, ETH Zürich
Innovation Fellow (project AID)
Jun 2021 — Feb 2022
AICAMP Co., Ltd.
Engineer & Co-Founder — Shenzhen / Hong Kong.
May 2018
Shenzhen University — Computer Vision Lab
Visiting Student
Shenzhen Institute of Advanced Technology, CAS
Data Scientist (Intern)
Mar 2016 — Mar 2017
Talks
From Serving Multiple LLMs to AI Agents
DAP Lab, Columbia University
Jul 2025
DeltaZip — Multi-tenant LLM Serving
Meta AI Org
Jul 2025
Chinese Science Club, Zürich
May 2025
Towards Decentralized LLM Serving
Systems Group, ETH Zürich
Mar 2025
MLSys Singapore
Apr 2024
AIDS Seminar: DeltaZip
Aalto University
Feb 2024
Industry Retreat: DeltaZip
Systems Group, ETH Zürich
Jan 2024
Techno Hour: Learned Cardinality Estimation
SAP
Apr 2022
Tech Talk: AID — Findable, Accessible, Usable AI
ETH Library
Feb 2022
Teaching

Teaching assistant at ETH Zürich, University of Zürich, and Shenzhen University.

Systems for AI Seminar
ETH Zürich — TA
Fall 2025
Large-Scale AI Engineering — Serving Large-Scale ML Models
ETH Zürich — Guest Lecturer
Spring / Fall 2025
Cloud Computing Architecture
ETH Zürich — Head TA
Spring 2025 / 2026
Seminar on Machine Learning Systems
ETH Zürich — TA
Fall 2024
Cloud Computing Architecture
ETH Zürich — TA
Spring 2024
Information Systems for Engineers
ETH Zürich — TA
Fall 2023
Foundations of Data Science
Univ. Zürich — TA
Fall 2021
Informatics II — Data Structures & Algorithms
Univ. Zürich — TA
Spring 2021 / 2022
Informatics I — Introduction to Programming
Univ. Zürich — TA
Fall 2020
Professional English for Computer Science
Shenzhen U. — TA
Spring 2019
Web Programming
Shenzhen U. — TA
2015 / 2016
Data Structures and Algorithms
Shenzhen U. — TA
2014
Student projects

I am fortunate to work with students at all levels on their projects and theses.

Serverless Agentic AI: programming model and execution system design.
Toppings: Delta / LoRA co-serving for LLMs.
Characterising the NVIDIA GH200 Accelerator for LLM applications (co-supervised with Foteini Strati).
Sparse Mixture of Experts Serving via Delta Compression (SLLM Workshop).
Analytical Model for LLM Inference.
LongerLoRA: Scaling Up Low-Rank Adaptors For Longer Context.
Community service
Reviewer
NeurIPS
2025
Reviewer
COLM
2025
Reviewer
SLLM Workshop @ ICLR
2025
Reviewer
Efficient Large Vision Models @ CVPR
2025
Reviewer
Workshop Proposals @ ICLR
2025
Co-organizer & PC Co-chair
DMLR Workshop @ ICLR
2024
Reviewer
DEEM Workshop @ SIGMOD
2023 / 2024
Reviewer
DMLR Journal
Photographer
IAPR TC4 Winter School on Biometrics, Shenzhen
Jan 2019
Open source