Qitian Wu

Bio

I am currently a postdoctoral fellow of Eric and Wendy Schmidt Center at Broad Institute of MIT and Harvard, working with Caroline Uhler. Prior to this, I finished the PhD in Computer Science from Shanghai Jiao Tong University (SJTU), supervised by Junchi Yan and worked with David Wipf, Hongyuan Zha and Michael Bronstein. Before that, I achieved the Bachelor (Microelectronics, minor in Mathematics) and Master (Computer Science) degrees from SJTU, and worked as research intern at Tencent WeChat, Amazon Web Service and BioMap AI Lab.

My general research interest revolves around scalable and generalizable machine learning. On the methodology side, I am currently focusing on empowering foundation models with multi-modal reasoning capabilities, scaling up computational backbones (e.g., Transformers) to large-scale data, and enhancing the generalization and reliability of AI systems. On the applidation side, I explore applying these methods to critical challenges in a broad range of real applications, such as scientific discovery and recommender systems.

I am the recipients of Microsoft Research PhD Fellowship, Baidu PhD Fellowship and Rising Star in Artiﬁcial Intelligence.

Research Summary

My current research aims at improving and broadening the capabilities of AI models, especially in terms of the scalability and generalizablity, via developing theoretically principled and practically useful methodology that sheds insights on ML algorithmic designs and facilitates problem solving in real applications.

For scalability, our works explore new Transformer architectures that scale up global attention to large interconnected data. The first model NodeFormer [in NeurIPS'22] introduces a pioneering Transformer for large graphs that reduces the quadratic complexity to linearity. The follow-up model SGFormer [in NeurIPS'23] adopts a simplified single-layer attention that achieves linear complexity without any approximation. In its extended version we supplement theoretical understandings on the model design. Beyond computational efficiency, we investigate how to understand the inherent mechanism of neural architectures. To this end, our work DIFFormer [in ICLR'23] derives a scalable attention model inspired by physical process, i.e., diffusion equations with energy constraint. In its extended version [in JMLR] we present more in-depth discussion on how energy-constrained diffusion can serve as a unified framework for different architectures (MLP, GNNs and Transformers). Along this path, our recent work AdvDIFFormer [in ICML'25] further extends the model from advective diffusion equations that endows Transformers with inherent generalization power.
For generalizability, our works endeavor to understand the generalization limits of neural networks under distribution shifts. On one side, we study the challenging out-of-distribution generalization problem. The first work EERM [in ICLR'22 ] formulates this problem with structured data and introduces a new learning algorithm through invariance principle. The follow-up works explore addressing this challenge through causal intervention, including CaNet [in WWW'24] and GLIND [in ICML'24]. On another side, we study out-of-distribution detection that aims to improve the reliablity of AI systems, e.g., GNNSafe [in ICLR'23].

Publications

The most recent works can be found on Google Scholar.

Selected
Representative
All (in chronological order)

DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion

Qitian Wu, Chenxiao Yang, Wentao Zhao, Yixuan He, David Wipf and Junchi Yan

International Conference on Learning Representations (ICLR) 2023 oral presentation, ranking among top 0.5%

Paper Slides Poster Video Code Blog (Chinese)

Summary: We propose a geometric diffusion framework with energy constraints and show its solution aligns with widely used attention networks, upon which we propose diffusion-based Transformers.

NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification

Qitian Wu, Wentao Zhao, Zenan Li, David Wipf and Junchi Yan

Advances in Neural Information Processing Systems (NeurIPS) 2022 spotlight presentation

Paper Slides Poster Video Code Blog (Chinese) Blog (English)

Summary: We propose a scalable graph Transformer with efficient all-pair message passing achieved in O(N) complexity. The global attention over 2M nodes only requires 4GB memory.

Handling Distribution Shifts on Graphs: An Invariance Perspective

Qitian Wu, Hengrui Zhang, Junchi Yan and David Wipf

International Conference on Learning Representations (ICLR) 2022

Paper Slides Poster Code Blog (Chinese) Blog (English)

Summary: We formulate out-of-distribution generalization on graphs and discuss how to leverage (causal) invariance principle for handling graph-based distribution shifts.

Transformers from Diffusion: A Unified Framework for Neural Message Passing

Qitian Wu, David Wipf and Junchi Yan

Journal of Machine Learning Research (JMLR) 2025 extended from DIFFormer (ICLR 2023)

Paper Code

Supercharging Graph Transformers with Advective Diffusion

Qitian Wu, Chenxiao Yang, Kaipeng Zeng and Michael Bronstein

International Conference on Machine Learning (ICML) 2025

Paper Code

DiffPuter: Empowering Diffusion Models for Missing Data Imputation

Hengrui Zhang, Liancheng Fang, Qitian Wu and Philip S Yu

International Conference on Learning Representations (ICLR) 2025 spotlight presentation

Paper Code

Learning Divergence Fields for Shift-Robust Message Passing

Qitian Wu, Fan Nie, Chenxiao Yang and Junchi Yan

International Conference on Machine Learning (ICML) 2024

Paper Slides Code Blog (Chinese) Blog (English)

How Graph Neural Networks Learn: Lessons from Training Dynamics

Chenxiao Yang, Qitian Wu, David Wipf, Ruoyu Sun and Junchi Yan

International Conference on Machine Learning (ICML) 2024

Paper Code

Graph Out-of-Distribution Generalization via Causal Intervention

Qitian Wu, Fan Nie, Chenxiao Yang, Tianyi Bao and Junchi Yan

The Web Conference (WWW) 2024 oral presentation

Paper Slides Code Blog (Chinese) Blog (English)

SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations

Qitian Wu, Wentao Zhao, Chenxiao Yang, Hengrui Zhang, Fan Nie, Haitian Jiang, Yatao Bian and Junchi Yan

Advances in Neural Information Processing Systems (NeurIPS) 2023

Paper Slides Code Blog (Chinese)

DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion

Qitian Wu, Chenxiao Yang, Wentao Zhao, Yixuan He, David Wipf and Junchi Yan

International Conference on Learning Representations (ICLR) 2023 oral presentation, ranking among top 0.5%

Paper Slides Poster Video Code Blog (Chinese)

Energy-based Out-of-Distribution Detection for Graph Neural Networks

Qitian Wu, Yiting Chen, Chenxiao Yang, and Junchi Yan

International Conference on Learning Representations (ICLR) 2023

Paper Slides Poster Code Blog (Chinese)

Graph Neural Networks are Inherently Good Generalizers: Insights by Bridging GNNs and Multi-Layer Perceptrons

Chenxiao Yang, Qitian Wu, Jiahua Wang and Junchi Yan

International Conference on Learning Representations (ICLR) 2023

Paper Code Blog (Chinese)

NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification

Qitian Wu, Wentao Zhao, Zenan Li, David Wipf and Junchi Yan

Advances in Neural Information Processing Systems (NeurIPS) 2022 spotlight presentation

Paper Slides Poster Video Code Blog (Chinese) Blog (English)

Learning Substructure Invariance for Out-of-Distribution Molecular Representations

Nianzu Yang, Kaipeng Zeng, Qitian Wu, Xiaosong Jia and Junchi Yan

Advances in Neural Information Processing Systems (NeurIPS) 2022 spotlight presentation

Paper Code Blog (Chinese)

Geometric Knowledge Distillation: Topology Compression for Graph Neural Networks

Chenxiao Yang, Qitian Wu and Junchi Yan

Advances in Neural Information Processing Systems (NeurIPS) 2022

Paper Code Blog (Chinese)

Towards Out-of-Distribution Sequential Event Prediction: A Causal Treatment

Chenxiao Yang, Qitian Wu, Qingsong Wen, Zhiqiang Zhou, Liang Sun and Junchi Yan

Advances in Neural Information Processing Systems (NeurIPS) 2022

Paper Code Blog (Chinese)

Handling Distribution Shifts on Graphs: An Invariance Perspective

Qitian Wu, Hengrui Zhang, Junchi Yan and David Wipf

International Conference on Learning Representations (ICLR) 2022

Paper Slides Poster Code Blog (Chinese) Blog (English)

Towards Open-World Recommendation: An Inductive Model-based Collaborative Filtering Approach

Qitian Wu, Hengrui Zhang, Xiaofeng Gao, Junchi Yan and Hongyuan Zha

International Conference on Machine Learning (ICML) 2021 spotlight presentation

Paper Slides Poster Code Blog (Chinese)

Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach

Qitian Wu, Chenxiao Yang and Junchi Yan

Advances in Neural Information Processing Systems (NeurIPS) 2021

Paper Slides Poster Code Blog (Chinese)

From Canonical Correlation Analysis to Self-supervised Graph Neural Networks

Hengrui Zhang, Qitian Wu, Junchi Yan, David Wipf and Philip S. Yu

Advances in Neural Information Processing Systems (NeurIPS) 2021

Paper Code

Bridging Explicit and Implicit Deep Generative Models via Neural Stein Estimators

Qitian Wu, Han Gao and Hongyuan Zha

Advances in Neural Information Processing Systems (NeurIPS) 2021

Paper Code

Dual Graph Attention Networks for Deep Latent Representation of Multifaceted Social Effects in Recommender Systems

Qitian Wu, Hengrui Zhang, Xiaofeng Gao, Peng He, Paul Weng, Han Gao and Guihai Chen

The Web Conference (WWW) 2019 long oral representation

Paper Code

Transformers from Diffusion: A Unified Framework for Neural Message Passing

Qitian Wu, David Wipf and Junchi Yan

Journal of Machine Learning Research (JMLR) 2025 extended from DIFFormer (ICLR 2023)

Supercharging Graph Transformers with Advective Diffusion

Qitian Wu, Chenxiao Yang, Kaipeng Zeng and Michael Bronstein

International Conference on Machine Learning (ICML) 2025

Generative Modeling Reinvents Supervised Learning: Label Repurposing with Predictive Consistency Learning

Yang Li, Jiale Ma, Yebin Yang, Qitian Wu, Hongyuan Zha and Junchi Yan

International Conference on Machine Learning (ICML) 2025

TabNAT: A Continuous-Discrete Joint Generative Framework for Tabular Data

Hengrui Zhang, Liancheng Fang, Qitian Wu and Philip S Yu

International Conference on Machine Learning (ICML) 2025

DiffPuter: Empowering Diffusion Models for Missing Data Imputation

Hengrui Zhang, Liancheng Fang, Qitian Wu and Philip S Yu

International Conference on Learning Representations (ICLR) 2025 spotlight presentation

SLMRec: Distilling Large Language Models into Small for Sequential Recommendation

Wujiang Xu, Qitian Wu, Zujie Liang, Jiaojiao Han, Xuying Ning, Yunxiao Shi, Wenfang Lin and Yongfeng Zhang

International Conference on Learning Representations (ICLR) 2025

Regularizing Energy among Training Samples for Out-of-Distribution Generalization

Yiting Chen, Qitian Wu and Junchi Yan

International Conference on Learning Representations (ICLR) 2025

Learning Divergence Fields for Shift-Robust Message Passing

Qitian Wu, Fan Nie, Chenxiao Yang and Junchi Yan

International Conference on Machine Learning (ICML) 2024

How Graph Neural Networks Learn: Lessons from Training Dynamics

Chenxiao Yang, Qitian Wu, David Wipf, Ruoyu Sun and Junchi Yan

International Conference on Machine Learning (ICML) 2024

Graph Out-of-Distribution Detection Goes Neighborhood Shaping

Tianyi Bao, Qitian Wu, Zetian Jiang, Yiting Chen, Jiawei Sun and Junchi Yan

International Conference on Machine Learning (ICML) 2024

Graph Out-of-Distribution Generalization via Causal Intervention

Qitian Wu, Fan Nie, Chenxiao Yang, Tianyi Bao and Junchi Yan

The Web Conference (WWW) 2024 oral presentation

Rethinking Cross-Domain Sequential Recommendation Under Open-World Assumptions

Wujiang Xu, Qitian Wu, Runzhong Wang, Mingming Ha, Qiongxu Ma, Linxun Chen, Bing Han and Junchi Yan

The Web Conference (WWW) 2024

SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations

Qitian Wu, Wentao Zhao, Chenxiao Yang, Hengrui Zhang, Fan Nie, Haitian Jiang, Yatao Bian and Junchi Yan

Advances in Neural Information Processing Systems (NeurIPS) 2023

Unleashing the Power of Graph Data Augmentation on Covariate Distribution Shift

Yongduo Sui, Qitian Wu, Jiancan Wu, Qing Cui, Longfei Li, Jun Zhou, Xiang Wang, Xiangnan He

Advances in Neural Information Processing Systems (NeurIPS) 2023

GraphGlow: Universal and Genralizable Structure Learning for Graph Neural Networks

Wentao Zhao, Qitian Wu, Chenxiao Yang and Junchi Yan

ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) 2023

MoleRec: Combinatorial Drug Recommendation with SubstructureAware Molecular Representation Learning

Nianzu Yang, Kaipeng Zeng, Qitian Wu, Junchi Yan

The Web Conference (WWW) 2023