Qitian Wu

Broad Institute of MIT and Harvard

wuqitian [AT] mit.edu

Google Scholar | Github | Twitter | Medium

What's News

  • [2025.06] Our extended paper from DIFFormer (ICLR'23) was accepted to JMLR
  • [2025.05] Three papers were accepted to ICML. See you in Vancouver.
  • [2025.05] Gave a talk on scaling up graph Transformers in NESS at Yale.
  • [2025.04] I will serve as Area Chair of NeurIPS 2025.
  • [2025.01] Three papers with one selected as spotlight were accepted to ICLR.
  • [2024.11] Gave a talk on scaling up graph Transformers at LoG seminar in New York.
  • [2024.09] Joined Broad Institute of MIT and Harvard as a postdoc fellow.
  • [2024.05] Three papers were accepted to ICML, congrats to Chenxiao and Tianyi.
  • [2024.01] Two papers (with one selected as Oral) were accepted to WWW. See you in Singapore.
  • [2023.12] Honored to be awarded with the Academic Scholar Star in SJTU.
  • [2023.10] Gave a talk on physics-inspired learning with non-IID data at ByteDance AI Lab.
  • [2023.09] Two papers were accepted to NeurIPS2023. See you in New Orleans.
  • [2023.09] Honored to be awarded with National PhD Scholarship.
  • [2023.07] Gave a talk on graph Transformers on LOG seminar (in Chinese). See the video here.
  • [2023.05] One paper is accepted to KDD2023, congrats to Wentao.
  • [2023.03] Gave a talk on learning on graphs with open world assumptions at Bosch AI Center.
  • [2023.02] One paper about combinatorial drug recommendation is accepted to WWW2023, congrats to Nianzu.
  • [2023.02] Gave a talk on graph Transformers at AI Times.
  • [2023.01] Three papers with one spotlight (notably-top-25%) were accepted to ICLR 2023, congrats to Chenxiao.
  • [2022.11] Two papers are selected as spotlight presentation (less than 5%) on NeurIPS 2022!
  • [2022.10] I was awarded with National PhD Scholarship (top 1%).
  • [2022.10] I will give a talk on graph OOD generalization at LoG seminar.
  • [2022.09] Five papers on OOD generalization and graph Transformers were accepted to NeurIPS'22.
  • [2022.05] Two papers on GNNs and causal learning were accepted to SIGKDD'22.
  • [2022.04] One paper on negative sampling was accepted to IJCAI'22.
  • [2022.03] I will give a talk on out-of-distribution generalization at AI Drive PaperWeekly.
  • [2022.01] One paper on learning with graph distribution shifts was accepted to ICLR'22.
  • [2021.12] I will give a talk on open-world learning at Baiyulan young researcher forum.
  • [2021.11] I was awarded with Baidu Scholarship (only 10 from worldwide).
  • [2021.10] I was awarded with Microsoft Research PhD Fellowship (only 11 from Asia).
  • [2021.09] Three papers were accepted to NeurIPS'21.
  • [2021.08] One paper on sequential recommendation was accepted to CIKM'21 as spotlight.
  • [2021.04] I was elected as Global Top 100 AI Rising Star!


  • Bio

    I am currently a postdoctoral fellow of Eric and Wendy Schmidt Center at Broad Institute of MIT and Harvard, working with Caroline Uhler. Prior to this, I finished the PhD in Computer Science from Shanghai Jiao Tong University (SJTU), supervised by Junchi Yan and worked with David Wipf, Hongyuan Zha and Michael Bronstein. Before that, I achieved the Bachelor (Microelectronics, minor in Mathematics) and Master (Computer Science) degrees from SJTU, and worked as research intern at Tencent WeChat, Amazon Web Service and BioMap AI Lab.

    My general research interest revolves around scalable and generalizable machine learning. On the methodology side, I am currently focusing on empowering foundation models with multi-modal reasoning capabilities, scaling up computational backbones (e.g., Transformers) to large-scale data, and enhancing the generalization and reliability of AI systems. On the applidation side, I explore applying these methods to critical challenges in a broad range of real applications, such as scientific discovery and recommender systems.

    I am the recipients of Microsoft Research PhD Fellowship, Baidu PhD Fellowship and Rising Star in Artificial Intelligence.

    Research Summary

    My current research aims at improving and broadening the capabilities of AI models, especially in terms of the scalability and generalizablity, via developing theoretically principled and practically useful methodology that sheds insights on ML algorithmic designs and facilitates problem solving in real applications.

    • For scalability, our works explore new Transformer architectures that scale up global attention to large interconnected data. The first model NodeFormer [in NeurIPS'22] introduces a pioneering Transformer for large graphs that reduces the quadratic complexity to linearity. The follow-up model SGFormer [in NeurIPS'23] adopts a simplified single-layer attention that achieves linear complexity without any approximation. In its extended version we supplement theoretical understandings on the model design. Beyond computational efficiency, we investigate how to understand the inherent mechanism of neural architectures. To this end, our work DIFFormer [in ICLR'23] derives a scalable attention model inspired by physical process, i.e., diffusion equations with energy constraint. In its extended version [in JMLR] we present more in-depth discussion on how energy-constrained diffusion can serve as a unified framework for different architectures (MLP, GNNs and Transformers). Along this path, our recent work AdvDIFFormer [in ICML'25] further extends the model from advective diffusion equations that endows Transformers with inherent generalization power.
    • For generalizability, our works endeavor to understand the generalization limits of neural networks under distribution shifts. On one side, we study the challenging out-of-distribution generalization problem. The first work EERM [in ICLR'22 ] formulates this problem with structured data and introduces a new learning algorithm through invariance principle. The follow-up works explore addressing this challenge through causal intervention, including CaNet [in WWW'24] and GLIND [in ICML'24]. On another side, we study out-of-distribution detection that aims to improve the reliablity of AI systems, e.g., GNNSafe [in ICLR'23].

    Publications

    The most recent works can be found on Google Scholar.

    • Selected
    • Representative
    • All (in chronological order)

    DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion

    Qitian Wu, Chenxiao Yang, Wentao Zhao, Yixuan He, David Wipf and Junchi Yan

    International Conference on Learning Representations (ICLR) 2023 oral presentation, ranking among top 0.5%

    Summary: We propose a geometric diffusion framework with energy constraints and show its solution aligns with widely used attention networks, upon which we propose diffusion-based Transformers.

    NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification

    Qitian Wu, Wentao Zhao, Zenan Li, David Wipf and Junchi Yan

    Advances in Neural Information Processing Systems (NeurIPS) 2022 spotlight presentation

    Summary: We propose a scalable graph Transformer with efficient all-pair message passing achieved in O(N) complexity. The global attention over 2M nodes only requires 4GB memory.

    Handling Distribution Shifts on Graphs: An Invariance Perspective

    Qitian Wu, Hengrui Zhang, Junchi Yan and David Wipf

    International Conference on Learning Representations (ICLR) 2022

    Summary: We formulate out-of-distribution generalization on graphs and discuss how to leverage (causal) invariance principle for handling graph-based distribution shifts.

    Transformers from Diffusion: A Unified Framework for Neural Message Passing

    Qitian Wu, David Wipf and Junchi Yan

    Journal of Machine Learning Research (JMLR) 2025 extended from DIFFormer (ICLR 2023)

    Supercharging Graph Transformers with Advective Diffusion

    Qitian Wu, Chenxiao Yang, Kaipeng Zeng and Michael Bronstein

    International Conference on Machine Learning (ICML) 2025

    DiffPuter: Empowering Diffusion Models for Missing Data Imputation

    Hengrui Zhang, Liancheng Fang, Qitian Wu and Philip S Yu

    International Conference on Learning Representations (ICLR) 2025 spotlight presentation

    Learning Divergence Fields for Shift-Robust Message Passing

    Qitian Wu, Fan Nie, Chenxiao Yang and Junchi Yan

    International Conference on Machine Learning (ICML) 2024

    How Graph Neural Networks Learn: Lessons from Training Dynamics

    Chenxiao Yang, Qitian Wu, David Wipf, Ruoyu Sun and Junchi Yan

    International Conference on Machine Learning (ICML) 2024

    Graph Out-of-Distribution Generalization via Causal Intervention

    Qitian Wu, Fan Nie, Chenxiao Yang, Tianyi Bao and Junchi Yan

    The Web Conference (WWW) 2024 oral presentation

    SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations

    Qitian Wu, Wentao Zhao, Chenxiao Yang, Hengrui Zhang, Fan Nie, Haitian Jiang, Yatao Bian and Junchi Yan

    Advances in Neural Information Processing Systems (NeurIPS) 2023

    DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion

    Qitian Wu, Chenxiao Yang, Wentao Zhao, Yixuan He, David Wipf and Junchi Yan

    International Conference on Learning Representations (ICLR) 2023 oral presentation, ranking among top 0.5%

    Energy-based Out-of-Distribution Detection for Graph Neural Networks

    Qitian Wu, Yiting Chen, Chenxiao Yang, and Junchi Yan

    International Conference on Learning Representations (ICLR) 2023

    Graph Neural Networks are Inherently Good Generalizers: Insights by Bridging GNNs and Multi-Layer Perceptrons

    Chenxiao Yang, Qitian Wu, Jiahua Wang and Junchi Yan

    International Conference on Learning Representations (ICLR) 2023

    NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification

    Qitian Wu, Wentao Zhao, Zenan Li, David Wipf and Junchi Yan

    Advances in Neural Information Processing Systems (NeurIPS) 2022 spotlight presentation

    Learning Substructure Invariance for Out-of-Distribution Molecular Representations

    Nianzu Yang, Kaipeng Zeng, Qitian Wu, Xiaosong Jia and Junchi Yan

    Advances in Neural Information Processing Systems (NeurIPS) 2022 spotlight presentation

    Geometric Knowledge Distillation: Topology Compression for Graph Neural Networks

    Chenxiao Yang, Qitian Wu and Junchi Yan

    Advances in Neural Information Processing Systems (NeurIPS) 2022

    Towards Out-of-Distribution Sequential Event Prediction: A Causal Treatment

    Chenxiao Yang, Qitian Wu, Qingsong Wen, Zhiqiang Zhou, Liang Sun and Junchi Yan

    Advances in Neural Information Processing Systems (NeurIPS) 2022

    Handling Distribution Shifts on Graphs: An Invariance Perspective

    Qitian Wu, Hengrui Zhang, Junchi Yan and David Wipf

    International Conference on Learning Representations (ICLR) 2022

    Towards Open-World Recommendation: An Inductive Model-based Collaborative Filtering Approach

    Qitian Wu, Hengrui Zhang, Xiaofeng Gao, Junchi Yan and Hongyuan Zha

    International Conference on Machine Learning (ICML) 2021 spotlight presentation

    Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach

    Qitian Wu, Chenxiao Yang and Junchi Yan

    Advances in Neural Information Processing Systems (NeurIPS) 2021

    From Canonical Correlation Analysis to Self-supervised Graph Neural Networks

    Hengrui Zhang, Qitian Wu, Junchi Yan, David Wipf and Philip S. Yu

    Advances in Neural Information Processing Systems (NeurIPS) 2021

    Bridging Explicit and Implicit Deep Generative Models via Neural Stein Estimators

    Qitian Wu, Han Gao and Hongyuan Zha

    Advances in Neural Information Processing Systems (NeurIPS) 2021

    Dual Graph Attention Networks for Deep Latent Representation of Multifaceted Social Effects in Recommender Systems

    Qitian Wu, Hengrui Zhang, Xiaofeng Gao, Peng He, Paul Weng, Han Gao and Guihai Chen

    The Web Conference (WWW) 2019 long oral representation

    Transformers from Diffusion: A Unified Framework for Neural Message Passing

    Qitian Wu, David Wipf and Junchi Yan

    Journal of Machine Learning Research (JMLR) 2025 extended from DIFFormer (ICLR 2023)

    Supercharging Graph Transformers with Advective Diffusion

    Qitian Wu, Chenxiao Yang, Kaipeng Zeng and Michael Bronstein

    International Conference on Machine Learning (ICML) 2025

    Generative Modeling Reinvents Supervised Learning: Label Repurposing with Predictive Consistency Learning

    Yang Li, Jiale Ma, Yebin Yang, Qitian Wu, Hongyuan Zha and Junchi Yan

    International Conference on Machine Learning (ICML) 2025

    TabNAT: A Continuous-Discrete Joint Generative Framework for Tabular Data

    Hengrui Zhang, Liancheng Fang, Qitian Wu and Philip S Yu

    International Conference on Machine Learning (ICML) 2025

    DiffPuter: Empowering Diffusion Models for Missing Data Imputation

    Hengrui Zhang, Liancheng Fang, Qitian Wu and Philip S Yu

    International Conference on Learning Representations (ICLR) 2025 spotlight presentation

    SLMRec: Distilling Large Language Models into Small for Sequential Recommendation

    Wujiang Xu, Qitian Wu, Zujie Liang, Jiaojiao Han, Xuying Ning, Yunxiao Shi, Wenfang Lin and Yongfeng Zhang

    International Conference on Learning Representations (ICLR) 2025

    Regularizing Energy among Training Samples for Out-of-Distribution Generalization

    Yiting Chen, Qitian Wu and Junchi Yan

    International Conference on Learning Representations (ICLR) 2025

    Learning Divergence Fields for Shift-Robust Message Passing

    Qitian Wu, Fan Nie, Chenxiao Yang and Junchi Yan

    International Conference on Machine Learning (ICML) 2024

    How Graph Neural Networks Learn: Lessons from Training Dynamics

    Chenxiao Yang, Qitian Wu, David Wipf, Ruoyu Sun and Junchi Yan

    International Conference on Machine Learning (ICML) 2024

    Graph Out-of-Distribution Detection Goes Neighborhood Shaping

    Tianyi Bao, Qitian Wu, Zetian Jiang, Yiting Chen, Jiawei Sun and Junchi Yan

    International Conference on Machine Learning (ICML) 2024

    Graph Out-of-Distribution Generalization via Causal Intervention

    Qitian Wu, Fan Nie, Chenxiao Yang, Tianyi Bao and Junchi Yan

    The Web Conference (WWW) 2024 oral presentation

    Rethinking Cross-Domain Sequential Recommendation Under Open-World Assumptions

    Wujiang Xu, Qitian Wu, Runzhong Wang, Mingming Ha, Qiongxu Ma, Linxun Chen, Bing Han and Junchi Yan

    The Web Conference (WWW) 2024

    SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations

    Qitian Wu, Wentao Zhao, Chenxiao Yang, Hengrui Zhang, Fan Nie, Haitian Jiang, Yatao Bian and Junchi Yan

    Advances in Neural Information Processing Systems (NeurIPS) 2023

    Unleashing the Power of Graph Data Augmentation on Covariate Distribution Shift

    Yongduo Sui, Qitian Wu, Jiancan Wu, Qing Cui, Longfei Li, Jun Zhou, Xiang Wang, Xiangnan He

    Advances in Neural Information Processing Systems (NeurIPS) 2023

    GraphGlow: Universal and Genralizable Structure Learning for Graph Neural Networks

    Wentao Zhao, Qitian Wu, Chenxiao Yang and Junchi Yan

    ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) 2023

    MoleRec: Combinatorial Drug Recommendation with SubstructureAware Molecular Representation Learning

    Nianzu Yang, Kaipeng Zeng, Qitian Wu, Junchi Yan

    The Web Conference (WWW) 2023

    DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion

    Qitian Wu, Chenxiao Yang, Wentao Zhao, Yixuan He, David Wipf and Junchi Yan

    International Conference on Learning Representations (ICLR) 2023 oral presentation, ranking among top 0.5%

    Energy-based Out-of-Distribution Detection for Graph Neural Networks

    Qitian Wu, Yiting Chen, Chenxiao Yang, and Junchi Yan

    International Conference on Learning Representations (ICLR) 2023

    Graph Neural Networks are Inherently Good Generalizers: Insights by Bridging GNNs and Multi-Layer Perceptrons

    Chenxiao Yang, Qitian Wu, Jiahua Wang and Junchi Yan

    International Conference on Learning Representations (ICLR) 2023

    NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification

    Qitian Wu, Wentao Zhao, Zenan Li, David Wipf and Junchi Yan

    Advances in Neural Information Processing Systems (NeurIPS) 2022 spotlight presentation

    Learning Substructure Invariance for Out-of-Distribution Molecular Representations

    Nianzu Yang, Kaipeng Zeng, Qitian Wu, Xiaosong Jia and Junchi Yan

    Advances in Neural Information Processing Systems (NeurIPS) 2022 spotlight presentation

    Geometric Knowledge Distillation: Topology Compression for Graph Neural Networks

    Chenxiao Yang, Qitian Wu and Junchi Yan

    Advances in Neural Information Processing Systems (NeurIPS) 2022

    Towards Out-of-Distribution Sequential Event Prediction: A Causal Treatment

    Chenxiao Yang, Qitian Wu, Qingsong Wen, Zhiqiang Zhou, Liang Sun and Junchi Yan

    Advances in Neural Information Processing Systems (NeurIPS) 2022

    GraphDE: A Generative Framework for Debiased Learning and Out-of-Distribution Detection on Graphs

    Zenan Li, Qitian Wu, Fan Nie and Junchi Yan

    Advances in Neural Information Processing Systems (NeurIPS) 2022

    Variational Inference for Training Graph Neural Networks in Low-Data Regime through Joint Structure-Label Estimation

    Danning Lao, Xinyu Yang, Qitian Wu, Junchi Yan

    ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD) 2022

    DICE: Domain-attack Invariant Causal Learning for Improved Data Privacy Protection and Adversarial Robustness

    Qibing Ren, Yiting Chen, Yichuan Mo, Qitian Wu, Junchi Yan

    ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD) 2022

    Handling Distribution Shifts on Graphs: An Invariance Perspective

    Qitian Wu, Hengrui Zhang, Junchi Yan and David Wipf

    International Conference on Learning Representations (ICLR) 2022

    Trading Hard Negatives and True Negatives: A Debiased Contrastive Collaborative Filtering Approach

    Chenxiao Yang, Qitian Wu, Jipeng Jin, Junwei Pan, Xiaofeng Gao, and Guihai Chen

    International Joint Conference on Artificial Intelligence (IJCAI) 2022

    ScaleGCN: Efficient and Effective Graph Convolution via Channel-Wise Scale Transformation

    Tianqi Zhang, Qitian Wu, Junchi Yan, Yunan Zhao and Bing Han

    IEEE Transactions on Neural Networks and Learning Systems (TNNLS) 2022

    Towards Open-World Recommendation: An Inductive Model-based Collaborative Filtering Approach

    Qitian Wu, Hengrui Zhang, Xiaofeng Gao, Junchi Yan and Hongyuan Zha

    International Conference on Machine Learning (ICML) 2021 spotlight presentation

    Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach

    Qitian Wu, Chenxiao Yang and Junchi Yan

    Advances in Neural Information Processing Systems (NeurIPS) 2021

    From Canonical Correlation Analysis to Self-supervised Graph Neural Networks

    Hengrui Zhang, Qitian Wu, Junchi Yan, David Wipf and Philip S. Yu

    Advances in Neural Information Processing Systems (NeurIPS) 2021

    Bridging Explicit and Implicit Deep Generative Models via Neural Stein Estimators

    Qitian Wu, Han Gao and Hongyuan Zha

    Advances in Neural Information Processing Systems (NeurIPS) 2021

    Seq2Bubbles: Region-Based Embedding Learning for User Behaviors in Sequential Recommenders

    Qitian Wu, Chenxiao Yang, Shuodian Yu, Xiaofeng Gao and Guihai Chen

    ACM International Conference on Information & Knowledge Management (CIKM) 2021 spotlight presentation

    Learning High-Order Graph Convolutional Networks via Adaptive Layerwise Aggregation Combination

    Tianqi Zhang, Qitian Wu and Junchi Yan

    IEEE Transactions on Neural Networks and Learning Systems (TNNLS) 2021

    Sentimem: Attentive memory networks for sentiment classification in user review

    Xiaosong Jia, Qitian Wu, Xiaofeng Gao and Guihai Chen

    International Conference on Database Systems for Advanced Applications (DASFAA) 2020

    Learning Latent Process from High-Dimensional Event Sequences via Efficient Sampling

    Qitian Wu, Zixuan Zhang, Xiaofeng Gao, Junchi Yan and Guihai Chen

    Advances in Neural Information Processing Systems (NeurIPS) 2019

    Feature Evolution Based Multi-Task Learning for Collaborative Filtering with Social Trust

    Qitian Wu, Lei Jiang, Xiaofeng Gao, Xiaochun Yang and Guihai Chen

    International Joint Conference on Artificial Intelligence (IJCAI) 2019

    Dual Sequential Prediction Models Linking Sequential Recommendation and Information Dissemination

    Qitian Wu, Yirui Gao, Xiaofeng Gao, Paul Weng and Guihai Chen

    ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) 2019

    Dual Graph Attention Networks for Deep Latent Representation of Multifaceted Social Effects in Recommender Systems

    Qitian Wu, Hengrui Zhang, Xiaofeng Gao, Peng He, Paul Weng, Han Gao and Guihai Chen

    The Web Conference (WWW) 2019 long oral representation

    EPAB: Early pattern aware bayesian model for social content popularity prediction

    Qitian Wu, Chaoqi Yang, Xiaofeng Gao, Peng He and Guihai Chen

    IEEE international conference on data mining (ICDM) 2018

    Adversarial training model unifying feature driven and point process perspectives for event popularity prediction

    Qitian Wu, Chaoqi Yang, Hengrui Zhang, Xiaofeng Gao, Paul Weng and Guihai Chen

    ACM International Conference on Information & Knowledge Management (CIKM) 2018

    EPOC: A survival perspective early pattern detection model for outbreak cascades

    Chaoqi Yang, Qitian Wu, Xiaofeng Gao and Guihai Chen

    International Conference on Database and Expert Systems Applications (DEXA) 2018

    Honors & Awards

    Eric and Wendy Schmidt Center Postdoctoral Fellowship, 2024

    SJTU Scholar Star (the highest academic award for PhD students in university level), 2023

    National Scholarship (only 0.2% for PhD students in national level), 2022, 2023

    Baidu PhD Fellowship (only 10 recipients worldwide), 2021

    Microsoft Research PhD Fellowship (only 11 recipients in Asia), 2021

    Global Top 100 Rising Star in Artificial Intelligence, 2021

    Yuanqing Yang Scholarship (only 3 master students in the department), 2019

    Outstanding Graduate in Shanghai (only 5%), 2018

    Outstanding Thesis of Undergraduates, 2018

    Outstanding Winner, INFORMS Awards, Mathematical Contest in Modeling, Data Insights Problem (top 3 out of 4748 teams in the worldwide, the INFORMS Awards selects only one team), 2018

    Lixin Tang Scholarship (only 60 students across all academic levels in the university), 2017, 2018

    National Scholarship (only 1% for undergraduate students), 2016, 2017

    The 1st-Class Academic Excellence Scholarship (top 1 in the department), 2016, 2017

    National Second Award, China Undergraduate Mathematical Contest in Modeling, 2016

    First Award, Physics Contest of Chinese College Students, 2015

    Service

    Area Chair/Reviewer for Conferences
    ICML (2021-2025), NeurIPS (2021-2025), ICLR (2022-2025), SIGKDD (2023), WWW (2023),
    AAAI (2021-2023), IJCAI (2021-2023), CVPR (2021-2023), ICCV (2021)

    Reviewer for Journals
    Nature BioMed. Eng., TPAMI, TKDE, TNNLS

    Acknowledgement

    This website is built on the template by Martin Saveski