About Me

I am an incoming Ph.D. student at Princeton University. Before joining Princeton, I studied at Peking University (PKU) majoring in applied mathematics, while pursuing a double major in computer science. I was fortunate to be advised by Professor Liwei Wang on research about theory of machine learning. I spent a wonderful summer at MIT as a research intern supervised by Professor Sasha Rakhlin in 2019. I also work closely with Professor Jason D. Lee.

I have a very broad range of interests spanning many fields in machine learning, e.g., optimization, representation learning, architecture design (Transformer, Graph Neural Networks, etc.). To summarize in one sentence, I’m mostly interested in the theories that can inspire us to make better algorithms. My tenet is to make machine learning more general [1, 2, 3], efficient [5, 6, 9], and reliable [4, 7, 8] (please see my publications below).

If you are interested in collaborating with me or want to have a chat, always feel free to contact me through e-mail or WeChat : )


  • Three papers accepted by ICML 2021! May, 2021
  • Two papers accepted by NeurIPS 2020! Sep. 2020
  • Graduate from PKU. July, 2020

Selected Publications

  1. (Preprint) Do Transformers Really Perform Bad for Graph Representation?

    Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu

    Highligh: Make Transformer great again on graph classification by introducing three graph structural encodings! Achieve SOTA performance on several benchmarks!

  2. (Preprint) Towards a Theoretical Framework of Out-of-Distribution Generalization

    Haotian Ye, Chuanlong Xie, Tianle Cai, Ruichen Li, Zhenguo Li, Liwei Wang

    Highligh: We formulate what an OOD is and derive bounds and model selection algorithm upon our framework.

  3. (ICML 2021) A Theory of Label Propagation for Subpopulation Shift

    Tianle Cai*, Ruiqi Gao*, Jason D. Lee*, Qi Lei*

    Hightlight: Subpopulation shift is a ubiquitous component of natural distribution shift. We propose a general theoretical framework of learning under subpopulation shift based on label propagation. And our insights can help to improve domain adaptation algorithms.

  4. (ICML 2021) Towards Certifying $\ell_\infty$ Robustness using Neural Networks with $\ell_\infty$-dist Neurons

    Bohang Zhang, Tianle Cai, Zhou Lu, Di He, Liwei Wang

    Hightlight: New architecture with inherent $\ell_\infty$-robustness and a tailored training pipeline. Achieving SOTA performance on several benchmarks!


  5. (ICML 2021) GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training

    Tianle Cai*, Shengjie Luo*, Keyulu Xu, Di He, Tie-Yan Liu, Liwei Wang

    Highlight: A principled normalization scheme specially designed for graph neural networks. Achieve SOTA on several graph classification benchmarks.

    [Code], [Third-part implementation by microsoft ptgnn lab. (Thanks for the very quick reaction and implementation MS!)]

  6. (NeurIPS 2020) Sanity-Checking Pruning Methods: Random Tickets can Win the Jackpot

    Jingtong Su*, Yihang Chen*, Tianle Cai*, Tianhao Wu, Ruiqi Gao, Liwei Wang, Jason D. Lee

    Highlight: We sanity-check several existing pruning methods and find the performance of a large group of methods only rely on the pruning ratio of each layer. This finding inspires us to design an efficient data-independent, training-free pruning method as a byproduct.


  7. (NeurIPS 2020) Locally Differentially Private (Contextual) Bandits Learning

    Kai Zheng, Tianle Cai, Weiran Huang, Zhenguo Li, Liwei Wang

    Highlight: Simple black-box reduction framework improves private bandits bounds.


  8. (NeurIPS 2019 Spotlight 2.4 % Acceptance rate) Convergence of Adversarial Training in Overparametrized Networks

    Ruiqi Gao*, Tianle Cai*, Haochuan Li, Liwei Wang, Cho-Jui Hsieh, Jason D. Lee

    Highlight: For overparameterized neural network, we prove that adversarial training can converge to global minima (with loss 0).


  9. (NeurIPS 2019 Beyond First Order Method in ML Workshop) Gram-Gauss-Newton Method: Learning Overparameterized Neural Networks for Regression Problems

    Tianle Cai*, Ruiqi Gao*, Jikai Hou*, Siyu Chen, Dong Wang, Di He, Zhihua Zhang, Liwei Wang

    Highlight: A provable second-order optimization method for overparameterized network on regression problem! As light as SGD at each iteration but converge much faster than SGD for real world application.


  • Towards Understanding Optimization of Deep Learning at IJTCS [slides] [video]

  • A Gram-Gauss-Newton Method Learning Overparameterized Deep Neural Networks for Regression Problems at PKU machine learning workshop [slides]


  • Visiting Research Student at Simons Institute, UC Berkeley
    • Program: Foundations of Deep Learning
    • June, 2019 - July, 2019
  • Visiting Research Internship at MIT
    • Advisor: Professor Sasha Rakhlin
    • June, 2019 - Sept., 2019
  • Visiting Research Student at Princeton
    • Host: Professor Jason D. Lee
    • Sept., 2019 - Oct., 2019