publications

2026

  1. TileLang: Bridge Programmability and Performance in Modern Neural Kernels
    Lei Wang, Yu Cheng, Yining Shi, and 9 more authors
    The Fourteenth International Conference on Learning Representations (ICLR 2026), 2026
    Oral Presentation
  2. Sparse Attention Adaptation for Long Reasoning
    Yizhao Gao, Shuming Guo, Shijie Cao, and 12 more authors
    The Fourteenth International Conference on Learning Representations (ICLR 2026), 2026
  3. MetaAttention: A Unified and Performant Attention Framework across Hardware Backends
    Feiyang Chen, Yu Cheng, Lei Wang, and 8 more authors
    In Proceedings of the 31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2026
  4. MiMo-V2-Flash Technical Report
    Bangjun Xiao, Bingquan Xia, Bo Yang, and 122 more authors
    arXiv preprint arXiv:2601.02780, 2026

2025

  1. PipeThreader: Software-Defined Pipelining for Efficient DNN Execution
    Yu Cheng, Lei Wang, Yining Shi, and 9 more authors
    The 19th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’25), 2025
  2. HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing
    Yizhao Gao, Jianyu Wei, Qihao Zhang, and 11 more authors
    arXiv preprint arXiv:2602.03560, 2025
  3. NeuStream: Bridging Deep Learning Serving and Stream Processing
    Haochen Yuan, Yuanqing Wang, Wenhao Xie, and 5 more authors
    The 20th European Conference on Computer Systems (EuroSys’25), 2025

2024

  1. Scaling Deep Learning Computation over the Inter-Core Connected Intelligence Processor
    Yiqi Liu, Yuqi Xue, Yu Cheng, and 4 more authors
    30th ACM Symposium on Operating Systems Principles (SOSP 2024), 2024

2023

  1. GoldMiner: Elastic Scaling of Training Data Pre-Processing Pipelines for Deep Learning
    Hanyu Zhao, Zhi Yang, Yu Cheng, and 8 more authors
    Proceedings of the ACM on Management of Data, 2023

2022

  1. Zoomer: Boosting retrieval on web-scale graphs by regions of interest
    Yuezihan Jiang, Yu Cheng, Hanyu Zhao, and 6 more authors
    In 2022 IEEE 38th International Conference on Data Engineering (ICDE), 2022