publications
2026
- TileLang: Bridge Programmability and Performance in Modern Neural KernelsThe Fourteenth International Conference on Learning Representations (ICLR 2026), 2026Oral Presentation
- Sparse Attention Adaptation for Long ReasoningThe Fourteenth International Conference on Learning Representations (ICLR 2026), 2026
- MetaAttention: A Unified and Performant Attention Framework across Hardware BackendsIn Proceedings of the 31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2026
- MiMo-V2-Flash Technical ReportarXiv preprint arXiv:2601.02780, 2026
2025
- PipeThreader: Software-Defined Pipelining for Efficient DNN ExecutionThe 19th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’25), 2025
- HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache SharingarXiv preprint arXiv:2602.03560, 2025
- NeuStream: Bridging Deep Learning Serving and Stream ProcessingThe 20th European Conference on Computer Systems (EuroSys’25), 2025
2024
- Scaling Deep Learning Computation over the Inter-Core Connected Intelligence Processor30th ACM Symposium on Operating Systems Principles (SOSP 2024), 2024
2023
- GoldMiner: Elastic Scaling of Training Data Pre-Processing Pipelines for Deep LearningProceedings of the ACM on Management of Data, 2023
2022
- Zoomer: Boosting retrieval on web-scale graphs by regions of interestIn 2022 IEEE 38th International Conference on Data Engineering (ICDE), 2022