Target venues: system conferences (OSDI/SOSP/ATC/EuroSys/ASPLOS), network conferences (NSDI/SIGCOMM), mobile conferences (MobiCom/MobiSys/SenSys/UbiComp).
Survey
- A Survey of Multi-Tenant Deep Learning Inference on GPU https://arxiv.org/pdf/2203.09040.pdf
- Full Stack Optimization of Transformer Inference: a Survey https://arxiv.org/abs/2302.14017
Edge-based Acceleration
- [SenSys’23] Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU | Chinese University of Hong Kong, City University of Hong Kong
- [SenSys’23 Best Paper Runner-up Award] nnPerf: Demystifying DNN Runtime Inference Latency on Mobile Platforms | Beijing University of Posts and Telecommunications
Cloud-based Acceleration