Target venues: system conferences (OSDI/SOSP/ATC/EuroSys/ASPLOS), network conferences (NSDI/SIGCOMM), mobile conferences (MobiCom/MobiSys/SenSys/UbiComp).

Survey

  1. A Survey of Multi-Tenant Deep Learning Inference on GPU https://arxiv.org/pdf/2203.09040.pdf
  2. Full Stack Optimization of Transformer Inference: a Survey https://arxiv.org/abs/2302.14017

Edge-based Acceleration

Cloud-based Acceleration