[ArXiv 2023.08] Retentive Network: A Successor to Transformer for Large Language Models https://arxiv.org/pdf/2307.08621.pdf | Microsoft Research, Tsinghua University
[ICCV’21 Best Paper Award] Swin Transformer: Hierarchical Vision Transformer using Shifted Windows https://arxiv.org/pdf/2103.14030.pdf | Microsoft Research