魔改Attention大集合

前几天逛github刷到一个『awesome-fast-attention』大列表，整理了一系列关于attention的高效改进文章，包括论文、引用量、源码实现、复杂度以及关键亮点。

Efficient Attention

文章

A Survey of Long-Term Context in Transformers[60]

Transformers Assemble（PART I）

Transformers Assemble（PART II）

Transformers Assemble（PART III）

Transformers Assemble（PART IV）

Transformers Assemble（PART V）

ICLR2020 | 深度自适应Transformer

Memory Transformer，一种简单明了的Transformer改造方案

【ICLR2020】Transformer Complex-order：一种新的位置编码方式

本文参考资料

[1]Generating Wikipedia by Summarizing Long Sequences: https://arxiv.org/abs/1801.10198v1

[2]memory-compressed-attention: https://github.com/lucidrains/memory-compressed-attention

[3]CBAM: Convolutional Block Attention Module: https://arxiv.org/abs/1807.06521v2

[4]attention-module: https://github.com/Jongchan/attention-module

[5]CCNet: Criss-Cross Attention for Semantic Segmentation: https://arxiv.org/abs/1811.11721v2

[6]CCNet: https://github.com/speedinghzl/CCNet

[7]Efficient Attention: Attention with Linear Complexities: https://arxiv.org/abs/1812.01243v8

[8]efficient-attention: https://github.com/cmsflash/efficient-attention

[9]Star-Transformer: https://arxiv.org/abs/1902.09113v2

[10]fastNLP: https://github.com/fastnlp/fastNLP/blob/master/fastNLP/modules/encoder/star_transformer.py

[11]Generating Long Sequences with Sparse Transformers: https://arxiv.org/abs/1904.10509v1

[12]torch-blocksparse: https://github.com/ptillet/torch-blocksparse

[13]GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond: https://arxiv.org/abs/1904.11492v1

[14]GCNet: https://github.com/xvjiarui/GCNet

[15]SCRAM: Spatially Coherent Randomized Attention Maps: https://arxiv.org/abs/1905.10308v1

[16]Interlaced Sparse Self-Attention for Semantic Segmentation: https://arxiv.org/abs/1907.12273v2

[17]Permutohedral Attention Module for Efficient Non-Local Neural Networks: https://arxiv.org/abs/1907.00641v2

[18]Permutohedral_attention_module: https://github.com/SamuelJoutard/Permutohedral_attention_module

[19]Large Memory Layers with Product Keys: https://arxiv.org/abs/1907.05242v2

[20]XLM: https://github.com/facebookresearch/XLM

[21]Expectation-Maximization Attention Networks for Semantic Segmentation: https://arxiv.org/abs/1907.13426v2

[22]EMANet: https://github.com/XiaLiPKU/EMANet

[23]Compressive Transformers for Long-Range Sequence Modelling: https://arxiv.org/abs/1911.05507v1

[24]compressive-transformer-pytorch: https://github.com/lucidrains/compressive-transformer-pytorch

[25]BP-Transformer: Modelling Long-Range Context via Binary Partitioning: https://arxiv.org/abs/1911.04070v1

[26]BPT: https://github.com/yzh119/BPT

[27]Axial Attention in Multidimensional Transformers: https://arxiv.org/abs/1912.12180v1

[28]axial-attention: https://github.com/lucidrains/axial-attention

[29]Reformer: The Efficient Transformer: https://arxiv.org/abs/2001.04451v2

[30]trax: https://github.com/google/trax/tree/master/trax/models/reformer

[31]Transformer on a Diet: https://arxiv.org/abs/2002.06170v1

[32]transformer-on-diet: https://github.com/cgraywang/transformer-on-diet

[33]Sparse Sinkhorn Attention: https://arxiv.org/abs/2002.11296v1

[34]sinkhorn-transformer: https://github.com/lucidrains/sinkhorn-transformer

[35]SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection: https://arxiv.org/abs/2003.09833v2

[36]Efficient Content-Based Sparse Attention with Routing Transformers: https://arxiv.org/abs/2003.05997v1

[37]routing-transformer: https://github.com/lucidrains/routing-transformer

[38]Longformer: The Long-Document Transformer: https://arxiv.org/abs/2004.05150v1

[39]longformer: https://github.com/allenai/longformer

[40]Neural Architecture Search for Lightweight Non-Local Networks: https://arxiv.org/abs/2004.01961v1

[41]AutoNL: https://github.com/LiYingwei/AutoNL

[42]ETC: Encoding Long and Structured Data in Transformers: https://arxiv.org/abs/2004.08483v2

[43]Multi-scale Transformer Language Models: https://arxiv.org/abs/2005.00581v1

[44]Synthesizer: Rethinking Self-Attention in Transformer Models: https://arxiv.org/abs/2005.00743v1

[45]Jukebox: A Generative Model for Music: https://arxiv.org/abs/2005.00341v1

[46]jukebox: https://github.com/openai/jukebox

[47]GMAT: Global Memory Augmentation for Transformers: https://arxiv.org/abs/2006.03274v1

[48]gmat: https://github.com/ag1988/gmat

[49]Masked Language Modeling for Proteins via Linearly ble Long-Context Transformers: https://arxiv.org/abs/2006.03555v1

[50]google-research: https://github.com/google-research/google-research/tree/master/performer/fast_self_attention

[51]Hand-crafted Attention is All You Need? A Study of Attention on Self-supervised Audio Transformer: https://arxiv.org/abs/2006.05174v1

[52]Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention: https://arxiv.org/abs/2006.16236v2

[53]fast-transformers: https://github.com/idiap/fast-transformers

[54]Linformer: Self-Attention with Linear Complexity: https://arxiv.org/abs/2006.04768v3

[55]linformer-pytorch: https://github.com/tatp22/linformer-pytorch

[56]Real-time Semantic Segmentation with Fast Attention: https://arxiv.org/abs/2007.03815v2

[57]Fast Transformers with Clustered Attention: https://arxiv.org/abs/2007.04825v1

[58]fast-transformers: https://github.com/idiap/fast-transformers

[59]Big Bird: Transformers for Longer Sequences: https://arxiv.org/abs/2007.14062v1

[60]A Survey of Long-Term Context in Transformers: https://www.pragmatic.ml/a-survey-of-methods-for-incorporating-long-term-context/

声明：文章收集于网络，版权归原作者所有，为传播信息而发，如有侵权，请联系小编删除，谢谢！

时间:2020-08-26 10:08 来源: 转发量:次

声明：本站部分作品是由网友自主投稿和发布、编辑整理上传，对此类作品本站仅提供交流平台，转载的目的在于传递更多信息及用于网络分享，并不代表本站赞同其观点和对其真实性负责，不为其版权负责。如果您发现网站上有侵犯您的知识产权的作品，请与我们取得联系，我们会及时修改或删除。

上一篇：盘点当前最流行的激活函数及选择经验
下一篇：一文读懂深度学习中的各种卷积

网友评论：

发表评论

最新评论 进入详细评论页>>