[Paper] Separable Self-attention for Mobile VisionTransformers

Deep Learning

ju_young 2022. 12. 26. 00:44

MobileViT는 CNN과 ViT의 장점을 가지는 light-weight network이다. 본 논문에서는 linear complexity를 가지는 separable self-attention을 소개한다.

separaple self-attention은 위와 같이 quadratic MHA를 두 개의 linear computation으로 대체하면서 global information을 encoding 한다.

contextual information을 가지는 $c_v$는 $ReLU(xW_v)$와 element-wise multiplication을 하고 $W_O$ weight를 가지는 linear layer를 통과한다.

[Paper] Automated 3D solid reconstruction from 2D CAD using OpenCV (2) (0)	2023.01.02
[Paper] Automated 3D solid reconstruction from 2D CAD using OpenCV (1) (0)	2023.01.02
[Paper] Swin Transformer: Hierarchical Vision Transformer using Shifted Windows (0)	2022.11.16
Panoptic segmentation (UPSNet, VPSNet)와 Landmark Localization (0)	2022.11.14
Instance Segmentation (Mask R-CNN, YOLACT, YolactEdge) (0)	2022.11.14

JADE's Repository