Attention Mechanism - 인공지능 > 딥러닝 | AI Insight Note

어텐션 메커니즘(Attention Mechanism)은 시퀀스 처리 시 모든 위치를 동시에 참조해 중요한 부분에 집중하는 기법이다. 2015년 Bahdanau et al.이 기계 번역에 적용하면서 주목받았고, 이후 트랜스포머(Transformer) 아키텍처의 핵심이 되었다.

핵심 개념

어텐션은 Query, Key, Value 세 요소로 구성된다.

Attention(Q, K, V) = softmax(QKᵀ / √d_k) · V

•Query(Q): 현재 처리하는 위치의 표현
•Key(K): 참조할 위치들의 표현
•Value(V): 실제 정보 내용
•√d_k: 스케일링 팩터 (기울기 안정화)

Self-Attention 구현

python

import torch
import torch.nn.functional as F
import math

def scaled_dot_product_attention(Q, K, V, mask=None):
    d_k = Q.size(-1)
    scores = torch.matmul(Q, K.transpose(-2, -1)) / math.sqrt(d_k)
    if mask is not None:
        scores = scores.masked_fill(mask == 0, -1e9)
    attn_weights = F.softmax(scores, dim=-1)
    return torch.matmul(attn_weights, V), attn_weights

# 예시: seq_len=5, d_model=64
Q = torch.randn(1, 5, 64)
K = torch.randn(1, 5, 64)
V = torch.randn(1, 5, 64)
output, weights = scaled_dot_product_attention(Q, K, V)
print(output.shape)  # (1, 5, 64)

어텐션 종류

종류	설명	사용처
소프트 어텐션	모든 위치 가중 합산	번역, 요약
하드 어텐션	한 위치만 선택 (비미분)	이미지 캡션
셀프 어텐션	같은 시퀀스 내 관계	트랜스포머
멀티헤드 어텐션	여러 표현 공간 병렬	BERT, GPT
크로스 어텐션	다른 시퀀스 참조	인코더-디코더

Attention Mechanism어텐션 메커니즘

핵심 개념

Self-Attention 구현

어텐션 종류

관련 개념

관련 노트

생성적 적대 신경망Generative Adversarial Network

합성곱 신경망Convolutional Neural Network

완전 연결 신경망Fully Connected Network