Layer Normalization - 인공지능 > 딥러닝 | AI Insight Note

레이어 정규화(Layer Normalization, LN)는 각 샘플의 특성 차원에 걸쳐 정규화를 수행하는 기법이다. 배치 정규화와 달리 배치 크기에 독립적이어서 트랜스포머, RNN, 소배치 학습에 적합하다.

수식

입력 $x \in \mathbb{R}^d$에 대해 (배치 내 단일 샘플):

$$\hat{x}_i = \frac{x_i - \mu}{\sqrt{\sigma^2 + \epsilon}}, \quad \mu = \frac{1}{d}\sum_j x_j, \quad \sigma^2 = \frac{1}{d}\sum_j (x_j - \mu)^2$$

구현

python

import torch
import torch.nn as nn

class LayerNorm(nn.Module):
    def __init__(self, normalized_shape, eps=1e-5):
        super().__init__()
        if isinstance(normalized_shape, int):
            normalized_shape = (normalized_shape,)
        self.normalized_shape = normalized_shape
        self.eps   = eps
        self.gamma = nn.Parameter(torch.ones(normalized_shape))
        self.beta  = nn.Parameter(torch.zeros(normalized_shape))

    def forward(self, x):
        # 마지막 len(normalized_shape)개 차원에 걸쳐 정규화
        dims = tuple(range(-len(self.normalized_shape), 0))
        mean = x.mean(dim=dims, keepdim=True)
        var  = x.var(dim=dims, keepdim=True, unbiased=False)
        x_hat = (x - mean) / (var + self.eps).sqrt()
        return self.gamma * x_hat + self.beta

# 트랜스포머에서의 사용
ln = nn.LayerNorm(512)  # embed_dim
x  = torch.randn(32, 128, 512)  # (batch, seq, embed)
print(ln(x).shape)  # (32, 128, 512)

Pre-LN vs Post-LN

python

# Post-LN (원본 Transformer)
def post_ln_transformer_block(x, attn, ffn, ln1, ln2):
    x = ln1(x + attn(x))
    x = ln2(x + ffn(x))
    return x

# Pre-LN (현대 대부분의 LLM: GPT-2 이후)
def pre_ln_transformer_block(x, attn, ffn, ln1, ln2):
    x = x + attn(ln1(x))
    x = x + ffn(ln2(x))
    return x

# Pre-LN: 학습 안정성 우수, 깊은 모델에 유리
# Post-LN: 성능 미세하게 우수, 불안정할 수 있음

RMS Norm (최근 LLM 주류)

python

class RMSNorm(nn.Module):
    """평균 빼기 생략 → 연산 효율화 (LLaMA 등에서 사용)"""
    def __init__(self, d, eps=1e-6):
        super().__init__()
        self.eps = eps
        self.weight = nn.Parameter(torch.ones(d))

    def forward(self, x):
        rms = x.pow(2).mean(-1, keepdim=True).add(self.eps).sqrt()
        return x / rms * self.weight

Layer Normalization레이어 정규화

수식

구현

Pre-LN vs Post-LN

RMS Norm (최근 LLM 주류)

관련 개념

관련 노트

생성적 적대 신경망Generative Adversarial Network

합성곱 신경망Convolutional Neural Network

완전 연결 신경망Fully Connected Network