Weight Initialization - 인공지능 > 딥러닝 | AI Insight Note

가중치 초기화(Weight Initialization)는 신경망 학습 시작 시 가중치 초기값을 설정하는 방법이다. 잘못된 초기화는 기울기 소실(Vanishing Gradient) 또는 기울기 폭발(Exploding Gradient)을 유발해 학습을 방해한다.

핵심 문제

모든 가중치 = 0:  모든 뉴런이 동일하게 업데이트 → 대칭 파괴 불가
너무 큰 초기값:   기울기 폭발
너무 작은 초기값: 기울기 소실

초기화 방법

python

import torch
import torch.nn as nn
import numpy as np

def initialize_weights(layer, method='kaiming'):
    if isinstance(layer, (nn.Linear, nn.Conv2d)):
        if method == 'xavier_uniform':
            # Xavier/Glorot 균등: sigmoid/tanh 활성화에 적합
            nn.init.xavier_uniform_(layer.weight)

        elif method == 'xavier_normal':
            # Xavier 정규분포
            nn.init.xavier_normal_(layer.weight)

        elif method == 'kaiming_uniform':
            # He 초기화 균등: ReLU 계열에 적합
            nn.init.kaiming_uniform_(layer.weight, nonlinearity='relu')

        elif method == 'kaiming_normal':
            # He 초기화 정규분포
            nn.init.kaiming_normal_(layer.weight, nonlinearity='relu')

        elif method == 'orthogonal':
            # 직교 초기화: RNN에 유용
            nn.init.orthogonal_(layer.weight)

        if layer.bias is not None:
            nn.init.zeros_(layer.bias)

# 모델에 적용
model = nn.Sequential(
    nn.Linear(784, 256), nn.ReLU(),
    nn.Linear(256, 128), nn.ReLU(),
    nn.Linear(128, 10)
)
model.apply(lambda m: initialize_weights(m, 'kaiming_normal'))

수식 비교

방법	분산	적합 활성화
랜덤 (N(0,1))	1	없음
Xavier	2/(fan_in + fan_out)	sigmoid, tanh
He (Kaiming)	2/fan_in	ReLU, Leaky ReLU
LeCun	1/fan_in	SELU

실험: 초기화에 따른 기울기 흐름

python

def check_gradient_flow(model, x):
    """각 레이어의 기울기 크기 확인"""
    x.requires_grad_(True)
    y = model(x)
    y.sum().backward()

    for name, param in model.named_parameters():
        if param.grad is not None:
            grad_norm = param.grad.norm().item()
            print(f"{name}: grad_norm = {grad_norm:.6f}")

x = torch.randn(32, 784)
check_gradient_flow(model, x)

Weight Initialization가중치 초기화

핵심 문제

초기화 방법

수식 비교

실험: 초기화에 따른 기울기 흐름

관련 개념

관련 노트

생성적 적대 신경망Generative Adversarial Network

합성곱 신경망Convolutional Neural Network

완전 연결 신경망Fully Connected Network