Random Forest - 인공지능 > 머신러닝 | AI Insight Note

랜덤 포레스트(Random Forest)는 다수의 결정 트리(Decision Tree)를 앙상블한 배깅 기반 학습 알고리즘이다. 각 트리가 무작위 샘플과 무작위 특성 부분집합으로 훈련되어 다양성을 확보하고, 분류/회귀 모두에 사용된다.

동작 원리

훈련 데이터
  ↓ Bootstrap 샘플링 (N회)
Tree-1 (무작위 특성 부분집합)
Tree-2 (무작위 특성 부분집합)
...
Tree-N
  ↓ 투표 (분류) / 평균 (회귀)
최종 예측

구현

python

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

rf = RandomForestClassifier(
    n_estimators=100,     # 트리 수
    max_features='sqrt',  # 각 분기 시 고려할 특성 수
    max_depth=None,       # 최대 깊이 (None=무제한)
    min_samples_split=2,
    random_state=42
)
rf.fit(X_train, y_train)
print(f"Accuracy: {accuracy_score(y_test, rf.predict(X_test)):.4f}")

# 특성 중요도
importances = rf.feature_importances_

하이퍼파라미터 조정

파라미터	역할	조정 방향
n_estimators	트리 수	많을수록 좋음 (느려짐)
max_depth	트리 깊이	작을수록 과적합 방지
max_features	특성 샘플링 비율	기본: sqrt(특성수)
min_samples_leaf	리프 최소 샘플	클수록 정규화

특성 중요도 (Feature Importance)

랜덤 포레스트는 각 특성이 얼마나 불순도 감소에 기여했는지로 중요도를 계산한다. 특성 선택과 해석에 활용.

Random Forest랜덤 포레스트

동작 원리

구현

하이퍼파라미터 조정

특성 중요도 (Feature Importance)

관련 개념

관련 노트

L2 정규화L2 Regularization

하이퍼파라미터 튜닝Hyperparameter Tuning

배치 크기Batch Size