Search Autocomplete System Design - null | AI Insight Note

검색 자동완성(Search Autocomplete)은 사용자가 타이핑할 때 실시간으로 검색어를 추천하는 시스템이다. Google, 네이버 검색창이 대표적 예다.

요구사항

기능 요구사항:
- 접두어 기반 상위 5개 검색어 추천
- 응답 시간 100ms 이하
- 최근 2주 인기 검색어 반영

규모:
- 일 1억 검색 (초당 ~1160회)
- QPS 피크: 초당 5,000회
- 검색어 후보: 수억 개

Trie 자료구조

python

from collections import defaultdict

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end = False
        self.frequency = 0
        self.top_searches = []  # 캐시된 상위 검색어

class AutocompleteSystem:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word: str, freq: int):
        node = self.root
        for ch in word:
            if ch not in node.children:
                node.children[ch] = TrieNode()
            node = node.children[ch]
        node.is_end = True
        node.frequency = freq

    def search(self, prefix: str, top_k: int = 5) -> list:
        node = self.root
        for ch in prefix:
            if ch not in node.children:
                return []
            node = node.children[ch]
        # 캐시된 결과 반환 (미리 계산)
        return node.top_searches[:top_k]

    def precompute_top_k(self, node: TrieNode, path: str, top_k: int):
        """모든 노드에 상위 k개 검색어 미리 계산"""
        if node.is_end:
            candidates = [(path, node.frequency)]
        else:
            candidates = []
        for ch, child in node.children.items():
            candidates.extend(self.precompute_top_k(child, path + ch, top_k))
        candidates.sort(key=lambda x: -x[1])
        node.top_searches = [w for w, _ in candidates[:top_k]]
        return candidates

분산 시스템 설계

데이터 수집 서비스:
- 검색 로그 수집 (Kafka)
- 배치 집계 (Spark, 주 1회)
- Trie 업데이트

쿼리 서비스:
- Trie 서버 (메모리 캐시)
- 접두어로 서버 분할 (샤딩)
  예: a-m → 서버1, n-z → 서버2

캐시 전략:
- 인기 접두어는 CDN/엣지 캐시
- TTL: 1시간

Search Autocomplete System Design검색 자동완성 설계

요구사항

Trie 자료구조

분산 시스템 설계

관련 문서