Edge AI - 클라우드 & 인프라 > 엣지 컴퓨팅 | AI Insight Note

엣지 AI(Edge AI)는 클라우드가 아닌 데이터 생성 위치(기기, 게이트웨이)에서 직접 AI 추론을 수행하는 기술이다. 지연 시간 감소, 프라이버시 보호, 오프라인 동작이 핵심 장점이다.

클라우드 AI vs 엣지 AI

구분	클라우드 AI	엣지 AI
추론 위치	원격 서버	로컬 디바이스
지연 시간	50~500ms	1~10ms
프라이버시	데이터 전송 필요	데이터 로컬 처리
대역폭	많이 필요	최소화
오프라인	불가	가능
모델 크기	무제한	MB~GB 제한

모델 최적화 기법

python

# 양자화 (Quantization) - TensorFlow Lite
import tensorflow as tf

# Float32 모델 → INT8 양자화
converter = tf.lite.TFLiteConverter.from_saved_model(model_path)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.int8]
# 대표 데이터셋으로 캘리브레이션
converter.representative_dataset = representative_data_gen
tflite_model = converter.convert()
# 결과: 모델 크기 4배 감소, 추론 속도 2~4배 향상

# 가지치기 (Pruning)
import tensorflow_model_optimization as tfmot
pruning_params = {
    'pruning_schedule': tfmot.sparsity.keras.PolynomialDecay(
        initial_sparsity=0.50,
        final_sparsity=0.80,
        begin_step=0, end_step=1000)
}
pruned_model = tfmot.sparsity.keras.prune_low_magnitude(model, **pruning_params)

엣지 AI 하드웨어

하드웨어	성능	전력	용도
NVIDIA Jetson Orin	275 TOPS	15~60W	로봇, 자율주행
Google Coral TPU	4 TOPS	2W	임베디드
Apple Neural Engine	35 TOPS	<5W	iPhone
Raspberry Pi 5	~1 TOPS	12W	프로토타입
Qualcomm AI Engine	48 TOPS	<5W	스마트폰

추론 프레임워크

bash

# TensorFlow Lite 추론 (C++)
# include "tensorflow/lite/interpreter.h"

// 모델 로드
auto model = tflite::FlatBufferModel::BuildFromFile("model.tflite");
tflite::InterpreterBuilder builder(*model, resolver);
builder(&interpreter);
interpreter->AllocateTensors();

// 입력 → 추론 → 출력
float* input = interpreter->typed_input_tensor<float>(0);
memcpy(input, input_data, input_size);
interpreter->Invoke();
float* output = interpreter->typed_output_tensor<float>(0);

Edge AI엣지 AI

클라우드 AI vs 엣지 AI

모델 최적화 기법

엣지 AI 하드웨어

추론 프레임워크

관련 문서