분산 시스템 Rate Limiting 완벽 구현 가이드: Redis + Token Bucket으로 API 남용 막기

월요일 오전 9시, API 서버의 CPU가 100%를 찍습니다. 로그를 확인하니 한 사용자가 초당 10,000개의 요청을 보내고 있습니다. 정상 사용자들은 503 에러를 받고, 고객 지원팀에 전화가 폭주합니다. 클라우드 비용은 평소의 10배로 치솟았고, 경영진이 긴급 회의를 소집합니다.

이런 상황을 어떻게 막을 수 있을까요?

답은 Rate Limiting(속도 제한)입니다. 이 글에서는 프로덕션 환경에서 검증된 분산 Rate Limiting 시스템을 Redis와 Token Bucket 알고리즘으로 구축하는 방법을 실전 코드와 함께 상세히 알려드립니다.

Rate Limiting이 왜 필수인가?

실제 발생하는 문제들

1. API 남용 (API Abuse)

// 악의적 사용자의 스크립트
for (let i = 0; i < 1000000; i++) {
 fetch('https://api.example.com/expensive-operation')
.then(response => response.json())
.then(data => console.log(data))
}

// 결과:
// - 100만 건의 요청이 동시에 발생
// - 데이터베이스 과부하
// - 정상 사용자 서비스 불가
// - 클라우드 비용 폭증

비용 영향:

정상 트래픽: 100 req/sec
악의적 트래픽: 10,000 req/sec (100배)

AWS Lambda 비용:
정상: $100/월
공격 시: $10,000/월

데이터베이스 연결:
정상: 50개
공격 시: 5,000개 → 연결 풀 고갈!

2. DDoS 공격

# 분산 서비스 거부 공격
# 100개 IP에서 동시에 공격
for ip in $(cat bot-ips.txt); do
 curl -X POST https://api.example.com/signup \
 -H "X-Forwarded-For: $ip" \
 -d '{"email":"fake@email.com"}' &
done

# 1분에 100,000개 회원가입 요청!
# 이메일 발송 비용만 $5,000

3. 크롤러/봇 트래픽

# 무례한 크롤러
import requests

for page in range(1, 100000):
 response = requests.get(
 f'https://api.example.com/products?page={page}'
 )
 # 0.001초 간격으로 요청 (초당 1000회!)
 time.sleep(0.001)

# 결과:
# - SEO 봇이 아닌 악의적 스크래핑
# - 서버 리소스 낭비
# - 정상 사용자 응답 지연

4. 실수로 인한 루프

// 개발자의 실수
function updateDashboard() {
 fetch('/api/stats')
.then(data => {
 renderChart(data)
 updateDashboard() // ️ 재귀 호출 종료 조건 없음!
 })
}

updateDashboard() // 무한 루프 시작!

// 프로덕션 배포 후...
// 수천 명의 사용자가 동시에 무한 요청
// → 서버 다운!

Rate Limiting의 경제적 가치

비용 절감:

시나리오: e-커머스 API

Without Rate Limiting:
- API 호출: 1억 건/월
- AWS 비용: $50,000
- DB 비용: $30,000
- 총: $80,000/월

With Rate Limiting:
- API 호출: 2천만 건/월 (정상 트래픽만)
- AWS 비용: $10,000
- DB 비용: $6,000
- 총: $16,000/월

절감액: $64,000/월 = $768,000/년!

Rate Limiting 알고리즘 완벽 비교

1. Token Bucket (토큰 버킷) ⭐ 추천!

작동 원리:

버킷 용량: 100 토큰
재충전 속도: 10 토큰/초

[토큰 버킷]
┌─────────────┐
│ ○ ○ ○ ○ ○ │ 토큰: 100개
│ ○ ○ ○ ○ ○ │
│ ○ ○ ○ ○ ○ │
└─────────────┘

요청 1개 = 토큰 1개 소비
1초마다 10개 토큰 충전

구현 예시:

class TokenBucket:
 def __init__(self, capacity, refill_rate):
 self.capacity = capacity # 최대 100개
 self.tokens = capacity # 현재 100개
 self.refill_rate = refill_rate # 10개/초
 self.last_refill = time.time()

 def consume(self, tokens=1):
 # 시간 경과에 따라 토큰 재충전
 now = time.time()
 elapsed = now - self.last_refill
 refill = int(elapsed * self.refill_rate)

 self.tokens = min(
 self.capacity,
 self.tokens + refill
 )
 self.last_refill = now

 # 토큰이 충분한지 확인
 if self.tokens >= tokens:
 self.tokens -= tokens
 return True # 요청 허용
 return False # 요청 거부

장점:

버스트 트래픽 처리: 순간적으로 100개 요청 가능
유연성: 평소엔 10 req/sec, 필요시 100 req/sec
사용자 친화적: 일시적 과부하 허용

단점:

구현 복잡도 높음
분산 환경에서 동기화 필요

2. Leaky Bucket (새는 양동이)

작동 원리:

[큐에 요청 쌓임]
 ↓ ↓ ↓
┌─────────────┐
│ REQ REQ REQ │
│ REQ REQ │ 큐 크기: 100
│ │
└──────┬──────┘
 ↓ 일정한 속도로 처리 (10 req/sec)
 [처리됨]

구현:

class LeakyBucket:
 def __init__(self, capacity, leak_rate):
 self.capacity = capacity
 self.queue = []
 self.leak_rate = leak_rate

 def add_request(self, request):
 if len(self.queue) < self.capacity:
 self.queue.append(request)
 return True
 return False # 큐 가득 참

 def process(self):
 # 일정한 속도로 처리
 while self.queue:
 request = self.queue.pop(0)
 handle_request(request)
 time.sleep(1 / self.leak_rate)

장점:

일정한 처리 속도: 후단 시스템 보호
구현 간단

단점:

버스트 트래픽 처리 불가
큐 관리 필요 (메모리 사용)

3. Fixed Window (고정 윈도우)

작동 원리:

분 단위로 카운트

00:00 - 00:59: 100 requests
01:00 - 01:59: 100 requests
02:00 - 02:59: 100 requests

각 윈도우마다 리셋

구현:

import time

class FixedWindow:
 def __init__(self, limit, window_size):
 self.limit = limit # 100 requests
 self.window_size = window_size # 60 seconds
 self.counter = {}

 def allow_request(self, user_id):
 current_window = int(time.time() / self.window_size)
 key = f"{user_id}:{current_window}"

 count = self.counter.get(key, 0)

 if count < self.limit:
 self.counter[key] = count + 1
 return True
 return False

장점:

매우 간단한 구현
메모리 효율적

단점:

경계 문제: 윈도우 경계에서 2배 요청 가능

00:59 - 01:00 사이:
00:59:50 ~ 00:59:59: 100 requests
01:00:00 ~ 01:00:09: 100 requests
→ 20초간 200 requests! (Rate limit 우회)

4. Sliding Window Log

작동 원리:

최근 60초간의 모든 요청 타임스탬프 기록

현재: 10:05:30

로그:
10:04:31 (59초 전)
10:04:45 (45초 전)
10:05:10 (20초 전)
10:05:29 (1초 전)

10:03:00 (90초 전 - 제외)

구현:

from collections import deque
import time

class SlidingWindowLog:
 def __init__(self, limit, window_size):
 self.limit = limit
 self.window_size = window_size
 self.requests = {} # {user_id: deque([timestamps])}

 def allow_request(self, user_id):
 now = time.time()
 window_start = now - self.window_size

 if user_id not in self.requests:
 self.requests[user_id] = deque()

 user_log = self.requests[user_id]

 # 오래된 로그 제거
 while user_log and user_log[0] < window_start:
 user_log.popleft()

 if len(user_log) < self.limit:
 user_log.append(now)
 return True
 return False

장점:

정확한 제한: 경계 문제 없음
공정한 분배

단점:

메모리 많이 사용: 모든 타임스탬프 저장
성능 이슈: 대규모 트래픽에서 느림

알고리즘 선택 가이드

| 알고리즘 | 구현 난이도 | 정확도 | 메모리 | 버스트 처리 | 추천 용도 |
|----------------|------------|-------|--------|-----------|----------------------------|
| Token Bucket | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ | | **API Gateway (최고추천)** |
| Leaky Bucket | ⭐⭐ | ⭐⭐⭐ | ⭐⭐ | | 백그라운드 작업 처리 |
| Fixed Window | ⭐ | ⭐ | ⭐ | | 간단한 프로토타입 |
| Sliding Log | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | | 정확한 제한이 필요한 경우 |

프로덕션 추천: Token Bucket + Redis ⭐

Redis로 분산 Rate Limiting 구현하기

왜 Redis인가?

1. 빠른 성능

메모리 기반 저장소
응답 시간: < 1ms
처리량: 100,000 ops/sec

2. 원자성 (Atomicity)

Lua 스크립트로 Race Condition 방지
여러 서버에서 동시 접근해도 안전

3. 만료 기능 (TTL)

자동으로 오래된 데이터 삭제
메모리 관리 불필요

기본 구현 (Node.js + Redis)

설치:

npm install redis ioredis

간단한 Fixed Window 구현:

const Redis = require('ioredis')
const redis = new Redis({
 host: 'localhost',
 port: 6379
})

async function rateLimit(userId, limit = 100, window = 60) {
 const key = `rate_limit:${userId}:${Math.floor(Date.now() / 1000 / window)}`

 // 현재 카운트 증가
 const current = await redis.incr(key)

 // 첫 요청이면 TTL 설정
 if (current === 1) {
 await redis.expire(key, window)
 }

 // 제한 확인
 if (current > limit) {
 const ttl = await redis.ttl(key)
 throw new Error(`Rate limit exceeded. Retry after ${ttl} seconds`)
 }

 return {
 allowed: true,
 remaining: limit - current
 }
}

// 사용 예시
app.get('/api/data', async (req, res) => {
 try {
 const result = await rateLimit(req.user.id, 100, 60)

 res.set('X-RateLimit-Limit', '100')
 res.set('X-RateLimit-Remaining', result.remaining)

 // 실제 API 로직
 res.json({ data: 'success' })
 } catch (error) {
 res.status(429).json({ error: error.message })
 }
})

문제점:

// ️ Race Condition 발생!
// 두 서버에서 동시에 요청

Server 1: current = redis.incr(key) // 99
Server 2: current = redis.incr(key) // 100

Server 1: if (current === 1) redis.expire(key, 60) // 실행 안됨
Server 2: if (current === 1) redis.expire(key, 60) // 실행 안됨

// 결과: TTL이 설정되지 않음!
// → 키가 영원히 Redis에 남음 (메모리 누수)

프로덕션급 구현: Lua 스크립트

Lua로 원자성 보장:

-- rate_limit.lua
-- Token Bucket 알고리즘

local key = KEYS[1]
local capacity = tonumber(ARGV[1]) -- 최대 100
local rate = tonumber(ARGV[2]) -- 10 tokens/sec
local requested = tonumber(ARGV[3]) -- 요청 토큰 수
local now = tonumber(ARGV[4])

-- Redis에서 현재 상태 가져오기
local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1])
local last_refill = tonumber(bucket[2])

-- 초기화
if tokens == nil then
 tokens = capacity
 last_refill = now
end

-- 경과 시간 계산
local elapsed = now - last_refill

-- 토큰 재충전
local refill = math.floor(elapsed * rate)
tokens = math.min(capacity, tokens + refill)

-- 마지막 재충전 시간 업데이트
if refill > 0 then
 last_refill = now
end

-- 요청 처리
local allowed = 0
local remaining = tokens

if tokens >= requested then
 tokens = tokens - requested
 allowed = 1
 remaining = tokens
end

-- Redis 업데이트
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', last_refill)
redis.call('EXPIRE', key, 3600) -- 1시간 TTL

-- 결과 반환
return {allowed, remaining, capacity}

Node.js에서 사용:

const fs = require('fs')
const Redis = require('ioredis')

const redis = new Redis()

// Lua 스크립트 로드
const rateLimitScript = fs.readFileSync('./rate_limit.lua', 'utf8')

async function tokenBucketRateLimit(
 userId,
 capacity = 100, // 최대 100 토큰
 rate = 10, // 초당 10 토큰 재충전
 requested = 1 // 요청 토큰 수
) {
 const key = `rate_limit:${userId}`
 const now = Date.now() / 1000 // 초 단위

 const result = await redis.eval(
 rateLimitScript,
 1, // 키 개수
 key,
 capacity,
 rate,
 requested,
 now
 )

 const [allowed, remaining, limit] = result

 return {
 allowed: allowed === 1,
 remaining: remaining,
 limit: limit,
 retryAfter: allowed === 0 ? Math.ceil((requested - remaining) / rate) : null
 }
}

// Express 미들웨어
function rateLimitMiddleware(options = {}) {
 const {
 capacity = 100,
 rate = 10,
 getUserId = (req) => req.user?.id || req.ip
 } = options

 return async (req, res, next) => {
 const userId = getUserId(req)

 try {
 const result = await tokenBucketRateLimit(userId, capacity, rate, 1)

 // 헤더 설정
 res.set('X-RateLimit-Limit', result.limit)
 res.set('X-RateLimit-Remaining', result.remaining)

 if (!result.allowed) {
 res.set('Retry-After', result.retryAfter)
 return res.status(429).json({
 error: 'Too Many Requests',
 retryAfter: result.retryAfter
 })
 }

 next()
 } catch (error) {
 console.error('Rate limit error:', error)
 // 에러 시 요청 허용 (fail-open)
 next()
 }
 }
}

// 사용
app.use('/api', rateLimitMiddleware({
 capacity: 1000, // 버스트: 1000 요청
 rate: 100 // 평균: 100 req/sec
}))

Java + Spring Boot 구현

build.gradle:

dependencies {
 implementation 'org.springframework.boot:spring-boot-starter-data-redis'
 implementation 'io.lettuce:lettuce-core'
 implementation 'com.github.vladimir-bukhtoyarov:bucket4j-core:7.6.0'
 implementation 'com.github.vladimir-bukhtoyarov:bucket4j-redis:7.6.0'
}

RateLimitService.java:

@Service
public class RateLimitService {

 @Autowired
 private StringRedisTemplate redisTemplate;

 private static final String RATE_LIMIT_SCRIPT =
 "local key = KEYS[1]\n" +
 "local capacity = tonumber(ARGV[1])\n" +
 "local rate = tonumber(ARGV[2])\n" +
 "local requested = tonumber(ARGV[3])\n" +
 "local now = tonumber(ARGV[4])\n" +
 "\n" +
 "local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')\n" +
 "local tokens = tonumber(bucket[1]) or capacity\n" +
 "local last_refill = tonumber(bucket[2]) or now\n" +
 "\n" +
 "local elapsed = now - last_refill\n" +
 "local refill = math.floor(elapsed * rate)\n" +
 "tokens = math.min(capacity, tokens + refill)\n" +
 "\n" +
 "if refill > 0 then\n" +
 " last_refill = now\n" +
 "end\n" +
 "\n" +
 "local allowed = 0\n" +
 "if tokens >= requested then\n" +
 " tokens = tokens - requested\n" +
 " allowed = 1\n" +
 "end\n" +
 "\n" +
 "redis.call('HMSET', key, 'tokens', tokens, 'last_refill', last_refill)\n" +
 "redis.call('EXPIRE', key, 3600)\n" +
 "\n" +
 "return {allowed, tokens, capacity}\n";

 public RateLimitResult checkRateLimit(
 String userId,
 int capacity,
 double rate,
 int requested
 ) {
 String key = "rate_limit:" + userId;
 double now = System.currentTimeMillis() / 1000.0;

 DefaultRedisScript<List> script = new DefaultRedisScript<>();
 script.setScriptText(RATE_LIMIT_SCRIPT);
 script.setResultType(List.class);

 List<Long> result = redisTemplate.execute(
 script,
 Collections.singletonList(key),
 String.valueOf(capacity),
 String.valueOf(rate),
 String.valueOf(requested),
 String.valueOf(now)
 );

 boolean allowed = result.get(0) == 1;
 int remaining = result.get(1).intValue();
 int limit = result.get(2).intValue();

 return new RateLimitResult(allowed, remaining, limit);
 }
}

@Data
@AllArgsConstructor
public class RateLimitResult {
 private boolean allowed;
 private int remaining;
 private int limit;
}

RateLimitInterceptor.java:

@Component
public class RateLimitInterceptor implements HandlerInterceptor {

 @Autowired
 private RateLimitService rateLimitService;

 @Override
 public boolean preHandle(
 HttpServletRequest request,
 HttpServletResponse response,
 Object handler
 ) throws Exception {
 String userId = getUserId(request);

 RateLimitResult result = rateLimitService.checkRateLimit(
 userId,
 1000, // capacity
 100, // rate (tokens/sec)
 1 // requested tokens
 );

 response.setHeader("X-RateLimit-Limit", String.valueOf(result.getLimit()));
 response.setHeader("X-RateLimit-Remaining", String.valueOf(result.getRemaining()));

 if (!result.isAllowed()) {
 response.setStatus(429);
 response.setHeader("Retry-After", "10");
 response.getWriter().write("{\"error\":\"Too Many Requests\"}");
 return false;
 }

 return true;
 }

 private String getUserId(HttpServletRequest request) {
 // JWT 토큰에서 userId 추출 또는 IP 주소 사용
 String token = request.getHeader("Authorization");
 if (token != null) {
 // JWT 파싱 로직
 return extractUserIdFromToken(token);
 }
 return request.getRemoteAddr();
 }
}

Python + FastAPI 구현

from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import JSONResponse
import redis.asyncio as redis
import time
import math

app = FastAPI()

# Redis 연결
redis_client = redis.from_url("redis://localhost:6379")

# Lua 스크립트
RATE_LIMIT_SCRIPT = """
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local rate = tonumber(ARGV[2])
local requested = tonumber(ARGV[3])
local now = tonumber(ARGV[4])

local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now

local elapsed = now - last_refill
local refill = math.floor(elapsed * rate)
tokens = math.min(capacity, tokens + refill)

if refill > 0 then
 last_refill = now
end

local allowed = 0
if tokens >= requested then
 tokens = tokens - requested
 allowed = 1
end

redis.call('HMSET', key, 'tokens', tokens, 'last_refill', last_refill)
redis.call('EXPIRE', key, 3600)

return {allowed, tokens, capacity}
"""

async def check_rate_limit(
 user_id: str,
 capacity: int = 100,
 rate: float = 10,
 requested: int = 1
):
 key = f"rate_limit:{user_id}"
 now = time.time()

 result = await redis_client.eval(
 RATE_LIMIT_SCRIPT,
 1,
 key,
 capacity,
 rate,
 requested,
 now
 )

 allowed = result[0] == 1
 remaining = int(result[1])
 limit = int(result[2])

 return {
 "allowed": allowed,
 "remaining": remaining,
 "limit": limit,
 "retry_after": math.ceil((requested - remaining) / rate) if not allowed else None
 }

# 미들웨어
@app.middleware("http")
async def rate_limit_middleware(request: Request, call_next):
 # IP 주소 또는 사용자 ID
 user_id = request.client.host

 result = await check_rate_limit(user_id, capacity=1000, rate=100)

 response = await call_next(request)

 response.headers["X-RateLimit-Limit"] = str(result["limit"])
 response.headers["X-RateLimit-Remaining"] = str(result["remaining"])

 if not result["allowed"]:
 return JSONResponse(
 status_code=429,
 content={"error": "Too Many Requests"},
 headers={"Retry-After": str(result["retry_after"])}
 )

 return response

@app.get("/api/data")
async def get_data():
 return {"message": "Success"}

엔터프라이즈급 고급 기능

1. 계층별 Rate Limit

사용자 등급에 따라 다른 제한 적용:

const RATE_LIMITS = {
 free: { capacity: 100, rate: 10 }, // 100 burst, 10/sec
 pro: { capacity: 1000, rate: 100 }, // 1000 burst, 100/sec
 enterprise: { capacity: 10000, rate: 1000 } // 10000 burst, 1000/sec
}

async function getUserRateLimit(userId) {
 const user = await db.users.findOne({ id: userId })
 const tier = user.subscription || 'free'
 return RATE_LIMITS[tier]
}

app.use('/api', async (req, res, next) => {
 const userId = req.user.id
 const limits = await getUserRateLimit(userId)

 const result = await tokenBucketRateLimit(
 userId,
 limits.capacity,
 limits.rate,
 1
 )

 if (!result.allowed) {
 // 업그레이드 제안
 return res.status(429).json({
 error: 'Rate limit exceeded',
 message: 'Upgrade to Pro for higher limits',
 upgradeUrl: '/pricing'
 })
 }

 next()
})

2. 동적 Rate Limit (서버 부하 기반)

async function getDynamicRateLimit() {
 const cpuUsage = await os.cpus().reduce((acc, cpu) => {
 const total = Object.values(cpu.times).reduce((a, b) => a + b)
 const idle = cpu.times.idle
 return acc + (1 - idle / total)
 }, 0) / os.cpus().length

 const memUsage = 1 - os.freemem() / os.totalmem()

 // 서버 부하가 높으면 제한 강화
 if (cpuUsage > 0.8 || memUsage > 0.8) {
 return { capacity: 50, rate: 5 } // 엄격
 } else if (cpuUsage > 0.5 || memUsage > 0.5) {
 return { capacity: 100, rate: 10 } // 보통
 } else {
 return { capacity: 200, rate: 20 } // 관대
 }
}

3. 엔드포인트별 다른 제한

const ENDPOINT_LIMITS = {
 '/api/search': { capacity: 10, rate: 1 }, // 비용이 높은 검색
 '/api/user': { capacity: 100, rate: 10 }, // 일반 조회
 '/api/upload': { capacity: 5, rate: 0.5 }, // 파일 업로드
 '/api/stats': { capacity: 1000, rate: 100 } // 가벼운 통계
}

app.use('/api', async (req, res, next) => {
 const endpoint = req.path
 const limits = ENDPOINT_LIMITS[endpoint] || { capacity: 100, rate: 10 }

 const result = await tokenBucketRateLimit(
 `${req.user.id}:${endpoint}`,
 limits.capacity,
 limits.rate,
 1
 )

 if (!result.allowed) {
 return res.status(429).json({
 error: 'Rate limit exceeded for ' + endpoint
 })
 }

 next()
})

4. IP 기반 + 사용자 기반 복합 제한

async function dualRateLimit(req) {
 const ip = req.ip
 const userId = req.user?.id

 // 1. IP 제한 (DDoS 방어)
 const ipLimit = await tokenBucketRateLimit(
 `ip:${ip}`,
 500, // IP당 500 burst
 50 // 50/sec
 )

 if (!ipLimit.allowed) {
 throw new Error('IP rate limit exceeded')
 }

 // 2. 사용자 제한 (API 남용 방어)
 if (userId) {
 const userLimit = await tokenBucketRateLimit(
 `user:${userId}`,
 1000, // 사용자당 1000 burst
 100 // 100/sec
 )

 if (!userLimit.allowed) {
 throw new Error('User rate limit exceeded')
 }
 }

 return { allowed: true }
}

프로덕션 배포 전략

1. Redis Cluster 구성

고가용성을 위한 클러스터:

# docker-compose.yml
version: '3'
services:
 redis-master:
 image: redis:7-alpine
 command: redis-server --appendonly yes
 ports:
 - "6379:6379"
 volumes:
 - redis-master-data:/data

 redis-replica-1:
 image: redis:7-alpine
 command: redis-server --slaveof redis-master 6379 --appendonly yes
 volumes:
 - redis-replica-1-data:/data
 depends_on:
 - redis-master

 redis-replica-2:
 image: redis:7-alpine
 command: redis-server --slaveof redis-master 6379 --appendonly yes
 volumes:
 - redis-replica-2-data:/data
 depends_on:
 - redis-master

 redis-sentinel-1:
 image: redis:7-alpine
 command: >
 redis-sentinel /etc/redis/sentinel.conf
 --sentinel monitor mymaster redis-master 6379 2
 --sentinel down-after-milliseconds mymaster 5000
 --sentinel parallel-syncs mymaster 1
 --sentinel failover-timeout mymaster 10000
 depends_on:
 - redis-master

volumes:
 redis-master-data:
 redis-replica-1-data:
 redis-replica-2-data:

2. 모니터링 대시보드

Prometheus + Grafana:

const prometheus = require('prom-client')

// 메트릭 정의
const rateLimitCounter = new prometheus.Counter({
 name: 'rate_limit_requests_total',
 help: 'Total rate limit checks',
 labelNames: ['user', 'result'] // result: allowed, blocked
})

const rateLimitHistogram = new prometheus.Histogram({
 name: 'rate_limit_duration_seconds',
 help: 'Rate limit check duration',
 buckets: [0.001, 0.005, 0.01, 0.05, 0.1]
})

// 사용
async function tokenBucketRateLimitWithMetrics(userId, capacity, rate, requested) {
 const start = Date.now()

 const result = await tokenBucketRateLimit(userId, capacity, rate, requested)

 const duration = (Date.now() - start) / 1000
 rateLimitHistogram.observe(duration)

 rateLimitCounter.inc({
 user: userId,
 result: result.allowed ? 'allowed' : 'blocked'
 })

 return result
}

// Prometheus 엔드포인트
app.get('/metrics', async (req, res) => {
 res.set('Content-Type', prometheus.register.contentType)
 res.end(await prometheus.register.metrics())
})

Grafana 쿼리:

# Rate limit 차단율
rate(rate_limit_requests_total{result="blocked"}[5m])
/ rate(rate_limit_requests_total[5m]) * 100

# 평균 응답 시간
rate(rate_limit_duration_seconds_sum[5m])
/ rate(rate_limit_duration_seconds_count[5m])

# 사용자별 차단 횟수
topk(10, sum by (user) (rate_limit_requests_total{result="blocked"}))

3. 알림 설정

# prometheus-alerts.yml
groups:
- name: rate_limiting
 interval: 30s
 rules:
 # Rate limit 차단율이 높음
 - alert: HighRateLimitBlockRate
 expr: |
 rate(rate_limit_requests_total{result="blocked"}[5m])
 / rate(rate_limit_requests_total[5m]) > 0.5
 for: 5m
 labels:
 severity: warning
 annotations:
 summary: "High rate limit block rate"
 description: "{{ $value | humanizePercentage }} of requests are being blocked"

 # Redis 연결 실패
 - alert: RedisConnectionFailure
 expr: redis_up == 0
 for: 1m
 labels:
 severity: critical
 annotations:
 summary: "Redis is down"
 description: "Rate limiting may not work properly"

 # Rate limit 응답 시간 증가
 - alert: SlowRateLimitCheck
 expr: |
 rate(rate_limit_duration_seconds_sum[5m])
 / rate(rate_limit_duration_seconds_count[5m]) > 0.1
 for: 5m
 labels:
 severity: warning
 annotations:
 summary: "Rate limit checks are slow"
 description: "Average duration: {{ $value }}s"

트러블슈팅 가이드

문제 1: Redis 메모리 부족

증상:

Error: OOM command not allowed when used memory > 'maxmemory'

해결책:

# redis.conf
maxmemory 2gb
maxmemory-policy allkeys-lru # LRU로 오래된 키 제거

# 또는 휘발성 키만 제거
maxmemory-policy volatile-lru

애플리케이션 레벨 해결:

// TTL을 더 짧게 설정
redis.call('EXPIRE', key, 300) // 5분 (기존 1시간에서 단축)

// 또는 사용자별 다른 TTL
const ttl = isPremiumUser(userId) ? 3600 : 300
redis.call('EXPIRE', key, ttl)

문제 2: Race Condition (Lua 스크립트 미사용 시)

증상:

TTL이 설정되지 않아 키가 계속 쌓임

해결책:

// 잘못된 방법
const current = await redis.incr(key)
if (current === 1) {
 await redis.expire(key, 60) // Race condition!
}

// 올바른 방법 1: Lua 스크립트
const result = await redis.eval(luaScript,...)

// 올바른 방법 2: SETEX 사용
await redis.setex(key, 60, 1)
await redis.incr(key)

문제 3: 분산 환경에서 부정확한 카운팅

증상:

여러 서버에서 다른 Redis 인스턴스를 사용
카운트가 분산되어 제한이 제대로 작동 안함

해결책:

// 잘못된 샤딩
const serverId = process.env.SERVER_ID
const redis = new Redis({ host: `redis-${serverId}` })

// 올바른 샤딩: 일관된 해싱
const Consistent = require('consistent-hashing')
const ring = new Consistent()

ring.add('redis-1')
ring.add('redis-2')
ring.add('redis-3')

function getRedisForUser(userId) {
 const server = ring.get(userId) // 같은 userId는 항상 같은 서버
 return redisClients[server]
}

실전 체크리스트

### 구현 전 체크리스트

□ 알고리즘 선택 (Token Bucket 추천)
□ Redis 설치 및 구성
□ Lua 스크립트 작성
□ Rate limit 수치 결정
 - Capacity (버스트 허용량)
 - Rate (평균 처리율)
 - TTL (키 만료 시간)
□ 사용자 식별 방법 (IP, JWT, API Key)

### 배포 전 체크리스트

□ Redis Cluster 구성 (고가용성)
□ 모니터링 설정 (Prometheus + Grafana)
□ 알림 설정 (AlertManager)
□ 부하 테스트 완료
□ Fail-open 정책 확인 (Redis 장애 시)
□ 에러 메시지 사용자 친화적으로 작성
□ HTTP 헤더 추가
 - X-RateLimit-Limit
 - X-RateLimit-Remaining
 - Retry-After

### 운영 체크리스트

□ 일일 차단율 모니터링
□ Redis 메모리 사용률 체크
□ 응답 시간 추적
□ 사용자 피드백 수집
□ Rate limit 수치 조정
□ 로그 분석 (악의적 사용자 탐지)

결론: 완벽한 Rate Limiting 시스템

프로덕션 환경에서 Rate Limiting은 선택이 아닌 필수입니다. 이 가이드에서 다룬 내용을 요약하면:

핵심 원칙

Token Bucket + Redis - 가장 검증된 조합
Lua 스크립트 - Race Condition 완벽 방지
계층별 제한 - 비즈니스 모델에 맞춘 유연성
모니터링 - 실시간 추적 및 알림

비즈니스 가치

Rate Limiting 구현 비용: $1,000 (개발 + 인프라)
연간 절감 비용: $768,000 (API 남용 방지)

ROI: 76,700%

마지막 조언

“처음부터 완벽할 필요는 없습니다.”

1주차: 간단한 Fixed Window로 시작 2주차: Token Bucket으로 업그레이드 3주차: Redis Cluster 구성 4주차: 모니터링 및 알림 추가

점진적으로 개선하면서 프로덕션에 안착시키세요.

Rate Limiting은 한 번 구현하면 영원히 당신의 서비스를 지켜줍니다. 이제 악의적 사용자와 DDoS 공격으로부터 당신의 API를 보호하세요!

다음 글 예고: “GraphQL N+1 쿼리 문제 완벽 해결 가이드: DataLoader로 성능 100배 향상”

질문이나 경험 공유는 댓글로! Rate Limiting 구현 과정에서 겪은 어려움이 있으신가요?

분산 시스템 Rate Limiting 완벽 구현 가이드: Redis + Token Bucket으로 API 남용 막기

Rate Limiting이 왜 필수인가?

실제 발생하는 문제들

Rate Limiting의 경제적 가치

Rate Limiting 알고리즘 완벽 비교

1. Token Bucket (토큰 버킷) ⭐ 추천!

2. Leaky Bucket (새는 양동이)

3. Fixed Window (고정 윈도우)

4. Sliding Window Log

알고리즘 선택 가이드

Redis로 분산 Rate Limiting 구현하기

왜 Redis인가?

기본 구현 (Node.js + Redis)

프로덕션급 구현: Lua 스크립트

Java + Spring Boot 구현

Python + FastAPI 구현

엔터프라이즈급 고급 기능

1. 계층별 Rate Limit

2. 동적 Rate Limit (서버 부하 기반)

3. 엔드포인트별 다른 제한

4. IP 기반 + 사용자 기반 복합 제한

프로덕션 배포 전략

1. Redis Cluster 구성

2. 모니터링 대시보드

3. 알림 설정

트러블슈팅 가이드

문제 1: Redis 메모리 부족

문제 2: Race Condition (Lua 스크립트 미사용 시)

문제 3: 분산 환경에서 부정확한 카운팅

실전 체크리스트

결론: 완벽한 Rate Limiting 시스템

핵심 원칙

비즈니스 가치

추천 아키텍처

마지막 조언

글을 마치며

데이터베이스 Deadlock 완벽 해결 가이드: PostgreSQL & MySQL 프로덕션 디버깅 실전 전략

Graceful Shutdown: 배포 중에도 단 하나의 요청도 놓치지 않는 법