๋น„์ง€๋„ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜

Tip

AWS ํ•ดํ‚น ๋ฐฐ์šฐ๊ธฐ ๋ฐ ์—ฐ์Šตํ•˜๊ธฐ:HackTricks Training AWS Red Team Expert (ARTE)
GCP ํ•ดํ‚น ๋ฐฐ์šฐ๊ธฐ ๋ฐ ์—ฐ์Šตํ•˜๊ธฐ: HackTricks Training GCP Red Team Expert (GRTE) Azure ํ•ดํ‚น ๋ฐฐ์šฐ๊ธฐ ๋ฐ ์—ฐ์Šตํ•˜๊ธฐ: HackTricks Training Azure Red Team Expert (AzRTE)

HackTricks ์ง€์›ํ•˜๊ธฐ

๋น„์ง€๋„ ํ•™์Šต

๋น„์ง€๋„ ํ•™์Šต์€ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ์‘๋‹ต์œผ๋กœ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ๋ชจ๋ธ์„ ํ›ˆ๋ จ์‹œํ‚ค๋Š” ๊ธฐ๊ณ„ ํ•™์Šต์˜ ํ•œ ์œ ํ˜•์ž…๋‹ˆ๋‹ค. ๋ชฉํ‘œ๋Š” ๋ฐ์ดํ„ฐ ๋‚ด์—์„œ ํŒจํ„ด, ๊ตฌ์กฐ ๋˜๋Š” ๊ด€๊ณ„๋ฅผ ์ฐพ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋ ˆ์ด๋ธ”์ด ์žˆ๋Š” ์˜ˆ์ œ์—์„œ ๋ชจ๋ธ์ด ํ•™์Šตํ•˜๋Š” ๊ฐ๋… ํ•™์Šต๊ณผ ๋‹ฌ๋ฆฌ, ๋น„์ง€๋„ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ๋ฐ์ดํ„ฐ๋กœ ์ž‘์—…ํ•ฉ๋‹ˆ๋‹ค. ๋น„์ง€๋„ ํ•™์Šต์€ ํด๋Ÿฌ์Šคํ„ฐ๋ง, ์ฐจ์› ์ถ•์†Œ ๋ฐ ์ด์ƒ ํƒ์ง€์™€ ๊ฐ™์€ ์ž‘์—…์— ์ž์ฃผ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ๋‚ด์˜ ์ˆจ๊ฒจ์ง„ ํŒจํ„ด์„ ๋ฐœ๊ฒฌํ•˜๊ฑฐ๋‚˜ ์œ ์‚ฌํ•œ ํ•ญ๋ชฉ์„ ๊ทธ๋ฃนํ™”ํ•˜๊ฑฐ๋‚˜ ๋ฐ์ดํ„ฐ์˜ ๋ณธ์งˆ์ ์ธ ํŠน์„ฑ์„ ์œ ์ง€ํ•˜๋ฉด์„œ ๋ณต์žก์„ฑ์„ ์ค„์ด๋Š” ๋ฐ ๋„์›€์ด ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

K-ํ‰๊ท  ํด๋Ÿฌ์Šคํ„ฐ๋ง

K-ํ‰๊ท ์€ ๊ฐ ์ ์„ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ํด๋Ÿฌ์Šคํ„ฐ ํ‰๊ท ์— ํ• ๋‹นํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ K๊ฐœ์˜ ํด๋Ÿฌ์Šคํ„ฐ๋กœ ๋ถ„ํ• ํ•˜๋Š” ์ค‘์‹ฌ ๊ธฐ๋ฐ˜ ํด๋Ÿฌ์Šคํ„ฐ๋ง ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค. ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค:

  1. ์ดˆ๊ธฐํ™”: K๊ฐœ์˜ ์ดˆ๊ธฐ ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ(์„ผํŠธ๋กœ์ด๋“œ)์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. ๋ณดํ†ต ๋ฌด์ž‘์œ„๋กœ ๋˜๋Š” k-means++์™€ ๊ฐ™์€ ๋” ์Šค๋งˆํŠธํ•œ ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.
  2. ํ• ๋‹น: ๊ฑฐ๋ฆฌ ๋ฉ”ํŠธ๋ฆญ(์˜ˆ: ์œ ํด๋ฆฌ๋“œ ๊ฑฐ๋ฆฌ)์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ฐ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๋ฅผ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์„ผํŠธ๋กœ์ด๋“œ์— ํ• ๋‹นํ•ฉ๋‹ˆ๋‹ค.
  3. ์—…๋ฐ์ดํŠธ: ๊ฐ ํด๋Ÿฌ์Šคํ„ฐ์— ํ• ๋‹น๋œ ๋ชจ๋“  ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์˜ ํ‰๊ท ์„ ์ทจํ•˜์—ฌ ์„ผํŠธ๋กœ์ด๋“œ๋ฅผ ์žฌ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
  4. ๋ฐ˜๋ณต: ํด๋Ÿฌ์Šคํ„ฐ ํ• ๋‹น์ด ์•ˆ์ •ํ™”๋  ๋•Œ๊นŒ์ง€(์„ผํŠธ๋กœ์ด๋“œ๊ฐ€ ๋” ์ด์ƒ ํฌ๊ฒŒ ์ด๋™ํ•˜์ง€ ์•Š์Œ) 2-3๋‹จ๊ณ„๋ฅผ ๋ฐ˜๋ณตํ•ฉ๋‹ˆ๋‹ค.

Tip

์‚ฌ์ด๋ฒ„ ๋ณด์•ˆ์—์„œ์˜ ์‚ฌ์šฉ ์‚ฌ๋ก€: K-ํ‰๊ท ์€ ๋„คํŠธ์›Œํฌ ์ด๋ฒคํŠธ๋ฅผ ํด๋Ÿฌ์Šคํ„ฐ๋งํ•˜์—ฌ ์นจ์ž… ํƒ์ง€์— ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์—ฐ๊ตฌ์ž๋“ค์€ KDD Cup 99 ์นจ์ž… ๋ฐ์ดํ„ฐ์…‹์— K-ํ‰๊ท ์„ ์ ์šฉํ•˜์—ฌ ํŠธ๋ž˜ํ”ฝ์„ ์ •์ƒ ํด๋Ÿฌ์Šคํ„ฐ์™€ ๊ณต๊ฒฉ ํด๋Ÿฌ์Šคํ„ฐ๋กœ ํšจ๊ณผ์ ์œผ๋กœ ๋ถ„ํ• ํ–ˆ์Šต๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ ๋ณด์•ˆ ๋ถ„์„๊ฐ€๋Š” ๋กœ๊ทธ ํ•ญ๋ชฉ์ด๋‚˜ ์‚ฌ์šฉ์ž ํ–‰๋™ ๋ฐ์ดํ„ฐ๋ฅผ ํด๋Ÿฌ์Šคํ„ฐ๋งํ•˜์—ฌ ์œ ์‚ฌํ•œ ํ™œ๋™ ๊ทธ๋ฃน์„ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ž˜ ํ˜•์„ฑ๋œ ํด๋Ÿฌ์Šคํ„ฐ์— ์†ํ•˜์ง€ ์•Š๋Š” ํฌ์ธํŠธ๋Š” ์ด์ƒ ์ง•ํ›„๋ฅผ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(์˜ˆ: ์ƒˆ๋กœ์šด ๋งฌ์›จ์–ด ๋ณ€์ข…์ด ์ž์ฒด ์ž‘์€ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ํ˜•์„ฑํ•˜๋Š” ๊ฒฝ์šฐ). K-ํ‰๊ท ์€ ํ–‰๋™ ํ”„๋กœํ•„์ด๋‚˜ ํŠน์„ฑ ๋ฒกํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ด์ง„ ํŒŒ์ผ์„ ๊ทธ๋ฃนํ™”ํ•˜์—ฌ ๋งฌ์›จ์–ด ๊ฐ€์กฑ ๋ถ„๋ฅ˜์—๋„ ๋„์›€์„ ์ค„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

K ์„ ํƒ

ํด๋Ÿฌ์Šคํ„ฐ ์ˆ˜(K)๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‹คํ–‰ํ•˜๊ธฐ ์ „์— ์ •์˜ํ•ด์•ผ ํ•˜๋Š” ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ์ž…๋‹ˆ๋‹ค. Elbow Method ๋˜๋Š” Silhouette Score์™€ ๊ฐ™์€ ๊ธฐ์ˆ ์€ ํด๋Ÿฌ์Šคํ„ฐ๋ง ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜์—ฌ K์— ์ ์ ˆํ•œ ๊ฐ’์„ ๊ฒฐ์ •ํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

  • Elbow Method: ๊ฐ ์ ์—์„œ ํ• ๋‹น๋œ ํด๋Ÿฌ์Šคํ„ฐ ์„ผํŠธ๋กœ์ด๋“œ๊นŒ์ง€์˜ ์ œ๊ณฑ ๊ฑฐ๋ฆฌ์˜ ํ•ฉ์„ K์˜ ํ•จ์ˆ˜๋กœ ํ”Œ๋กฏํ•ฉ๋‹ˆ๋‹ค. ๊ฐ์†Œ์œจ์ด ๊ธ‰๊ฒฉํžˆ ๋ณ€ํ™”ํ•˜๋Š” โ€œํŒ”๊ฟˆ์น˜โ€ ์ง€์ ์„ ์ฐพ์•„ ์ ์ ˆํ•œ ํด๋Ÿฌ์Šคํ„ฐ ์ˆ˜๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.
  • Silhouette Score: ๋‹ค์–‘ํ•œ K ๊ฐ’์— ๋Œ€ํ•œ ์‹ค๋ฃจ์—ฃ ์ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ๋” ๋†’์€ ์‹ค๋ฃจ์—ฃ ์ ์ˆ˜๋Š” ๋” ์ž˜ ์ •์˜๋œ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

๊ฐ€์ • ๋ฐ ํ•œ๊ณ„

K-ํ‰๊ท ์€ ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ๊ตฌํ˜•์ด๊ณ  ํฌ๊ธฐ๊ฐ€ ๋™์ผํ•˜๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๋ฉฐ, ์ด๋Š” ๋ชจ๋“  ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด ์„ฑ๋ฆฝํ•˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ดˆ๊ธฐ ์„ผํŠธ๋กœ์ด๋“œ ๋ฐฐ์น˜์— ๋ฏผ๊ฐํ•˜๋ฉฐ ์ง€์—ญ ์ตœ์†Œ๊ฐ’์œผ๋กœ ์ˆ˜๋ ดํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ K-ํ‰๊ท ์€ ๋ฐ€๋„๊ฐ€ ๋‹ค๋ฅด๊ฑฐ๋‚˜ ๋น„๊ตฌํ˜• ๋ชจ์–‘์˜ ๋ฐ์ดํ„ฐ์…‹ ๋ฐ ์„œ๋กœ ๋‹ค๋ฅธ ์Šค์ผ€์ผ์˜ ํŠน์„ฑ์—๋Š” ์ ํ•ฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋ชจ๋“  ํŠน์„ฑ์ด ๊ฑฐ๋ฆฌ ๊ณ„์‚ฐ์— ๋™์ผํ•˜๊ฒŒ ๊ธฐ์—ฌํ•˜๋„๋ก ๋ณด์žฅํ•˜๊ธฐ ์œ„ํ•ด ์ •๊ทœํ™” ๋˜๋Š” ํ‘œ์ค€ํ™”์™€ ๊ฐ™์€ ์ „์ฒ˜๋ฆฌ ๋‹จ๊ณ„๊ฐ€ ํ•„์š”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์˜ˆ์‹œ -- ๋„คํŠธ์›Œํฌ ์ด๋ฒคํŠธ ํด๋Ÿฌ์Šคํ„ฐ๋ง ์•„๋ž˜์—์„œ๋Š” ๋„คํŠธ์›Œํฌ ํŠธ๋ž˜ํ”ฝ ๋ฐ์ดํ„ฐ๋ฅผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•˜๊ณ  K-ํ‰๊ท ์„ ์‚ฌ์šฉํ•˜์—ฌ ํด๋Ÿฌ์Šคํ„ฐ๋งํ•ฉ๋‹ˆ๋‹ค. ์—ฐ๊ฒฐ ์ง€์† ์‹œ๊ฐ„ ๋ฐ ๋ฐ”์ดํŠธ ์ˆ˜์™€ ๊ฐ™์€ ํŠน์„ฑ์„ ๊ฐ€์ง„ ์ด๋ฒคํŠธ๊ฐ€ ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค. "์ •์ƒ" ํŠธ๋ž˜ํ”ฝ์˜ 3๊ฐœ ํด๋Ÿฌ์Šคํ„ฐ์™€ ๊ณต๊ฒฉ ํŒจํ„ด์„ ๋‚˜ํƒ€๋‚ด๋Š” 1๊ฐœ์˜ ์ž‘์€ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ K-ํ‰๊ท ์„ ์‹คํ–‰ํ•˜์—ฌ ์ด๋“ค์ด ๋ถ„๋ฆฌ๋˜๋Š”์ง€ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค. ```python import numpy as np from sklearn.cluster import KMeans

Simulate synthetic network traffic data (e.g., [duration, bytes]).

Three normal clusters and one small attack cluster.

rng = np.random.RandomState(42) normal1 = rng.normal(loc=[50, 500], scale=[10, 100], size=(500, 2)) # Cluster 1 normal2 = rng.normal(loc=[60, 1500], scale=[8, 200], size=(500, 2)) # Cluster 2 normal3 = rng.normal(loc=[70, 3000], scale=[5, 300], size=(500, 2)) # Cluster 3 attack = rng.normal(loc=[200, 800], scale=[5, 50], size=(50, 2)) # Small attack cluster

X = np.vstack([normal1, normal2, normal3, attack])

Run K-Means clustering into 4 clusters (we expect it to find the 4 groups)

kmeans = KMeans(n_clusters=4, random_state=0, n_init=10) labels = kmeans.fit_predict(X)

Analyze resulting clusters

clusters, counts = np.unique(labels, return_counts=True) print(fโ€œCluster labels: {clusters}โ€œ) print(fโ€œCluster sizes: {counts}โ€) print(โ€œCluster centers (duration, bytes):โ€) for idx, center in enumerate(kmeans.cluster_centers_): print(fโ€œ Cluster {idx}: {center}โ€œ)

์ด ์˜ˆ์ œ์—์„œ K-Means๋Š” 4๊ฐœ์˜ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์ฐพ์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋น„์ •์ƒ์ ์œผ๋กœ ๋†’์€ ์ง€์† ์‹œ๊ฐ„(~200)์„ ๊ฐ€์ง„ ์ž‘์€ ๊ณต๊ฒฉ ํด๋Ÿฌ์Šคํ„ฐ๋Š” ์ •์ƒ ํด๋Ÿฌ์Šคํ„ฐ์™€์˜ ๊ฑฐ๋ฆฌ๋กœ ์ธํ•ด ์ด์ƒ์ ์œผ๋กœ ์ž์ฒด ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ํ˜•์„ฑํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ๊ฒฐ๊ณผ๋ฅผ ํ•ด์„ํ•˜๊ธฐ ์œ„ํ•ด ํด๋Ÿฌ์Šคํ„ฐ ํฌ๊ธฐ์™€ ์ค‘์‹ฌ์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค. ์‹ค์ œ ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ๋Š” ๋ช‡ ๊ฐœ์˜ ํฌ์ธํŠธ๋กœ ํด๋Ÿฌ์Šคํ„ฐ์— ์ž ์žฌ์  ์ด์ƒ ์ง•ํ›„๋กœ ๋ ˆ์ด๋ธ”์„ ๋ถ™์ด๊ฑฐ๋‚˜ ์•…์˜์ ์ธ ํ™œ๋™์„ ์œ„ํ•ด ๊ตฌ์„ฑ์›์„ ๊ฒ€์‚ฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

### ๊ณ„์ธต์  ํด๋Ÿฌ์Šคํ„ฐ๋ง

๊ณ„์ธต์  ํด๋Ÿฌ์Šคํ„ฐ๋ง์€ ๋ฐ”๋‹ฅ์—์„œ ์œ„๋กœ(์‘์ง‘์ ) ์ ‘๊ทผ ๋ฐฉ์‹ ๋˜๋Š” ์œ„์—์„œ ์•„๋ž˜๋กœ(๋ถ„ํ• ์ ) ์ ‘๊ทผ ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•˜์—ฌ ํด๋Ÿฌ์Šคํ„ฐ์˜ ๊ณ„์ธต์„ ๊ตฌ์ถ•ํ•ฉ๋‹ˆ๋‹ค:

1. **์‘์ง‘์  (Bottom-Up)**: ๊ฐ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๋ฅผ ๋ณ„๋„์˜ ํด๋Ÿฌ์Šคํ„ฐ๋กœ ์‹œ์ž‘ํ•˜๊ณ  ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ๋ฐ˜๋ณต์ ์œผ๋กœ ๋ณ‘ํ•ฉํ•˜์—ฌ ๋‹จ์ผ ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ๋‚จ๊ฑฐ๋‚˜ ์ค‘์ง€ ๊ธฐ์ค€์ด ์ถฉ์กฑ๋  ๋•Œ๊นŒ์ง€ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
2. **๋ถ„ํ• ์  (Top-Down)**: ๋ชจ๋“  ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๋ฅผ ๋‹จ์ผ ํด๋Ÿฌ์Šคํ„ฐ๋กœ ์‹œ์ž‘ํ•˜๊ณ  ๊ฐ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๊ฐ€ ์ž์‹ ์˜ ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ๋˜๊ฑฐ๋‚˜ ์ค‘์ง€ ๊ธฐ์ค€์ด ์ถฉ์กฑ๋  ๋•Œ๊นŒ์ง€ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ๋ฐ˜๋ณต์ ์œผ๋กœ ๋ถ„ํ• ํ•ฉ๋‹ˆ๋‹ค.

์‘์ง‘์  ํด๋Ÿฌ์Šคํ„ฐ๋ง์€ ํด๋Ÿฌ์Šคํ„ฐ ๊ฐ„ ๊ฑฐ๋ฆฌ ์ •์˜์™€ ์–ด๋–ค ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ๋ณ‘ํ•ฉํ• ์ง€๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ์—ฐ๊ฒฐ ๊ธฐ์ค€์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์ธ ์—ฐ๊ฒฐ ๋ฐฉ๋ฒ•์—๋Š” ๋‹จ์ผ ์—ฐ๊ฒฐ(๋‘ ํด๋Ÿฌ์Šคํ„ฐ ๊ฐ„ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ ์˜ ๊ฑฐ๋ฆฌ), ์™„์ „ ์—ฐ๊ฒฐ(๊ฐ€์žฅ ๋จผ ์ ์˜ ๊ฑฐ๋ฆฌ), ํ‰๊ท  ์—ฐ๊ฒฐ ๋“ฑ์ด ์žˆ์œผ๋ฉฐ, ๊ฑฐ๋ฆฌ ์ธก์ • ๊ธฐ์ค€์€ ์ข…์ข… ์œ ํด๋ฆฌ๋“œ์ž…๋‹ˆ๋‹ค. ์—ฐ๊ฒฐ์˜ ์„ ํƒ์€ ์ƒ์„ฑ๋œ ํด๋Ÿฌ์Šคํ„ฐ์˜ ํ˜•ํƒœ์— ์˜ํ–ฅ์„ ๋ฏธ์นฉ๋‹ˆ๋‹ค. ํด๋Ÿฌ์Šคํ„ฐ ์ˆ˜ K๋ฅผ ๋ฏธ๋ฆฌ ์ง€์ •ํ•  ํ•„์š”๋Š” ์—†์œผ๋ฉฐ, ์›ํ•˜๋Š” ํด๋Ÿฌ์Šคํ„ฐ ์ˆ˜๋ฅผ ์–ป๊ธฐ ์œ„ํ•ด ์„ ํƒํ•œ ์ˆ˜์ค€์—์„œ ๋ด๋“œ๋กœ๊ทธ๋žจ์„ "์ž๋ฅผ" ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ณ„์ธต์  ํด๋Ÿฌ์Šคํ„ฐ๋ง์€ ๋ด๋“œ๋กœ๊ทธ๋žจ์„ ์ƒ์„ฑํ•˜๋ฉฐ, ์ด๋Š” ์„œ๋กœ ๋‹ค๋ฅธ ์ˆ˜์ค€์˜ ์„ธ๋ถ„์„ฑ์—์„œ ํด๋Ÿฌ์Šคํ„ฐ ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๋ณด์—ฌ์ฃผ๋Š” ๋‚˜๋ฌด์™€ ๊ฐ™์€ ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค. ๋ด๋“œ๋กœ๊ทธ๋žจ์€ ํŠน์ • ํด๋Ÿฌ์Šคํ„ฐ ์ˆ˜๋ฅผ ์–ป๊ธฐ ์œ„ํ•ด ์›ํ•˜๋Š” ์ˆ˜์ค€์—์„œ ์ž˜๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

> [!TIP]
> *์‚ฌ์ด๋ฒ„ ๋ณด์•ˆ์˜ ์‚ฌ์šฉ ์‚ฌ๋ก€:* ๊ณ„์ธต์  ํด๋Ÿฌ์Šคํ„ฐ๋ง์€ ์ด๋ฒคํŠธ๋‚˜ ์—”ํ‹ฐํ‹ฐ๋ฅผ ํŠธ๋ฆฌ๋กœ ์กฐ์งํ•˜์—ฌ ๊ด€๊ณ„๋ฅผ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์•…์„ฑ ์ฝ”๋“œ ๋ถ„์„์—์„œ ์‘์ง‘์  ํด๋Ÿฌ์Šคํ„ฐ๋ง์€ ์ƒ˜ํ”Œ์„ ํ–‰๋™ ์œ ์‚ฌ์„ฑ์— ๋”ฐ๋ผ ๊ทธ๋ฃนํ™”ํ•˜์—ฌ ์•…์„ฑ ์ฝ”๋“œ ํŒจ๋ฐ€๋ฆฌ์™€ ๋ณ€์ข…์˜ ๊ณ„์ธต์„ ๋“œ๋Ÿฌ๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋„คํŠธ์›Œํฌ ๋ณด์•ˆ์—์„œ๋Š” IP ํŠธ๋ž˜ํ”ฝ ํ๋ฆ„์„ ํด๋Ÿฌ์Šคํ„ฐ๋งํ•˜๊ณ  ๋ด๋“œ๋กœ๊ทธ๋žจ์„ ์‚ฌ์šฉํ•˜์—ฌ ํŠธ๋ž˜ํ”ฝ์˜ ํ•˜์œ„ ๊ทธ๋ฃน์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(์˜ˆ: ํ”„๋กœํ† ์ฝœ๋ณ„, ํ–‰๋™๋ณ„). K๋ฅผ ๋ฏธ๋ฆฌ ์„ ํƒํ•  ํ•„์š”๊ฐ€ ์—†๊ธฐ ๋•Œ๋ฌธ์— ๊ณต๊ฒฉ ์นดํ…Œ๊ณ ๋ฆฌ ์ˆ˜๊ฐ€ ์•Œ๋ ค์ง€์ง€ ์•Š์€ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๋ฅผ ํƒ์ƒ‰ํ•  ๋•Œ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.

#### ๊ฐ€์ • ๋ฐ ํ•œ๊ณ„

๊ณ„์ธต์  ํด๋Ÿฌ์Šคํ„ฐ๋ง์€ ํŠน์ • ํด๋Ÿฌ์Šคํ„ฐ ํ˜•ํƒœ๋ฅผ ๊ฐ€์ •ํ•˜์ง€ ์•Š์œผ๋ฉฐ ์ค‘์ฒฉ๋œ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ํฌ์ฐฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๊ทธ๋ฃน ๊ฐ„์˜ ๋ถ„๋ฅ˜๋ฒ•์ด๋‚˜ ๊ด€๊ณ„๋ฅผ ๋ฐœ๊ฒฌํ•˜๋Š” ๋ฐ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค(์˜ˆ: ์•…์„ฑ ์ฝ”๋“œ๋ฅผ ํŒจ๋ฐ€๋ฆฌ ํ•˜์œ„ ๊ทธ๋ฃน์œผ๋กœ ๊ทธ๋ฃนํ™”). ์ด๋Š” ๊ฒฐ์ •์ ์ด๋ฉฐ(๋ฌด์ž‘์œ„ ์ดˆ๊ธฐํ™” ๋ฌธ์ œ ์—†์Œ) ์ฃผ์š” ์žฅ์ ์€ ๋ด๋“œ๋กœ๊ทธ๋žจ์œผ๋กœ, ๋ชจ๋“  ๊ทœ๋ชจ์—์„œ ๋ฐ์ดํ„ฐ์˜ ํด๋Ÿฌ์Šคํ„ฐ๋ง ๊ตฌ์กฐ์— ๋Œ€ํ•œ ํ†ต์ฐฐ๋ ฅ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค โ€“ ๋ณด์•ˆ ๋ถ„์„๊ฐ€๋Š” ์˜๋ฏธ ์žˆ๋Š” ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์‹๋ณ„ํ•˜๊ธฐ ์œ„ํ•ด ์ ์ ˆํ•œ ์ปท์˜คํ”„๋ฅผ ๊ฒฐ์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๊ณ„์‚ฐ ๋น„์šฉ์ด ๋งŽ์ด ๋“ค๋ฉฐ(์ผ๋ฐ˜์ ์œผ๋กœ $O(n^2)$ ์‹œ๊ฐ„ ๋˜๋Š” ๋‹จ์ˆœ ๊ตฌํ˜„์˜ ๊ฒฝ์šฐ ๋” ๋‚˜์จ) ๋งค์šฐ ํฐ ๋ฐ์ดํ„ฐ ์„ธํŠธ์—๋Š” ์‹คํ˜„ ๊ฐ€๋Šฅํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ์ด๋Š” ํƒ์š•์  ์ ˆ์ฐจ๋กœ, ๋ณ‘ํ•ฉ์ด๋‚˜ ๋ถ„ํ• ์ด ์ด๋ฃจ์–ด์ง„ ํ›„์—๋Š” ๋˜๋Œ๋ฆด ์ˆ˜ ์—†์œผ๋ฉฐ, ์ดˆ๊ธฐ ์‹ค์ˆ˜๋กœ ์ธํ•ด ์ตœ์ ์ด ์•„๋‹Œ ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด์ƒ์น˜๋Š” ์ผ๋ถ€ ์—ฐ๊ฒฐ ์ „๋žต(๋‹จ์ผ ์—ฐ๊ฒฐ์ด ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ์ด์ƒ์น˜๋ฅผ ํ†ตํ•ด ์—ฐ๊ฒฐ๋˜๋Š” "์ฒด์ธ" ํšจ๊ณผ๋ฅผ ์œ ๋ฐœํ•  ์ˆ˜ ์žˆ์Œ)์— ์˜ํ–ฅ์„ ๋ฏธ์น  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

<details>
<summary>์˜ˆ์ œ -- ์ด๋ฒคํŠธ์˜ ์‘์ง‘์  ํด๋Ÿฌ์Šคํ„ฐ๋ง
</summary>

K-Means ์˜ˆ์ œ์—์„œ ์ƒ์„ฑ๋œ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ๋ฅผ ์žฌ์‚ฌ์šฉํ•˜์—ฌ(3๊ฐœ์˜ ์ •์ƒ ํด๋Ÿฌ์Šคํ„ฐ + 1๊ฐœ์˜ ๊ณต๊ฒฉ ํด๋Ÿฌ์Šคํ„ฐ) ์‘์ง‘์  ํด๋Ÿฌ์Šคํ„ฐ๋ง์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ ๋ด๋“œ๋กœ๊ทธ๋žจ๊ณผ ํด๋Ÿฌ์Šคํ„ฐ ๋ ˆ์ด๋ธ”์„ ์–ป๋Š” ๋ฐฉ๋ฒ•์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.
```python
from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import linkage, dendrogram

# Perform agglomerative clustering (bottom-up) on the data
agg = AgglomerativeClustering(n_clusters=None, distance_threshold=0, linkage='ward')
# distance_threshold=0 gives the full tree without cutting (we can cut manually)
agg.fit(X)

print(f"Number of merge steps: {agg.n_clusters_ - 1}")  # should equal number of points - 1
# Create a dendrogram using SciPy for visualization (optional)
Z = linkage(X, method='ward')
# Normally, you would plot the dendrogram. Here we'll just compute cluster labels for a chosen cut:
clusters_3 = AgglomerativeClustering(n_clusters=3, linkage='ward').fit_predict(X)
print(f"Labels with 3 clusters: {np.unique(clusters_3)}")
print(f"Cluster sizes for 3 clusters: {np.bincount(clusters_3)}")

DBSCAN (๋ฐ€๋„ ๊ธฐ๋ฐ˜ ๊ณต๊ฐ„ ํด๋Ÿฌ์Šคํ„ฐ๋ง ์•Œ๊ณ ๋ฆฌ์ฆ˜)

DBSCAN์€ ๋ฐ€๋„ ๊ธฐ๋ฐ˜ ํด๋Ÿฌ์Šคํ„ฐ๋ง ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ, ๋ฐ€์ง‘๋œ ์ ๋“ค์„ ํ•จ๊ป˜ ๊ทธ๋ฃนํ™”ํ•˜๊ณ  ์ €๋ฐ€๋„ ์ง€์—ญ์˜ ์ ๋“ค์„ ์ด์ƒ์น˜๋กœ ํ‘œ์‹œํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋‹ค์–‘ํ•œ ๋ฐ€๋„์™€ ๋น„๊ตฌํ˜• ํ˜•ํƒœ๋ฅผ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ์…‹์— ํŠนํžˆ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.

DBSCAN์€ ๋‘ ๊ฐœ์˜ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ •์˜ํ•˜์—ฌ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค:

  • Epsilon (ฮต): ๋™์ผ ํด๋Ÿฌ์Šคํ„ฐ์˜ ์ผ๋ถ€๋กœ ๊ฐ„์ฃผ๋  ๋‘ ์  ๊ฐ„์˜ ์ตœ๋Œ€ ๊ฑฐ๋ฆฌ.
  • MinPts: ๋ฐ€์ง‘ ์ง€์—ญ(ํ•ต์‹ฌ ์ )์„ ํ˜•์„ฑํ•˜๋Š” ๋ฐ ํ•„์š”ํ•œ ์ตœ์†Œ ์  ์ˆ˜.

DBSCAN์€ ํ•ต์‹ฌ ์ , ๊ฒฝ๊ณ„ ์  ๋ฐ ๋…ธ์ด์ฆˆ ์ ์„ ์‹๋ณ„ํ•ฉ๋‹ˆ๋‹ค:

  • ํ•ต์‹ฌ ์ : ฮต ๊ฑฐ๋ฆฌ ๋‚ด์— ์ตœ์†Œ MinPts ์ด์›ƒ์ด ์žˆ๋Š” ์ .
  • ๊ฒฝ๊ณ„ ์ : ํ•ต์‹ฌ ์ ์˜ ฮต ๊ฑฐ๋ฆฌ ๋‚ด์— ์žˆ์ง€๋งŒ MinPts ์ด์›ƒ์ด ๋ถ€์กฑํ•œ ์ .
  • ๋…ธ์ด์ฆˆ ์ : ํ•ต์‹ฌ ์ ๋„ ๊ฒฝ๊ณ„ ์ ๋„ ์•„๋‹Œ ์ .

ํด๋Ÿฌ์Šคํ„ฐ๋ง์€ ๋ฐฉ๋ฌธํ•˜์ง€ ์•Š์€ ํ•ต์‹ฌ ์ ์„ ์„ ํƒํ•˜๊ณ  ์ด๋ฅผ ์ƒˆ๋กœ์šด ํด๋Ÿฌ์Šคํ„ฐ๋กœ ํ‘œ์‹œํ•œ ๋‹ค์Œ, ๊ทธ๋กœ๋ถ€ํ„ฐ ๋ฐ€๋„์— ๋„๋‹ฌ ๊ฐ€๋Šฅํ•œ ๋ชจ๋“  ์ (ํ•ต์‹ฌ ์  ๋ฐ ๊ทธ ์ด์›ƒ ๋“ฑ)์„ ์žฌ๊ท€์ ์œผ๋กœ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ง„ํ–‰๋ฉ๋‹ˆ๋‹ค. ๊ฒฝ๊ณ„ ์ ์€ ์ธ๊ทผ ํ•ต์‹ฌ์˜ ํด๋Ÿฌ์Šคํ„ฐ์— ์ถ”๊ฐ€๋ฉ๋‹ˆ๋‹ค. ๋ชจ๋“  ๋„๋‹ฌ ๊ฐ€๋Šฅํ•œ ์ ์„ ํ™•์žฅํ•œ ํ›„, DBSCAN์€ ๋‹ค๋ฅธ ๋ฐฉ๋ฌธํ•˜์ง€ ์•Š์€ ํ•ต์‹ฌ์œผ๋กœ ์ด๋™ํ•˜์—ฌ ์ƒˆ๋กœ์šด ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค. ์–ด๋–ค ํ•ต์‹ฌ์— ์˜ํ•ด์„œ๋„ ๋„๋‹ฌ๋˜์ง€ ์•Š์€ ์ ์€ ๋…ธ์ด์ฆˆ๋กœ ๋‚จ์•„ ์žˆ์Šต๋‹ˆ๋‹ค.

Tip

์‚ฌ์ด๋ฒ„ ๋ณด์•ˆ์—์„œ์˜ ์‚ฌ์šฉ ์‚ฌ๋ก€: DBSCAN์€ ๋„คํŠธ์›Œํฌ ํŠธ๋ž˜ํ”ฝ์—์„œ ์ด์ƒ ํƒ์ง€์— ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์ •์ƒ ์‚ฌ์šฉ์ž ํ™œ๋™์€ ํŠน์„ฑ ๊ณต๊ฐ„์—์„œ ํ•˜๋‚˜ ์ด์ƒ์˜ ๋ฐ€์ง‘ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ํ˜•์„ฑํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ƒˆ๋กœ์šด ๊ณต๊ฒฉ ํ–‰๋™์€ DBSCAN์ด ๋…ธ์ด์ฆˆ(์ด์ƒ์น˜)๋กœ ๋ ˆ์ด๋ธ”์„ ๋ถ™์ผ ์‚ฐ์žฌ๋œ ์ ์œผ๋กœ ๋‚˜ํƒ€๋‚ฉ๋‹ˆ๋‹ค. ์ด๋Š” ํฌํŠธ ์Šค์บ”์ด๋‚˜ ์„œ๋น„์Šค ๊ฑฐ๋ถ€ ํŠธ๋ž˜ํ”ฝ์„ ์ ์˜ ํฌ์†Œ ์ง€์—ญ์œผ๋กœ ๊ฐ์ง€ํ•  ์ˆ˜ ์žˆ๋Š” ๋„คํŠธ์›Œํฌ ํ๋ฆ„ ๊ธฐ๋ก์„ ํด๋Ÿฌ์Šคํ„ฐ๋งํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋˜ ๋‹ค๋ฅธ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์€ ์•…์„ฑ์ฝ”๋“œ ๋ณ€์ข… ๊ทธ๋ฃนํ™”์ž…๋‹ˆ๋‹ค: ๋Œ€๋ถ€๋ถ„์˜ ์ƒ˜ํ”Œ์ด ๊ฐ€์กฑ๋ณ„๋กœ ํด๋Ÿฌ์Šคํ„ฐ๋ง๋˜์ง€๋งŒ ๋ช‡ ๊ฐœ๋Š” ์–ด๋””์—๋„ ๋งž์ง€ ์•Š๋Š” ๊ฒฝ์šฐ, ๊ทธ ๋ช‡ ๊ฐœ๋Š” ์ œ๋กœ๋ฐ์ด ์•…์„ฑ์ฝ”๋“œ์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋…ธ์ด์ฆˆ๋ฅผ ํ”Œ๋ž˜๊ทธํ•  ์ˆ˜ ์žˆ๋Š” ๋Šฅ๋ ฅ ๋•๋ถ„์— ๋ณด์•ˆ ํŒ€์€ ์ด๋Ÿฌํ•œ ์ด์ƒ์น˜๋ฅผ ์กฐ์‚ฌํ•˜๋Š” ๋ฐ ์ง‘์ค‘ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ฐ€์ • ๋ฐ ํ•œ๊ณ„

๊ฐ€์ • ๋ฐ ๊ฐ•์ : DBSCAN์€ ๊ตฌํ˜• ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ๊ฐ€์ •ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค โ€“ ์ž„์˜์˜ ํ˜•ํƒœ์˜ ํด๋Ÿฌ์Šคํ„ฐ(์ฒด์ธํ˜• ๋˜๋Š” ์ธ์ ‘ ํด๋Ÿฌ์Šคํ„ฐ ๋“ฑ)๋ฅผ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ๋ฐ€๋„์— ๋”ฐ๋ผ ํด๋Ÿฌ์Šคํ„ฐ ์ˆ˜๋ฅผ ์ž๋™์œผ๋กœ ๊ฒฐ์ •ํ•˜๋ฉฐ, ์ด์ƒ์น˜๋ฅผ ๋…ธ์ด์ฆˆ๋กœ ํšจ๊ณผ์ ์œผ๋กœ ์‹๋ณ„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋ถˆ๊ทœ์น™ํ•œ ํ˜•ํƒœ์™€ ๋…ธ์ด์ฆˆ๋ฅผ ๊ฐ€์ง„ ์‹ค์ œ ๋ฐ์ดํ„ฐ์— ๊ฐ•๋ ฅํ•ฉ๋‹ˆ๋‹ค. ์ด์ƒ์น˜์— ๋Œ€ํ•ด ๊ฐ•๊ฑดํ•ฉ๋‹ˆ๋‹ค(K-Means์™€ ๋‹ฌ๋ฆฌ ํด๋Ÿฌ์Šคํ„ฐ์— ๊ฐ•์ œ๋กœ ํฌํ•จ์‹œํ‚ค์ง€ ์•Š์Œ). ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ๋Œ€๋žต ๊ท ์ผํ•œ ๋ฐ€๋„๋ฅผ ๊ฐ€์งˆ ๋•Œ ์ž˜ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

ํ•œ๊ณ„: DBSCAN์˜ ์„ฑ๋Šฅ์€ ์ ์ ˆํ•œ ฮต ๋ฐ MinPts ๊ฐ’์„ ์„ ํƒํ•˜๋Š” ๋ฐ ์˜์กดํ•ฉ๋‹ˆ๋‹ค. ๋ฐ€๋„๊ฐ€ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ์–ด๋ ค์›€์„ ๊ฒช์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค โ€“ ๋‹จ์ผ ฮต๋Š” ๋ฐ€์ง‘ ํด๋Ÿฌ์Šคํ„ฐ์™€ ํฌ์†Œ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ๋ชจ๋‘ ์ˆ˜์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ฮต๊ฐ€ ๋„ˆ๋ฌด ์ž‘์œผ๋ฉด ๋Œ€๋ถ€๋ถ„์˜ ์ ์„ ๋…ธ์ด์ฆˆ๋กœ ๋ ˆ์ด๋ธ”๋งํ•˜๊ณ , ๋„ˆ๋ฌด ํฌ๋ฉด ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ์ž˜๋ชป ๋ณ‘ํ•ฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, DBSCAN์€ ๋งค์šฐ ํฐ ๋ฐ์ดํ„ฐ์…‹์—์„œ ๋น„ํšจ์œจ์ ์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(๋‹จ์ˆœํ•˜๊ฒŒ $O(n^2)$, ๊ทธ๋Ÿฌ๋‚˜ ๊ณต๊ฐ„ ์ธ๋ฑ์‹ฑ์ด ๋„์›€์ด ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค). ๊ณ ์ฐจ์› ํŠน์„ฑ ๊ณต๊ฐ„์—์„œ๋Š” โ€œฮต ๋‚ด ๊ฑฐ๋ฆฌโ€ ๊ฐœ๋…์ด ๋œ ์˜๋ฏธ ์žˆ๊ฒŒ ๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ(์ฐจ์›์˜ ์ €์ฃผ), DBSCAN์€ ์‹ ์ค‘ํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜ ์กฐ์ •์ด ํ•„์š”ํ•˜๊ฑฐ๋‚˜ ์ง๊ด€์ ์ธ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์ฐพ์ง€ ๋ชปํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿผ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  HDBSCAN๊ณผ ๊ฐ™์€ ํ™•์žฅ์€ ์ผ๋ถ€ ๋ฌธ์ œ(์˜ˆ: ๋ฐ€๋„ ๋ณ€ํ™”)๋ฅผ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค.

์˜ˆ์‹œ -- ๋…ธ์ด์ฆˆ๊ฐ€ ์žˆ๋Š” ํด๋Ÿฌ์Šคํ„ฐ๋ง ```python from sklearn.cluster import DBSCAN

Generate synthetic data: 2 normal clusters and 5 outlier points

cluster1 = rng.normal(loc=[100, 1000], scale=[5, 100], size=(100, 2)) cluster2 = rng.normal(loc=[120, 2000], scale=[5, 100], size=(100, 2)) outliers = rng.uniform(low=[50, 50], high=[180, 3000], size=(5, 2)) # scattered anomalies data = np.vstack([cluster1, cluster2, outliers])

Run DBSCAN with chosen eps and MinPts

eps = 15.0 # radius for neighborhood min_pts = 5 # minimum neighbors to form a dense region db = DBSCAN(eps=eps, min_samples=min_pts).fit(data) labels = db.labels_ # cluster labels (-1 for noise)

Analyze clusters and noise

num_clusters = len(set(labels) - {-1}) num_noise = np.sum(labels == -1) print(fโ€œDBSCAN found {num_clusters} clusters and {num_noise} noise pointsโ€œ) print(โ€œCluster labels for first 10 points:โ€, labels[:10])

์ด ์Šค๋‹ˆํŽซ์—์„œ๋Š” `eps`์™€ `min_samples`๋ฅผ ๋ฐ์ดํ„ฐ ์Šค์ผ€์ผ์— ๋งž๊ฒŒ ์กฐ์ •ํ–ˆ์Šต๋‹ˆ๋‹ค(ํŠน์ง• ๋‹จ์œ„๋กœ 15.0, ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ํ˜•์„ฑํ•˜๊ธฐ ์œ„ํ•ด 5๊ฐœ์˜ ํฌ์ธํŠธ ํ•„์š”). DBSCAN์€ 2๊ฐœ์˜ ํด๋Ÿฌ์Šคํ„ฐ(์ •์ƒ ํŠธ๋ž˜ํ”ฝ ํด๋Ÿฌ์Šคํ„ฐ)๋ฅผ ์ฐพ์•„์•ผ ํ•˜๋ฉฐ, 5๊ฐœ์˜ ์ฃผ์ž…๋œ ์ด์ƒ์น˜๋ฅผ ๋…ธ์ด์ฆˆ๋กœ ํ”Œ๋ž˜๊ทธํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ๊ฒ€์ฆํ•˜๊ธฐ ์œ„ํ•ด ํด๋Ÿฌ์Šคํ„ฐ ์ˆ˜์™€ ๋…ธ์ด์ฆˆ ํฌ์ธํŠธ ์ˆ˜๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค. ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ๋Š” ฮต(ฮต๋ฅผ ์„ ํƒํ•˜๊ธฐ ์œ„ํ•ด k-๊ฑฐ๋ฆฌ ๊ทธ๋ž˜ํ”„ ํœด๋ฆฌ์Šคํ‹ฑ ์‚ฌ์šฉ)์™€ MinPts(์ผ๋ฐ˜์ ์œผ๋กœ ๋ฐ์ดํ„ฐ ์ฐจ์› + 1๋กœ ์„ค์ •๋จ)๋ฅผ ๋ฐ˜๋ณตํ•˜์—ฌ ์•ˆ์ •์ ์ธ ํด๋Ÿฌ์Šคํ„ฐ๋ง ๊ฒฐ๊ณผ๋ฅผ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋…ธ์ด์ฆˆ๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ๋ ˆ์ด๋ธ”๋งํ•˜๋Š” ๊ธฐ๋Šฅ์€ ์ถ”๊ฐ€ ๋ถ„์„์„ ์œ„ํ•œ ์ž ์žฌ์  ๊ณต๊ฒฉ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„๋ฆฌํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค.

</details>

### ์ฃผ์„ฑ๋ถ„ ๋ถ„์„ (PCA)

PCA๋Š” ๋ฐ์ดํ„ฐ์˜ ์ตœ๋Œ€ ๋ถ„์‚ฐ์„ ํฌ์ฐฉํ•˜๋Š” ์ƒˆ๋กœ์šด ์ง๊ต ์ถ•(์ฃผ์„ฑ๋ถ„) ์ง‘ํ•ฉ์„ ์ฐพ๋Š” **์ฐจ์› ์ถ•์†Œ** ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค. ๊ฐ„๋‹จํžˆ ๋งํ•ด, PCA๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์ƒˆ๋กœ์šด ์ขŒํ‘œ๊ณ„๋กœ ํšŒ์ „ํ•˜๊ณ  ํˆฌ์˜ํ•˜์—ฌ ์ฒซ ๋ฒˆ์งธ ์ฃผ์„ฑ๋ถ„(PC1)์ด ๊ฐ€๋Šฅํ•œ ์ตœ๋Œ€ ๋ถ„์‚ฐ์„ ์„ค๋ช…ํ•˜๊ณ , ๋‘ ๋ฒˆ์งธ ์ฃผ์„ฑ๋ถ„(PC2)์ด PC1์— ์ˆ˜์ง์ธ ์ตœ๋Œ€ ๋ถ„์‚ฐ์„ ์„ค๋ช…ํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ์ˆ˜ํ•™์ ์œผ๋กœ PCA๋Š” ๋ฐ์ดํ„ฐ์˜ ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ์˜ ๊ณ ์œ ๋ฒกํ„ฐ๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณ ์œ ๋ฒกํ„ฐ๋Š” ์ฃผ์„ฑ๋ถ„ ๋ฐฉํ–ฅ์ด๋ฉฐ, ํ•ด๋‹น ๊ณ ์œ ๊ฐ’์€ ๊ฐ ๊ณ ์œ ๋ฒกํ„ฐ๊ฐ€ ์„ค๋ช…ํ•˜๋Š” ๋ถ„์‚ฐ์˜ ์–‘์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. PCA๋Š” ์ข…์ข… ํŠน์ง• ์ถ”์ถœ, ์‹œ๊ฐํ™” ๋ฐ ๋…ธ์ด์ฆˆ ๊ฐ์†Œ์— ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

์ด๊ฒƒ์€ ๋ฐ์ดํ„ฐ์…‹ ์ฐจ์›์— **์ƒ๋‹นํ•œ ์„ ํ˜• ์˜์กด์„ฑ ๋˜๋Š” ์ƒ๊ด€๊ด€๊ณ„**๊ฐ€ ํฌํ•จ๋œ ๊ฒฝ์šฐ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.

PCA๋Š” ๋ฐ์ดํ„ฐ์˜ ์ฃผ์„ฑ๋ถ„์„ ์‹๋ณ„ํ•˜์—ฌ ์ตœ๋Œ€ ๋ถ„์‚ฐ ๋ฐฉํ–ฅ์„ ์ฐพ์Šต๋‹ˆ๋‹ค. PCA์— ํฌํ•จ๋œ ๋‹จ๊ณ„๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:
1. **ํ‘œ์ค€ํ™”**: ํ‰๊ท ์„ ๋นผ๊ณ  ๋‹จ์œ„ ๋ถ„์‚ฐ์œผ๋กœ ์Šค์ผ€์ผ๋งํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ์ค‘์‹ฌ์— ๋งž์ถฅ๋‹ˆ๋‹ค.
2. **๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ**: ํ‘œ์ค€ํ™”๋œ ๋ฐ์ดํ„ฐ์˜ ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ์„ ๊ณ„์‚ฐํ•˜์—ฌ ํŠน์ง• ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ์ดํ•ดํ•ฉ๋‹ˆ๋‹ค.
3. **๊ณ ์œ ๊ฐ’ ๋ถ„ํ•ด**: ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ์— ๋Œ€ํ•ด ๊ณ ์œ ๊ฐ’ ๋ถ„ํ•ด๋ฅผ ์ˆ˜ํ–‰ํ•˜์—ฌ ๊ณ ์œ ๊ฐ’๊ณผ ๊ณ ์œ ๋ฒกํ„ฐ๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค.
4. **์ฃผ์„ฑ๋ถ„ ์„ ํƒ**: ๊ณ ์œ ๊ฐ’์„ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌํ•˜๊ณ  ๊ฐ€์žฅ ํฐ ๊ณ ์œ ๊ฐ’์— ํ•ด๋‹นํ•˜๋Š” ์ƒ์œ„ K๊ฐœ์˜ ๊ณ ์œ ๋ฒกํ„ฐ๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณ ์œ ๋ฒกํ„ฐ๋Š” ์ƒˆ๋กœ์šด ํŠน์ง• ๊ณต๊ฐ„์„ ํ˜•์„ฑํ•ฉ๋‹ˆ๋‹ค.
5. **๋ฐ์ดํ„ฐ ๋ณ€ํ™˜**: ์„ ํƒ๋œ ์ฃผ์„ฑ๋ถ„์„ ์‚ฌ์šฉํ•˜์—ฌ ์›๋ณธ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒˆ๋กœ์šด ํŠน์ง• ๊ณต๊ฐ„์— ํˆฌ์˜ํ•ฉ๋‹ˆ๋‹ค.
PCA๋Š” ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”, ๋…ธ์ด์ฆˆ ๊ฐ์†Œ ๋ฐ ๋‹ค๋ฅธ ๋จธ์‹  ๋Ÿฌ๋‹ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ „์ฒ˜๋ฆฌ ๋‹จ๊ณ„๋กœ ๋„๋ฆฌ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ์˜ ์ฐจ์›์„ ์ค„์ด๋ฉด์„œ ๋ณธ์งˆ์ ์ธ ๊ตฌ์กฐ๋ฅผ ์œ ์ง€ํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค.

#### ๊ณ ์œ ๊ฐ’๊ณผ ๊ณ ์œ ๋ฒกํ„ฐ

๊ณ ์œ ๊ฐ’์€ ํ•ด๋‹น ๊ณ ์œ ๋ฒกํ„ฐ๊ฐ€ ํฌ์ฐฉํ•˜๋Š” ๋ถ„์‚ฐ์˜ ์–‘์„ ๋‚˜ํƒ€๋‚ด๋Š” ์Šค์นผ๋ผ์ž…๋‹ˆ๋‹ค. ๊ณ ์œ ๋ฒกํ„ฐ๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ๊ฐ€์žฅ ๋งŽ์ด ๋ณ€ํ•˜๋Š” ๋ฐฉํ–ฅ์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

A๊ฐ€ ์ •๋ฐฉ ํ–‰๋ ฌ์ด๊ณ , v๊ฐ€ 0์ด ์•„๋‹Œ ๋ฒกํ„ฐ๋ผ๊ณ  ๊ฐ€์ •ํ•ฉ์‹œ๋‹ค: `A * v = ฮป * v`
์—ฌ๊ธฐ์„œ:
- A๋Š” [ [1, 2], [2, 1]]๊ณผ ๊ฐ™์€ ์ •๋ฐฉ ํ–‰๋ ฌ์ž…๋‹ˆ๋‹ค(์˜ˆ: ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ)
- v๋Š” ๊ณ ์œ ๋ฒกํ„ฐ์ž…๋‹ˆ๋‹ค(์˜ˆ: [1, 1])

๊ทธ๋Ÿผ, `A * v = [ [1, 2], [2, 1]] * [1, 1] = [3, 3]`๊ฐ€ ๋˜์–ด ๊ณ ์œ ๊ฐ’ ฮป๋Š” ๊ณ ์œ ๋ฒกํ„ฐ v์— ๊ณฑํ•ด์ ธ ฮป = 3์ด ๋ฉ๋‹ˆ๋‹ค.

#### PCA์—์„œ์˜ ๊ณ ์œ ๊ฐ’๊ณผ ๊ณ ์œ ๋ฒกํ„ฐ

์˜ˆ๋ฅผ ๋“ค์–ด ์„ค๋ช…ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. 100x100 ํ”ฝ์…€์˜ ์–ผ๊ตด ๊ทธ๋ ˆ์ด์Šค์ผ€์ผ ์ด๋ฏธ์ง€๊ฐ€ ๋งŽ์€ ๋ฐ์ดํ„ฐ์…‹์ด ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ๊ฐ ํ”ฝ์…€์€ ํŠน์ง•์œผ๋กœ ๊ฐ„์ฃผ๋  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์ด๋ฏธ์ง€๋‹น 10,000๊ฐœ์˜ ํŠน์ง•(๋˜๋Š” ์ด๋ฏธ์ง€๋‹น 10,000๊ฐœ์˜ ๊ตฌ์„ฑ ์š”์†Œ ๋ฒกํ„ฐ)์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ฐ์ดํ„ฐ์…‹์˜ ์ฐจ์›์„ PCA๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ค„์ด๋ ค๋ฉด ๋‹ค์Œ ๋‹จ๊ณ„๋ฅผ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค:

1. **ํ‘œ์ค€ํ™”**: ๊ฐ ํŠน์ง•(ํ”ฝ์…€)์˜ ํ‰๊ท ์„ ๋ฐ์ดํ„ฐ์…‹์—์„œ ๋นผ์„œ ๋ฐ์ดํ„ฐ๋ฅผ ์ค‘์‹ฌ์— ๋งž์ถฅ๋‹ˆ๋‹ค.
2. **๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ**: ํ‘œ์ค€ํ™”๋œ ๋ฐ์ดํ„ฐ์˜ ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ์„ ๊ณ„์‚ฐํ•˜์—ฌ ํŠน์ง•(ํ”ฝ์…€) ๊ฐ„์˜ ๋ณ€๋™์„ฑ์„ ํฌ์ฐฉํ•ฉ๋‹ˆ๋‹ค.
- ๋‘ ๋ณ€์ˆ˜(์ด ๊ฒฝ์šฐ ํ”ฝ์…€) ๊ฐ„์˜ ๊ณต๋ถ„์‚ฐ์€ ํ•จ๊ป˜ ์–ผ๋งˆ๋‚˜ ๋ณ€ํ•˜๋Š”์ง€๋ฅผ ๋‚˜ํƒ€๋‚ด๋ฏ€๋กœ, ์—ฌ๊ธฐ์„œ์˜ ์•„์ด๋””์–ด๋Š” ์–ด๋–ค ํ”ฝ์…€์ด ์„ ํ˜• ๊ด€๊ณ„๋กœ ํ•จ๊ป˜ ์ฆ๊ฐ€ํ•˜๊ฑฐ๋‚˜ ๊ฐ์†Œํ•˜๋Š”์ง€๋ฅผ ์ฐพ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
- ์˜ˆ๋ฅผ ๋“ค์–ด, ํ”ฝ์…€ 1๊ณผ ํ”ฝ์…€ 2๊ฐ€ ํ•จ๊ป˜ ์ฆ๊ฐ€ํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ๋‹ค๋ฉด, ์ด๋“ค ๊ฐ„์˜ ๊ณต๋ถ„์‚ฐ์€ ์–‘์ˆ˜๊ฐ€ ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค.
- ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ์€ 10,000x10,000 ํ–‰๋ ฌ์ด ๋˜๋ฉฐ, ๊ฐ ํ•ญ๋ชฉ์€ ๋‘ ํ”ฝ์…€ ๊ฐ„์˜ ๊ณต๋ถ„์‚ฐ์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.
3. **๊ณ ์œ ๊ฐ’ ๋ฐฉ์ •์‹ ํ•ด๊ฒฐ**: ํ•ด๊ฒฐํ•ด์•ผ ํ•  ๊ณ ์œ ๊ฐ’ ๋ฐฉ์ •์‹์€ `C * v = ฮป * v`์ด๋ฉฐ, ์—ฌ๊ธฐ์„œ C๋Š” ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ, v๋Š” ๊ณ ์œ ๋ฒกํ„ฐ, ฮป๋Š” ๊ณ ์œ ๊ฐ’์ž…๋‹ˆ๋‹ค. ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฐฉ๋ฒ•์œผ๋กœ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
- **๊ณ ์œ ๊ฐ’ ๋ถ„ํ•ด**: ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ์— ๋Œ€ํ•ด ๊ณ ์œ ๊ฐ’ ๋ถ„ํ•ด๋ฅผ ์ˆ˜ํ–‰ํ•˜์—ฌ ๊ณ ์œ ๊ฐ’๊ณผ ๊ณ ์œ ๋ฒกํ„ฐ๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค.
- **ํŠน์ด๊ฐ’ ๋ถ„ํ•ด (SVD)**: ๋˜๋Š” SVD๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ํ–‰๋ ฌ์„ ํŠน์ด๊ฐ’๊ณผ ๋ฒกํ„ฐ๋กœ ๋ถ„ํ•ดํ•˜์—ฌ ์ฃผ์„ฑ๋ถ„์„ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
4. **์ฃผ์„ฑ๋ถ„ ์„ ํƒ**: ๊ณ ์œ ๊ฐ’์„ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌํ•˜๊ณ  ๊ฐ€์žฅ ํฐ ๊ณ ์œ ๊ฐ’์— ํ•ด๋‹นํ•˜๋Š” ์ƒ์œ„ K๊ฐœ์˜ ๊ณ ์œ ๋ฒกํ„ฐ๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณ ์œ ๋ฒกํ„ฐ๋Š” ๋ฐ์ดํ„ฐ์˜ ์ตœ๋Œ€ ๋ถ„์‚ฐ ๋ฐฉํ–ฅ์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

> [!TIP]
> *์‚ฌ์ด๋ฒ„ ๋ณด์•ˆ์—์„œ์˜ ์‚ฌ์šฉ ์‚ฌ๋ก€:* PCA์˜ ์ผ๋ฐ˜์ ์ธ ์‚ฌ์šฉ ์ค‘ ํ•˜๋‚˜๋Š” ์ด์ƒ ํƒ์ง€๋ฅผ ์œ„ํ•œ ํŠน์ง• ์ถ•์†Œ์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, 40๊ฐœ ์ด์ƒ์˜ ๋„คํŠธ์›Œํฌ ๋ฉ”ํŠธ๋ฆญ(์˜ˆ: NSL-KDD ํŠน์ง•)์„ ๊ฐ€์ง„ ์นจ์ž… ํƒ์ง€ ์‹œ์Šคํ…œ์€ PCA๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ช‡ ๊ฐœ์˜ ๊ตฌ์„ฑ ์š”์†Œ๋กœ ์ค„์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ์‹œ๊ฐํ™”ํ•˜๊ฑฐ๋‚˜ ํด๋Ÿฌ์Šคํ„ฐ๋ง ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ์ž…๋ ฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ถ„์„๊ฐ€๋Š” ์ฒซ ๋ฒˆ์งธ ๋‘ ์ฃผ์„ฑ๋ถ„์˜ ๊ณต๊ฐ„์—์„œ ๋„คํŠธ์›Œํฌ ํŠธ๋ž˜ํ”ฝ์„ ํ”Œ๋กœํŒ…ํ•˜์—ฌ ๊ณต๊ฒฉ์ด ์ •์ƒ ํŠธ๋ž˜ํ”ฝ๊ณผ ๋ถ„๋ฆฌ๋˜๋Š”์ง€๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. PCA๋Š” ๋˜ํ•œ ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ ์ „์†ก๋œ ๋ฐ”์ดํŠธ์™€ ์ˆ˜์‹ ๋œ ๋ฐ”์ดํŠธ์™€ ๊ฐ™์€ ์ค‘๋ณต ํŠน์ง•์„ ์ œ๊ฑฐํ•˜์—ฌ ํƒ์ง€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๋” ๊ฐ•๋ ฅํ•˜๊ณ  ๋น ๋ฅด๊ฒŒ ๋งŒ๋“œ๋Š” ๋ฐ ๋„์›€์ด ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

#### ๊ฐ€์ • ๋ฐ ํ•œ๊ณ„

PCA๋Š” **๋ถ„์‚ฐ์˜ ์ฃผ์ถ•์ด ์˜๋ฏธ๊ฐ€ ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค** โ€“ ์ด๋Š” ์„ ํ˜• ๋ฐฉ๋ฒ•์ด๋ฏ€๋กœ ๋ฐ์ดํ„ฐ์˜ ์„ ํ˜• ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ํฌ์ฐฉํ•ฉ๋‹ˆ๋‹ค. PCA๋Š” ํŠน์ง• ๊ณต๋ถ„์‚ฐ๋งŒ ์‚ฌ์šฉํ•˜๋ฏ€๋กœ ๋น„์ง€๋„ ํ•™์Šต์ž…๋‹ˆ๋‹ค. PCA์˜ ์žฅ์ ์—๋Š” ๋…ธ์ด์ฆˆ ๊ฐ์†Œ(์ž‘์€ ๋ถ„์‚ฐ ๊ตฌ์„ฑ ์š”์†Œ๋Š” ์ข…์ข… ๋…ธ์ด์ฆˆ์— ํ•ด๋‹น)์™€ ํŠน์ง•์˜ ๋น„์ƒ๊ด€ํ™”๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์ค‘๊ฐ„ ์ •๋„์˜ ๊ณ ์ฐจ์›์— ๋Œ€ํ•ด ๊ณ„์‚ฐ์ ์œผ๋กœ ํšจ์œจ์ ์ด๋ฉฐ ์ข…์ข… ๋‹ค๋ฅธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ „์ฒ˜๋ฆฌ ๋‹จ๊ณ„๋กœ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค(์ฐจ์›์˜ ์ €์ฃผ๋ฅผ ์™„ํ™”ํ•˜๊ธฐ ์œ„ํ•ด). ํ•œ๊ณ„ ์ค‘ ํ•˜๋‚˜๋Š” PCA๊ฐ€ ์„ ํ˜• ๊ด€๊ณ„์—๋งŒ ์ œํ•œ๋œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค โ€“ ๋ณต์žกํ•œ ๋น„์„ ํ˜• ๊ตฌ์กฐ๋Š” ํฌ์ฐฉํ•˜์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค(์˜คํ† ์ธ์ฝ”๋”๋‚˜ t-SNE๋Š” ํฌ์ฐฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค). ๋˜ํ•œ PCA ๊ตฌ์„ฑ ์š”์†Œ๋Š” ์›๋ž˜ ํŠน์ง• ์ธก๋ฉด์—์„œ ํ•ด์„ํ•˜๊ธฐ ์–ด๋ ค์šธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(์›๋ž˜ ํŠน์ง•์˜ ์กฐํ•ฉ์ด๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค). ์‚ฌ์ด๋ฒ„ ๋ณด์•ˆ์—์„œ๋Š” ์ฃผ์˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค: ๋‚ฎ์€ ๋ถ„์‚ฐ ํŠน์ง•์—์„œ ๋ฏธ์„ธํ•œ ๋ณ€ํ™”๋งŒ ์ผ์œผํ‚ค๋Š” ๊ณต๊ฒฉ์€ ์ƒ์œ„ PC์—์„œ ๋‚˜ํƒ€๋‚˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(์™œ๋ƒํ•˜๋ฉด PCA๋Š” ๋ฐ˜๋“œ์‹œ "ํฅ๋ฏธ๋กœ์›€"์ด ์•„๋‹ˆ๋ผ ๋ถ„์‚ฐ์„ ์šฐ์„ ์‹œํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค).

<details>
<summary>์˜ˆ์ œ -- ๋„คํŠธ์›Œํฌ ๋ฐ์ดํ„ฐ์˜ ์ฐจ์› ์ถ•์†Œ
</summary>

์—ฌ๋Ÿฌ ํŠน์ง•(์˜ˆ: ์ง€์† ์‹œ๊ฐ„, ๋ฐ”์ดํŠธ, ์ˆ˜)์œผ๋กœ ๊ตฌ์„ฑ๋œ ๋„คํŠธ์›Œํฌ ์—ฐ๊ฒฐ ๋กœ๊ทธ๊ฐ€ ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ๋ช‡ ๊ฐ€์ง€ ํŠน์ง• ๊ฐ„์˜ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ๊ฐ€์ง„ ํ•ฉ์„ฑ 4์ฐจ์› ๋ฐ์ดํ„ฐ์…‹์„ ์ƒ์„ฑํ•˜๊ณ  PCA๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‹œ๊ฐํ™” ๋˜๋Š” ์ถ”๊ฐ€ ๋ถ„์„์„ ์œ„ํ•ด 2์ฐจ์›์œผ๋กœ ์ถ•์†Œํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.
```python
from sklearn.decomposition import PCA

# Create synthetic 4D data (3 clusters similar to before, but add correlated features)
# Base features: duration, bytes (as before)
base_data = np.vstack([normal1, normal2, normal3])  # 1500 points from earlier normal clusters
# Add two more features correlated with existing ones, e.g. packets = bytes/50 + noise, errors = duration/10 + noise
packets = base_data[:, 1] / 50 + rng.normal(scale=0.5, size=len(base_data))
errors = base_data[:, 0] / 10 + rng.normal(scale=0.5, size=len(base_data))
data_4d = np.column_stack([base_data[:, 0], base_data[:, 1], packets, errors])

# Apply PCA to reduce 4D data to 2D
pca = PCA(n_components=2)
data_2d = pca.fit_transform(data_4d)
print("Explained variance ratio of 2 components:", pca.explained_variance_ratio_)
print("Original shape:", data_4d.shape, "Reduced shape:", data_2d.shape)
# We can examine a few transformed points
print("First 5 data points in PCA space:\n", data_2d[:5])

์—ฌ๊ธฐ์—์„œ๋Š” ์ด์ „์˜ ์ •์ƒ ํŠธ๋ž˜ํ”ฝ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ๊ฐ€์ ธ์™€ ๊ฐ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์— ๋ฐ”์ดํŠธ ๋ฐ ์ง€์† ์‹œ๊ฐ„๊ณผ ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ์žˆ๋Š” ๋‘ ๊ฐœ์˜ ์ถ”๊ฐ€ ๊ธฐ๋Šฅ(ํŒจํ‚ท ๋ฐ ์˜ค๋ฅ˜)์„ ํ™•์žฅํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ PCA๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ 4๊ฐœ์˜ ๊ธฐ๋Šฅ์„ 2๊ฐœ์˜ ์ฃผ์„ฑ๋ถ„์œผ๋กœ ์••์ถ•ํ•ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ์„ค๋ช…๋œ ๋ถ„์‚ฐ ๋น„์œจ์„ ์ถœ๋ ฅํ•˜๋ฉฐ, ์ด๋Š” ์˜ˆ๋ฅผ ๋“ค์–ด 2๊ฐœ์˜ ๊ตฌ์„ฑ ์š”์†Œ๊ฐ€ 95% ์ด์ƒ์˜ ๋ถ„์‚ฐ์„ ํฌ์ฐฉํ•œ๋‹ค๊ณ  ๋ณด์—ฌ์ค„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(์ฆ‰, ์ •๋ณด ์†์‹ค์ด ์ ์Œ์„ ์˜๋ฏธ). ์ถœ๋ ฅ์€ ๋ฐ์ดํ„ฐ ํ˜•ํƒœ๊ฐ€ (1500, 4)์—์„œ (1500, 2)๋กœ ์ค„์–ด๋“œ๋Š” ๊ฒƒ๋„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. PCA ๊ณต๊ฐ„์˜ ์ฒ˜์Œ ๋ช‡ ๊ฐœ ํฌ์ธํŠธ๊ฐ€ ์˜ˆ๋กœ ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ๋Š” data_2d๋ฅผ ํ”Œ๋กœํŒ…ํ•˜์—ฌ ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ๊ตฌ๋ณ„ ๊ฐ€๋Šฅํ•œ์ง€ ์‹œ๊ฐ์ ์œผ๋กœ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด์ƒ์ด ์กด์žฌํ•˜๋Š” ๊ฒฝ์šฐ, PCA ๊ณต๊ฐ„์—์„œ ์ฃผ์š” ํด๋Ÿฌ์Šคํ„ฐ์—์„œ ๋ฉ€๋ฆฌ ๋–จ์–ด์ง„ ์ ์œผ๋กœ ๋‚˜ํƒ€๋‚  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ PCA๋Š” ๋ณต์žกํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์ธ๊ฐ„ ํ•ด์„์„ ์œ„ํ•œ ๊ด€๋ฆฌ ๊ฐ€๋Šฅํ•œ ํ˜•ํƒœ๋กœ ์ •์ œํ•˜๊ฑฐ๋‚˜ ๋‹ค๋ฅธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ๋„์™€์ค๋‹ˆ๋‹ค.

Gaussian Mixture Models (GMM)

๊ฐ€์šฐ์‹œ์•ˆ ํ˜ผํ•ฉ ๋ชจ๋ธ์€ ๋ฐ์ดํ„ฐ๊ฐ€ ์•Œ๋ ค์ง€์ง€ ์•Š์€ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๊ฐ€์ง„ ์—ฌ๋Ÿฌ ๊ฐ€์šฐ์‹œ์•ˆ(์ •์ƒ) ๋ถ„ํฌ์˜ ํ˜ผํ•ฉ์—์„œ ์ƒ์„ฑ๋œ๋‹ค๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค. ๋ณธ์งˆ์ ์œผ๋กœ, ์ด๋Š” ํ™•๋ฅ ์  ํด๋Ÿฌ์Šคํ„ฐ๋ง ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค: ๊ฐ ํฌ์ธํŠธ๋ฅผ K๊ฐœ์˜ ๊ฐ€์šฐ์‹œ์•ˆ ๊ตฌ์„ฑ ์š”์†Œ ์ค‘ ํ•˜๋‚˜์— ๋ถ€๋“œ๋Ÿฝ๊ฒŒ ํ• ๋‹นํ•˜๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ๊ฐ€์šฐ์‹œ์•ˆ ๊ตฌ์„ฑ ์š”์†Œ k๋Š” ํ‰๊ท  ๋ฒกํ„ฐ(ฮผ_k), ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ(ฮฃ_k), ๊ทธ๋ฆฌ๊ณ  ํ•ด๋‹น ํด๋Ÿฌ์Šคํ„ฐ์˜ ์œ ๋ณ‘๋ฅ ์„ ๋‚˜ํƒ€๋‚ด๋Š” ํ˜ผํ•ฉ ๊ฐ€์ค‘์น˜(ฯ€_k)๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค. K-ํ‰๊ท ๊ณผ ๋‹ฌ๋ฆฌ GMM์€ ๊ฐ ํฌ์ธํŠธ์— ๊ฐ ํด๋Ÿฌ์Šคํ„ฐ์— ์†ํ•  ํ™•๋ฅ ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

GMM ์ ํ•ฉ์€ ์ผ๋ฐ˜์ ์œผ๋กœ ๊ธฐ๋Œ€-์ตœ๋Œ€ํ™”(EM) ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ†ตํ•ด ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค:

  • ์ดˆ๊ธฐํ™”: ํ‰๊ท , ๊ณต๋ถ„์‚ฐ ๋ฐ ํ˜ผํ•ฉ ๊ณ„์ˆ˜์— ๋Œ€ํ•œ ์ดˆ๊ธฐ ์ถ”์ •๊ฐ’์œผ๋กœ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค(๋˜๋Š” K-ํ‰๊ท  ๊ฒฐ๊ณผ๋ฅผ ์‹œ์ž‘์ ์œผ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค).

  • E-๋‹จ๊ณ„(๊ธฐ๋Œ€): ํ˜„์žฌ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ๊ฐ ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ๊ฐ ํฌ์ธํŠธ์— ๋Œ€ํ•œ ์ฑ…์ž„์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค: ๋ณธ์งˆ์ ์œผ๋กœ r_nk = P(z_k | x_n) ์—ฌ๊ธฐ์„œ z_k๋Š” ํฌ์ธํŠธ x_n์— ๋Œ€ํ•œ ํด๋Ÿฌ์Šคํ„ฐ ์†Œ์†์„ ๋‚˜ํƒ€๋‚ด๋Š” ์ž ์žฌ ๋ณ€์ˆ˜์ž…๋‹ˆ๋‹ค. ์ด๋Š” ๋ฒ ์ด์ฆˆ ์ •๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ˜„์žฌ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ฐ ํฌ์ธํŠธ๊ฐ€ ๊ฐ ํด๋Ÿฌ์Šคํ„ฐ์— ์†ํ•  ํ›„ํ–‰ ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ์ฑ…์ž„์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ณ„์‚ฐ๋ฉ๋‹ˆ๋‹ค:

r_{nk} = \frac{\pi_k \mathcal{N}(x_n | \mu_k, \Sigma_k)}{\sum_{j=1}^{K} \pi_j \mathcal{N}(x_n | \mu_j, \Sigma_j)}

์—ฌ๊ธฐ์„œ:

  • ( \pi_k )๋Š” ํด๋Ÿฌ์Šคํ„ฐ k์— ๋Œ€ํ•œ ํ˜ผํ•ฉ ๊ณ„์ˆ˜(ํด๋Ÿฌ์Šคํ„ฐ k์˜ ์‚ฌ์ „ ํ™•๋ฅ )์ž…๋‹ˆ๋‹ค,

  • ( \mathcal{N}(x_n | \mu_k, \Sigma_k) )๋Š” ํ‰๊ท  ( \mu_k ) ๋ฐ ๊ณต๋ถ„์‚ฐ ( \Sigma_k )๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ํฌ์ธํŠธ ( x_n )์— ๋Œ€ํ•œ ๊ฐ€์šฐ์‹œ์•ˆ ํ™•๋ฅ  ๋ฐ€๋„ ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค.

  • M-๋‹จ๊ณ„(์ตœ๋Œ€ํ™”): E-๋‹จ๊ณ„์—์„œ ๊ณ„์‚ฐ๋œ ์ฑ…์ž„์„ ์‚ฌ์šฉํ•˜์—ฌ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค:

  • ๊ฐ ํ‰๊ท  ฮผ_k๋ฅผ ํฌ์ธํŠธ์˜ ๊ฐ€์ค‘ ํ‰๊ท ์œผ๋กœ ์—…๋ฐ์ดํŠธํ•˜๋ฉฐ, ๊ฐ€์ค‘์น˜๋Š” ์ฑ…์ž„์ž…๋‹ˆ๋‹ค.

  • ๊ฐ ๊ณต๋ถ„์‚ฐ ฮฃ_k๋ฅผ ํด๋Ÿฌ์Šคํ„ฐ k์— ํ• ๋‹น๋œ ํฌ์ธํŠธ์˜ ๊ฐ€์ค‘ ๊ณต๋ถ„์‚ฐ์œผ๋กœ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค.

  • ํ˜ผํ•ฉ ๊ณ„์ˆ˜ ฯ€_k๋ฅผ ํด๋Ÿฌ์Šคํ„ฐ k์— ๋Œ€ํ•œ ํ‰๊ท  ์ฑ…์ž„์œผ๋กœ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค.

  • E ๋ฐ M ๋‹จ๊ณ„๋ฅผ ๋ฐ˜๋ณตํ•˜์—ฌ ์ˆ˜๋ ดํ•  ๋•Œ๊นŒ์ง€(๋งค๊ฐœ๋ณ€์ˆ˜๊ฐ€ ์•ˆ์ •ํ™”๋˜๊ฑฐ๋‚˜ ์šฐ๋„ ๊ฐœ์„ ์ด ์ž„๊ณ„๊ฐ’ ์ดํ•˜๋กœ ๋–จ์–ด์งˆ ๋•Œ๊นŒ์ง€).

๊ฒฐ๊ณผ๋Š” ์ „์ฒด ๋ฐ์ดํ„ฐ ๋ถ„ํฌ๋ฅผ ์ง‘ํ•ฉ์ ์œผ๋กœ ๋ชจ๋ธ๋งํ•˜๋Š” ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ ์ง‘ํ•ฉ์ž…๋‹ˆ๋‹ค. ์ ํ•ฉ๋œ GMM์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ํฌ์ธํŠธ๋ฅผ ๊ฐ€์žฅ ๋†’์€ ํ™•๋ฅ ์„ ๊ฐ€์ง„ ๊ฐ€์šฐ์‹œ์•ˆ์— ํ• ๋‹นํ•˜์—ฌ ํด๋Ÿฌ์Šคํ„ฐ๋งํ•˜๊ฑฐ๋‚˜ ๋ถˆํ™•์‹ค์„ฑ์„ ์œ„ํ•ด ํ™•๋ฅ ์„ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ƒˆ๋กœ์šด ํฌ์ธํŠธ์˜ ์šฐ๋„๋ฅผ ํ‰๊ฐ€ํ•˜์—ฌ ๋ชจ๋ธ์— ์ ํ•ฉํ•œ์ง€ ํ™•์ธํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค(์ด์ƒ ํƒ์ง€์— ์œ ์šฉ).

Tip

์‚ฌ์ด๋ฒ„ ๋ณด์•ˆ์˜ ์‚ฌ์šฉ ์‚ฌ๋ก€: GMM์€ ์ •์ƒ ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ๋ฅผ ๋ชจ๋ธ๋งํ•˜์—ฌ ์ด์ƒ ํƒ์ง€์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค: ํ•™์Šต๋œ ํ˜ผํ•ฉ ์•„๋ž˜์—์„œ ๋งค์šฐ ๋‚ฎ์€ ํ™•๋ฅ ์„ ๊ฐ€์ง„ ํฌ์ธํŠธ๋Š” ์ด์ƒ์œผ๋กœ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ํ•ฉ๋ฒ•์ ์ธ ๋„คํŠธ์›Œํฌ ํŠธ๋ž˜ํ”ฝ ๊ธฐ๋Šฅ์— ๋Œ€ํ•ด GMM์„ ํ›ˆ๋ จํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค; ํ•™์Šต๋œ ํด๋Ÿฌ์Šคํ„ฐ์™€ ์œ ์‚ฌํ•˜์ง€ ์•Š์€ ๊ณต๊ฒฉ ์—ฐ๊ฒฐ์€ ๋‚ฎ์€ ์šฐ๋„๋ฅผ ๊ฐ€์งˆ ๊ฒƒ์ž…๋‹ˆ๋‹ค. GMM์€ ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ์„œ๋กœ ๋‹ค๋ฅธ ๋ชจ์–‘์„ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๋Š” ํ™œ๋™์„ ํด๋Ÿฌ์Šคํ„ฐ๋งํ•˜๋Š” ๋ฐ์—๋„ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค โ€“ ์˜ˆ๋ฅผ ๋“ค์–ด, ๊ฐ ํ”„๋กœํ•„์˜ ๊ธฐ๋Šฅ์ด ๊ฐ€์šฐ์‹œ์•ˆ๊ณผ ์œ ์‚ฌํ•˜์ง€๋งŒ ๊ณ ์œ ํ•œ ๋ถ„์‚ฐ ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง„ ํ–‰๋™ ํ”„๋กœํ•„์— ๋”ฐ๋ผ ์‚ฌ์šฉ์ž๋ฅผ ๊ทธ๋ฃนํ™”ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋˜ ๋‹ค๋ฅธ ์‹œ๋‚˜๋ฆฌ์˜ค๋Š” ํ”ผ์‹ฑ ํƒ์ง€์—์„œ ํ•ฉ๋ฒ•์ ์ธ ์ด๋ฉ”์ผ ๊ธฐ๋Šฅ์ด ํ•˜๋‚˜์˜ ๊ฐ€์šฐ์‹œ์•ˆ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ํ˜•์„ฑํ•˜๊ณ , ์•Œ๋ ค์ง„ ํ”ผ์‹ฑ์ด ๋‹ค๋ฅธ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ํ˜•์„ฑํ•˜๋ฉฐ, ์ƒˆ๋กœ์šด ํ”ผ์‹ฑ ์บ ํŽ˜์ธ์ด ๊ธฐ์กด ํ˜ผํ•ฉ์— ๋น„ํ•ด ๋ณ„๋„์˜ ๊ฐ€์šฐ์‹œ์•ˆ ๋˜๋Š” ๋‚ฎ์€ ํ™•๋ฅ  ํฌ์ธํŠธ๋กœ ๋‚˜ํƒ€๋‚  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ฐ€์ • ๋ฐ ํ•œ๊ณ„

GMM์€ ๊ณต๋ถ„์‚ฐ์„ ํฌํ•จํ•˜๋Š” K-ํ‰๊ท ์˜ ์ผ๋ฐ˜ํ™”๋กœ, ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ํƒ€์›ํ˜•์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(๊ตฌํ˜•์— ๊ตญํ•œ๋˜์ง€ ์•Š์Œ). ๊ณต๋ถ„์‚ฐ์ด ์™„์ „ํ•  ๊ฒฝ์šฐ ์„œ๋กœ ๋‹ค๋ฅธ ํฌ๊ธฐ์™€ ๋ชจ์–‘์˜ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํด๋Ÿฌ์Šคํ„ฐ ๊ฒฝ๊ณ„๊ฐ€ ๋ชจํ˜ธํ•  ๋•Œ ์†Œํ”„ํŠธ ํด๋Ÿฌ์Šคํ„ฐ๋ง์€ ์žฅ์ ์ž…๋‹ˆ๋‹ค โ€“ ์˜ˆ๋ฅผ ๋“ค์–ด, ์‚ฌ์ด๋ฒ„ ๋ณด์•ˆ์—์„œ ์ด๋ฒคํŠธ๋Š” ์—ฌ๋Ÿฌ ๊ณต๊ฒฉ ์œ ํ˜•์˜ ํŠน์„ฑ์„ ๊ฐ€์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค; GMM์€ ํ™•๋ฅ ๋กœ ๊ทธ ๋ถˆํ™•์‹ค์„ฑ์„ ๋ฐ˜์˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. GMM์€ ๋˜ํ•œ ๋ฐ์ดํ„ฐ์˜ ํ™•๋ฅ  ๋ฐ€๋„ ์ถ”์ •์„ ์ œ๊ณตํ•˜์—ฌ ์ด์ƒ๊ฐ’(๋ชจ๋“  ํ˜ผํ•ฉ ๊ตฌ์„ฑ ์š”์†Œ ์•„๋ž˜์—์„œ ๋‚ฎ์€ ์šฐ๋„๋ฅผ ๊ฐ€์ง„ ํฌ์ธํŠธ)์„ ํƒ์ง€ํ•˜๋Š” ๋ฐ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.

๋‹จ์ ์œผ๋กœ๋Š” GMM์ด ๊ตฌ์„ฑ ์š”์†Œ K์˜ ์ˆ˜๋ฅผ ์ง€์ •ํ•ด์•ผ ํ•œ๋‹ค๋Š” ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค(๊ทธ๋Ÿฌ๋‚˜ BIC/AIC์™€ ๊ฐ™์€ ๊ธฐ์ค€์„ ์‚ฌ์šฉํ•˜์—ฌ ์„ ํƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค). EM์€ ๋•Œ๋•Œ๋กœ ๋А๋ฆฌ๊ฒŒ ์ˆ˜๋ ดํ•˜๊ฑฐ๋‚˜ ์ง€์—ญ ์ตœ์ ์ ์— ์ˆ˜๋ ดํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์ดˆ๊ธฐํ™”๊ฐ€ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค(์ข…์ข… EM์„ ์—ฌ๋Ÿฌ ๋ฒˆ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค). ๋ฐ์ดํ„ฐ๊ฐ€ ์‹ค์ œ๋กœ ๊ฐ€์šฐ์‹œ์•ˆ์˜ ํ˜ผํ•ฉ์„ ๋”ฐ๋ฅด์ง€ ์•Š๋Š” ๊ฒฝ์šฐ ๋ชจ๋ธ์ด ๋ถ€์ ํ•ฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜๋‚˜์˜ ๊ฐ€์šฐ์‹œ์•ˆ์ด ๋‹จ์ง€ ์ด์ƒ๊ฐ’์„ ๋ฎ๊ธฐ ์œ„ํ•ด ์ถ•์†Œ๋˜๋Š” ์œ„ํ—˜๋„ ์žˆ์œผ๋ฉฐ(์ •๊ทœํ™” ๋˜๋Š” ์ตœ์†Œ ๊ณต๋ถ„์‚ฐ ๊ฒฝ๊ณ„๋กœ ์™„ํ™”ํ•  ์ˆ˜ ์žˆ์Œ).

์˜ˆ์ œ -- ์†Œํ”„ํŠธ ํด๋Ÿฌ์Šคํ„ฐ๋ง ๋ฐ ์ด์ƒ ์ ์ˆ˜ ```python from sklearn.mixture import GaussianMixture

Fit a GMM with 3 components to the normal traffic data

gmm = GaussianMixture(n_components=3, covariance_type=โ€˜fullโ€™, random_state=0) gmm.fit(base_data) # using the 1500 normal data points from PCA example

Print the learned Gaussian parameters

print(โ€œGMM means:\nโ€, gmm.means_) print(โ€œGMM covariance matrices:\nโ€, gmm.covariances_)

Take a sample attack-like point and evaluate it

sample_attack = np.array([[200, 800]]) # an outlier similar to earlier attack cluster probs = gmm.predict_proba(sample_attack) log_likelihood = gmm.score_samples(sample_attack) print(โ€œCluster membership probabilities for sample attack:โ€, probs) print(โ€œLog-likelihood of sample attack under GMM:โ€, log_likelihood)

์ด ์ฝ”๋“œ์—์„œ๋Š” ์ •์ƒ ํŠธ๋ž˜ํ”ฝ์—์„œ 3๊ฐœ์˜ ๊ฐ€์šฐ์‹œ์•ˆ์œผ๋กœ GMM์„ ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค(์ •์ƒ ํŠธ๋ž˜ํ”ฝ์˜ 3๊ฐœ ํ”„๋กœํ•„์„ ์•Œ๊ณ  ์žˆ๋‹ค๊ณ  ๊ฐ€์ •). ์ธ์‡„๋œ ํ‰๊ท ๊ณผ ๊ณต๋ถ„์‚ฐ์€ ์ด๋Ÿฌํ•œ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค(์˜ˆ๋ฅผ ๋“ค์–ด, ํ•˜๋‚˜์˜ ํ‰๊ท ์€ [50,500] ๊ทผ์ฒ˜์ผ ์ˆ˜ ์žˆ์œผ๋ฉฐ ์ด๋Š” ํ•˜๋‚˜์˜ ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ์— ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค). ๊ทธ๋Ÿฐ ๋‹ค์Œ ์˜์‹ฌ์Šค๋Ÿฌ์šด ์—ฐ๊ฒฐ [duration=200, bytes=800]์„ ํ…Œ์ŠคํŠธํ•ฉ๋‹ˆ๋‹ค. predict_proba๋Š” ์ด ์ ์ด 3๊ฐœ์˜ ํด๋Ÿฌ์Šคํ„ฐ ๊ฐ๊ฐ์— ์†ํ•  ํ™•๋ฅ ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. [200,800]์ด ์ •์ƒ ํด๋Ÿฌ์Šคํ„ฐ์—์„œ ๋ฉ€๋ฆฌ ๋–จ์–ด์ ธ ์žˆ์œผ๋ฏ€๋กœ ์ด๋Ÿฌํ•œ ํ™•๋ฅ ์€ ๋งค์šฐ ๋‚ฎ๊ฑฐ๋‚˜ ํฌ๊ฒŒ ์™œ๊ณก๋  ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ๋ฉ๋‹ˆ๋‹ค. ์ „์ฒด score_samples(๋กœ๊ทธ ์šฐ๋„)๊ฐ€ ์ธ์‡„๋˜๋ฉฐ, ๋งค์šฐ ๋‚ฎ์€ ๊ฐ’์€ ํ•ด๋‹น ์ ์ด ๋ชจ๋ธ์— ์ž˜ ๋งž์ง€ ์•Š์Œ์„ ๋‚˜ํƒ€๋‚ด์–ด ์ด๋ฅผ ์ด์ƒ์น˜๋กœ ํ”Œ๋ž˜๊ทธํ•ฉ๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ๋Š” ๋กœ๊ทธ ์šฐ๋„(๋˜๋Š” ์ตœ๋Œ€ ํ™•๋ฅ )์— ์ž„๊ณ„๊ฐ’์„ ์„ค์ •ํ•˜์—ฌ ์ ์ด ์•…์˜์ ์ด๋ผ๊ณ  ๊ฐ„์ฃผ๋  ๋งŒํผ ์ถฉ๋ถ„ํžˆ ๊ฐ€๋Šฅ์„ฑ์ด ๋‚ฎ์€์ง€ ๊ฒฐ์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ GMM์€ ์ด์ƒ ํƒ์ง€๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ์›์น™์ ์ธ ๋ฐฉ๋ฒ•์„ ์ œ๊ณตํ•˜๋ฉฐ ๋ถˆํ™•์‹ค์„ฑ์„ ์ธ์ •ํ•˜๋Š” ๋ถ€๋“œ๋Ÿฌ์šด ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

### Isolation Forest

**Isolation Forest**๋Š” ์ ์„ ๋ฌด์ž‘์œ„๋กœ ๊ฒฉ๋ฆฌํ•˜๋Š” ์•„์ด๋””์–ด์— ๊ธฐ๋ฐ˜ํ•œ ์•™์ƒ๋ธ” ์ด์ƒ ํƒ์ง€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค. ์›๋ฆฌ๋Š” ์ด์ƒ์น˜๋Š” ์ ๊ณ  ๋‹ค๋ฅด๊ธฐ ๋•Œ๋ฌธ์— ์ •์ƒ ์ ๋ณด๋‹ค ๊ฒฉ๋ฆฌํ•˜๊ธฐ๊ฐ€ ๋” ์‰ฝ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. Isolation Forest๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋ฌด์ž‘์œ„๋กœ ๋ถ„ํ• ํ•˜๋Š” ๋งŽ์€ ์ด์ง„ ๊ฒฉ๋ฆฌ ํŠธ๋ฆฌ(๋ฌด์ž‘์œ„ ๊ฒฐ์ • ํŠธ๋ฆฌ)๋ฅผ ๊ตฌ์ถ•ํ•ฉ๋‹ˆ๋‹ค. ํŠธ๋ฆฌ์˜ ๊ฐ ๋…ธ๋“œ์—์„œ ๋ฌด์ž‘์œ„ ํŠน์„ฑ์ด ์„ ํƒ๋˜๊ณ  ํ•ด๋‹น ํŠน์„ฑ์˜ ์ตœ์†Œ๊ฐ’๊ณผ ์ตœ๋Œ€๊ฐ’ ์‚ฌ์ด์—์„œ ๋ฌด์ž‘์œ„ ๋ถ„ํ•  ๊ฐ’์ด ์„ ํƒ๋ฉ๋‹ˆ๋‹ค. ์ด ๋ถ„ํ• ์€ ๋ฐ์ดํ„ฐ๋ฅผ ๋‘ ๊ฐœ์˜ ๊ฐ€์ง€๋กœ ๋‚˜๋ˆ•๋‹ˆ๋‹ค. ๊ฐ ์ ์ด ์ž์‹ ์˜ ๋ฆฌํ”„์— ๊ฒฉ๋ฆฌ๋˜๊ฑฐ๋‚˜ ์ตœ๋Œ€ ํŠธ๋ฆฌ ๋†’์ด์— ๋„๋‹ฌํ•  ๋•Œ๊นŒ์ง€ ํŠธ๋ฆฌ๊ฐ€ ์„ฑ์žฅํ•ฉ๋‹ˆ๋‹ค.

์ด์ƒ ํƒ์ง€๋Š” ์ด๋Ÿฌํ•œ ๋ฌด์ž‘์œ„ ํŠธ๋ฆฌ์—์„œ ๊ฐ ์ ์˜ ๊ฒฝ๋กœ ๊ธธ์ด๋ฅผ ๊ด€์ฐฐํ•˜์—ฌ ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค. ์ฆ‰, ์ ์„ ๊ฒฉ๋ฆฌํ•˜๋Š” ๋ฐ ํ•„์š”ํ•œ ๋ถ„ํ•  ์ˆ˜์ž…๋‹ˆ๋‹ค. ์ง๊ด€์ ์œผ๋กœ, ์ด์ƒ์น˜(์ด์ƒ๊ฐ’)๋Š” ๋ฌด์ž‘์œ„ ๋ถ„ํ• ์ด ํฌ์†Œ ์ง€์—ญ์— ์žˆ๋Š” ์ด์ƒ์น˜๋ฅผ ๋ถ„๋ฆฌํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋” ๋†’๊ธฐ ๋•Œ๋ฌธ์— ๋” ๋นจ๋ฆฌ ๊ฒฉ๋ฆฌ๋˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์Šต๋‹ˆ๋‹ค. Isolation Forest๋Š” ๋ชจ๋“  ํŠธ๋ฆฌ์—์„œ ํ‰๊ท  ๊ฒฝ๋กœ ๊ธธ์ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ด์ƒ ์ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค: ํ‰๊ท  ๊ฒฝ๋กœ๊ฐ€ ์งง์„์ˆ˜๋ก โ†’ ๋” ์ด์ƒ์ ์ž…๋‹ˆ๋‹ค. ์ ์ˆ˜๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ [0,1]๋กœ ์ •๊ทœํ™”๋˜๋ฉฐ, 1์€ ๋งค์šฐ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์€ ์ด์ƒ์น˜๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

> [!TIP]
> *์‚ฌ์ด๋ฒ„ ๋ณด์•ˆ์˜ ์‚ฌ์šฉ ์‚ฌ๋ก€:* Isolation Forest๋Š” ์นจ์ž… ํƒ์ง€ ๋ฐ ์‚ฌ๊ธฐ ํƒ์ง€์— ์„ฑ๊ณต์ ์œผ๋กœ ์‚ฌ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์ •์ƒ ํ–‰๋™์ด ๋Œ€๋ถ€๋ถ„์ธ ๋„คํŠธ์›Œํฌ ํŠธ๋ž˜ํ”ฝ ๋กœ๊ทธ์—์„œ Isolation Forest๋ฅผ ํ›ˆ๋ จํ•˜๋ฉด, ์ˆฒ์€ ์ด์ƒ ํŠธ๋ž˜ํ”ฝ(์˜ˆ: ๋“ค์–ด๋ณธ ์  ์—†๋Š” ํฌํŠธ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” IP ๋˜๋Š” ๋น„์ •์ƒ์ ์ธ ํŒจํ‚ท ํฌ๊ธฐ ํŒจํ„ด)์— ๋Œ€ํ•ด ์งง์€ ๊ฒฝ๋กœ๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ๊ฒ€์‚ฌ๋ฅผ ์œ„ํ•ด ํ”Œ๋ž˜๊ทธ๋ฅผ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. ๋ ˆ์ด๋ธ”์ด ์ง€์ •๋œ ๊ณต๊ฒฉ์ด ํ•„์š”ํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ์•Œ๋ ค์ง€์ง€ ์•Š์€ ๊ณต๊ฒฉ ์œ ํ˜•์„ ํƒ์ง€ํ•˜๋Š” ๋ฐ ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ ์‚ฌ์šฉ์ž ๋กœ๊ทธ์ธ ๋ฐ์ดํ„ฐ์— ๋ฐฐํฌํ•˜์—ฌ ๊ณ„์ • ํƒˆ์ทจ๋ฅผ ํƒ์ง€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(์ด์ƒ์ ์ธ ๋กœ๊ทธ์ธ ์‹œ๊ฐ„์ด๋‚˜ ์œ„์น˜๊ฐ€ ๋น ๋ฅด๊ฒŒ ๊ฒฉ๋ฆฌ๋ฉ๋‹ˆ๋‹ค). ํ•œ ์‚ฌ์šฉ ์‚ฌ๋ก€์—์„œ Isolation Forest๋Š” ์‹œ์Šคํ…œ ๋ฉ”ํŠธ๋ฆญ์„ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๊ณ  ๋ฉ”ํŠธ๋ฆญ ์กฐํ•ฉ(CPU, ๋„คํŠธ์›Œํฌ, ํŒŒ์ผ ๋ณ€๊ฒฝ)์ด ์—ญ์‚ฌ์  ํŒจํ„ด๊ณผ ๋งค์šฐ ๋‹ค๋ฅด๊ฒŒ ๋ณด์ผ ๋•Œ ๊ฒฝ๊ณ ๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ๊ธฐ์—…์„ ๋ณดํ˜ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

#### Assumptions and Limitations

**์žฅ์ **: Isolation Forest๋Š” ๋ถ„ํฌ ๊ฐ€์ •์„ ํ•„์š”๋กœ ํ•˜์ง€ ์•Š์œผ๋ฉฐ, ๊ฒฉ๋ฆฌ๋ฅผ ์ง์ ‘ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ์™€ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹์—์„œ ํšจ์œจ์ ์ด๋ฉฐ(์ˆฒ์„ ๊ตฌ์ถ•ํ•˜๋Š” ๋ฐ ์„ ํ˜• ๋ณต์žก๋„ $O(n\log n)$) ๊ฐ ํŠธ๋ฆฌ๋Š” ์˜ค์ง ์ผ๋ถ€ ํŠน์„ฑ๊ณผ ๋ถ„ํ• ๋กœ ์ ์„ ๊ฒฉ๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ์ˆซ์ž ํŠน์„ฑ์„ ์ž˜ ์ฒ˜๋ฆฌํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์œผ๋ฉฐ, $O(n^2)$์ผ ์ˆ˜ ์žˆ๋Š” ๊ฑฐ๋ฆฌ ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•๋ณด๋‹ค ๋” ๋น ๋ฅผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ์ž๋™์œผ๋กœ ์ด์ƒ ์ ์ˆ˜๋ฅผ ์ œ๊ณตํ•˜๋ฏ€๋กœ ๊ฒฝ๊ณ ๋ฅผ ์œ„ํ•œ ์ž„๊ณ„๊ฐ’์„ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(๋˜๋Š” ์˜ˆ์ƒ ์ด์ƒ์น˜ ๋น„์œจ์— ๋”ฐ๋ผ ์ž๋™์œผ๋กœ ์ปท์˜คํ”„๋ฅผ ๊ฒฐ์ •ํ•˜๊ธฐ ์œ„ํ•ด ์˜ค์—ผ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค).

**์ œํ•œ ์‚ฌํ•ญ**: ๋ฌด์ž‘์œ„ ํŠน์„ฑ ๋•Œ๋ฌธ์— ๊ฒฐ๊ณผ๋Š” ์‹คํ–‰ ๊ฐ„์— ์•ฝ๊ฐ„ ๋‹ค๋ฅผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(์ถฉ๋ถ„ํ•œ ์ˆ˜์˜ ํŠธ๋ฆฌ๊ฐ€ ์žˆ์„ ๊ฒฝ์šฐ ์ด๋Š” ๋ฏธ๋ฏธํ•ฉ๋‹ˆ๋‹ค). ๋ฐ์ดํ„ฐ์— ๋งŽ์€ ๊ด€๋ จ ์—†๋Š” ํŠน์„ฑ์ด ์žˆ๊ฑฐ๋‚˜ ์ด์ƒ์น˜๊ฐ€ ์–ด๋–ค ํŠน์„ฑ์—์„œ๋„ ๊ฐ•ํ•˜๊ฒŒ ๊ตฌ๋ณ„๋˜์ง€ ์•Š์œผ๋ฉด ๊ฒฉ๋ฆฌ๊ฐ€ ํšจ๊ณผ์ ์ด์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(๋ฌด์ž‘์œ„ ๋ถ„ํ• ์ด ์šฐ์—ฐํžˆ ์ •์ƒ ์ ์„ ๊ฒฉ๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋งŽ์€ ํŠธ๋ฆฌ๋ฅผ ํ‰๊ท ํ™”ํ•˜๋ฉด ์ด๋ฅผ ์™„ํ™”ํ•ฉ๋‹ˆ๋‹ค). ๋˜ํ•œ, Isolation Forest๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ์ด์ƒ์น˜๊ฐ€ ์†Œ์ˆ˜๋ผ๋Š” ๊ฒƒ์„ ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค(์ด๋Š” ์‚ฌ์ด๋ฒ„ ๋ณด์•ˆ ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์‹ค์ž…๋‹ˆ๋‹ค).

<details>
<summary>์˜ˆ์ œ -- ๋„คํŠธ์›Œํฌ ๋กœ๊ทธ์—์„œ ์ด์ƒ์น˜ ํƒ์ง€
</summary>

์ด์ „ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์…‹(์ •์ƒ ๋ฐ ์ผ๋ถ€ ๊ณต๊ฒฉ ์ ์„ ํฌํ•จ)์„ ์‚ฌ์šฉํ•˜์—ฌ Isolation Forest๋ฅผ ์‹คํ–‰ํ•˜์—ฌ ๊ณต๊ฒฉ์„ ๋ถ„๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ๋ฐ์ดํ„ฐ์˜ ์•ฝ 15%๊ฐ€ ์ด์ƒ์ ์ผ ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒํ•œ๋‹ค๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค.
```python
from sklearn.ensemble import IsolationForest

# Combine normal and attack test data from autoencoder example
X_test_if = test_data  # (120 x 2 array with 100 normal and 20 attack points)
# Train Isolation Forest (unsupervised) on the test set itself for demo (in practice train on known normal)
iso_forest = IsolationForest(n_estimators=100, contamination=0.15, random_state=0)
iso_forest.fit(X_test_if)
# Predict anomalies (-1 for anomaly, 1 for normal)
preds = iso_forest.predict(X_test_if)
anomaly_scores = iso_forest.decision_function(X_test_if)  # the higher, the more normal
print("Isolation Forest predicted labels (first 20):", preds[:20])
print("Number of anomalies detected:", np.sum(preds == -1))
print("Example anomaly scores (lower means more anomalous):", anomaly_scores[:5])

์ด ์ฝ”๋“œ์—์„œ๋Š” 100๊ฐœ์˜ ํŠธ๋ฆฌ๋กœ IsolationForest๋ฅผ ์ธ์Šคํ„ด์Šคํ™”ํ•˜๊ณ  contamination=0.15๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค(์ฆ‰, ์•ฝ 15%์˜ ์ด์ƒ์น˜๋ฅผ ์˜ˆ์ƒํ•˜๋ฉฐ, ๋ชจ๋ธ์€ ~15%์˜ ํฌ์ธํŠธ๊ฐ€ ํ”Œ๋ž˜๊ทธ๊ฐ€ ์ง€์ •๋˜๋„๋ก ์ ์ˆ˜ ์ž„๊ณ„๊ฐ’์„ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค). ์šฐ๋ฆฌ๋Š” ์ •์ƒ ํฌ์ธํŠธ์™€ ๊ณต๊ฒฉ ํฌ์ธํŠธ๊ฐ€ ํ˜ผํ•ฉ๋œ X_test_if์— ๋งž์ถฅ๋‹ˆ๋‹ค(์ฐธ๊ณ : ์ผ๋ฐ˜์ ์œผ๋กœ๋Š” ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์— ๋งž์ถ˜ ํ›„ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ์˜ˆ์ธกํ•˜์ง€๋งŒ, ์—ฌ๊ธฐ์„œ๋Š” ๊ฒฐ๊ณผ๋ฅผ ์ง์ ‘ ๊ด€์ฐฐํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ™์€ ์„ธํŠธ์— ๋งž์ถ”๊ณ  ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค).

์ถœ๋ ฅ์€ ์ฒซ 20 ํฌ์ธํŠธ์— ๋Œ€ํ•œ ์˜ˆ์ธก ๋ ˆ์ด๋ธ”์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค(-1์€ ์ด์ƒ์น˜๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค). ์šฐ๋ฆฌ๋Š” ์ด ๋ช‡ ๊ฐœ์˜ ์ด์ƒ์น˜๊ฐ€ ๊ฐ์ง€๋˜์—ˆ๋Š”์ง€์™€ ๋ช‡ ๊ฐ€์ง€ ์˜ˆ์ œ ์ด์ƒ์น˜ ์ ์ˆ˜๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” 120 ํฌ์ธํŠธ ์ค‘ ์•ฝ 18๊ฐœ๊ฐ€ -1๋กœ ๋ ˆ์ด๋ธ”์ด ์ง€์ •๋  ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒํ•ฉ๋‹ˆ๋‹ค(์˜ค์—ผ๋„๊ฐ€ 15%์˜€๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค). ์šฐ๋ฆฌ์˜ 20๊ฐœ ๊ณต๊ฒฉ ์ƒ˜ํ”Œ์ด ์‹ค์ œ๋กœ ๊ฐ€์žฅ ์™ธ๊ณฝ์— ์žˆ๋‹ค๋ฉด, ๊ทธ๋“ค ๋Œ€๋ถ€๋ถ„์€ -1 ์˜ˆ์ธก์— ๋‚˜ํƒ€๋‚˜์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด์ƒ์น˜ ์ ์ˆ˜(Isolation Forest์˜ ๊ฒฐ์ • ํ•จ์ˆ˜)๋Š” ์ •์ƒ ํฌ์ธํŠธ์— ๋Œ€ํ•ด ๋” ๋†’๊ณ  ์ด์ƒ์น˜์— ๋Œ€ํ•ด ๋” ๋‚ฎ์Šต๋‹ˆ๋‹ค(๋” ๋ถ€์ •์ ) โ€“ ์šฐ๋ฆฌ๋Š” ๋ถ„๋ฆฌ๋ฅผ ๋ณด๊ธฐ ์œ„ํ•ด ๋ช‡ ๊ฐ€์ง€ ๊ฐ’์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์ ์ˆ˜๋ณ„๋กœ ์ •๋ ฌํ•˜์—ฌ ์ƒ์œ„ ์ด์ƒ์น˜๋ฅผ ๋ณด๊ณ  ์กฐ์‚ฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ Isolation Forest๋Š” ๋Œ€๊ทœ๋ชจ ๋น„ํ‘œ์‹œ ๋ณด์•ˆ ๋ฐ์ดํ„ฐ๋ฅผ ํšจ์œจ์ ์œผ๋กœ ์„ ๋ณ„ํ•˜๊ณ  ์ธ๊ฐ„ ๋ถ„์„์ด๋‚˜ ์ถ”๊ฐ€ ์ž๋™ ๊ฒ€ํ† ๋ฅผ ์œ„ํ•ด ๊ฐ€์žฅ ๋ถˆ๊ทœ์น™ํ•œ ์ธ์Šคํ„ด์Šค๋ฅผ ์„ ํƒํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

t-SNE (t-๋ถ„ํฌ ํ™•๋ฅ ์  ์ด์›ƒ ์ž„๋ฒ ๋”ฉ)

t-SNE๋Š” ๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ๋ฅผ 2์ฐจ์› ๋˜๋Š” 3์ฐจ์›์œผ๋กœ ์‹œ๊ฐํ™”ํ•˜๊ธฐ ์œ„ํ•ด ํŠน๋ณ„ํžˆ ์„ค๊ณ„๋œ ๋น„์„ ํ˜• ์ฐจ์› ์ถ•์†Œ ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ ๊ฐ„์˜ ์œ ์‚ฌ์„ฑ์„ ๊ฒฐํ•ฉ ํ™•๋ฅ  ๋ถ„ํฌ๋กœ ๋ณ€ํ™˜ํ•˜๊ณ  ์ €์ฐจ์› ํˆฌ์˜์—์„œ ์ง€์—ญ ์ด์›ƒ์˜ ๊ตฌ์กฐ๋ฅผ ๋ณด์กดํ•˜๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ„๋‹จํžˆ ๋งํ•ด, t-SNE๋Š” (์˜ˆ๋ฅผ ๋“ค์–ด) 2D์—์„œ ์œ ์‚ฌํ•œ ํฌ์ธํŠธ(์›๋ž˜ ๊ณต๊ฐ„์—์„œ)๊ฐ€ ์„œ๋กœ ๊ฐ€๊นŒ์ด ์œ„์น˜ํ•˜๊ณ  ๋น„์œ ์‚ฌํ•œ ํฌ์ธํŠธ๊ฐ€ ๋ฉ€๋ฆฌ ๋–จ์–ด์ง€๋„๋ก ๋ฐฐ์น˜ํ•ฉ๋‹ˆ๋‹ค.

์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋‘ ๊ฐ€์ง€ ์ฃผ์š” ๋‹จ๊ณ„๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค:

  1. ๊ณ ์ฐจ์› ๊ณต๊ฐ„์—์„œ ์Œ๋ณ„ ์นœํ™”๋„ ๊ณ„์‚ฐ: ๊ฐ ํฌ์ธํŠธ ์Œ์— ๋Œ€ํ•ด t-SNE๋Š” ๊ทธ ์Œ์„ ์ด์›ƒ์œผ๋กœ ์„ ํƒํ•  ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค(์ด๋Š” ๊ฐ ํฌ์ธํŠธ์— ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ๋ฅผ ์ค‘์‹ฌ์œผ๋กœ ํ•˜๊ณ  ๊ฑฐ๋ฆฌ๋ฅผ ์ธก์ •ํ•˜์—ฌ ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค โ€“ ํ˜ผ๋ž€๋„ ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” ๊ณ ๋ ค๋˜๋Š” ์ด์›ƒ์˜ ํšจ๊ณผ์ ์ธ ์ˆ˜์— ์˜ํ–ฅ์„ ๋ฏธ์นฉ๋‹ˆ๋‹ค).
  2. ์ €์ฐจ์›(์˜ˆ: 2D) ๊ณต๊ฐ„์—์„œ ์Œ๋ณ„ ์นœํ™”๋„ ๊ณ„์‚ฐ: ์ฒ˜์Œ์— ํฌ์ธํŠธ๋Š” 2D์—์„œ ๋ฌด์ž‘์œ„๋กœ ๋ฐฐ์น˜๋ฉ๋‹ˆ๋‹ค. t-SNE๋Š” ์ด ๋งต์—์„œ ๊ฑฐ๋ฆฌ์˜ ์œ ์‚ฌํ•œ ํ™•๋ฅ ์„ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค(๊ฐ€์šฐ์‹œ์•ˆ๋ณด๋‹ค ๋” ๋‘๊บผ์šด ๊ผฌ๋ฆฌ๋ฅผ ๊ฐ€์ง„ ์ŠคํŠœ๋˜ํŠธ t-๋ถ„ํฌ ์ปค๋„์„ ์‚ฌ์šฉํ•˜์—ฌ ๋จผ ํฌ์ธํŠธ์— ๋” ๋งŽ์€ ์ž์œ ๋ฅผ ํ—ˆ์šฉํ•ฉ๋‹ˆ๋‹ค).
  3. ๊ฒฝ๋Ÿ‰ ํ•˜๊ฐ•๋ฒ•: t-SNE๋Š” ๊ณ ์ฐจ์› ์นœํ™”๋„ ๋ถ„ํฌ์™€ ์ €์ฐจ์› ๋ถ„ํฌ ๊ฐ„์˜ ์ฟจ๋ฐฑ-๋ผ์ด๋ธ”๋Ÿฌ(KL) ๋ฐœ์‚ฐ์„ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด 2D์—์„œ ํฌ์ธํŠธ๋ฅผ ๋ฐ˜๋ณต์ ์œผ๋กœ ์ด๋™ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” 2D ๋ฐฐ์—ด์ด ๊ฐ€๋Šฅํ•œ ํ•œ ๊ณ ์ฐจ์› ๊ตฌ์กฐ๋ฅผ ๋ฐ˜์˜ํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค โ€“ ์›๋ž˜ ๊ณต๊ฐ„์—์„œ ๊ฐ€๊นŒ์› ๋˜ ํฌ์ธํŠธ๋Š” ์„œ๋กœ ๋Œ์–ด๋‹น๊ธฐ๊ณ , ๋ฉ€๋ฆฌ ๋–จ์–ด์ง„ ํฌ์ธํŠธ๋Š” ๋ฐ€์–ด๋‚ด์–ด ๊ท ํ˜•์„ ์ฐพ์„ ๋•Œ๊นŒ์ง€ ์ง„ํ–‰๋ฉ๋‹ˆ๋‹ค.

๊ฒฐ๊ณผ๋Š” ๋ฐ์ดํ„ฐ์˜ ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ๋ช…ํ™•ํ•ด์ง€๋Š” ์‹œ๊ฐ์ ์œผ๋กœ ์˜๋ฏธ ์žˆ๋Š” ์‚ฐ์ ๋„๊ฐ€ ๋˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Šต๋‹ˆ๋‹ค.

Tip

์‚ฌ์ด๋ฒ„ ๋ณด์•ˆ์—์„œ์˜ ์‚ฌ์šฉ ์‚ฌ๋ก€: t-SNE๋Š” ์ข…์ข… ์ธ๊ฐ„ ๋ถ„์„์„ ์œ„ํ•œ ๊ณ ์ฐจ์› ๋ณด์•ˆ ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”์— ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๋ณด์•ˆ ์šด์˜ ์„ผํ„ฐ์—์„œ ๋ถ„์„๊ฐ€๋Š” ์ˆ˜์‹ญ ๊ฐœ์˜ ํŠน์„ฑ์„ ๊ฐ€์ง„ ์ด๋ฒคํŠธ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๊ฐ€์ ธ์™€(tcp ํฌํŠธ ๋ฒˆํ˜ธ, ๋นˆ๋„, ๋ฐ”์ดํŠธ ์ˆ˜ ๋“ฑ) t-SNE๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ 2D ํ”Œ๋กฏ์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ณต๊ฒฉ์€ ์ด ํ”Œ๋กฏ์—์„œ ์ •์ƒ ๋ฐ์ดํ„ฐ์™€ ๋ถ„๋ฆฌ๋˜๊ฑฐ๋‚˜ ์ž์ฒด ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ํ˜•์„ฑํ•˜์—ฌ ์‹๋ณ„ํ•˜๊ธฐ ์‰ฝ๊ฒŒ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ์ด๋Š” ๋งฌ์›จ์–ด ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ์ ์šฉ๋˜์–ด ๋งฌ์›จ์–ด ๊ฐ€์กฑ์˜ ๊ทธ๋ฃนํ™”๋ฅผ ๋ณด๊ฑฐ๋‚˜ ์„œ๋กœ ๋‹ค๋ฅธ ๊ณต๊ฒฉ ์œ ํ˜•์ด ๋šœ๋ ทํ•˜๊ฒŒ ํด๋Ÿฌ์Šคํ„ฐ๋ง๋˜๋Š” ๋„คํŠธ์›Œํฌ ์นจ์ž… ๋ฐ์ดํ„ฐ์— ์ ์šฉ๋˜์–ด ์ถ”๊ฐ€ ์กฐ์‚ฌ๋ฅผ ์•ˆ๋‚ดํ•ฉ๋‹ˆ๋‹ค. ๋ณธ์งˆ์ ์œผ๋กœ t-SNE๋Š” ์‚ฌ์ด๋ฒ„ ๋ฐ์ดํ„ฐ์—์„œ ๊ตฌ์กฐ๋ฅผ ๋ณผ ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

๊ฐ€์ • ๋ฐ ํ•œ๊ณ„

t-SNE๋Š” ํŒจํ„ด์˜ ์‹œ๊ฐ์  ๋ฐœ๊ฒฌ์— ํ›Œ๋ฅญํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋‹ค๋ฅธ ์„ ํ˜• ๋ฐฉ๋ฒ•(PCA์™€ ๊ฐ™์€)์œผ๋กœ๋Š” ๋“œ๋Ÿฌ๋‚˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ๋Š” ํด๋Ÿฌ์Šคํ„ฐ, ํ•˜์œ„ ํด๋Ÿฌ์Šคํ„ฐ ๋ฐ ์ด์ƒ์น˜๋ฅผ ๋“œ๋Ÿฌ๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋งฌ์›จ์–ด ํ–‰๋™ ํ”„๋กœํŒŒ์ผ์ด๋‚˜ ๋„คํŠธ์›Œํฌ ํŠธ๋ž˜ํ”ฝ ํŒจํ„ด๊ณผ ๊ฐ™์€ ๋ณต์žกํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์‹œ๊ฐํ™”ํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์ด๋ฒ„ ๋ณด์•ˆ ์—ฐ๊ตฌ์— ์‚ฌ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ง€์—ญ ๊ตฌ์กฐ๋ฅผ ๋ณด์กดํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ž์—ฐ์Šค๋Ÿฌ์šด ๊ทธ๋ฃนํ™”๋ฅผ ๋ณด์—ฌ์ฃผ๋Š” ๋ฐ ์ข‹์Šต๋‹ˆ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ t-SNE๋Š” ๊ณ„์‚ฐ์ ์œผ๋กœ ๋” ๋ฌด๊ฒ์Šต๋‹ˆ๋‹ค(์•ฝ $O(n^2)$) ๋”ฐ๋ผ์„œ ๋งค์šฐ ํฐ ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ๊ฒฝ์šฐ ์ƒ˜ํ”Œ๋ง์ด ํ•„์š”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ์ถœ๋ ฅ์— ์˜ํ–ฅ์„ ๋ฏธ์น  ์ˆ˜ ์žˆ๋Š” ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ(ํ˜ผ๋ž€๋„, ํ•™์Šต๋ฅ , ๋ฐ˜๋ณต ํšŸ์ˆ˜)๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค โ€“ ์˜ˆ๋ฅผ ๋“ค์–ด, ์„œ๋กœ ๋‹ค๋ฅธ ํ˜ผ๋ž€๋„ ๊ฐ’์€ ์„œ๋กœ ๋‹ค๋ฅธ ์Šค์ผ€์ผ์—์„œ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ๋“œ๋Ÿฌ๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. t-SNE ํ”Œ๋กฏ์€ ๋•Œ๋•Œ๋กœ ์ž˜๋ชป ํ•ด์„๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค โ€“ ๋งต์˜ ๊ฑฐ๋ฆฌ๋“ค์€ ์ „์—ญ์ ์œผ๋กœ ์ง์ ‘์ ์œผ๋กœ ์˜๋ฏธ๊ฐ€ ์—†์œผ๋ฉฐ(์ง€์—ญ ์ด์›ƒ์— ์ดˆ์ ์„ ๋งž์ถ”๊ธฐ ๋•Œ๋ฌธ์—, ๋•Œ๋•Œ๋กœ ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ์ธ์œ„์ ์œผ๋กœ ์ž˜ ๋ถ„๋ฆฌ๋˜์–ด ๋ณด์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค). ๋˜ํ•œ t-SNE๋Š” ์ฃผ๋กœ ์‹œ๊ฐํ™”๋ฅผ ์œ„ํ•œ ๊ฒƒ์ด๋ฉฐ, ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๋ฅผ ์ง์ ‘์ ์œผ๋กœ ํˆฌ์˜ํ•˜๋Š” ๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ•์„ ์ œ๊ณตํ•˜์ง€ ์•Š์œผ๋ฉฐ, ์˜ˆ์ธก ๋ชจ๋ธ๋ง์„ ์œ„ํ•œ ์ „์ฒ˜๋ฆฌ๋กœ ์‚ฌ์šฉ๋˜๋„๋ก ์„ค๊ณ„๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค(UMAP์€ ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ๋” ๋น ๋ฅธ ์†๋„๋กœ ํ•ด๊ฒฐํ•˜๋Š” ๋Œ€์•ˆ์ž…๋‹ˆ๋‹ค).

์˜ˆ์ œ -- ๋„คํŠธ์›Œํฌ ์—ฐ๊ฒฐ ์‹œ๊ฐํ™”

์šฐ๋ฆฌ๋Š” t-SNE๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์ค‘ ํŠน์„ฑ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ 2D๋กœ ์ถ•์†Œํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์ด์ „์˜ 4D ๋ฐ์ดํ„ฐ(์ •์ƒ ํŠธ๋ž˜ํ”ฝ์˜ 3๊ฐœ์˜ ์ž์—ฐ ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ์žˆ์—ˆ์Œ)์— ๋ช‡ ๊ฐœ์˜ ์ด์ƒ์น˜ ํฌ์ธํŠธ๋ฅผ ์ถ”๊ฐ€ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ t-SNE๋ฅผ ์‹คํ–‰ํ•˜๊ณ (๊ฐœ๋…์ ์œผ๋กœ) ๊ฒฐ๊ณผ๋ฅผ ์‹œ๊ฐํ™”ํ•ฉ๋‹ˆ๋‹ค.

# 1 โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
#    Create synthetic 4-D dataset
#      โ€ข Three clusters of โ€œnormalโ€ traffic (duration, bytes)
#      โ€ข Two correlated features: packets & errors
#      โ€ข Five outlier points to simulate suspicious traffic
# โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
import numpy as np
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE
from sklearn.preprocessing import StandardScaler

rng = np.random.RandomState(42)

# Base (duration, bytes) clusters
normal1 = rng.normal(loc=[50, 500],  scale=[10, 100], size=(500, 2))
normal2 = rng.normal(loc=[60, 1500], scale=[8,  200], size=(500, 2))
normal3 = rng.normal(loc=[70, 3000], scale=[5,  300], size=(500, 2))

base_data = np.vstack([normal1, normal2, normal3])       # (1500, 2)

# Correlated features
packets = base_data[:, 1] / 50 + rng.normal(scale=0.5, size=len(base_data))
errors  = base_data[:, 0] / 10 + rng.normal(scale=0.5, size=len(base_data))

data_4d = np.column_stack([base_data, packets, errors])  # (1500, 4)

# Outlier / attack points
outliers_4d = np.column_stack([
rng.normal(250, 1, size=5),     # extreme duration
rng.normal(1000, 1, size=5),    # moderate bytes
rng.normal(5, 1, size=5),       # very low packets
rng.normal(25, 1, size=5)       # high errors
])

data_viz = np.vstack([data_4d, outliers_4d])             # (1505, 4)

# 2 โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
#    Standardize features (recommended for t-SNE)
# โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data_viz)

# 3 โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
#    Run t-SNE to project 4-D โ†’ 2-D
# โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
tsne = TSNE(
n_components=2,
perplexity=30,
learning_rate='auto',
init='pca',
random_state=0
)
data_2d = tsne.fit_transform(data_scaled)
print("t-SNE output shape:", data_2d.shape)  # (1505, 2)

# 4 โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
#    Visualize: normal traffic vs. outliers
# โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
plt.figure(figsize=(8, 6))
plt.scatter(
data_2d[:-5, 0], data_2d[:-5, 1],
label="Normal traffic",
alpha=0.6,
s=10
)
plt.scatter(
data_2d[-5:, 0], data_2d[-5:, 1],
label="Outliers / attacks",
alpha=0.9,
s=40,
marker="X",
edgecolor='k'
)

plt.title("t-SNE Projection of Synthetic Network Traffic")
plt.xlabel("t-SNE component 1")
plt.ylabel("t-SNE component 2")
plt.legend()
plt.tight_layout()
plt.show()

์—ฌ๊ธฐ์—์„œ๋Š” ์ด์ „์˜ 4D ์ •์ƒ ๋ฐ์ดํ„ฐ์…‹๊ณผ ๊ทน๋‹จ์ ์ธ ์ด์ƒ์น˜ ๋ช‡ ๊ฐœ๋ฅผ ๊ฒฐํ•ฉํ–ˆ์Šต๋‹ˆ๋‹ค(์ด์ƒ์น˜๋Š” ํ•˜๋‚˜์˜ ํŠน์„ฑ(โ€œdurationโ€)์ด ๋งค์šฐ ๋†’๊ฒŒ ์„ค์ •๋˜์–ด ์žˆ์–ด ์ด์ƒํ•œ ํŒจํ„ด์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•ฉ๋‹ˆ๋‹ค). ์šฐ๋ฆฌ๋Š” ์ผ๋ฐ˜์ ์ธ ํ˜ผ๋ž€๋„ 30์œผ๋กœ t-SNE๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ถœ๋ ฅ ๋ฐ์ดํ„ฐ data_2d์˜ ํ˜•ํƒœ๋Š” (1505, 2)์ž…๋‹ˆ๋‹ค. ์ด ํ…์ŠคํŠธ์—์„œ๋Š” ์‹ค์ œ๋กœ ํ”Œ๋กฏ์„ ๊ทธ๋ฆฌ์ง€ ์•Š๊ฒ ์ง€๋งŒ, ๋งŒ์•ฝ ๊ทธ๋ฆฐ๋‹ค๋ฉด 3๊ฐœ์˜ ์ •์ƒ ํด๋Ÿฌ์Šคํ„ฐ์— ํ•ด๋‹นํ•˜๋Š” ์„ธ ๊ฐœ์˜ ๋ฐ€์ง‘ ํด๋Ÿฌ์Šคํ„ฐ์™€ ๊ทธ ํด๋Ÿฌ์Šคํ„ฐ์—์„œ ๋ฉ€๋ฆฌ ๋–จ์–ด์ง„ ๊ณ ๋ฆฝ๋œ ์ ์œผ๋กœ ๋‚˜ํƒ€๋‚˜๋Š” 5๊ฐœ์˜ ์ด์ƒ์น˜๋ฅผ ๋ณผ ์ˆ˜ ์žˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ธํ„ฐ๋ž™ํ‹ฐ๋ธŒ ์›Œํฌํ”Œ๋กœ์šฐ์—์„œ๋Š” ๋ ˆ์ด๋ธ”(์ •์ƒ ๋˜๋Š” ์–ด๋–ค ํด๋Ÿฌ์Šคํ„ฐ, ๋Œ€ ์ด์ƒ์น˜)์— ๋”ฐ๋ผ ์ ์˜ ์ƒ‰์ƒ์„ ์ง€์ •ํ•˜์—ฌ ์ด ๊ตฌ์กฐ๋ฅผ ๊ฒ€์ฆํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ ˆ์ด๋ธ”์ด ์—†๋”๋ผ๋„ ๋ถ„์„๊ฐ€๋Š” 2D ํ”Œ๋กฏ์—์„œ ๋นˆ ๊ณต๊ฐ„์— ์žˆ๋Š” 5๊ฐœ์˜ ์ ์„ ๋ฐœ๊ฒฌํ•˜๊ณ  ์ด๋ฅผ ํ‘œ์‹œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” t-SNE๊ฐ€ ์‚ฌ์ด๋ฒ„ ๋ณด์•ˆ ๋ฐ์ดํ„ฐ์—์„œ ์‹œ๊ฐ์  ์ด์ƒ ํƒ์ง€ ๋ฐ ํด๋Ÿฌ์Šคํ„ฐ ๊ฒ€์‚ฌ๋ฅผ ์œ„ํ•œ ๊ฐ•๋ ฅํ•œ ๋„๊ตฌ๊ฐ€ ๋  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ฃผ๋ฉฐ, ์œ„์˜ ์ž๋™ํ™”๋œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๋ณด์™„ํ•ฉ๋‹ˆ๋‹ค.

HDBSCAN (๋…ธ์ด์ฆˆ๊ฐ€ ์žˆ๋Š” ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์˜ ๊ณ„์ธต ๋ฐ€๋„ ๊ธฐ๋ฐ˜ ๊ณต๊ฐ„ ํด๋Ÿฌ์Šคํ„ฐ๋ง)

HDBSCAN์€ ๋‹จ์ผ ์ „์—ญ eps ๊ฐ’์„ ์„ ํƒํ•  ํ•„์š”๋ฅผ ์—†์• ๊ณ  ๋ฐ€๋„๊ฐ€ ๋‹ค๋ฅธ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ๋ณต๊ตฌํ•  ์ˆ˜ ์žˆ๋„๋ก ๋ฐ€๋„ ์—ฐ๊ฒฐ ๊ตฌ์„ฑ ์š”์†Œ์˜ ๊ณ„์ธต์„ ๊ตฌ์ถ•ํ•œ ๋‹ค์Œ ์ด๋ฅผ ์‘์ถ•ํ•˜๋Š” DBSCAN์˜ ํ™•์žฅ์ž…๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์ธ DBSCAN๊ณผ ๋น„๊ตํ•  ๋•Œ ๋ณดํ†ต

  • ์ผ๋ถ€ ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ๋ฐ€์ง‘ํ•˜๊ณ  ๋‹ค๋ฅธ ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ํฌ๋ฐ•ํ•  ๋•Œ ๋” ์ง๊ด€์ ์ธ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
  • ํ•˜๋‚˜์˜ ์‹ค์ œ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ(min_cluster_size)์™€ ํ•ฉ๋ฆฌ์ ์ธ ๊ธฐ๋ณธ๊ฐ’๋งŒ ๊ฐ€์ง‘๋‹ˆ๋‹ค.
  • ๋ชจ๋“  ์ ์— ํด๋Ÿฌ์Šคํ„ฐ ๋ฉค๋ฒ„์‹ญ ํ™•๋ฅ ๊ณผ ์ด์ƒ์น˜ ์ ์ˆ˜(outlier_scores_)๋ฅผ ๋ถ€์—ฌํ•˜์—ฌ ์œ„ํ˜‘ ํƒ์ง€ ๋Œ€์‹œ๋ณด๋“œ์— ๋งค์šฐ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.

Tip

์‚ฌ์ด๋ฒ„ ๋ณด์•ˆ์˜ ์‚ฌ์šฉ ์‚ฌ๋ก€: HDBSCAN์€ ํ˜„๋Œ€์˜ ์œ„ํ˜‘ ํƒ์ง€ ํŒŒ์ดํ”„๋ผ์ธ์—์„œ ๋งค์šฐ ์ธ๊ธฐ๊ฐ€ ์žˆ์œผ๋ฉฐ, ์ƒ์—…์šฉ XDR ์Šค์œ„ํŠธ์™€ ํ•จ๊ป˜ ์ œ๊ณต๋˜๋Š” ๋…ธํŠธ๋ถ ๊ธฐ๋ฐ˜ ํƒ์ง€ ํ”Œ๋ ˆ์ด๋ถ ๋‚ด์—์„œ ์ž์ฃผ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜๋‚˜์˜ ์‹ค์šฉ์ ์ธ ๋ ˆ์‹œํ”ผ๋Š” IR ๋™์•ˆ HTTP ๋น„์ฝ˜ ํŠธ๋ž˜ํ”ฝ์„ ํด๋Ÿฌ์Šคํ„ฐ๋งํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค: ์‚ฌ์šฉ์ž ์—์ด์ „ํŠธ, ๊ฐ„๊ฒฉ ๋ฐ URI ๊ธธ์ด๋Š” ์ข…์ข… ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ฐ€์ง‘๋œ ํ•ฉ๋ฒ•์ ์ธ ์†Œํ”„ํŠธ์›จ์–ด ์—…๋ฐ์ดํŠธ ๊ทธ๋ฃน์„ ํ˜•์„ฑํ•˜๋Š” ๋ฐ˜๋ฉด, C2 ๋น„์ฝ˜์€ ์ž‘์€ ์ €๋ฐ€๋„ ํด๋Ÿฌ์Šคํ„ฐ ๋˜๋Š” ์ˆœ์ˆ˜ํ•œ ๋…ธ์ด์ฆˆ๋กœ ๋‚จ์•„ ์žˆ์Šต๋‹ˆ๋‹ค.

์˜ˆ์‹œ โ€“ ๋น„์ฝ˜ C2 ์ฑ„๋„ ์ฐพ๊ธฐ ```python import pandas as pd from hdbscan import HDBSCAN from sklearn.preprocessing import StandardScaler

df has features extracted from proxy logs

features = [ โ€œavg_intervalโ€, # seconds between requests โ€œuri_length_meanโ€, # average URI length โ€œuser_agent_entropyโ€ # Shannon entropy of UA string ] X = StandardScaler().fit_transform(df[features])

hdb = HDBSCAN(min_cluster_size=15, # at least 15 similar beacons to be a group metric=โ€œeuclideanโ€, prediction_data=True) labels = hdb.fit_predict(X)

df[โ€œclusterโ€] = labels

Anything with label == -1 is noise โ†’ inspect as potential C2

suspects = df[df[โ€œclusterโ€] == -1] print(โ€œSuspect beacon count:โ€, len(suspects))

</details>

---

### ๊ฐ•๊ฑด์„ฑ ๋ฐ ๋ณด์•ˆ ๊ณ ๋ ค์‚ฌํ•ญ โ€“ ์˜ค์—ผ ๋ฐ ์ ๋Œ€์  ๊ณต๊ฒฉ (2023-2025)

์ตœ๊ทผ ์—ฐ๊ตฌ์— ๋”ฐ๋ฅด๋ฉด **๋น„์ง€๋„ ํ•™์Šต์ž๋Š” *์ ๊ทน์ ์ธ ๊ณต๊ฒฉ์ž*์— ๋ฉด์—ญ์ด *์•„๋‹ˆ๋‹ค***:

* **์ด์ƒ ํƒ์ง€๊ธฐ๋ฅผ ์œ„ํ•œ ๋ฐ์ดํ„ฐ ์˜ค์—ผ.** Chen *et al.* (IEEE S&P 2024)์€ 3%์˜ ์กฐ์ž‘๋œ ํŠธ๋ž˜ํ”ฝ๋งŒ ์ถ”๊ฐ€ํ•ด๋„ Isolation Forest์™€ ECOD์˜ ๊ฒฐ์ • ๊ฒฝ๊ณ„๋ฅผ ์ด๋™์‹œ์ผœ ์‹ค์ œ ๊ณต๊ฒฉ์ด ์ •์ƒ์œผ๋กœ ๋ณด์ด๊ฒŒ ํ•  ์ˆ˜ ์žˆ์Œ์„ ์ž…์ฆํ–ˆ์Šต๋‹ˆ๋‹ค. ์ €์ž๋“ค์€ ์˜ค์—ผ ํฌ์ธํŠธ๋ฅผ ์ž๋™์œผ๋กœ ํ•ฉ์„ฑํ•˜๋Š” ์˜คํ”ˆ ์†Œ์Šค PoC(`udo-poison`)๋ฅผ ๊ณต๊ฐœํ–ˆ์Šต๋‹ˆ๋‹ค.
* **ํด๋Ÿฌ์Šคํ„ฐ๋ง ๋ชจ๋ธ์˜ ๋ฐฑ๋„์–ด.** *BadCME* ๊ธฐ๋ฒ• (BlackHat EU 2023)์€ ์ž‘์€ ํŠธ๋ฆฌ๊ฑฐ ํŒจํ„ด์„ ์‹ฌ์–ด๋†“์Šต๋‹ˆ๋‹ค; ๊ทธ ํŠธ๋ฆฌ๊ฑฐ๊ฐ€ ๋‚˜ํƒ€๋‚  ๋•Œ๋งˆ๋‹ค K-Means ๊ธฐ๋ฐ˜ ํƒ์ง€๊ธฐ๊ฐ€ ์กฐ์šฉํžˆ ์ด๋ฒคํŠธ๋ฅผ โ€œ์–‘์„ฑโ€ ํด๋Ÿฌ์Šคํ„ฐ ์•ˆ์— ๋ฐฐ์น˜ํ•ฉ๋‹ˆ๋‹ค.
* **DBSCAN/HDBSCAN ํšŒํ”ผ.** 2025๋…„ KU Leuven์˜ ํ•™์ˆ  ์‚ฌ์ „ ์ธ์‡„๋ฌผ์€ ๊ณต๊ฒฉ์ž๊ฐ€ ๋ฐ€๋„ ๊ฐ„๊ฒฉ์— ์˜๋„์ ์œผ๋กœ ๋“ค์–ด๊ฐ€๋Š” ๋น„์ฝ˜ ํŒจํ„ด์„ ์กฐ์ž‘ํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ฃผ์–ด *๋…ธ์ด์ฆˆ* ๋ ˆ์ด๋ธ” ์•ˆ์— ํšจ๊ณผ์ ์œผ๋กœ ์ˆจ์„ ์ˆ˜ ์žˆ์Œ์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

์ฃผ๋ชฉ๋ฐ›๊ณ  ์žˆ๋Š” ์™„ํ™”์ฑ…:

1. **๋ชจ๋ธ ์ •ํ™” / TRIM.** ๋ชจ๋“  ์žฌํ›ˆ๋ จ ์—ํฌํฌ ์ „์— 1โ€“2%์˜ ์†์‹ค์ด ๊ฐ€์žฅ ๋†’์€ ํฌ์ธํŠธ๋ฅผ ๋ฒ„๋ ค(ํŠธ๋ฆฌ๋ฐ๋œ ์ตœ๋Œ€ ์šฐ๋„) ์˜ค์—ผ์„ ๊ทน์ ์œผ๋กœ ์–ด๋ ต๊ฒŒ ๋งŒ๋“ญ๋‹ˆ๋‹ค.
2. **ํ•ฉ์˜ ์•™์ƒ๋ธ”.** ์—ฌ๋Ÿฌ ์ด์งˆ์ ์ธ ํƒ์ง€๊ธฐ(์˜ˆ: Isolation Forest + GMM + ECOD)๋ฅผ ๊ฒฐํ•ฉํ•˜๊ณ  *์–ด๋–ค* ๋ชจ๋ธ์ด ํฌ์ธํŠธ๋ฅผ ํ”Œ๋ž˜๊ทธํ•  ๊ฒฝ์šฐ ๊ฒฝ๊ณ ๋ฅผ ๋ฐœ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค. ์—ฐ๊ตฌ์— ๋”ฐ๋ฅด๋ฉด ์ด๋Š” ๊ณต๊ฒฉ์ž์˜ ๋น„์šฉ์„ 10๋ฐฐ ์ด์ƒ ์ฆ๊ฐ€์‹œํ‚ต๋‹ˆ๋‹ค.
3. **ํด๋Ÿฌ์Šคํ„ฐ๋ง์„ ์œ„ํ•œ ๊ฑฐ๋ฆฌ ๊ธฐ๋ฐ˜ ๋ฐฉ์–ด.** `k`๊ฐœ์˜ ์„œ๋กœ ๋‹ค๋ฅธ ๋žœ๋ค ์‹œ๋“œ๋กœ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์žฌ๊ณ„์‚ฐํ•˜๊ณ  ์ง€์†์ ์œผ๋กœ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์ด๋™ํ•˜๋Š” ํฌ์ธํŠธ๋ฅผ ๋ฌด์‹œํ•ฉ๋‹ˆ๋‹ค.

---

### ํ˜„๋Œ€ ์˜คํ”ˆ ์†Œ์Šค ๋„๊ตฌ (2024-2025)

* **PyOD 2.x** (2024๋…„ 5์›” ์ถœ์‹œ)๋Š” *ECOD*, *COPOD* ๋ฐ GPU ๊ฐ€์† *AutoFormer* ํƒ์ง€๊ธฐ๋ฅผ ์ถ”๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด์ œ **ํ•œ ์ค„์˜ ์ฝ”๋“œ**๋กœ ๋ฐ์ดํ„ฐ์…‹์—์„œ 30๊ฐœ ์ด์ƒ์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๋น„๊ตํ•  ์ˆ˜ ์žˆ๋Š” `benchmark` ํ•˜์œ„ ๋ช…๋ น์–ด๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค:
```bash
pyod benchmark --input logs.csv --label attack --n_jobs 8
  • Anomalib v1.5 (2025๋…„ 2์›”)์€ ๋น„์ „ ์ค‘์‹ฌ์ด์ง€๋งŒ ์Šคํฌ๋ฆฐ์ƒท ๊ธฐ๋ฐ˜ ํ”ผ์‹ฑ ํŽ˜์ด์ง€ ํƒ์ง€๋ฅผ ์œ„ํ•œ ์ผ๋ฐ˜์ ์ธ PatchCore ๊ตฌํ˜„๋„ ํฌํ•จํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
  • scikit-learn 1.5 (2024๋…„ 11์›”)์€ ๋“œ๋””์–ด ์ƒˆ๋กœ์šด cluster.HDBSCAN ๋ž˜ํผ๋ฅผ ํ†ตํ•ด HDBSCAN์— ๋Œ€ํ•œ score_samples๋ฅผ ๋…ธ์ถœํ•˜๋ฏ€๋กœ Python 3.12์—์„œ ์™ธ๋ถ€ ๊ธฐ์—ฌ ํŒจํ‚ค์ง€๊ฐ€ ํ•„์š”ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
๋น ๋ฅธ PyOD ์˜ˆ์ œ โ€“ ECOD + Isolation Forest ์•™์ƒ๋ธ” ```python from pyod.models import ECOD, IForest from pyod.utils.data import generate_data, evaluate_print from pyod.utils.example import visualize

X_train, y_train, X_test, y_test = generate_data( n_train=5000, n_test=1000, n_features=16, contamination=0.02, random_state=42)

models = [ECOD(), IForest()]

majority vote โ€“ flag if any model thinks it is anomalous

anomaly_scores = sum(m.fit(X_train).decision_function(X_test) for m in models) / len(models)

evaluate_print(โ€œEnsembleโ€, y_test, anomaly_scores)

</details>

## References

- [HDBSCAN โ€“ Hierarchical density-based clustering](https://github.com/scikit-learn-contrib/hdbscan)
- Chen, X. *et al.* โ€œ๋น„์ง€๋„ ์ด์ƒ ํƒ์ง€์˜ ๋ฐ์ดํ„ฐ ์˜ค์—ผ์— ๋Œ€ํ•œ ์ทจ์•ฝ์„ฑ.โ€ *IEEE ๋ณด์•ˆ ๋ฐ ํ”„๋ผ์ด๋ฒ„์‹œ ์‹ฌํฌ์ง€์—„*, 2024.



> [!TIP]
> AWS ํ•ดํ‚น ๋ฐฐ์šฐ๊ธฐ ๋ฐ ์—ฐ์Šตํ•˜๊ธฐ:<img src="../../../../../images/arte.png" alt="" style="width:auto;height:24px;vertical-align:middle;">[**HackTricks Training AWS Red Team Expert (ARTE)**](https://training.hacktricks.xyz/courses/arte)<img src="../../../../../images/arte.png" alt="" style="width:auto;height:24px;vertical-align:middle;">\
> GCP ํ•ดํ‚น ๋ฐฐ์šฐ๊ธฐ ๋ฐ ์—ฐ์Šตํ•˜๊ธฐ: <img src="../../../../../images/grte.png" alt="" style="width:auto;height:24px;vertical-align:middle;">[**HackTricks Training GCP Red Team Expert (GRTE)**](https://training.hacktricks.xyz/courses/grte)<img src="../../../../../images/grte.png" alt="" style="width:auto;height:24px;vertical-align:middle;">
> Azure ํ•ดํ‚น ๋ฐฐ์šฐ๊ธฐ ๋ฐ ์—ฐ์Šตํ•˜๊ธฐ: <img src="../../../../../images/azrte.png" alt="" style="width:auto;height:24px;vertical-align:middle;">[**HackTricks Training Azure Red Team Expert (AzRTE)**](https://training.hacktricks.xyz/courses/azrte)<img src="../../../../../images/azrte.png" alt="" style="width:auto;height:24px;vertical-align:middle;">
>
> <details>
>
> <summary>HackTricks ์ง€์›ํ•˜๊ธฐ</summary>
>
> - [**๊ตฌ๋… ๊ณ„ํš**](https://github.com/sponsors/carlospolop) ํ™•์ธํ•˜๊ธฐ!
> - **๐Ÿ’ฌ [**๋””์Šค์ฝ”๋“œ ๊ทธ๋ฃน**](https://discord.gg/hRep4RUj7f) ๋˜๋Š” [**ํ…”๋ ˆ๊ทธ๋žจ ๊ทธ๋ฃน**](https://t.me/peass)์— ์ฐธ์—ฌํ•˜๊ฑฐ๋‚˜ **ํŠธ์œ„ํ„ฐ** ๐Ÿฆ [**@hacktricks_live**](https://twitter.com/hacktricks_live)**๋ฅผ ํŒ”๋กœ์šฐํ•˜์„ธ์š”.**
> - **[**HackTricks**](https://github.com/carlospolop/hacktricks) ๋ฐ [**HackTricks Cloud**](https://github.com/carlospolop/hacktricks-cloud) ๊นƒํ—ˆ๋ธŒ ๋ฆฌํฌ์ง€ํ† ๋ฆฌ์— PR์„ ์ œ์ถœํ•˜์—ฌ ํ•ดํ‚น ํŠธ๋ฆญ์„ ๊ณต์œ ํ•˜์„ธ์š”.**
>
> </details>