æåž«ãªãåŠç¿ã¢ã«ãŽãªãºã
Tip
AWSãããã³ã°ãåŠã³ãå®è·µããïŒ
HackTricks Training AWS Red Team Expert (ARTE)
GCPãããã³ã°ãåŠã³ãå®è·µããïŒHackTricks Training GCP Red Team Expert (GRTE)
Azureãããã³ã°ãåŠã³ãå®è·µããïŒ
HackTricks Training Azure Red Team Expert (AzRTE)
HackTricksããµããŒããã
- ãµãã¹ã¯ãªãã·ã§ã³ãã©ã³ã確èªããŠãã ããïŒ
- **ð¬ Discordã°ã«ãŒããŸãã¯ãã¬ã°ã©ã ã°ã«ãŒãã«åå ããããTwitter ðŠ @hacktricks_liveããã©ããŒããŠãã ããã
- HackTricksããã³HackTricks Cloudã®GitHubãªããžããªã«PRãæåºããŠãããã³ã°ããªãã¯ãå ±æããŠãã ããã
æåž«ãªãåŠç¿
æåž«ãªãåŠç¿ã¯ãã©ãã«ä»ãã®å¿çãªãã§ããŒã¿ã«åºã¥ããŠã¢ãã«ãèšç·Žããæ©æ¢°åŠç¿ã®äžçš®ã§ããç®çã¯ãããŒã¿å ã®ãã¿ãŒã³ãæ§é ããŸãã¯é¢ä¿ãèŠã€ããããšã§ããã©ãã«ä»ãã®äŸããåŠç¿ããæåž«ããåŠç¿ãšã¯ç°ãªããæåž«ãªãåŠç¿ã¢ã«ãŽãªãºã ã¯ã©ãã«ã®ãªãããŒã¿ã§åäœããŸãã æåž«ãªãåŠç¿ã¯ãã¯ã©ã¹ã¿ãªã³ã°ã次å åæžãç°åžžæ€åºãªã©ã®ã¿ã¹ã¯ã«ãã°ãã°äœ¿çšãããŸããããŒã¿å ã®é ãããã¿ãŒã³ãçºèŠããããé¡äŒŒã®ã¢ã€ãã ãã°ã«ãŒãåããããããŒã¿ã®æ¬è³ªçãªç¹åŸŽãä¿æããªãããã®è€éããæžå°ãããã®ã«åœ¹ç«ã¡ãŸãã
K-Meansã¯ã©ã¹ã¿ãªã³ã°
K-Meansã¯ãåç¹ãæãè¿ãã¯ã©ã¹ã¿å¹³åã«å²ãåœãŠãããšã«ãã£ãŠããŒã¿ãKåã®ã¯ã©ã¹ã¿ã«åå²ããéå¿ããŒã¹ã®ã¯ã©ã¹ã¿ãªã³ã°ã¢ã«ãŽãªãºã ã§ããã¢ã«ãŽãªãºã ã¯æ¬¡ã®ããã«æ©èœããŸãïŒ
- åæå: Kåã®åæã¯ã©ã¹ã¿äžå¿ïŒéå¿ïŒãéžæããŸããéåžžã¯ã©ã³ãã ã«ããŸãã¯k-means++ã®ãããªããã¹ããŒããªæ¹æ³ã§è¡ããŸãã
- å²ãåœãŠ: è·é¢ã¡ããªãã¯ïŒäŸïŒãŠãŒã¯ãªããè·é¢ïŒã«åºã¥ããŠãåããŒã¿ãã€ã³ããæãè¿ãéå¿ã«å²ãåœãŠãŸãã
- æŽæ°: åã¯ã©ã¹ã¿ã«å²ãåœãŠããããã¹ãŠã®ããŒã¿ãã€ã³ãã®å¹³åãåãããšã§éå¿ãåèšç®ããŸãã
- ç¹°ãè¿ã: ã¯ã©ã¹ã¿ã®å²ãåœãŠãå®å®ãããŸã§ïŒéå¿ã倧ããç§»åããªããªããŸã§ïŒã¹ããã2ã3ãç¹°ãè¿ããŸãã
Tip
ãµã€ããŒã»ãã¥ãªãã£ã«ããããŠãŒã¹ã±ãŒã¹: K-Meansã¯ããããã¯ãŒã¯ã€ãã³ããã¯ã©ã¹ã¿ãªã³ã°ããããšã«ãã£ãŠäŸµå ¥æ€ç¥ã«äœ¿çšãããŸããäŸãã°ãç ç©¶è ã¯KDD Cup 99äŸµå ¥ããŒã¿ã»ããã«K-Meansãé©çšããæ£åžžãªãã©ãã£ãã¯ã𿻿ã¯ã©ã¹ã¿ã«å¹æçã«åå²ãããããšãçºèŠããŸãããå®éã«ã¯ãã»ãã¥ãªãã£ã¢ããªã¹ãã¯ãã°ãšã³ããªããŠãŒã¶ãŒè¡åããŒã¿ãã¯ã©ã¹ã¿ãªã³ã°ããŠãé¡äŒŒã®æŽ»åã®ã°ã«ãŒããèŠã€ããããšããããŸããããŸã圢æãããã¯ã©ã¹ã¿ã«å±ããªããã€ã³ãã¯ãç°åžžã瀺ãå¯èœæ§ããããŸãïŒäŸïŒæ°ãããã«ãŠã§ã¢ã®äºçš®ãç¬èªã®å°ããªã¯ã©ã¹ã¿ã圢æããïŒãK-Meansã¯ããã€ããªãè¡åãããã¡ã€ã«ãç¹åŸŽãã¯ãã«ã«åºã¥ããŠã°ã«ãŒãåããããšã«ãã£ãŠããã«ãŠã§ã¢ãã¡ããªãŒã®åé¡ã«ã圹ç«ã¡ãŸãã
Kã®éžæ
ã¯ã©ã¹ã¿ã®æ°ïŒKïŒã¯ãã¢ã«ãŽãªãºã ãå®è¡ããåã«å®çŸ©ããå¿ èŠããããã€ããŒãã©ã¡ãŒã¿ã§ãããšã«ããŒæ³ãã·ã«ãšããã¹ã³ã¢ã®ãããªææ³ã¯ãã¯ã©ã¹ã¿ãªã³ã°ã®ããã©ãŒãã³ã¹ãè©äŸ¡ããããšã«ãã£ãŠKã®é©åãªå€ã決å®ããã®ã«åœ¹ç«ã¡ãŸãïŒ
- ãšã«ããŒæ³: åãã€ã³ããããã®å²ãåœãŠãããã¯ã©ã¹ã¿éå¿ãŸã§ã®äºä¹è·é¢ã®åèšãKã®é¢æ°ãšããŠããããããŸããæžå°çãæ¥æ¿ã«å€åããããšã«ããŒããã€ã³ããæ¢ããé©åãªã¯ã©ã¹ã¿æ°ã瀺ããŸãã
- ã·ã«ãšããã¹ã³ã¢: ç°ãªãKã®å€ã«å¯ŸããŠã·ã«ãšããã¹ã³ã¢ãèšç®ããŸããã·ã«ãšããã¹ã³ã¢ãé«ãã»ã©ãããæç¢ºã«å®çŸ©ãããã¯ã©ã¹ã¿ã瀺ããŸãã
ä»®å®ãšå¶é
K-Meansã¯ãã¯ã©ã¹ã¿ãçç¶ã§åããµã€ãºã§ãããšä»®å®ããŠããŸãããããã¯ãã¹ãŠã®ããŒã¿ã»ããã«åœãŠã¯ãŸãããã§ã¯ãããŸãããåæã®éå¿ã®é çœ®ã«ææã§ããã屿çãªæå°å€ã«åæããå¯èœæ§ããããŸããããã«ãK-Meansã¯ãç°ãªãå¯åºŠãéçç¶ã®åœ¢ç¶ãæã€ããŒã¿ã»ããããç°ãªãã¹ã±ãŒã«ã®ç¹åŸŽã«ã¯é©ããŠããŸããããã¹ãŠã®ç¹åŸŽãè·é¢èšç®ã«çããå¯äžããããšãä¿èšŒããããã«ãæ£èŠåãæšæºåã®ãããªååŠçã¹ããããå¿ èŠã«ãªãå ŽåããããŸãã
äŸ -- ãããã¯ãŒã¯ã€ãã³ãã®ã¯ã©ã¹ã¿ãªã³ã°
以äžã§ã¯ããããã¯ãŒã¯ãã©ãã£ãã¯ããŒã¿ãã·ãã¥ã¬ãŒãããK-Meansã䜿çšããŠã¯ã©ã¹ã¿ãªã³ã°ããŸããæ¥ç¶æéããã€ãæ°ã®ãããªç¹åŸŽãæã€ã€ãã³ãããããšä»®å®ããŸãããæ£åžžããã©ãã£ãã¯ã®3ã€ã®ã¯ã©ã¹ã¿ãšãæ»æãã¿ãŒã³ã衚ã1ã€ã®å°ããªã¯ã©ã¹ã¿ãäœæããŸãããã®åŸãK-Meansãå®è¡ããŠãããããåé¢ããããã©ããã確èªããŸãã ```python import numpy as np from sklearn.cluster import KMeansSimulate synthetic network traffic data (e.g., [duration, bytes]).
Three normal clusters and one small attack cluster.
rng = np.random.RandomState(42) normal1 = rng.normal(loc=[50, 500], scale=[10, 100], size=(500, 2)) # Cluster 1 normal2 = rng.normal(loc=[60, 1500], scale=[8, 200], size=(500, 2)) # Cluster 2 normal3 = rng.normal(loc=[70, 3000], scale=[5, 300], size=(500, 2)) # Cluster 3 attack = rng.normal(loc=[200, 800], scale=[5, 50], size=(50, 2)) # Small attack cluster
X = np.vstack([normal1, normal2, normal3, attack])
Run K-Means clustering into 4 clusters (we expect it to find the 4 groups)
kmeans = KMeans(n_clusters=4, random_state=0, n_init=10) labels = kmeans.fit_predict(X)
Analyze resulting clusters
clusters, counts = np.unique(labels, return_counts=True) print(fâCluster labels: {clusters}â) print(fâCluster sizes: {counts}â) print(âCluster centers (duration, bytes):â) for idx, center in enumerate(kmeans.cluster_centers_): print(fâ Cluster {idx}: {center}â)
ãã®äŸã§ã¯ãK-Meansã¯4ã€ã®ã¯ã©ã¹ã¿ãŒãèŠã€ããã¹ãã§ããç°åžžã«é«ãæç¶æéïŒçŽ200ïŒãæã€å°ããªæ»æã¯ã©ã¹ã¿ãŒã¯ãéåžžã®ã¯ã©ã¹ã¿ãŒããã®è·é¢ãèæ
®ããŠãçæ³çã«ã¯ç¬èªã®ã¯ã©ã¹ã¿ãŒã圢æããŸããçµæãè§£éããããã«ãã¯ã©ã¹ã¿ãŒã®ãµã€ãºãšäžå¿ãå°å·ããŸããå®éã®ã·ããªãªã§ã¯ãå°æ°ã®ãã€ã³ããæã€ã¯ã©ã¹ã¿ãŒã«æœåšçãªç°åžžãšããŠã©ãã«ãä»ãããããã®ã¡ã³ããŒãæªæã®ããæŽ»åã®ããã«èª¿æ»ããããšãã§ããŸãã
</details>
### éå±€çã¯ã©ã¹ã¿ãªã³ã°
éå±€çã¯ã©ã¹ã¿ãªã³ã°ã¯ãããã ã¢ããïŒåéåïŒã¢ãããŒããŸãã¯ãããããŠã³ïŒåå²åïŒã¢ãããŒãã䜿çšããŠãã¯ã©ã¹ã¿ãŒã®éå±€ãæ§ç¯ããŸãã
1. **åéåïŒããã ã¢ããïŒ**: åããŒã¿ãã€ã³ããå¥ã
ã®ã¯ã©ã¹ã¿ãŒãšããŠéå§ããæãè¿ãã¯ã©ã¹ã¿ãŒãå埩çã«ããŒãžããŠãåäžã®ã¯ã©ã¹ã¿ãŒãæ®ãããåæ¢åºæºãæºãããããŸã§ç¶ããŸãã
2. **åå²åïŒãããããŠã³ïŒ**: ãã¹ãŠã®ããŒã¿ãã€ã³ããåäžã®ã¯ã©ã¹ã¿ãŒã«å
¥ããåããŒã¿ãã€ã³ããç¬èªã®ã¯ã©ã¹ã¿ãŒã«ãªãããåæ¢åºæºãæºãããããŸã§ã¯ã©ã¹ã¿ãŒãå埩çã«åå²ããŸãã
åéåã¯ã©ã¹ã¿ãªã³ã°ã¯ãã¯ã©ã¹ã¿ãŒéã®è·é¢ã®å®çŸ©ãšãã©ã®ã¯ã©ã¹ã¿ãŒãããŒãžããããæ±ºå®ããããã®ãªã³ã¯åºæºãå¿
èŠãšããŸããäžè¬çãªãªã³ã¯æ¹æ³ã«ã¯ãåäžãªã³ã¯ïŒ2ã€ã®ã¯ã©ã¹ã¿ãŒéã®æãè¿ããã€ã³ãã®è·é¢ïŒãå®å
šãªã³ã¯ïŒæãé ããã€ã³ãã®è·é¢ïŒãå¹³åãªã³ã¯ãªã©ããããè·é¢ã¡ããªãã¯ã¯ãã°ãã°ãŠãŒã¯ãªããã§ãããªã³ã¯ã®éžæã¯çæãããã¯ã©ã¹ã¿ãŒã®åœ¢ç¶ã«åœ±é¿ãäžããŸããã¯ã©ã¹ã¿ãŒã®æ°Kãäºåã«æå®ããå¿
èŠã¯ãªããéžæããã¬ãã«ã§æš¹åœ¢å³ããã«ãããããŠãåžæããæ°ã®ã¯ã©ã¹ã¿ãŒãåŸãããšãã§ããŸãã
éå±€çã¯ã©ã¹ã¿ãªã³ã°ã¯ãç°ãªãç²åºŠã¬ãã«ã§ã¯ã©ã¹ã¿ãŒéã®é¢ä¿ãç€ºãæš¹åœ¢å³ãçæããŸããæš¹åœ¢å³ã¯ãç¹å®ã®æ°ã®ã¯ã©ã¹ã¿ãŒãåŸãããã«åžæããã¬ãã«ã§ã«ããããããšãã§ããŸãã
> [!TIP]
> *ãµã€ããŒã»ãã¥ãªãã£ã«ããããŠãŒã¹ã±ãŒã¹:* éå±€çã¯ã©ã¹ã¿ãªã³ã°ã¯ãã€ãã³ãããšã³ãã£ãã£ãããªãŒã«æŽçããŠé¢ä¿ãç¹å®ããããšãã§ããŸããããšãã°ããã«ãŠã§ã¢åæã§ã¯ãåéåã¯ã©ã¹ã¿ãªã³ã°ãè¡åã®é¡äŒŒæ§ã«ãã£ãŠãµã³ãã«ãã°ã«ãŒãåãããã«ãŠã§ã¢ãã¡ããªãŒãšããªã¢ã³ãã®éå±€ãæããã«ããããšãã§ããŸãããããã¯ãŒã¯ã»ãã¥ãªãã£ã§ã¯ãIPãã©ãã£ãã¯ãããŒãã¯ã©ã¹ã¿ãªã³ã°ããæš¹åœ¢å³ã䜿çšããŠãã©ãã£ãã¯ã®ãµãã°ã«ãŒãïŒäŸïŒãããã³ã«å¥ã次ã«è¡åå¥ïŒã確èªããããšãã§ããŸããKãäºåã«éžæããå¿
èŠããªããããæ»æã«ããŽãªã®æ°ãäžæãªæ°ããããŒã¿ãæ¢çŽ¢ããéã«äŸ¿å©ã§ãã
#### ä»®å®ãšå¶é
éå±€çã¯ã©ã¹ã¿ãªã³ã°ã¯ç¹å®ã®ã¯ã©ã¹ã¿ãŒåœ¢ç¶ãä»®å®ããããã¹ããããã¯ã©ã¹ã¿ãŒããã£ããã£ã§ããŸããããã¯ãã°ã«ãŒãéã®å顿³ãé¢ä¿ãçºèŠããã®ã«åœ¹ç«ã¡ãŸãïŒäŸïŒãã«ãŠã§ã¢ããã¡ããªãŒãµãã°ã«ãŒãã§ã°ã«ãŒãåïŒãããã¯æ±ºå®è«çã§ããïŒã©ã³ãã åæåã®åé¡ã¯ãããŸããïŒãäž»èŠãªå©ç¹ã¯ããã¹ãŠã®ã¹ã±ãŒã«ã§ããŒã¿ã®ã¯ã©ã¹ã¿ãªã³ã°æ§é ã«é¢ããæŽå¯ãæäŸããæš¹åœ¢å³ã§ããã»ãã¥ãªãã£ã¢ããªã¹ãã¯ãæå³ã®ããã¯ã©ã¹ã¿ãŒãç¹å®ããããã®é©åãªã«ãããªããæ±ºå®ã§ããŸãããã ããèšç®ã³ã¹ããé«ãïŒéåžžã¯$O(n^2)$æéãŸãã¯ãã以äžã®ãã€ãŒããªå®è£
ïŒãéåžžã«å€§ããªããŒã¿ã»ããã«ã¯å®è¡å¯èœã§ã¯ãããŸããããŸããããã¯è²ªæ¬²ãªææ³ã§ãããäžåºŠããŒãžãŸãã¯åå²ãè¡ããããšå
ã«æ»ãããšãã§ãããæ©æã«ééããçºçããå Žåã«æé©ã§ãªãã¯ã©ã¹ã¿ãŒãçããå¯èœæ§ããããŸããå€ãå€ãäžéšã®ãªã³ã¯æŠç¥ã«åœ±é¿ãäžããå¯èœæ§ããããŸãïŒåäžãªã³ã¯ã¯ãå€ãå€ãä»ããŠã¯ã©ã¹ã¿ãŒããªã³ã¯ããããã§ã€ãã³ã°ã广ãåŒãèµ·ããå¯èœæ§ããããŸãïŒã
<details>
<summary>äŸ -- ã€ãã³ãã®åéåã¯ã©ã¹ã¿ãªã³ã°
</summary>
K-Meansã®äŸããåæããŒã¿ïŒ3ã€ã®éåžžã®ã¯ã©ã¹ã¿ãŒ + 1ã€ã®æ»æã¯ã©ã¹ã¿ãŒïŒãåå©çšããåéåã¯ã©ã¹ã¿ãªã³ã°ãé©çšããŸããæ¬¡ã«ã暹圢å³ãšã¯ã©ã¹ã¿ãŒã©ãã«ãååŸããæ¹æ³ã瀺ããŸãã
```python
from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import linkage, dendrogram
# Perform agglomerative clustering (bottom-up) on the data
agg = AgglomerativeClustering(n_clusters=None, distance_threshold=0, linkage='ward')
# distance_threshold=0 gives the full tree without cutting (we can cut manually)
agg.fit(X)
print(f"Number of merge steps: {agg.n_clusters_ - 1}") # should equal number of points - 1
# Create a dendrogram using SciPy for visualization (optional)
Z = linkage(X, method='ward')
# Normally, you would plot the dendrogram. Here we'll just compute cluster labels for a chosen cut:
clusters_3 = AgglomerativeClustering(n_clusters=3, linkage='ward').fit_predict(X)
print(f"Labels with 3 clusters: {np.unique(clusters_3)}")
print(f"Cluster sizes for 3 clusters: {np.bincount(clusters_3)}")
DBSCAN (ãã€ãºã䌎ãã¢ããªã±ãŒã·ã§ã³ã®å¯åºŠããŒã¹ã®ç©ºéã¯ã©ã¹ã¿ãªã³ã°)
DBSCANã¯ãå¯åºŠã«åºã¥ãã¯ã©ã¹ã¿ãªã³ã°ã¢ã«ãŽãªãºã ã§ãå¯éããŠãããã€ã³ããã°ã«ãŒãåããäœå¯åºŠé åã®ãã€ã³ããå€ãå€ãšããŠããŒã¯ããŸããããã¯ãç°ãªãå¯åºŠãšéç圢ã®åœ¢ç¶ãæã€ããŒã¿ã»ããã«ç¹ã«æçšã§ãã
DBSCANã¯ã2ã€ã®ãã©ã¡ãŒã¿ãå®çŸ©ããããšã«ãã£ãŠæ©èœããŸãïŒ
- Epsilon (ε): åãã¯ã©ã¹ã¿ã®äžéšãšèŠãªããã2ã€ã®ãã€ã³ãéã®æå€§è·é¢ã
- MinPts: å¯ãªé åïŒã³ã¢ãã€ã³ãïŒã圢æããããã«å¿ èŠãªæå°ãã€ã³ãæ°ã
DBSCANã¯ãã³ã¢ãã€ã³ããããŒããŒãã€ã³ãããã€ãºãã€ã³ããèå¥ããŸãïŒ
- ã³ã¢ãã€ã³ã: εè·é¢å ã«å°ãªããšãMinPtsã®é£æ¥ãã€ã³ããæã€ãã€ã³ãã
- ããŒããŒãã€ã³ã: ã³ã¢ãã€ã³ãã®Îµè·é¢å ã«ããããMinPtsæªæºã®é£æ¥ãã€ã³ããæã€ãã€ã³ãã
- ãã€ãºãã€ã³ã: ã³ã¢ãã€ã³ãã§ãããŒããŒãã€ã³ãã§ããªããã€ã³ãã
ã¯ã©ã¹ã¿ãªã³ã°ã¯ãæªèšªåã®ã³ã¢ãã€ã³ããéžæãããããæ°ããã¯ã©ã¹ã¿ãšããŠããŒã¯ããããããå¯åºŠå°éå¯èœãªãã¹ãŠã®ãã€ã³ãïŒã³ã¢ãã€ã³ããšãã®é£æ¥ãã€ã³ããªã©ïŒãååž°çã«è¿œå ããããšã«ãã£ãŠé²è¡ããŸããããŒããŒãã€ã³ãã¯è¿ãã®ã³ã¢ã®ã¯ã©ã¹ã¿ã«è¿œå ãããŸãããã¹ãŠã®å°éå¯èœãªãã€ã³ããæ¡åŒµããåŸãDBSCANã¯å¥ã®æªèšªåã®ã³ã¢ã«ç§»åããŠæ°ããã¯ã©ã¹ã¿ãéå§ããŸããã©ã®ã³ã¢ã«ãå°éã§ããªãã£ããã€ã³ãã¯ãã€ãºãšããŠã©ãã«ä»ããããŸãã
Tip
ãµã€ããŒã»ãã¥ãªãã£ã«ããããŠãŒã¹ã±ãŒã¹: DBSCANã¯ãããã¯ãŒã¯ãã©ãã£ãã¯ã®ç°åžžæ€åºã«åœ¹ç«ã¡ãŸããããšãã°ãéåžžã®ãŠãŒã¶ãŒæŽ»åã¯ç¹åŸŽç©ºéã«ãããŠ1ã€ä»¥äžã®å¯ãªã¯ã©ã¹ã¿ã圢æããäžæ¹ã§ãæ°ããæ»æè¡åã¯æ£çºçãªãã€ã³ããšããŠçŸããDBSCANã¯ããããã€ãºïŒå€ãå€ïŒãšããŠã©ãã«ä»ãããŸããããã¯ãããŒãã¹ãã£ã³ããµãŒãã¹æåŠãã©ãã£ãã¯ããã€ã³ãã®ãŸã°ããªé åãšããŠæ€åºã§ãããããã¯ãŒã¯ãããŒã¬ã³ãŒãã®ã¯ã©ã¹ã¿ãªã³ã°ã«äœ¿çšãããŠããŸãããå¥ã®ã¢ããªã±ãŒã·ã§ã³ã¯ãã«ãŠã§ã¢ã®ããªã¢ã³ããã°ã«ãŒãåããããšã§ãïŒã»ãšãã©ã®ãµã³ãã«ããã¡ããªãŒããšã«ã¯ã©ã¹ã¿ãªã³ã°ãããããããã€ãã¯ã©ãã«ãé©åããªãå Žåããããã¯ãŒããã€ãã«ãŠã§ã¢ã§ããå¯èœæ§ããããŸãããã€ãºããã©ã°ä»ãããèœåã«ãããã»ãã¥ãªãã£ããŒã ã¯ãããã®å€ãå€ã®èª¿æ»ã«éäžã§ããŸãã
ä»®å®ãšå¶é
ä»®å®ãšåŒ·ã¿: DBSCANã¯çç¶ã®ã¯ã©ã¹ã¿ãä»®å®ããŸãã â ä»»æã®åœ¢ç¶ã®ã¯ã©ã¹ã¿ïŒé£éç¶ãŸãã¯é£æ¥ããã¯ã©ã¹ã¿ãããïŒãèŠã€ããããšãã§ããŸããããŒã¿ã®å¯åºŠã«åºã¥ããŠã¯ã©ã¹ã¿ã®æ°ãèªåçã«æ±ºå®ããå€ãå€ããã€ãºãšããŠå¹æçã«èå¥ã§ããŸããããã«ãããäžèŠåãªåœ¢ç¶ãšãã€ãºãæã€å®äžçã®ããŒã¿ã«å¯ŸããŠåŒ·åã§ããå€ãå€ã«å¯ŸããŠé å¥ã§ãïŒK-Meansãšã¯ç°ãªããå€ãå€ãã¯ã©ã¹ã¿ã«åŒ·å¶çã«å ¥ããŸããïŒãã¯ã©ã¹ã¿ãã»ãŒåäžãªå¯åºŠãæã€å Žåã«ããŸãæ©èœããŸãã
å¶é: DBSCANã®ããã©ãŒãã³ã¹ã¯ãé©åãªÎµãšMinPtsã®å€ãéžæããããšã«äŸåããŸããå¯åºŠãç°ãªãããŒã¿ã«å¯ŸããŠã¯èŠåŽããããšããããŸã â åäžã®Îµã§ã¯ãå¯ãªã¯ã©ã¹ã¿ãšãŸã°ããªã¯ã©ã¹ã¿ã®äž¡æ¹ãå容ã§ããŸãããεãå°ãããããšãã»ãšãã©ã®ãã€ã³ãããã€ãºãšããŠã©ãã«ä»ããããŸãïŒå€§ãããããšãã¯ã©ã¹ã¿ãäžæ£ã«ããŒãžãããå¯èœæ§ããããŸãããŸããDBSCANã¯éåžžã«å€§ããªããŒã¿ã»ããã§ã¯éå¹ççã«ãªãããšããããŸãïŒåçŽã«ã¯$O(n^2)$ã§ããã空éã€ã³ããã¯ã¹ã圹ç«ã€ããšããããŸãïŒã髿¬¡å ã®ç¹åŸŽç©ºéã§ã¯ããεå ã®è·é¢ãã®æŠå¿µãããŸãæå³ãæããªããªãããšããããŸãïŒæ¬¡å ã®åªãïŒããã®ããDBSCANã¯æ éãªãã©ã¡ãŒã¿èª¿æŽãå¿ èŠã«ãªãããçŽæçãªã¯ã©ã¹ã¿ãèŠã€ããããªãããšããããŸããããã«ãããããããHDBSCANã®ãããªæ¡åŒµã¯ãããã€ãã®åé¡ïŒå¯åºŠã®å€åãªã©ïŒã«å¯ŸåŠããŸãã
äŸ -- ãã€ãºã䌎ãã¯ã©ã¹ã¿ãªã³ã°
```python from sklearn.cluster import DBSCANGenerate synthetic data: 2 normal clusters and 5 outlier points
cluster1 = rng.normal(loc=[100, 1000], scale=[5, 100], size=(100, 2)) cluster2 = rng.normal(loc=[120, 2000], scale=[5, 100], size=(100, 2)) outliers = rng.uniform(low=[50, 50], high=[180, 3000], size=(5, 2)) # scattered anomalies data = np.vstack([cluster1, cluster2, outliers])
Run DBSCAN with chosen eps and MinPts
eps = 15.0 # radius for neighborhood min_pts = 5 # minimum neighbors to form a dense region db = DBSCAN(eps=eps, min_samples=min_pts).fit(data) labels = db.labels_ # cluster labels (-1 for noise)
Analyze clusters and noise
num_clusters = len(set(labels) - {-1}) num_noise = np.sum(labels == -1) print(fâDBSCAN found {num_clusters} clusters and {num_noise} noise pointsâ) print(âCluster labels for first 10 points:â, labels[:10])
ãã®ã¹ããããã§ã¯ãããŒã¿ã¹ã±ãŒã«ã«åãã㊠`eps` ãš `min_samples` ã調æŽããŸããïŒç¹åŸŽåäœã§15.0ãã¯ã©ã¹ã¿ã圢æããããã«5ãã€ã³ããå¿
èŠïŒãDBSCANã¯2ã€ã®ã¯ã©ã¹ã¿ïŒéåžžã®ãã©ãã£ãã¯ã¯ã©ã¹ã¿ïŒãèŠã€ãã5ã€ã®æ³šå
¥ãããå€ãå€ããã€ãºãšããŠãã©ã°ä»ãããå¿
èŠããããŸããããã確èªããããã«ãã¯ã©ã¹ã¿æ°ãšãã€ãºãã€ã³ãã®æ°ãåºåããŸããå®éã®èšå®ã§ã¯ãεãå埩åŠçïŒkè·é¢ã°ã©ããã¥ãŒãªã¹ãã£ãã¯ã䜿çšããŠÎµãéžæïŒããMinPtsïŒäžè¬çã«ã¯ããŒã¿ã®æ¬¡å
æ° + 1ã«èšå®ãããïŒã調æŽããŠå®å®ããã¯ã©ã¹ã¿ãªã³ã°çµæãèŠã€ããããšããããŸãããã€ãºãæç€ºçã«ã©ãã«ä»ãããèœåã¯ããããªãåæã®ããã«æœåšçãªæ»æããŒã¿ãåé¢ããã®ã«åœ¹ç«ã¡ãŸãã
</details>
### äž»æååæ (PCA)
PCAã¯ãããŒã¿ã®æå€§åæ£ãæããæ°ããçŽäº€è»žïŒäž»æåïŒãèŠã€ããããã®**次å
åæž**ææ³ã§ããç°¡åã«èšãã°ãPCAã¯ããŒã¿ãæ°ãã座æšç³»ã«å転ãããŠæåœ±ããæåã®äž»æåïŒPC1ïŒãå¯èœãªéãæå€§ã®åæ£ã説æãã2çªç®ã®äž»æåïŒPC2ïŒãPC1ã«çŽäº€ããæå€§ã®åæ£ã説æãã以äžåæ§ã§ããæ°åŠçã«ã¯ãPCAã¯ããŒã¿ã®å
±åæ£è¡åã®åºæãã¯ãã«ãèšç®ããŸãããããã®åºæãã¯ãã«ã¯äž»æåã®æ¹åã§ããã察å¿ããåºæå€ã¯ååºæãã¯ãã«ã«ãã£ãŠèª¬æããã忣ã®éã瀺ããŸããPCAã¯ãç¹åŸŽæœåºãèŠèŠåããã€ãºåæžã«é »ç¹ã«äœ¿çšãããŸãã
ãã®ææ³ã¯ãããŒã¿ã»ããã®æ¬¡å
ã«**éèŠãªç·åœ¢äŸåé¢ä¿ãçžé¢é¢ä¿**ãå«ãŸããŠããå Žåã«æçšã§ãã
PCAã¯ãããŒã¿ã®äž»æåãç¹å®ããããšã«ãã£ãŠæ©èœããŸããäž»æåã¯æå€§åæ£ã®æ¹åã§ããPCAã«é¢äžããã¹ãããã¯æ¬¡ã®ãšããã§ãïŒ
1. **æšæºå**ïŒããŒã¿ã®å¹³åãåŒããåäœåæ£ã«ã¹ã±ãŒãªã³ã°ããŠããŒã¿ãäžå¿ã«ããŸãã
2. **å
±åæ£è¡å**ïŒæšæºåãããããŒã¿ã®å
±åæ£è¡åãèšç®ããç¹åŸŽéã®é¢ä¿ãçè§£ããŸãã
3. **åºæå€åè§£**ïŒå
±åæ£è¡åã«å¯ŸããŠåºæå€åè§£ãè¡ããåºæå€ãšåºæãã¯ãã«ãååŸããŸãã
4. **äž»æåã®éžæ**ïŒåºæå€ãéé ã«äžŠã¹ãæå€§ã®åºæå€ã«å¯Ÿå¿ããäžäœKã®åºæãã¯ãã«ãéžæããŸãããããã®åºæãã¯ãã«ãæ°ããç¹åŸŽç©ºéã圢æããŸãã
5. **ããŒã¿ã®å€æ**ïŒéžæããäž»æåã䜿çšããŠãå
ã®ããŒã¿ãæ°ããç¹åŸŽç©ºéã«æåœ±ããŸãã
PCAã¯ãããŒã¿ã®èŠèŠåããã€ãºåæžãä»ã®æ©æ¢°åŠç¿ã¢ã«ãŽãªãºã ã®ååŠçã¹ããããšããŠåºã䜿çšãããŠããŸããããŒã¿ã®æ¬¡å
ãåæžããªããããã®æ¬è³ªçãªæ§é ãä¿æããã®ã«åœ¹ç«ã¡ãŸãã
#### åºæå€ãšåºæãã¯ãã«
åºæå€ã¯ã察å¿ããåºæãã¯ãã«ã«ãã£ãŠæãããã忣ã®éã瀺ãã¹ã«ã©ãŒã§ããåºæãã¯ãã«ã¯ãããŒã¿ãæãå€åããç¹åŸŽç©ºéã®æ¹åã衚ããŸãã
Aãæ£æ¹è¡åã§ãvããŒãã§ãªããã¯ãã«ã§ãããšããŸãïŒ `A * v = λ * v`
ããã§ïŒ
- A㯠[ [1, 2], [2, 1]] ã®ãããªæ£æ¹è¡åïŒäŸïŒå
±åæ£è¡åïŒ
- vã¯åºæãã¯ãã«ïŒäŸïŒ[1, 1]ïŒ
ãããšã `A * v = [ [1, 2], [2, 1]] * [1, 1] = [3, 3]` ãšãªããããã¯åºæå€Î»ãåºæãã¯ãã«vã§æãããããã®ã§ãåºæå€Î» = 3ã«ãªããŸãã
#### PCAã«ãããåºæå€ãšåºæãã¯ãã«
ãããäŸã§èª¬æããŸãããã100x100ãã¯ã»ã«ã®é¡ã®ã°ã¬ãŒã¹ã±ãŒã«ç»åãããããããããŒã¿ã»ããããããšããŸããåãã¯ã»ã«ã¯ç¹åŸŽãšèŠãªãããšãã§ãããããç»åããšã«10,000ã®ç¹åŸŽïŒãŸãã¯ç»åããšã®10,000æåã®ãã¯ãã«ïŒããããŸãããã®ããŒã¿ã»ããã®æ¬¡å
ãPCAã䜿çšããŠåæžãããå Žåãæ¬¡ã®æé ã«åŸããŸãïŒ
1. **æšæºå**ïŒããŒã¿ã»ããããåç¹åŸŽïŒãã¯ã»ã«ïŒã®å¹³åãåŒããŠããŒã¿ãäžå¿ã«ããŸãã
2. **å
±åæ£è¡å**ïŒæšæºåãããããŒã¿ã®å
±åæ£è¡åãèšç®ããç¹åŸŽïŒãã¯ã»ã«ïŒãã©ã®ããã«äžç·ã«å€åããããæããŸãã
- 2ã€ã®å€æ°ïŒãã®å Žåã¯ãã¯ã»ã«ïŒéã®å
±åæ£ã¯ãã©ã®çšåºŠäžç·ã«å€åãããã瀺ããŸããããã§ã®ã¢ã€ãã¢ã¯ãã©ã®ãã¯ã»ã«ãç·åœ¢é¢ä¿ã§äžç·ã«å¢å ãŸãã¯æžå°ããåŸåãããããèŠã€ããããšã§ãã
- äŸãã°ããã¯ã»ã«1ãšãã¯ã»ã«2ãäžç·ã«å¢å ããåŸåãããå Žåã圌ãã®éã®å
±åæ£ã¯æ£ã«ãªããŸãã
- å
±åæ£è¡åã¯10,000x10,000ã®è¡åã«ãªããåãšã³ããªã¯2ã€ã®ãã¯ã»ã«éã®å
±åæ£ã衚ããŸãã
3. **åºæå€æ¹çšåŒãè§£ã**ïŒè§£ãã¹ãåºæå€æ¹çšåŒã¯ `C * v = λ * v` ã§ãCã¯å
±åæ£è¡åãvã¯åºæãã¯ãã«ãλã¯åºæå€ã§ããããã¯æ¬¡ã®ãããªæ¹æ³ã§è§£ãããšãã§ããŸãïŒ
- **åºæå€åè§£**ïŒå
±åæ£è¡åã«å¯ŸããŠåºæå€åè§£ãè¡ããåºæå€ãšåºæãã¯ãã«ãååŸããŸãã
- **ç¹ç°å€åè§£ (SVD)**ïŒä»£ããã«ãSVDã䜿çšããŠããŒã¿è¡åãç¹ç°å€ãšãã¯ãã«ã«åè§£ããäž»æåãåŸãããšãã§ããŸãã
4. **äž»æåã®éžæ**ïŒåºæå€ãéé ã«äžŠã¹ãæå€§ã®åºæå€ã«å¯Ÿå¿ããäžäœKã®åºæãã¯ãã«ãéžæããŸãããããã®åºæãã¯ãã«ã¯ãããŒã¿ã®æå€§åæ£ã®æ¹åã衚ããŸãã
> [!TIP]
> *ãµã€ããŒã»ãã¥ãªãã£ã«ããããŠãŒã¹ã±ãŒã¹:* ã»ãã¥ãªãã£ã«ãããPCAã®äžè¬çãªäœ¿çšæ³ã¯ãç°åžžæ€åºã®ããã®ç¹åŸŽåæžã§ããããšãã°ã40以äžã®ãããã¯ãŒã¯ã¡ããªãã¯ïŒNSL-KDDã®ç¹åŸŽãªã©ïŒãæã€äŸµå
¥æ€ç¥ã·ã¹ãã ã¯ãPCAã䜿çšããŠæ°åã®æåã«åæžããèŠèŠåã®ããã«ããŒã¿ãèŠçŽããããã¯ã©ã¹ã¿ãªã³ã°ã¢ã«ãŽãªãºã ã«äŸçµŠãããã§ããŸããã¢ããªã¹ãã¯ãæåã®2ã€ã®äž»æåã®ç©ºéã§ãããã¯ãŒã¯ãã©ãã£ãã¯ãããããããŠãæ»æãéåžžã®ãã©ãã£ãã¯ããåé¢ããããã©ããã確èªããããšããããŸããPCAã¯ãåé·ãªç¹åŸŽïŒçžé¢ãããå Žåã®éä¿¡ãã€ããšåä¿¡ãã€ããªã©ïŒãæé€ããã®ã«ã圹ç«ã¡ãæ€åºã¢ã«ãŽãªãºã ãããå
ç¢ã§è¿
éã«ããŸãã
#### ä»®å®ãšå¶é
PCAã¯ã**忣ã®äž»è»žãæå³ã®ãããã®ã§ãã**ãšä»®å®ããŸããããã¯ç·åœ¢ææ³ã§ãããããããŒã¿ã®ç·åœ¢çžé¢ãæããŸããããã¯æåž«ãªãã§ãããç¹åŸŽã®å
±åæ£ã®ã¿ã䜿çšããŸããPCAã®å©ç¹ã«ã¯ããã€ãºåæžïŒå°ããªåæ£ã®æåã¯ãã°ãã°ãã€ãºã«å¯Ÿå¿ããïŒãç¹åŸŽã®éçžé¢åãå«ãŸããŸããäžçšåºŠã®é«æ¬¡å
ã«å¯ŸããŠèšç®å¹çãè¯ããä»ã®ã¢ã«ãŽãªãºã ã®ååŠçã¹ããããšããŠãã°ãã°æçšã§ãïŒæ¬¡å
ã®åªãã軜æžããããïŒã1ã€ã®å¶éã¯ãPCAãç·åœ¢é¢ä¿ã«å¶éãããŠããããšã§ããè€éãªéç·åœ¢æ§é ãæããããšã¯ã§ããŸããïŒãªãŒããšã³ã³ãŒããt-SNEãã§ãããããããŸããïŒããŸããPCAã®æåã¯å
ã®ç¹åŸŽã®èгç¹ããè§£éãé£ããå ŽåããããŸãïŒå
ã®ç¹åŸŽã®çµã¿åããã§ãïŒããµã€ããŒã»ãã¥ãªãã£ã§ã¯ã泚æãå¿
èŠã§ãïŒäœåæ£ã®ç¹åŸŽã«ããããªå€åãåŒãèµ·ããæ»æã¯ãäžäœã®äž»æåã«çŸããªããããããŸããïŒPCAã¯åæ£ãåªå
ããããããè峿·±ãããå¿
ãããåªå
ããããã§ã¯ãããŸããïŒã
<details>
<summary>äŸ -- ãããã¯ãŒã¯ããŒã¿ã®æ¬¡å
åæž
</summary>
ãããã¯ãŒã¯æ¥ç¶ãã°ã«è€æ°ã®ç¹åŸŽïŒäŸïŒæç¶æéããã€ããã«ãŠã³ãïŒããããšããŸããçžé¢ã®ããç¹åŸŽãæã€åæã®4次å
ããŒã¿ã»ãããçæããPCAã䜿çšããŠèŠèŠåãŸãã¯ãããªãåæã®ããã«2次å
ã«åæžããŸãã
```python
from sklearn.decomposition import PCA
# Create synthetic 4D data (3 clusters similar to before, but add correlated features)
# Base features: duration, bytes (as before)
base_data = np.vstack([normal1, normal2, normal3]) # 1500 points from earlier normal clusters
# Add two more features correlated with existing ones, e.g. packets = bytes/50 + noise, errors = duration/10 + noise
packets = base_data[:, 1] / 50 + rng.normal(scale=0.5, size=len(base_data))
errors = base_data[:, 0] / 10 + rng.normal(scale=0.5, size=len(base_data))
data_4d = np.column_stack([base_data[:, 0], base_data[:, 1], packets, errors])
# Apply PCA to reduce 4D data to 2D
pca = PCA(n_components=2)
data_2d = pca.fit_transform(data_4d)
print("Explained variance ratio of 2 components:", pca.explained_variance_ratio_)
print("Original shape:", data_4d.shape, "Reduced shape:", data_2d.shape)
# We can examine a few transformed points
print("First 5 data points in PCA space:\n", data_2d[:5])
ããã§ã¯ã以åã®éåžžã®ãã©ãã£ãã¯ã¯ã©ã¹ã¿ãåããåããŒã¿ãã€ã³ãã«ãã€ãæ°ãšæéã«çžé¢ãã2ã€ã®è¿œå æ©èœïŒãã±ãããšãšã©ãŒïŒãæ¡åŒµããŸãããæ¬¡ã«ãPCAã䜿çšããŠ4ã€ã®ç¹åŸŽã2ã€ã®äž»æåã«å§çž®ããŸãã説æããã忣æ¯ãå°å·ããäŸãã°ã2ã€ã®æåã«ãã£ãŠ95ïŒ ä»¥äžã®åæ£ãææãããŠããããšã瀺ããããããŸããïŒã€ãŸããæ å ±æå€±ãå°ãªãããšãæå³ããŸãïŒãåºåã¯ãããŒã¿ã®åœ¢ç¶ã(1500, 4)ãã(1500, 2)ã«æžå°ããããšã瀺ããŠããŸããPCA空éã®æåã®ããã€ãã®ãã€ã³ããäŸãšããŠç€ºãããŠããŸããå®éã«ã¯ãdata_2dãããããããŠã¯ã©ã¹ã¿ãåºå¥å¯èœãã©ãããèŠèŠçã«ç¢ºèªã§ããŸããç°åžžãååšããå ŽåãPCA空éã®äž»èŠãªã¯ã©ã¹ã¿ããé¢ããç¹ãšããŠãããèŠãããšãã§ãããããããŸããããããã£ãŠãPCAã¯è€éãªããŒã¿ã人éã®è§£éãä»ã®ã¢ã«ãŽãªãºã ãžã®å ¥åãšããŠç®¡çå¯èœãªåœ¢ã«ç²Ÿè£œããã®ã«åœ¹ç«ã¡ãŸãã
ã¬ãŠã¹æ··åã¢ãã« (GMM)
ã¬ãŠã¹æ··åã¢ãã«ã¯ãããŒã¿ãæªç¥ã®ãã©ã¡ãŒã¿ãæã€ããã€ãã®ã¬ãŠã¹ïŒæ£èŠïŒååžã®æ··åããçæããããšä»®å®ããŸããæ¬è³ªçã«ã¯ãããã¯ç¢ºççã¯ã©ã¹ã¿ãªã³ã°ã¢ãã«ã§ãïŒåãã€ã³ããKåã®ã¬ãŠã¹æåã®1ã€ã«æãããå²ãåœãŠãããšããŸããåã¬ãŠã¹æåkã¯ãå¹³åãã¯ãã«(ÎŒ_k)ãå ±åæ£è¡å(Σ_k)ãããã³ãã®ã¯ã©ã¹ã¿ã®æ®å床ãè¡šãæ··åéã¿(Ï_k)ãæã£ãŠããŸããK-MeansããããŒããå²ãåœãŠãè¡ãã®ã«å¯ŸããGMMã¯åãã€ã³ããåã¯ã©ã¹ã¿ã«å±ãã確çãäžããŸãã
GMMã®ãã£ããã£ã³ã°ã¯éåžžãæåŸ 倿倧åïŒEMïŒã¢ã«ãŽãªãºã ãä»ããŠè¡ãããŸãïŒ
-
åæåïŒå¹³åãå ±åæ£ãããã³æ··åä¿æ°ã®åææšå®å€ããå§ããïŒãŸãã¯K-Meansã®çµæãåºçºç¹ãšããŠäœ¿çšããïŒã
-
Eã¹ãããïŒæåŸ å€ïŒïŒçŸåšã®ãã©ã¡ãŒã¿ã«åºã¥ããŠãåãã€ã³ãã«å¯Ÿããåã¯ã©ã¹ã¿ã®è²¬ä»»ãèšç®ããŸãïŒæ¬è³ªçã«ã¯
r_nk = P(z_k | x_n)ã§ãããããã§z_kã¯ãã€ã³ãx_nã®ã¯ã©ã¹ã¿ã¡ã³ããŒã·ãããç€ºãæœåšå€æ°ã§ããããã¯ãã€ãºã®å®çã䜿çšããŠè¡ãããçŸåšã®ãã©ã¡ãŒã¿ã«åºã¥ããŠåãã€ã³ããåã¯ã©ã¹ã¿ã«å±ããåŸæ¹ç¢ºçãèšç®ããŸããè²¬ä»»ã¯æ¬¡ã®ããã«èšç®ãããŸãïŒ
r_{nk} = \frac{\pi_k \mathcal{N}(x_n | \mu_k, \Sigma_k)}{\sum_{j=1}^{K} \pi_j \mathcal{N}(x_n | \mu_j, \Sigma_j)}
ããã§ïŒ
-
( \pi_k ) ã¯ã¯ã©ã¹ã¿kã®æ··åä¿æ°ïŒã¯ã©ã¹ã¿kã®äºå確çïŒã§ãã
-
( \mathcal{N}(x_n | \mu_k, \Sigma_k) ) ã¯ãå¹³å( \mu_k )ãšå ±åæ£( \Sigma_k )ãäžãããããã€ã³ã( x_n )ã®ã¬ãŠã¹ç¢ºçå¯åºŠé¢æ°ã§ãã
-
Mã¹ãããïŒæå€§åïŒïŒEã¹ãããã§èšç®ããã責任ã䜿çšããŠãã©ã¡ãŒã¿ãæŽæ°ããŸãïŒ
-
åå¹³åÎŒ_kããã€ã³ãã®éã¿ä»ãå¹³åãšããŠæŽæ°ããŸããéã¿ã¯è²¬ä»»ã§ãã
-
åå ±åæ£Î£_kãã¯ã©ã¹ã¿kã«å²ãåœãŠããããã€ã³ãã®éã¿ä»ãå ±åæ£ãšããŠæŽæ°ããŸãã
-
æ··åä¿æ°Ï_kãã¯ã©ã¹ã¿kã®å¹³å責任ãšããŠæŽæ°ããŸãã
-
Eããã³Mã¹ããããç¹°ãè¿ã åæãããŸã§ïŒãã©ã¡ãŒã¿ãå®å®ããããå°€åºŠã®æ¹åãéŸå€ãäžåããŸã§ïŒã
çµæã¯ãå šäœã®ããŒã¿ååžãéå£çã«ã¢ãã«åããã¬ãŠã¹ååžã®ã»ããã§ãããã£ããã£ã³ã°ãããGMMã䜿çšããŠãåãã€ã³ããæãé«ã確çã®ã¬ãŠã¹ã«å²ãåœãŠãããšã§ã¯ã©ã¹ã¿ãªã³ã°ããããäžç¢ºå®æ§ã®ããã«ç¢ºçãä¿æããããšãã§ããŸãããŸããæ°ãããã€ã³ãã®å°€åºŠãè©äŸ¡ããŠãããããã¢ãã«ã«é©åãããã©ããã確èªããããšãã§ããŸãïŒç°åžžæ€åºã«åœ¹ç«ã¡ãŸãïŒã
Tip
ãµã€ããŒã»ãã¥ãªãã£ã«ããããŠãŒã¹ã±ãŒã¹ïŒ GMMã¯ãæ£åžžããŒã¿ã®ååžãã¢ãã«åããããšã«ãã£ãŠç°åžžæ€åºã«äœ¿çšã§ããŸãïŒåŠç¿ããæ··åã®äžã§éåžžã«äœã確çãæã€ãã€ã³ãã¯ç°åžžãšããŠãã©ã°ä»ããããŸããããšãã°ãæ£åœãªãããã¯ãŒã¯ãã©ãã£ãã¯ã®ç¹åŸŽã«åºã¥ããŠGMMããã¬ãŒãã³ã°ããããšãã§ããŸããåŠç¿ããã¯ã©ã¹ã¿ã«äŒŒãŠããªãæ»ææ¥ç¶ã¯äœã尀床ãæã€ã§ããããGMMã¯ãã¯ã©ã¹ã¿ãç°ãªã圢ç¶ãæã€å¯èœæ§ãããæŽ»åãã¯ã©ã¹ã¿ãªã³ã°ããããã«ã䜿çšãããŸããããšãã°ãè¡åãããã¡ã€ã«ã«ãã£ãŠãŠãŒã¶ãŒãã°ã«ãŒãåããå Žåãåãããã¡ã€ã«ã®ç¹åŸŽã¯ã¬ãŠã¹çã§ããå¯èœæ§ããããŸãããããããç¬èªã®åæ£æ§é ãæã£ãŠããŸããå¥ã®ã·ããªãªãšããŠããã£ãã·ã³ã°æ€åºã§ã¯ãæ£åœãªã¡ãŒã«ã®ç¹åŸŽã1ã€ã®ã¬ãŠã¹ã¯ã©ã¹ã¿ã圢æããæ¢ç¥ã®ãã£ãã·ã³ã°ãå¥ã®ãã®ã圢æããæ°ãããã£ãã·ã³ã°ãã£ã³ããŒã³ãæ¢åã®æ··åã«å¯ŸããŠå¥ã®ã¬ãŠã¹ãŸãã¯äœã尀床ã®ãã€ã³ããšããŠçŸããå¯èœæ§ããããŸãã
ä»®å®ãšå¶é
GMMã¯ãå ±åæ£ãåãå ¥ããK-Meansã®äžè¬åã§ãããã¯ã©ã¹ã¿ã¯æ¥åäœã§ããå¯èœæ§ããããŸãïŒç圢ã ãã§ã¯ãããŸããïŒãå ±åæ£ãå®å šã§ããã°ãç°ãªããµã€ãºãšåœ¢ç¶ã®ã¯ã©ã¹ã¿ãåŠçã§ããŸãããœããã¯ã©ã¹ã¿ãªã³ã°ã¯ãã¯ã©ã¹ã¿å¢çããããŸããªå Žåã«å©ç¹ããããŸããããšãã°ããµã€ããŒã»ãã¥ãªãã£ã§ã¯ãã€ãã³ããè€æ°ã®æ»æã¿ã€ãã®ç¹æ§ãæã€å¯èœæ§ããããŸããGMMã¯ç¢ºçã§ãã®äžç¢ºå®æ§ãåæ ã§ããŸããGMMã¯ãŸããããŒã¿ã®ç¢ºçå¯åºŠæšå®ãæäŸããå€ãå€ïŒãã¹ãŠã®æ··åæåã®äžã§äœã尀床ãæã€ãã€ã³ãïŒãæ€åºããã®ã«åœ¹ç«ã¡ãŸãã
æ¬ ç¹ãšããŠãGMMã¯æåã®æ°Kãæå®ããå¿ èŠããããŸãïŒãã ããBIC/AICã®ãããªåºæºã䜿çšããŠéžæã§ããŸãïŒãEMã¯æã é ãåæãããã屿æé©ã«åæãããããããšããããããåæåãéèŠã§ãïŒéåžžãEMãè€æ°åå®è¡ããŸãïŒãããŒã¿ãå®éã«ã¬ãŠã¹ã®æ··åã«åŸããªãå Žåãã¢ãã«ã¯é©åãæªãå¯èœæ§ããããŸãããŸãã1ã€ã®ã¬ãŠã¹ãå€ãå€ãã«ããŒããããã«çž®å°ãããªã¹ã¯ããããŸãïŒãã ããæ£ååãæå°å ±åæ£å¢çãããã軜æžã§ããŸãïŒã
äŸ -- ãœããã¯ã©ã¹ã¿ãªã³ã°ãšç°åžžã¹ã³ã¢
```python from sklearn.mixture import GaussianMixtureFit a GMM with 3 components to the normal traffic data
gmm = GaussianMixture(n_components=3, covariance_type=âfullâ, random_state=0) gmm.fit(base_data) # using the 1500 normal data points from PCA example
Print the learned Gaussian parameters
print(âGMM means:\nâ, gmm.means_) print(âGMM covariance matrices:\nâ, gmm.covariances_)
Take a sample attack-like point and evaluate it
sample_attack = np.array([[200, 800]]) # an outlier similar to earlier attack cluster probs = gmm.predict_proba(sample_attack) log_likelihood = gmm.score_samples(sample_attack) print(âCluster membership probabilities for sample attack:â, probs) print(âLog-likelihood of sample attack under GMM:â, log_likelihood)
ãã®ã³ãŒãã§ã¯ãæ£åžžãªãã©ãã£ãã¯ã«å¯ŸããŠ3ã€ã®ã¬ãŠã¹ååžãæã€GMMããã¬ãŒãã³ã°ããŸãïŒæ£åœãªãã©ãã£ãã¯ã®3ã€ã®ãããã¡ã€ã«ãç¥ã£ãŠãããšä»®å®ããŸãïŒãå°å·ãããå¹³åãšå
±åæ£ã¯ãããã®ã¯ã©ã¹ã¿ã説æããŸãïŒäŸãã°ã1ã€ã®å¹³åã¯[50,500]ã®åšèŸºã§ã1ã€ã®ã¯ã©ã¹ã¿ã®äžå¿ã«å¯Ÿå¿ãããããããŸããïŒã次ã«ãçãããæ¥ç¶[duration=200, bytes=800]ããã¹ãããŸããpredict_probaã¯ããã®ãã€ã³ãã3ã€ã®ã¯ã©ã¹ã¿ã®ããããã«å±ãã確çã瀺ããŸã â [200,800]ãæ£åžžãªã¯ã©ã¹ã¿ããé ãé¢ããŠããããããããã®ç¢ºçã¯éåžžã«äœãããéåžžã«åã£ãŠãããšäºæ³ãããŸããå
šäœã®score_samplesïŒå¯Ÿæ°å°€åºŠïŒãå°å·ãããŸãïŒéåžžã«äœãå€ã¯ããã®ãã€ã³ããã¢ãã«ã«ããŸãé©åããŠããªãããšã瀺ããç°åžžãšããŠãã©ã°ãç«ãŠãŸããå®éã«ã¯ã察æ°å°€åºŠïŒãŸãã¯æå€§ç¢ºçïŒã«ãããå€ãèšå®ããŠããã€ã³ããæªæã®ãããã®ãšèŠãªãããã«ã¯ååã«ããããããªããã©ããã倿ã§ããŸãããããã£ãŠãGMMã¯ç°åžžæ€åºãè¡ãããã®ååçãªæ¹æ³ãæäŸããäžç¢ºå®æ§ãèªèãããœããã¯ã©ã¹ã¿ãçæããŸãã
### Isolation Forest
**Isolation Forest**ã¯ããã€ã³ããã©ã³ãã ã«å€ç«ããããšããã¢ã€ãã¢ã«åºã¥ããã¢ã³ãµã³ãã«ç°åžžæ€åºã¢ã«ãŽãªãºã ã§ããååã¯ãç°åžžã¯å°ãªãç°ãªããããæ£åžžãªãã€ã³ããããå€ç«ããããããšããããšã§ããIsolation Forestã¯ãå€ãã®ãã€ããªå€ç«æšïŒã©ã³ãã æ±ºå®æšïŒãæ§ç¯ããããŒã¿ãã©ã³ãã ã«åå²ããŸããæšã®åããŒãã§ã¯ãã©ã³ãã ãªç¹åŸŽãéžæããããã®ç¹åŸŽã®æå°å€ãšæå€§å€ã®éã§ã©ã³ãã ãªåå²å€ãéžã°ããŸãããã®åå²ã¯ããŒã¿ã2ã€ã®æã«åããŸããæšã¯ãåãã€ã³ããèªåã®èã«å€ç«ããããæå€§ã®æšã®é«ãã«éãããŸã§æé·ããŸãã
ç°åžžæ€åºã¯ããããã®ã©ã³ãã ãªæšã®åãã€ã³ãã®ãã¹ã®é·ãã芳å¯ããããšã«ãã£ãŠè¡ãããŸã â ãã€ã³ããå€ç«ãããããã«å¿
èŠãªåå²ã®æ°ã§ããçŽæçã«ãç°åžžïŒå€ãå€ïŒã¯ãã©ã³ãã ãªåå²ãå€ãå€ïŒãŸã°ããªé åã«ããïŒãåé¢ããå¯èœæ§ãé«ããããããæ©ãå€ç«ããåŸåããããŸããIsolation Forestã¯ããã¹ãŠã®æšã®å¹³åãã¹ã®é·ãããç°åžžã¹ã³ã¢ãèšç®ããŸãïŒå¹³åãã¹ãçãã»ã© â ããç°åžžã§ããã¹ã³ã¢ã¯éåžž[0,1]ã«æ£èŠåããã1ã¯éåžžã«ç°åžžã§ããããšãæå³ããŸãã
> [!TIP]
> *ãµã€ããŒã»ãã¥ãªãã£ã«ããããŠãŒã¹ã±ãŒã¹:* Isolation Forestã¯ã䟵å
¥æ€ç¥ãè©æ¬ºæ€ç¥ã«æåè£ã«äœ¿çšãããŠããŸããäŸãã°ãäž»ã«æ£åžžãªåäœãå«ããããã¯ãŒã¯ãã©ãã£ãã¯ãã°ã§Isolation Forestããã¬ãŒãã³ã°ããŸãïŒãã©ã¬ã¹ãã¯ãå¥åŠãªãã©ãã£ãã¯ïŒèããããšã®ãªãããŒãã䜿çšããIPãç°åžžãªãã±ãããµã€ãºãã¿ãŒã³ãªã©ïŒã«å¯ŸããŠçããã¹ãçæããæ€æ»ã®ããã«ãã©ã°ãç«ãŠãŸããã©ãã«ä»ãã®æ»æãå¿
èŠãšããªããããæªç¥ã®æ»æã¿ã€ããæ€åºããã®ã«é©ããŠããŸãããŸãããŠãŒã¶ãŒãã°ã€ã³ããŒã¿ã«å±éããŠã¢ã«ãŠã³ãä¹ã£åããæ€åºããããšãã§ããŸãïŒç°åžžãªãã°ã€ã³æéãå Žæãè¿
éã«å€ç«ããŸãïŒããããŠãŒã¹ã±ãŒã¹ã§ã¯ãIsolation Forestãã·ã¹ãã ã¡ããªã¯ã¹ãç£èŠããã¡ããªã¯ã¹ã®çµã¿åããïŒCPUããããã¯ãŒã¯ããã¡ã€ã«å€æŽïŒãæŽå²çãã¿ãŒã³ãšéåžžã«ç°ãªãå ŽåïŒçãå€ç«ãã¹ïŒã«ã¢ã©ãŒããçæããããšã§ãäŒæ¥ãä¿è·ãããããããŸããã
#### ä»®å®ãšå¶é
**å©ç¹**: Isolation Forestã¯ååžã®ä»®å®ãå¿
èŠãšãããå€ç«ãçŽæ¥ã¿ãŒã²ããã«ããŸãã髿¬¡å
ããŒã¿ãå€§èŠæš¡ããŒã¿ã»ããã«å¯ŸããŠå¹ççã§ãïŒãã©ã¬ã¹ããæ§ç¯ããããã®ç·åœ¢è€é床$O(n\log n)$ïŒã®ã§ãåæšã¯ç¹åŸŽã®ãµãã»ãããšåå²ã®ã¿ã§ãã€ã³ããå€ç«ãããŸããæ°å€ç¹åŸŽãããŸãåŠçããåŸåãããã$O(n^2)$ã®å¯èœæ§ãããè·é¢ããŒã¹ã®æ¹æ³ãããéããªãããšããããŸãããŸããèªåçã«ç°åžžã¹ã³ã¢ãæäŸãããããã¢ã©ãŒãã®ãããå€ãèšå®ããããšãã§ããŸãïŒãŸãã¯ãæåŸ
ãããç°åžžå²åã«åºã¥ããŠã«ãããªããèªåçã«æ±ºå®ããããã«æ±æãã©ã¡ãŒã¿ã䜿çšã§ããŸãïŒã
**å¶é**: ã©ã³ãã ãªæ§è³ªã®ãããçµæã¯å®è¡éã§ãããã«ç°ãªãå ŽåããããŸãïŒãã ããååãªæ°ã®æšãããã°ããã¯å°ããã§ãïŒãããŒã¿ã«å€ãã®ç¡é¢ä¿ãªç¹åŸŽãããå Žåããç°åžžãã©ã®ç¹åŸŽã§ã匷ãåºå¥ãããªãå Žåãå€ç«ã广çã§ãªãå¯èœæ§ããããŸãïŒã©ã³ãã ãªåå²ãæ£åžžãªãã€ã³ããå¶ç¶ã«å€ç«ãããå¯èœæ§ããããŸã â ãã ããå€ãã®æšãå¹³ååããããšã§ããã軜æžããŸãïŒããŸããIsolation Forestã¯äžè¬çã«ç°åžžãå°æ°æŽŸã§ãããšä»®å®ããŸãïŒããã¯éåžžããµã€ããŒã»ãã¥ãªãã£ã®ã·ããªãªã§ã¯çå®ã§ãïŒã
<details>
<summary>äŸ -- ãããã¯ãŒã¯ãã°ã«ãããå€ãå€ã®æ€åº
</summary>
以åã®ãã¹ãããŒã¿ã»ããïŒæ£åžžãªãã€ã³ããšããã€ãã®æ»æãã€ã³ããå«ãïŒã䜿çšããIsolation Forestãå®è¡ããŠæ»æãåé¢ã§ãããã©ããã確èªããŸãããã¢ã³ã¹ãã¬ãŒã·ã§ã³ã®ããã«ãããŒã¿ã®çŽ15%ãç°åžžã§ãããšäºæ³ãããšä»®å®ããŸãã
```python
from sklearn.ensemble import IsolationForest
# Combine normal and attack test data from autoencoder example
X_test_if = test_data # (120 x 2 array with 100 normal and 20 attack points)
# Train Isolation Forest (unsupervised) on the test set itself for demo (in practice train on known normal)
iso_forest = IsolationForest(n_estimators=100, contamination=0.15, random_state=0)
iso_forest.fit(X_test_if)
# Predict anomalies (-1 for anomaly, 1 for normal)
preds = iso_forest.predict(X_test_if)
anomaly_scores = iso_forest.decision_function(X_test_if) # the higher, the more normal
print("Isolation Forest predicted labels (first 20):", preds[:20])
print("Number of anomalies detected:", np.sum(preds == -1))
print("Example anomaly scores (lower means more anomalous):", anomaly_scores[:5])
ãã®ã³ãŒãã§ã¯ãIsolationForestã100æ¬ã®æšã§ã€ã³ã¹ã¿ã³ã¹åããcontamination=0.15ãèšå®ããŸãïŒããã¯çŽ15%ã®ç°åžžãæåŸ
ããããšãæå³ããŸã; ã¢ãã«ã¯ã¹ã³ã¢ã®éŸå€ãèšå®ããçŽ15%ã®ãã€ã³ãããã©ã°ä»ããããããã«ããŸãïŒãX_test_ifã«ãã£ãããããŸãããããã¯éåžžã®ãã€ã³ããšæ»æãã€ã³ãã®æ··åãå«ãã§ããŸãïŒæ³šæ: éåžžã¯ãã¬ãŒãã³ã°ããŒã¿ã«ãã£ãããããæ°ããããŒã¿ã«å¯ŸããŠäºæž¬ãè¡ããŸãããããã§ã¯çµæãçŽæ¥èгå¯ããããã«åãã»ããã§ãã£ãããšäºæž¬ãè¡ããŸãïŒã
åºåã¯æåã®20ãã€ã³ãã®äºæž¬ã©ãã«ã瀺ããŠããŸãïŒ-1ã¯ç°åžžã瀺ããŸãïŒããŸããåèšã§æ€åºãããç°åžžã®æ°ãšããã€ãã®äŸã®ç°åžžã¹ã³ã¢ãå°å·ããŸãã120ãã€ã³ãã®ãã¡çŽ18ãã€ã³ãã-1ãšã©ãã«ä»ããããããšãæåŸ ããŸãïŒæ±æã15%ã ã£ãããïŒãããç§ãã¡ã®20ã®æ»æãµã³ãã«ãæ¬åœã«æãå€ãããã®ã§ããã°ããã®ã»ãšãã©ã¯ãããã®-1äºæž¬ã«çŸããã¯ãã§ããç°åžžã¹ã³ã¢ïŒIsolation Forestã®æ±ºå®é¢æ°ïŒã¯ãéåžžã®ãã€ã³ãã§ã¯é«ããç°åžžã§ã¯äœãïŒããè² ã®å€ïŒãªããŸã â åé¢ã確èªããããã«ããã€ãã®å€ãå°å·ããŸããå®éã«ã¯ãããŒã¿ãã¹ã³ã¢ã§ãœãŒãããŠãããã®å€ãå€ã確èªãã調æ»ããããšãèããããŸãããããã£ãŠãIsolation Forestã¯ãå€§èŠæš¡ãªã©ãã«ã®ãªãã»ãã¥ãªãã£ããŒã¿ãå¹ççã«ãµããåãã人éã®åæããããªãèªåçãªç²Ÿæ»ã®ããã«æãäžèŠåãªã€ã³ã¹ã¿ã³ã¹ãéžã³åºãæ¹æ³ãæäŸããŸãã
t-SNE (t-ååžç¢ºççè¿ååã蟌ã¿)
t-SNEã¯ã髿¬¡å ããŒã¿ã2次å ãŸãã¯3次å ã§èŠèŠåããããã«ç¹å¥ã«èšèšãããéç·åœ¢æ¬¡å åæžæè¡ã§ããããŒã¿ãã€ã³ãéã®é¡äŒŒæ§ãçµå確çååžã«å€æããäœæ¬¡å æåœ±ã«ããã屿çãªè¿åã®æ§é ãä¿æããããšããŸããç°¡åã«èšãã°ãt-SNEã¯ïŒäŸãã°ïŒ2Dã«ãã€ã³ããé 眮ããé¡äŒŒãããã€ã³ãïŒå ã®ç©ºéã§ïŒãè¿ãã«ãç°ãªããã€ã³ããé«ã確çã§é ãã«é 眮ããŸãã
ã¢ã«ãŽãªãºã ã«ã¯2ã€ã®äž»èŠãªã¹ããŒãžããããŸãïŒ
- 髿¬¡å 空éã§ã®ãã¢ã¯ã€ãºèŠªåæ§ã®èšç®: åãã€ã³ãã®ãã¢ã«ã€ããŠãt-SNEã¯ãã®ãã¢ãè¿é£ãšããŠéžã¶ç¢ºçãèšç®ããŸãïŒããã¯åãã€ã³ãã«ã¬ãŠã¹ååžãäžå¿ã«ããŠè·é¢ã枬å®ããããšã§è¡ãããŸã â ããŒãã¬ãã·ãã£ãã©ã¡ãŒã¿ã¯èæ ®ãããè¿é£ã®å®å¹æ°ã«åœ±é¿ãäžããŸãïŒã
- äœæ¬¡å ïŒäŸãã°2DïŒç©ºéã§ã®ãã¢ã¯ã€ãºèŠªåæ§ã®èšç®: æåã«ããã€ã³ãã¯2Dã«ã©ã³ãã ã«é 眮ãããŸããt-SNEã¯ãã®ãããã®è·é¢ã«å¯ŸããŠé¡äŒŒã®ç¢ºçãå®çŸ©ããŸãïŒããé ãã®ãã€ã³ãã«èªç±ãäžããããã«ãã¬ãŠã¹ãããéãå°Ÿãæã€ã¹ãã¥ãŒãã³ãtååžã«ãŒãã«ã䜿çšããŸãïŒã
- åŸé éäžæ³: t-SNEã¯æ¬¡ã«ã2Dã®ãã€ã³ããå埩çã«ç§»åããã髿¬¡å ã®èŠªåæ§ååžãšäœæ¬¡å ã®ãããšã®éã®ã¯ã«ããã¯âã©ã€ãã©ãŒïŒKLïŒãã€ããŒãžã§ã³ã¹ãæå°åããŸããããã«ããã2Dã®é 眮ã髿¬¡å ã®æ§é ãã§ããã ãåæ ããããã«ãªããŸã â å ã®ç©ºéã§è¿ãã£ããã€ã³ãã¯äºãã«åŒãå¯ããããé ãã®ãã€ã³ãã¯åçºãããã©ã³ã¹ãèŠã€ãããŸã§ç¶ããŸãã
ãã®çµæãããŒã¿ã®ã¯ã©ã¹ã¿ãŒãæããã«ãªãèŠèŠçã«æå³ã®ããæ£åžå³ãåŸãããŸãã
Tip
ãµã€ããŒã»ãã¥ãªãã£ã«ããããŠãŒã¹ã±ãŒã¹: t-SNEã¯ãã°ãã°äººéã®åæã®ããã«é«æ¬¡å ã®ã»ãã¥ãªãã£ããŒã¿ãèŠèŠåããããã«äœ¿çšãããŸããäŸãã°ãã»ãã¥ãªãã£ãªãã¬ãŒã·ã§ã³ã»ã³ã¿ãŒã§ã¯ãã¢ããªã¹ããããŒãçªå·ãé »åºŠããã€ãæ°ãªã©ã®æ°åã®ç¹åŸŽãæã€ã€ãã³ãããŒã¿ã»ãããåããt-SNEã䜿çšããŠ2Dãããããçæããããšãã§ããŸãããã®ããããã§ã¯ãæ»æãç¬èªã®ã¯ã©ã¹ã¿ãŒã圢æããããéåžžã®ããŒã¿ããåé¢ãããããããšããããèå¥ã容æã«ãªããŸãããã«ãŠã§ã¢ãã¡ããªãŒã®ã°ã«ãŒãã³ã°ããç°ãªãæ»æã¿ã€ããæç¢ºã«ã¯ã©ã¹ã¿ãŒåããããããã¯ãŒã¯äŸµå ¥ããŒã¿ã«é©çšããããããªã調æ»ãå°ãããšãã§ããŸããåºæ¬çã«ãt-SNEã¯ãµã€ããŒããŒã¿ã®æ§é ãèŠèŠåããæ¹æ³ãæäŸããŸãã
ä»®å®ãšå¶é
t-SNEã¯ãã¿ãŒã³ã®èŠèŠççºèŠã«åªããŠããŸããã¯ã©ã¹ã¿ãŒããµãã¯ã©ã¹ã¿ãŒãä»ã®ç·åœ¢ææ³ïŒPCAãªã©ïŒã§ã¯èŠéããããããããªãå€ãå€ãæããã«ããããšãã§ããŸãããã«ãŠã§ã¢ã®è¡åãããã¡ã€ã«ããããã¯ãŒã¯ãã©ãã£ãã¯ãã¿ãŒã³ã®ãããªè€éãªããŒã¿ãèŠèŠåããããã«ãµã€ããŒã»ãã¥ãªãã£ç ç©¶ã§äœ¿çšãããŠããŸããã屿çãªæ§é ãä¿æãããããèªç¶ãªã°ã«ãŒãã³ã°ã瀺ãã®ã«é©ããŠããŸãã
ããããt-SNEã¯èšç®è² è·ãéãïŒçŽ$O(n^2)$ïŒãéåžžã«å€§ããªããŒã¿ã»ããã§ã¯ãµã³ããªã³ã°ãå¿ èŠã«ãªãå ŽåããããŸãããŸããåºåã«åœ±é¿ãäžãããã€ããŒãã©ã¡ãŒã¿ïŒããŒãã¬ãã·ãã£ãåŠç¿çãååŸ©åæ°ïŒããããŸã â äŸãã°ãç°ãªãããŒãã¬ãã·ãã£å€ã¯ç°ãªãã¹ã±ãŒã«ã§ã¯ã©ã¹ã¿ãŒãæããã«ãããããããŸãããt-SNEããããã¯ææèª€è§£ãããããšããããŸã â ãããå ã®è·é¢ã¯ã°ããŒãã«ã«çŽæ¥çãªæå³ãæããïŒå±æçãªè¿é£ã«çŠç¹ãåœãŠãŠãããæã«ã¯ã¯ã©ã¹ã¿ãŒã人工çã«åé¢ãããŠèŠããããšããããŸãïŒããŸããt-SNEã¯äž»ã«èŠèŠåã®ããã®ãã®ã§ãããæ°ããããŒã¿ãã€ã³ããåèšç®ãªãã«æåœ±ããããã®ç°¡åãªæ¹æ³ãæäŸãããäºæž¬ã¢ããªã³ã°ã®ååŠçãšããŠäœ¿çšããããšã¯æå³ãããŠããŸããïŒUMAPã¯ãããã®åé¡ã®ããã€ããããéãé床ã§è§£æ±ºããä»£æ¿ææ®µã§ãïŒã
äŸ -- ãããã¯ãŒã¯æ¥ç¶ã®èŠèŠå
t-SNEã䜿çšããŠãã«ããã£ãŒãã£ãŒããŒã¿ã»ããã2Dã«åæžããŸããäŸãšããŠã以åã®4DããŒã¿ïŒéåžžã®ãã©ãã£ãã¯ã®3ã€ã®èªç¶ãªã¯ã©ã¹ã¿ãŒããã£ããã®ïŒã«ããã€ãã®ç°åžžãã€ã³ãã远å ããŸãããã®åŸãt-SNEãå®è¡ãïŒæŠå¿µçã«ïŒçµæãèŠèŠåããŸãã
# 1 âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
# Create synthetic 4-D dataset
# ⢠Three clusters of ânormalâ traffic (duration, bytes)
# ⢠Two correlated features: packets & errors
# ⢠Five outlier points to simulate suspicious traffic
# ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
import numpy as np
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE
from sklearn.preprocessing import StandardScaler
rng = np.random.RandomState(42)
# Base (duration, bytes) clusters
normal1 = rng.normal(loc=[50, 500], scale=[10, 100], size=(500, 2))
normal2 = rng.normal(loc=[60, 1500], scale=[8, 200], size=(500, 2))
normal3 = rng.normal(loc=[70, 3000], scale=[5, 300], size=(500, 2))
base_data = np.vstack([normal1, normal2, normal3]) # (1500, 2)
# Correlated features
packets = base_data[:, 1] / 50 + rng.normal(scale=0.5, size=len(base_data))
errors = base_data[:, 0] / 10 + rng.normal(scale=0.5, size=len(base_data))
data_4d = np.column_stack([base_data, packets, errors]) # (1500, 4)
# Outlier / attack points
outliers_4d = np.column_stack([
rng.normal(250, 1, size=5), # extreme duration
rng.normal(1000, 1, size=5), # moderate bytes
rng.normal(5, 1, size=5), # very low packets
rng.normal(25, 1, size=5) # high errors
])
data_viz = np.vstack([data_4d, outliers_4d]) # (1505, 4)
# 2 âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
# Standardize features (recommended for t-SNE)
# ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data_viz)
# 3 âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
# Run t-SNE to project 4-D â 2-D
# ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
tsne = TSNE(
n_components=2,
perplexity=30,
learning_rate='auto',
init='pca',
random_state=0
)
data_2d = tsne.fit_transform(data_scaled)
print("t-SNE output shape:", data_2d.shape) # (1505, 2)
# 4 âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
# Visualize: normal traffic vs. outliers
# ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
plt.figure(figsize=(8, 6))
plt.scatter(
data_2d[:-5, 0], data_2d[:-5, 1],
label="Normal traffic",
alpha=0.6,
s=10
)
plt.scatter(
data_2d[-5:, 0], data_2d[-5:, 1],
label="Outliers / attacks",
alpha=0.9,
s=40,
marker="X",
edgecolor='k'
)
plt.title("t-SNE Projection of Synthetic Network Traffic")
plt.xlabel("t-SNE component 1")
plt.ylabel("t-SNE component 2")
plt.legend()
plt.tight_layout()
plt.show()
ããã§ã¯ã以åã®4DããŒãã«ããŒã¿ã»ããã«å°æ°ã®æ¥µç«¯ãªå€ãå€ãçµã¿åãããŸããïŒå€ãå€ã¯1ã€ã®ç¹åŸŽïŒãdurationãïŒãéåžžã«é«ãèšå®ãããŠãããå¥åŠãªãã¿ãŒã³ãã·ãã¥ã¬ãŒãããŠããŸãïŒãå žåçãªããŒãã¬ãã·ãã£30ã§t-SNEãå®è¡ããŸããåºåããŒã¿data_2dã®åœ¢ç¶ã¯(1505, 2)ã§ãããã®ããã¹ãã§ã¯å®éã«ããããã¯ããŸããããããè¡ã£ãå Žåã3ã€ã®ããŒãã«ã¯ã©ã¹ã¿ã«å¯Ÿå¿ãã3ã€ã®å¯éããã¯ã©ã¹ã¿ãšããããã®ã¯ã©ã¹ã¿ããé ãé¢ããå€ç«ããç¹ãšããŠçŸãã5ã€ã®å€ãå€ãèŠããããšäºæ³ãããŸããã€ã³ã¿ã©ã¯ãã£ããªã¯ãŒã¯ãããŒã§ã¯ããã€ã³ãããã®ã©ãã«ïŒããŒãã«ãŸãã¯ã©ã®ã¯ã©ã¹ã¿ã察ç°åžžïŒã§è²åãããŠãã®æ§é ã確èªã§ããŸããã©ãã«ããªããŠããã¢ããªã¹ãã¯2Dããããäžã®ç©ºã®ã¹ããŒã¹ã«åº§ã£ãŠãã5ã€ã®ãã€ã³ãã«æ°ã¥ãããããããã©ã°ä»ããããããããŸãããããã¯ãt-SNEããµã€ããŒã»ãã¥ãªãã£ããŒã¿ã«ãããèŠèŠçç°åžžæ€åºãšã¯ã©ã¹ã¿æ€æ»ã®åŒ·åãªæ¯æŽãšãªããäžèšã®èªååã¢ã«ãŽãªãºã ãè£å®ããæ¹æ³ã瀺ããŠããŸãã
HDBSCANïŒãã€ãºã䌎ãã¢ããªã±ãŒã·ã§ã³ã®éå±€çå¯åºŠããŒã¹ç©ºéã¯ã©ã¹ã¿ãªã³ã°ïŒ
HDBSCANã¯ãåäžã®ã°ããŒãã«epså€ãéžæããå¿
èŠãæé€ããå¯åºŠæ¥ç¶ã³ã³ããŒãã³ãã®éå±€ãæ§ç¯ããŠãããããåçž®ããããšã«ãã£ãŠãç°ãªãå¯åºŠã®ã¯ã©ã¹ã¿ãå埩ã§ããDBSCANã®æ¡åŒµã§ããããã©DBSCANãšæ¯èŒããŠãéåžžã¯
- äžéšã®ã¯ã©ã¹ã¿ãå¯ã§ä»ã®ã¯ã©ã¹ã¿ããŸã°ããªå Žåã«ãããçŽæçãªã¯ã©ã¹ã¿ãæœåºããŸãã
- å®éã®ãã€ããŒãã©ã¡ãŒã¿ã¯1ã€ïŒ
min_cluster_sizeïŒã®ã¿ã§ã劥åœãªããã©ã«ãããããŸãã - åãã€ã³ãã«ã¯ã©ã¹ã¿ã¡ã³ããŒã·ããã®ç¢ºçãšå€ãå€ã¹ã³ã¢ïŒ
outlier_scores_ïŒãäžããè åšãã³ãã£ã³ã°ããã·ã¥ããŒãã«éåžžã«äŸ¿å©ã§ãã
Tip
ãµã€ããŒã»ãã¥ãªãã£ã«ããããŠãŒã¹ã±ãŒã¹: HDBSCANã¯çŸä»£ã®è åšãã³ãã£ã³ã°ãã€ãã©ã€ã³ã§éåžžã«äººæ°ãããã忥XDRã¹ã€ãŒãã«ä»å±ããããŒãããã¯ããŒã¹ã®ãã³ãã£ã³ã°ãã¬ã€ããã¯ã®äžã§ããèŠãããŸããå®çšçãªã¬ã·ãã®1ã€ã¯ãIRäžã«HTTPããŒãã³ã°ãã©ãã£ãã¯ãã¯ã©ã¹ã¿ãªã³ã°ããããšã§ãïŒãŠãŒã¶ãŒãšãŒãžã§ã³ããééãURIã®é·ãã¯ãæ£åœãªãœãããŠã§ã¢ã¢ããããŒã¿ãŒã®ããã€ãã®å¯éããã°ã«ãŒãã圢æããããšãå€ãäžæ¹ã§ãC2ããŒãã³ã°ã¯å°ããªäœå¯åºŠã¯ã©ã¹ã¿ãŸãã¯çŽç²ãªãã€ãºãšããŠæ®ããŸãã
äŸ â ããŒãã³ã°C2ãã£ãã«ã®çºèŠ
```python import pandas as pd from hdbscan import HDBSCAN from sklearn.preprocessing import StandardScalerdf has features extracted from proxy logs
features = [ âavg_intervalâ, # seconds between requests âuri_length_meanâ, # average URI length âuser_agent_entropyâ # Shannon entropy of UA string ] X = StandardScaler().fit_transform(df[features])
hdb = HDBSCAN(min_cluster_size=15, # at least 15 similar beacons to be a group metric=âeuclideanâ, prediction_data=True) labels = hdb.fit_predict(X)
df[âclusterâ] = labels
Anything with label == -1 is noise â inspect as potential C2
suspects = df[df[âclusterâ] == -1] print(âSuspect beacon count:â, len(suspects))
</details>
---
### ããã¹ãæ§ãšã»ãã¥ãªãã£ã®èæ
®äºé
â æ¯ç©æ£åžãšæµå¯Ÿçæ»æ (2023-2025)
æè¿ã®ç ç©¶ã«ããã**æåž«ãªãåŠç¿è
ã¯*ã¢ã¯ãã£ããªæ»æè
*ã«å¯ŸããŠå
ç«ã§ã¯ãªã**ããšã瀺ãããŠããŸãïŒ
* **ç°åžžæ€åºåšã«å¯ŸããããŒã¿æ¯ç©æ£åžã** Chen *et al.* (IEEE S&P 2024) ã¯ãããã3%ã®å å·¥ããããã©ãã£ãã¯ã远å ããããšã§ãIsolation ForestãšECODã®æ±ºå®å¢çãã·ããããå®éã®æ»æãæ£åžžã«èŠããããã«ãªãããšã瀺ããŸãããèè
ãã¡ã¯ãæ¯ç¹ãèªåçã«åæãããªãŒãã³ãœãŒã¹ã®PoCïŒ`udo-poison`ïŒãå
¬éããŸããã
* **ã¯ã©ã¹ã¿ãªã³ã°ã¢ãã«ãžã®ããã¯ãã¢ã** *BadCME*æè¡ïŒBlackHat EU 2023ïŒã¯ãå°ããªããªã¬ãŒãã¿ãŒã³ãåã蟌ã¿ãŸãããã®ããªã¬ãŒãçŸãããšãK-MeansããŒã¹ã®æ€åºåšã¯éãã«ã€ãã³ãããè¯æ§ãã¯ã©ã¹ã¿ã«é
眮ããŸãã
* **DBSCAN/HDBSCANã®åé¿ã** KU Leuvenã®2025幎ã®åŠè¡ãã¬ããªã³ãã¯ãæ»æè
ãæå³çã«å¯åºŠã®ã®ã£ããã«èœã¡èŸŒãããŒã³ãã³ã°ãã¿ãŒã³ãäœæã§ããããšã瀺ãã*ãã€ãº*ã©ãã«ã®äžã«å¹æçã«é ããããšãã§ããŸãã
泚ç®ãéããŠããç·©åçïŒ
1. **ã¢ãã«ã®ãµãã¿ã€ãº / TRIMã** ãã¹ãŠã®åãã¬ãŒãã³ã°ãšããã¯ã®åã«ã1â2%ã®æé«æå€±ãã€ã³ãïŒããªã ãããæå€§å°€åºŠïŒãç Žæ£ããæ¯ç©æ£åžãåçã«å°é£ã«ããŸãã
2. **ã³ã³ã»ã³ãµã¹ã¢ã³ãµã³ãã«ã** è€æ°ã®ç°ç𮿀åºåšïŒäŸïŒIsolation Forest + GMM + ECODïŒãçµã¿åããã*ãããã*ã®ã¢ãã«ããã€ã³ãããã©ã°ä»ãããå Žåã«èŠåãçºããŸããç ç©¶ã«ãããšãããã«ããæ»æè
ã®ã³ã¹ãã10å以äžå¢å ããŸãã
3. **ã¯ã©ã¹ã¿ãªã³ã°ã®ããã®è·é¢ããŒã¹ã®é²åŸ¡ã** `k` ç°ãªãã©ã³ãã ã·ãŒãã§ã¯ã©ã¹ã¿ãåèšç®ããåžžã«ã¯ã©ã¹ã¿ãç§»åãããã€ã³ããç¡èŠããŸãã
---
### çŸä»£ã®ãªãŒãã³ãœãŒã¹ããŒã« (2024-2025)
* **PyOD 2.x**ïŒ2024幎5æãªãªãŒã¹ïŒã¯ã*ECOD*ã*COPOD*ãããã³GPUå éããã*AutoFormer*æ€åºåšã远å ããŸãããããã«ããã**1è¡ã®ã³ãŒã**ã§ããŒã¿ã»ããäžã®30以äžã®ã¢ã«ãŽãªãºã ãæ¯èŒã§ãã`benchmark`ãµãã³ãã³ããæäŸãããŸãïŒ
```bash
pyod benchmark --input logs.csv --label attack --n_jobs 8
- Anomalib v1.5ïŒ2025幎2æïŒã¯èŠèŠã«çŠç¹ãåœãŠãŠããŸãããäžè¬çãªPatchCoreå®è£ ãå«ãŸããŠãããã¹ã¯ãªãŒã³ã·ã§ããããŒã¹ã®ãã£ãã·ã³ã°ããŒãžæ€åºã«äŸ¿å©ã§ãã
- scikit-learn 1.5ïŒ2024幎11æïŒã¯ãPython 3.12ã®ãšãã«å€éšã®contribããã±ãŒãžãå¿
èŠãšããã«ãæ°ãã
cluster.HDBSCANã©ãããŒãä»ããŠHDBSCANã®ããã®score_samplesãæçµçã«å ¬éããŸããã
ã¯ã€ãã¯PyODäŸ â ECOD + Isolation Forestã¢ã³ãµã³ãã«
```python from pyod.models import ECOD, IForest from pyod.utils.data import generate_data, evaluate_print from pyod.utils.example import visualizeX_train, y_train, X_test, y_test = generate_data( n_train=5000, n_test=1000, n_features=16, contamination=0.02, random_state=42)
models = [ECOD(), IForest()]
majority vote â flag if any model thinks it is anomalous
anomaly_scores = sum(m.fit(X_train).decision_function(X_test) for m in models) / len(models)
evaluate_print(âEnsembleâ, y_test, anomaly_scores)
</details>
## åèæç®
- [HDBSCAN â éå±€çå¯åºŠããŒã¹ã®ã¯ã©ã¹ã¿ãªã³ã°](https://github.com/scikit-learn-contrib/hdbscan)
- Chen, X. *et al.* âç¡ç£ç£ç°åžžæ€åºã®ããŒã¿ãã€ãºãã³ã°ã«å¯Ÿããè匱æ§ã«ã€ããŠãâ *IEEEã»ãã¥ãªãã£ãšãã©ã€ãã·ãŒã·ã³ããžãŠã *, 2024.
> [!TIP]
> AWSãããã³ã°ãåŠã³ãå®è·µããïŒ<img src="../../../../../images/arte.png" alt="" style="width:auto;height:24px;vertical-align:middle;">[**HackTricks Training AWS Red Team Expert (ARTE)**](https://training.hacktricks.xyz/courses/arte)<img src="../../../../../images/arte.png" alt="" style="width:auto;height:24px;vertical-align:middle;">\
> GCPãããã³ã°ãåŠã³ãå®è·µããïŒ<img src="../../../../../images/grte.png" alt="" style="width:auto;height:24px;vertical-align:middle;">[**HackTricks Training GCP Red Team Expert (GRTE)**](https://training.hacktricks.xyz/courses/grte)<img src="../../../../../images/grte.png" alt="" style="width:auto;height:24px;vertical-align:middle;">
> Azureãããã³ã°ãåŠã³ãå®è·µããïŒ<img src="../../../../../images/azrte.png" alt="" style="width:auto;height:24px;vertical-align:middle;">[**HackTricks Training Azure Red Team Expert (AzRTE)**](https://training.hacktricks.xyz/courses/azrte)<img src="../../../../../images/azrte.png" alt="" style="width:auto;height:24px;vertical-align:middle;">
>
> <details>
>
> <summary>HackTricksããµããŒããã</summary>
>
> - [**ãµãã¹ã¯ãªãã·ã§ã³ãã©ã³**](https://github.com/sponsors/carlospolop)ã確èªããŠãã ããïŒ
> - **ð¬ [**Discordã°ã«ãŒã**](https://discord.gg/hRep4RUj7f)ãŸãã¯[**ãã¬ã°ã©ã ã°ã«ãŒã**](https://t.me/peass)ã«åå ãããã**Twitter** ðŠ [**@hacktricks_live**](https://twitter.com/hacktricks_live)**ããã©ããŒããŠãã ããã**
> - **[**HackTricks**](https://github.com/carlospolop/hacktricks)ããã³[**HackTricks Cloud**](https://github.com/carlospolop/hacktricks-cloud)ã®GitHubãªããžããªã«PRãæåºããŠãããã³ã°ããªãã¯ãå
±æããŠãã ããã**
>
> </details>


