Deep Learning

Tip

AWS ํ•ดํ‚น ๋ฐฐ์šฐ๊ธฐ ๋ฐ ์—ฐ์Šตํ•˜๊ธฐ:HackTricks Training AWS Red Team Expert (ARTE)
GCP ํ•ดํ‚น ๋ฐฐ์šฐ๊ธฐ ๋ฐ ์—ฐ์Šตํ•˜๊ธฐ: HackTricks Training GCP Red Team Expert (GRTE) Azure ํ•ดํ‚น ๋ฐฐ์šฐ๊ธฐ ๋ฐ ์—ฐ์Šตํ•˜๊ธฐ: HackTricks Training Azure Red Team Expert (AzRTE)

HackTricks ์ง€์›ํ•˜๊ธฐ

Deep Learning

๋”ฅ ๋Ÿฌ๋‹์€ ์—ฌ๋Ÿฌ ์ธต(๋”ฅ ์‹ ๊ฒฝ๋ง)์„ ๊ฐ€์ง„ ์‹ ๊ฒฝ๋ง์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ์˜ ๋ณต์žกํ•œ ํŒจํ„ด์„ ๋ชจ๋ธ๋งํ•˜๋Š” ๋จธ์‹  ๋Ÿฌ๋‹์˜ ํ•˜์œ„ ์ง‘ํ•ฉ์ž…๋‹ˆ๋‹ค. ์ปดํ“จํ„ฐ ๋น„์ „, ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ๋ฐ ์Œ์„ฑ ์ธ์‹ ๋“ฑ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ๋†€๋ผ์šด ์„ฑ๊ณต์„ ๊ฑฐ๋‘์—ˆ์Šต๋‹ˆ๋‹ค.

Neural Networks

์‹ ๊ฒฝ๋ง์€ ๋”ฅ ๋Ÿฌ๋‹์˜ ๊ธฐ๋ณธ ๊ตฌ์„ฑ ์š”์†Œ์ž…๋‹ˆ๋‹ค. ์ด๋“ค์€ ์ธต์œผ๋กœ ๊ตฌ์„ฑ๋œ ์ƒํ˜ธ ์—ฐ๊ฒฐ๋œ ๋…ธ๋“œ(๋‰ด๋Ÿฐ)๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ ๋‰ด๋Ÿฐ์€ ์ž…๋ ฅ์„ ๋ฐ›๊ณ , ๊ฐ€์ค‘์น˜ ํ•ฉ์„ ์ ์šฉํ•œ ํ›„, ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ๊ฒฐ๊ณผ๋ฅผ ์ถœ๋ ฅ์œผ๋กœ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค. ์ธต์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

  • Input Layer: ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ›๋Š” ์ฒซ ๋ฒˆ์งธ ์ธต.
  • Hidden Layers: ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์— ๋ณ€ํ™˜์„ ์ˆ˜ํ–‰ํ•˜๋Š” ์ค‘๊ฐ„ ์ธต. ์ˆจ๊ฒจ์ง„ ์ธต๊ณผ ๊ฐ ์ธต์˜ ๋‰ด๋Ÿฐ ์ˆ˜๋Š” ๋‹ค์–‘ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด๋Š” ์„œ๋กœ ๋‹ค๋ฅธ ์•„ํ‚คํ…์ฒ˜๋กœ ์ด์–ด์ง‘๋‹ˆ๋‹ค.
  • Output Layer: ๋„คํŠธ์›Œํฌ์˜ ์ถœ๋ ฅ์„ ์ƒ์„ฑํ•˜๋Š” ์ตœ์ข… ์ธต์œผ๋กœ, ๋ถ„๋ฅ˜ ์ž‘์—…์—์„œ ํด๋ž˜์Šค ํ™•๋ฅ ๊ณผ ๊ฐ™์€ ๊ฒฐ๊ณผ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

Activation Functions

๋‰ด๋Ÿฐ์˜ ์ธต์ด ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•  ๋•Œ, ๊ฐ ๋‰ด๋Ÿฐ์€ ์ž…๋ ฅ์— ๊ฐ€์ค‘์น˜์™€ ํŽธํ–ฅ์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค(z = w * x + b), ์—ฌ๊ธฐ์„œ w๋Š” ๊ฐ€์ค‘์น˜, x๋Š” ์ž…๋ ฅ, b๋Š” ํŽธํ–ฅ์ž…๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ ๋‰ด๋Ÿฐ์˜ ์ถœ๋ ฅ์€ ๋ชจ๋ธ์— ๋น„์„ ํ˜•์„ฑ์„ ๋„์ž…ํ•˜๊ธฐ ์œ„ํ•ด ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ํ†ต๊ณผํ•ฉ๋‹ˆ๋‹ค. ์ด ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋Š” ๋‹ค์Œ ๋‰ด๋Ÿฐ์ด โ€œํ™œ์„ฑํ™”๋˜์–ด์•ผ ํ•˜๋Š”์ง€์™€ ์–ผ๋งˆ๋‚˜โ€œ๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋„คํŠธ์›Œํฌ๋Š” ๋ฐ์ดํ„ฐ์˜ ๋ณต์žกํ•œ ํŒจํ„ด๊ณผ ๊ด€๊ณ„๋ฅผ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋ชจ๋“  ์—ฐ์† ํ•จ์ˆ˜๋ฅผ ๊ทผ์‚ฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋”ฐ๋ผ์„œ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋Š” ์‹ ๊ฒฝ๋ง์— ๋น„์„ ํ˜•์„ฑ์„ ๋„์ž…ํ•˜์—ฌ ๋ฐ์ดํ„ฐ์˜ ๋ณต์žกํ•œ ๊ด€๊ณ„๋ฅผ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์ธ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  • Sigmoid: ์ž…๋ ฅ ๊ฐ’์„ 0๊ณผ 1 ์‚ฌ์ด์˜ ๋ฒ”์œ„๋กœ ๋งคํ•‘ํ•˜๋ฉฐ, ์ด์ง„ ๋ถ„๋ฅ˜์— ์ž์ฃผ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
  • ReLU (Rectified Linear Unit): ์ž…๋ ฅ์ด ์–‘์ˆ˜์ผ ๊ฒฝ์šฐ ์ž…๋ ฅ์„ ์ง์ ‘ ์ถœ๋ ฅํ•˜๊ณ , ๊ทธ๋ ‡์ง€ ์•Š์œผ๋ฉด 0์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋‹จ์ˆœ์„ฑ๊ณผ ๋”ฅ ๋„คํŠธ์›Œํฌ ํ›ˆ๋ จ์˜ ํšจ๊ณผ์„ฑ ๋•๋ถ„์— ๋„๋ฆฌ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
  • Tanh: ์ž…๋ ฅ ๊ฐ’์„ -1๊ณผ 1 ์‚ฌ์ด์˜ ๋ฒ”์œ„๋กœ ๋งคํ•‘ํ•˜๋ฉฐ, ์ฃผ๋กœ ์ˆจ๊ฒจ์ง„ ์ธต์—์„œ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
  • Softmax: ์›์‹œ ์ ์ˆ˜๋ฅผ ํ™•๋ฅ ๋กœ ๋ณ€ํ™˜ํ•˜๋ฉฐ, ๋‹ค์ค‘ ํด๋ž˜์Šค ๋ถ„๋ฅ˜๋ฅผ ์œ„ํ•œ ์ถœ๋ ฅ ์ธต์—์„œ ์ž์ฃผ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

Backpropagation

์—ญ์ „ํŒŒ๋Š” ๋‰ด๋Ÿฐ ๊ฐ„์˜ ์—ฐ๊ฒฐ ๊ฐ€์ค‘์น˜๋ฅผ ์กฐ์ •ํ•˜์—ฌ ์‹ ๊ฒฝ๋ง์„ ํ›ˆ๋ จ์‹œํ‚ค๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค. ์ด๋Š” ์†์‹ค ํ•จ์ˆ˜์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ฐ ๊ฐ€์ค‘์น˜์— ๋Œ€ํ•ด ๊ณ„์‚ฐํ•˜๊ณ , ์†์‹ค์„ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๊ธฐ์šธ๊ธฐ์˜ ๋ฐ˜๋Œ€ ๋ฐฉํ–ฅ์œผ๋กœ ๊ฐ€์ค‘์น˜๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. ์—ญ์ „ํŒŒ์— ํฌํ•จ๋œ ๋‹จ๊ณ„๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  1. Forward Pass: ์ž…๋ ฅ์„ ์ธต์„ ํ†ตํ•ด ์ „๋‹ฌํ•˜๊ณ  ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜์—ฌ ๋„คํŠธ์›Œํฌ์˜ ์ถœ๋ ฅ์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
  2. Loss Calculation: ์˜ˆ์ธก๋œ ์ถœ๋ ฅ๊ณผ ์‹ค์ œ ๋ชฉํ‘œ ๊ฐ„์˜ ์†์‹ค(์˜ค๋ฅ˜)์„ ์†์‹ค ํ•จ์ˆ˜(์˜ˆ: ํšŒ๊ท€์˜ ๊ฒฝ์šฐ ํ‰๊ท  ์ œ๊ณฑ ์˜ค์ฐจ, ๋ถ„๋ฅ˜์˜ ๊ฒฝ์šฐ ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
  3. Backward Pass: ๋ฏธ๋ถ„ ๋ฒ•์น™์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ๊ฐ€์ค‘์น˜์— ๋Œ€ํ•œ ์†์‹ค์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
  4. Weight Update: ์†์‹ค์„ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜(์˜ˆ: ํ™•๋ฅ ์  ๊ฒฝ๋Ÿ‰ ํ•˜๊ฐ•๋ฒ•, Adam)์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ€์ค‘์น˜๋ฅผ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค.

Convolutional Neural Networks (CNNs)

ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง(CNN)์€ ์ด๋ฏธ์ง€์™€ ๊ฐ™์€ ๊ฒฉ์ž ํ˜•ํƒœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ์„ค๊ณ„๋œ ํŠน์ˆ˜ํ•œ ์œ ํ˜•์˜ ์‹ ๊ฒฝ๋ง์ž…๋‹ˆ๋‹ค. ์ด๋“ค์€ ๊ณต๊ฐ„์  ํŠน์ง•์˜ ๊ณ„์ธต ๊ตฌ์กฐ๋ฅผ ์ž๋™์œผ๋กœ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋Š” ๋Šฅ๋ ฅ ๋•๋ถ„์— ์ปดํ“จํ„ฐ ๋น„์ „ ์ž‘์—…์—์„œ ํŠนํžˆ ํšจ๊ณผ์ ์ž…๋‹ˆ๋‹ค.

CNN์˜ ์ฃผ์š” ๊ตฌ์„ฑ ์š”์†Œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  • Convolutional Layers: ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ํ•™์Šต ๊ฐ€๋Šฅํ•œ ํ•„ํ„ฐ(์ปค๋„)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•ฉ์„ฑ๊ณฑ ์—ฐ์‚ฐ์„ ์ ์šฉํ•˜์—ฌ ์ง€์—ญ ํŠน์ง•์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ํ•„ํ„ฐ๋Š” ์ž…๋ ฅ ์œ„๋ฅผ ์Šฌ๋ผ์ด๋“œํ•˜๋ฉฐ ์ ๊ณฑ์„ ๊ณ„์‚ฐํ•˜์—ฌ ํŠน์ง• ๋งต์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  • Pooling Layers: ์ค‘์š”ํ•œ ํŠน์ง•์„ ์œ ์ง€ํ•˜๋ฉด์„œ ํŠน์ง• ๋งต์˜ ๊ณต๊ฐ„ ์ฐจ์›์„ ์ค„์ž…๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์ธ ํ’€๋ง ์—ฐ์‚ฐ์—๋Š” ์ตœ๋Œ€ ํ’€๋ง๊ณผ ํ‰๊ท  ํ’€๋ง์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.
  • Fully Connected Layers: ํ•œ ์ธต์˜ ๋ชจ๋“  ๋‰ด๋Ÿฐ์„ ๋‹ค์Œ ์ธต์˜ ๋ชจ๋“  ๋‰ด๋Ÿฐ์— ์—ฐ๊ฒฐํ•˜๋ฉฐ, ์ „ํ†ต์ ์ธ ์‹ ๊ฒฝ๋ง๊ณผ ์œ ์‚ฌํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ธต์€ ์ผ๋ฐ˜์ ์œผ๋กœ ๋ถ„๋ฅ˜ ์ž‘์—…์„ ์œ„ํ•ด ๋„คํŠธ์›Œํฌ์˜ ๋์—์„œ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

CNN์˜ Convolutional Layers ๋‚ด๋ถ€์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ตฌ๋ถ„๋„ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค:

  • Initial Convolutional Layer: ์›์‹œ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ(์˜ˆ: ์ด๋ฏธ์ง€)๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ์ฒซ ๋ฒˆ์งธ ํ•ฉ์„ฑ๊ณฑ ์ธต์œผ๋กœ, ์—ฃ์ง€ ๋ฐ ํ…์Šค์ฒ˜์™€ ๊ฐ™์€ ๊ธฐ๋ณธ ํŠน์ง•์„ ์‹๋ณ„ํ•˜๋Š” ๋ฐ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • Intermediate Convolutional Layers: ์ดˆ๊ธฐ ์ธต์—์„œ ํ•™์Šตํ•œ ํŠน์ง•์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ตฌ์ถ•๋œ ํ›„์† ํ•ฉ์„ฑ๊ณฑ ์ธต์œผ๋กœ, ๋„คํŠธ์›Œํฌ๊ฐ€ ๋” ๋ณต์žกํ•œ ํŒจํ„ด๊ณผ ํ‘œํ˜„์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
  • Final Convolutional Layer: ์™„์ „ ์—ฐ๊ฒฐ ์ธต ์ด์ „์˜ ๋งˆ์ง€๋ง‰ ํ•ฉ์„ฑ๊ณฑ ์ธต์œผ๋กœ, ๊ณ ์ˆ˜์ค€์˜ ํŠน์ง•์„ ์บก์ฒ˜ํ•˜๊ณ  ๋ถ„๋ฅ˜๋ฅผ ์œ„ํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ์ค€๋น„ํ•ฉ๋‹ˆ๋‹ค.

Tip

CNN์€ ๊ฒฉ์ž ํ˜•ํƒœ์˜ ๋ฐ์ดํ„ฐ์—์„œ ํŠน์ง•์˜ ๊ณต๊ฐ„์  ๊ณ„์ธต ๊ตฌ์กฐ๋ฅผ ํ•™์Šตํ•˜๊ณ  ๊ฐ€์ค‘์น˜ ๊ณต์œ ๋ฅผ ํ†ตํ•ด ๋งค๊ฐœ๋ณ€์ˆ˜ ์ˆ˜๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ๋Š” ๋Šฅ๋ ฅ ๋•๋ถ„์— ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜, ๊ฐ์ฒด ํƒ์ง€ ๋ฐ ์ด๋ฏธ์ง€ ๋ถ„ํ•  ์ž‘์—…์— ํŠนํžˆ ํšจ๊ณผ์ ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, ์ด๋“ค์€ ์ด์›ƒ ๋ฐ์ดํ„ฐ(ํ”ฝ์…€)๊ฐ€ ๋จผ ํ”ฝ์…€๋ณด๋‹ค ๋” ๊ด€๋ จ์„ฑ์ด ๋†’์„ ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ๋Š” ํŠน์ง• ์ง€์—ญ์„ฑ ์›์น™์„ ์ง€์›ํ•˜๋Š” ๋ฐ์ดํ„ฐ์—์„œ ๋” ์ž˜ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ํ…์ŠคํŠธ์™€ ๊ฐ™์€ ๋‹ค๋ฅธ ์œ ํ˜•์˜ ๋ฐ์ดํ„ฐ์—์„œ๋Š” ํ•ด๋‹น๋˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, CNN์ด ๋ณต์žกํ•œ ํŠน์ง•์„ ์‹๋ณ„ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ๊ณต๊ฐ„์  ๋งฅ๋ฝ์„ ์ ์šฉํ•  ์ˆ˜ ์—†๋‹ค๋Š” ์ ์— ์œ ์˜ํ•˜์‹ญ์‹œ์˜ค. ์ฆ‰, ์ด๋ฏธ์ง€์˜ ์„œ๋กœ ๋‹ค๋ฅธ ๋ถ€๋ถ„์—์„œ ๋ฐœ๊ฒฌ๋œ ๋™์ผํ•œ ํŠน์ง•์€ ๋™์ผํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

Example defining a CNN

์—ฌ๊ธฐ์—์„œ๋Š” 48x48 ํฌ๊ธฐ์˜ RGB ์ด๋ฏธ์ง€ ๋ฐฐ์น˜๋ฅผ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ์‚ฌ์šฉํ•˜๊ณ , ํŠน์ง•์„ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•ด ํ•ฉ์„ฑ๊ณฑ ์ธต๊ณผ ์ตœ๋Œ€ ํ’€๋ง์„ ์‚ฌ์šฉํ•˜๋ฉฐ, ๋ถ„๋ฅ˜๋ฅผ ์œ„ํ•ด ์™„์ „ ์—ฐ๊ฒฐ ์ธต์„ ์‚ฌ์šฉํ•˜๋Š” ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง(CNN)์„ ์ •์˜ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์„ค๋ช…์„ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‹ค์Œ์€ PyTorch์—์„œ 1๊ฐœ์˜ ํ•ฉ์„ฑ๊ณฑ ์ธต์„ ์ •์˜ํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค: self.conv1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding=1).

  • in_channels: ์ž…๋ ฅ ์ฑ„๋„ ์ˆ˜. RGB ์ด๋ฏธ์ง€์˜ ๊ฒฝ์šฐ, ์ด๋Š” 3(๊ฐ ์ƒ‰์ƒ ์ฑ„๋„๋งˆ๋‹ค ํ•˜๋‚˜)์ž…๋‹ˆ๋‹ค. ๊ทธ๋ ˆ์ด์Šค์ผ€์ผ ์ด๋ฏธ์ง€์˜ ๊ฒฝ์šฐ, ์ด๋Š” 1์ด ๋ฉ๋‹ˆ๋‹ค.

  • out_channels: ํ•ฉ์„ฑ๊ณฑ ์ธต์ด ํ•™์Šตํ•  ์ถœ๋ ฅ ์ฑ„๋„(ํ•„ํ„ฐ) ์ˆ˜์ž…๋‹ˆ๋‹ค. ์ด๋Š” ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜์— ๋”ฐ๋ผ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ๋Š” ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ์ž…๋‹ˆ๋‹ค.

  • kernel_size: ํ•ฉ์„ฑ๊ณฑ ํ•„ํ„ฐ์˜ ํฌ๊ธฐ์ž…๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์ธ ์„ ํƒ์€ 3x3์ด๋ฉฐ, ์ด๋Š” ํ•„ํ„ฐ๊ฐ€ ์ž…๋ ฅ ์ด๋ฏธ์ง€์˜ 3x3 ์˜์—ญ์„ ์ปค๋ฒ„ํ•จ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” in_channels์—์„œ out_channels๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” 3ร—3ร—3 ์ƒ‰์ƒ ์Šคํƒฌํ”„์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  1. ๊ทธ 3ร—3ร—3 ์Šคํƒฌํ”„๋ฅผ ์ด๋ฏธ์ง€ ํ๋ธŒ์˜ ์™ผ์ชฝ ์ƒ๋‹จ ๋ชจ์„œ๋ฆฌ์— ๋†“์Šต๋‹ˆ๋‹ค.
  2. ๊ฐ ๊ฐ€์ค‘์น˜๋ฅผ ๊ทธ ์•„๋ž˜์˜ ํ”ฝ์…€์— ๊ณฑํ•˜๊ณ  ๋ชจ๋‘ ๋”ํ•œ ํ›„, ํŽธํ–ฅ์„ ์ถ”๊ฐ€ํ•˜์—ฌ ํ•˜๋‚˜์˜ ์ˆซ์ž๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค.
  3. ๊ทธ ์ˆซ์ž๋ฅผ ๋นˆ ๋งต์˜ ์œ„์น˜(0, 0)์— ๊ธฐ๋กํ•ฉ๋‹ˆ๋‹ค.
  4. ์Šคํƒฌํ”„๋ฅผ ์˜ค๋ฅธ์ชฝ์œผ๋กœ ํ•œ ํ”ฝ์…€ ์Šฌ๋ผ์ด๋“œ(์ŠคํŠธ๋ผ์ด๋“œ = 1)ํ•˜๊ณ  ์ „์ฒด 48ร—48 ๊ทธ๋ฆฌ๋“œ๋ฅผ ์ฑ„์šธ ๋•Œ๊นŒ์ง€ ๋ฐ˜๋ณตํ•ฉ๋‹ˆ๋‹ค.
  • padding: ์ž…๋ ฅ์˜ ๊ฐ ์ธก๋ฉด์— ์ถ”๊ฐ€๋˜๋Š” ํ”ฝ์…€ ์ˆ˜์ž…๋‹ˆ๋‹ค. ํŒจ๋”ฉ์€ ์ž…๋ ฅ์˜ ๊ณต๊ฐ„ ์ฐจ์›์„ ๋ณด์กดํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋˜์–ด ์ถœ๋ ฅ ํฌ๊ธฐ๋ฅผ ๋” ์ž˜ ์ œ์–ดํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, 3x3 ์ปค๋„์„ ๊ฐ€์ง„ 48x48 ํ”ฝ์…€ ์ž…๋ ฅ์˜ ๊ฒฝ์šฐ, ํŒจ๋”ฉ 1์€ ํ•ฉ์„ฑ๊ณฑ ์—ฐ์‚ฐ ํ›„ ์ถœ๋ ฅ ํฌ๊ธฐ๋ฅผ ๋™์ผํ•˜๊ฒŒ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค(48x48). ์ด๋Š” ํŒจ๋”ฉ์ด ์ž…๋ ฅ ์ด๋ฏธ์ง€ ์ฃผ์œ„์— 1ํ”ฝ์…€์˜ ๊ฒฝ๊ณ„๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ์ปค๋„์ด ๊ฐ€์žฅ์ž๋ฆฌ๋ฅผ ์Šฌ๋ผ์ด๋“œํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•˜์—ฌ ๊ณต๊ฐ„ ์ฐจ์›์„ ์ค„์ด์ง€ ์•Š๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

๊ทธ๋Ÿฐ ๋‹ค์Œ ์ด ์ธต์˜ ํ•™์Šต ๊ฐ€๋Šฅํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜ ์ˆ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  • (3x3x3 (์ปค๋„ ํฌ๊ธฐ) + 1 (ํŽธํ–ฅ)) x 32 (out_channels) = 896 ํ•™์Šต ๊ฐ€๋Šฅํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜.

๊ฐ ํ•ฉ์„ฑ๊ณฑ ์ธต์˜ ๊ธฐ๋Šฅ์€ ์ž…๋ ฅ์˜ ์„ ํ˜• ๋ณ€ํ™˜์„ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ด๋ฏ€๋กœ ์‚ฌ์šฉ๋œ ๊ฐ ์ปค๋„๋งˆ๋‹ค ํŽธํ–ฅ(+1)์ด ์ถ”๊ฐ€๋ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฐฉ์ •์‹์œผ๋กœ ํ‘œํ˜„๋ฉ๋‹ˆ๋‹ค:

Y = f(W * X + b)

W๋Š” ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ(ํ•™์Šต๋œ ํ•„ํ„ฐ, 3x3x3 = 27๊ฐœ์˜ ๋งค๊ฐœ๋ณ€์ˆ˜)์ด๊ณ , b๋Š” ๊ฐ ์ถœ๋ ฅ ์ฑ„๋„์— ๋Œ€ํ•ด +1์ธ ๋ฐ”์ด์–ด์Šค ๋ฒกํ„ฐ์ž…๋‹ˆ๋‹ค.

self.conv1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding=1)์˜ ์ถœ๋ ฅ์€ (batch_size, 32, 48, 48) ํ˜•ํƒœ์˜ ํ…์„œ๊ฐ€ ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ 32๋Š” 48x48 ํ”ฝ์…€ ํฌ๊ธฐ์˜ ์ƒˆ๋กœ ์ƒ์„ฑ๋œ ์ฑ„๋„ ์ˆ˜์ž…๋‹ˆ๋‹ค.

๊ทธ๋Ÿฐ ๋‹ค์Œ, ์ด ํ•ฉ์„ฑ๊ณฑ ์ธต์„ ๋‹ค๋ฅธ ํ•ฉ์„ฑ๊ณฑ ์ธต์— ์—ฐ๊ฒฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค: self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1).

์ด๊ฒƒ์€ ๋‹ค์Œ์„ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค: (32x3x3 (์ปค๋„ ํฌ๊ธฐ) + 1 (๋ฐ”์ด์–ด์Šค)) x 64 (์ถœ๋ ฅ ์ฑ„๋„) = 18,496๊ฐœ์˜ ํ•™์Šต ๊ฐ€๋Šฅํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜์™€ (batch_size, 64, 48, 48) ํ˜•ํƒœ์˜ ์ถœ๋ ฅ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

๋ณด์‹œ๋‹ค์‹œํ”ผ ๋งค๊ฐœ๋ณ€์ˆ˜์˜ ์ˆ˜๋Š” ๊ฐ ์ถ”๊ฐ€ ํ•ฉ์„ฑ๊ณฑ ์ธต๊ณผ ํ•จ๊ป˜ ๋น ๋ฅด๊ฒŒ ์ฆ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค, ํŠนํžˆ ์ถœ๋ ฅ ์ฑ„๋„ ์ˆ˜๊ฐ€ ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ.

๋ฐ์ดํ„ฐ ์‚ฌ์šฉ๋Ÿ‰์„ ์ œ์–ดํ•˜๋Š” ํ•œ ๊ฐ€์ง€ ์˜ต์…˜์€ ๊ฐ ํ•ฉ์„ฑ๊ณฑ ์ธต ๋’ค์— ์ตœ๋Œ€ ํ’€๋ง์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ตœ๋Œ€ ํ’€๋ง์€ ํŠน์ง• ๋งต์˜ ๊ณต๊ฐ„ ์ฐจ์›์„ ์ค„์—ฌ ๋งค๊ฐœ๋ณ€์ˆ˜ ์ˆ˜์™€ ๊ณ„์‚ฐ ๋ณต์žก์„ฑ์„ ์ค„์ด๋Š” ๋ฐ ๋„์›€์ด ๋˜๋ฉฐ ์ค‘์š”ํ•œ ํŠน์ง•์„ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค.

๋‹ค์Œ๊ณผ ๊ฐ™์ด ์„ ์–ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค: self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2). ์ด๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ 2x2 ํ”ฝ์…€ ๊ทธ๋ฆฌ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ๊ฐ ๊ทธ๋ฆฌ๋“œ์—์„œ ์ตœ๋Œ€ ๊ฐ’์„ ์ทจํ•ด ํŠน์ง• ๋งต์˜ ํฌ๊ธฐ๋ฅผ ์ ˆ๋ฐ˜์œผ๋กœ ์ค„์ด๋Š” ๊ฒƒ์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ๋˜ํ•œ, stride=2๋Š” ํ’€๋ง ์ž‘์—…์ด ํ•œ ๋ฒˆ์— 2ํ”ฝ์…€์”ฉ ์ด๋™ํ•จ์„ ์˜๋ฏธํ•˜๋ฉฐ, ์ด ๊ฒฝ์šฐ ํ’€๋ง ์˜์—ญ ๊ฐ„์˜ ๊ฒน์นจ์„ ๋ฐฉ์ง€ํ•ฉ๋‹ˆ๋‹ค.

์ด ํ’€๋ง ์ธต์„ ์‚ฌ์šฉํ•˜๋ฉด ์ฒซ ๋ฒˆ์งธ ํ•ฉ์„ฑ๊ณฑ ์ธต ์ดํ›„์˜ ์ถœ๋ ฅ ํ˜•ํƒœ๋Š” self.conv2์˜ ์ถœ๋ ฅ์— self.pool1์„ ์ ์šฉํ•œ ํ›„ (batch_size, 64, 24, 24)๊ฐ€ ๋˜์–ด ์ด์ „ ์ธต์˜ ํฌ๊ธฐ๋ฅผ 1/4๋กœ ์ค„์ž…๋‹ˆ๋‹ค.

Tip

ํ•ฉ์„ฑ๊ณฑ ์ธต ๋’ค์— ํ’€๋ง์„ ํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ํŠน์ง• ๋งต์˜ ๊ณต๊ฐ„ ์ฐจ์›์„ ์ค„์—ฌ ๋งค๊ฐœ๋ณ€์ˆ˜ ์ˆ˜์™€ ๊ณ„์‚ฐ ๋ณต์žก์„ฑ์„ ์ œ์–ดํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋˜๋ฉฐ, ์ดˆ๊ธฐ ๋งค๊ฐœ๋ณ€์ˆ˜๊ฐ€ ์ค‘์š”ํ•œ ํŠน์ง•์„ ํ•™์Šตํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ํ’€๋ง ์ธต ์•ž์˜ ํ•ฉ์„ฑ๊ณฑ์„ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์—์„œ ํŠน์ง•์„ ์ถ”์ถœํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(์˜ˆ: ์„ , ๋ชจ์„œ๋ฆฌ). ์ด ์ •๋ณด๋Š” ์—ฌ์ „ํžˆ ํ’€๋ง๋œ ์ถœ๋ ฅ์— ์กด์žฌํ•˜์ง€๋งŒ, ๋‹ค์Œ ํ•ฉ์„ฑ๊ณฑ ์ธต์€ ์›๋ž˜ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณผ ์ˆ˜ ์—†๊ณ , ์˜ค์ง ํ’€๋ง๋œ ์ถœ๋ ฅ๋งŒ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์ด์ „ ์ธต์˜ ์ •๋ณด๊ฐ€ ์ถ•์†Œ๋œ ๋ฒ„์ „์ž…๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์ธ ์ˆœ์„œ: Conv โ†’ ReLU โ†’ Pool์—์„œ ๊ฐ 2ร—2 ํ’€๋ง ์ฐฝ์€ ์ด์ œ ํŠน์ง• ํ™œ์„ฑํ™”(โ€œ๋ชจ์„œ๋ฆฌ ์กด์žฌ / ์—†์Œโ€)์™€ ๊ฒฝ์Ÿํ•˜๋ฉฐ, ์›์‹œ ํ”ฝ์…€ ๊ฐ•๋„์™€๋Š” ๋‹ค๋ฆ…๋‹ˆ๋‹ค. ๊ฐ€์žฅ ๊ฐ•ํ•œ ํ™œ์„ฑํ™”๋ฅผ ์œ ์ง€ํ•˜๋Š” ๊ฒƒ์€ ์ •๋ง๋กœ ๊ฐ€์žฅ ๋‘๋“œ๋Ÿฌ์ง„ ์ฆ๊ฑฐ๋ฅผ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค.

๊ทธ๋Ÿฐ ๋‹ค์Œ ํ•„์š”ํ•œ ๋งŒํผ์˜ ํ•ฉ์„ฑ๊ณฑ ๋ฐ ํ’€๋ง ์ธต์„ ์ถ”๊ฐ€ํ•œ ํ›„, ์ถœ๋ ฅ์„ ํ‰ํƒ„ํ™”ํ•˜์—ฌ ์™„์ „ ์—ฐ๊ฒฐ ์ธต์— ๊ณต๊ธ‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋ฐฐ์น˜์˜ ๊ฐ ์ƒ˜ํ”Œ์— ๋Œ€ํ•ด ํ…์„œ๋ฅผ 1D ๋ฒกํ„ฐ๋กœ ์žฌ๊ตฌ์„ฑํ•˜์—ฌ ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค:

x = x.view(-1, 64*24*24)

๊ทธ๋ฆฌ๊ณ  ์ด์ „์˜ ํ•ฉ์„ฑ๊ณฑ ๋ฐ ํ’€๋ง ๋ ˆ์ด์–ด์—์„œ ์ƒ์„ฑ๋œ ๋ชจ๋“  ํ›ˆ๋ จ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๊ฐ€์ง„ ์ด 1D ๋ฒกํ„ฐ๋กœ, ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์™„์ „ ์—ฐ๊ฒฐ ๋ ˆ์ด์–ด๋ฅผ ์ •์˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

self.fc1 = nn.Linear(64 * 24 * 24, 512)

์ด ๋ ˆ์ด์–ด๋Š” ์ด์ „ ๋ ˆ์ด์–ด์˜ ํ‰ํƒ„ํ™”๋œ ์ถœ๋ ฅ์„ ๊ฐ€์ ธ์™€ 512๊ฐœ์˜ ์ˆจ๊ฒจ์ง„ ์œ ๋‹›์— ๋งคํ•‘ํ•ฉ๋‹ˆ๋‹ค.

์ด ๋ ˆ์ด์–ด๊ฐ€ ์ถ”๊ฐ€ํ•œ (64 * 24 * 24 + 1 (bias)) * 512 = 3,221,504 ๊ฐœ์˜ ํ•™์Šต ๊ฐ€๋Šฅํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ฃผ๋ชฉํ•˜์„ธ์š”. ์ด๋Š” ํ•ฉ์„ฑ๊ณฑ ๋ ˆ์ด์–ด์— ๋น„ํ•ด ์ƒ๋‹นํ•œ ์ฆ๊ฐ€์ž…๋‹ˆ๋‹ค. ์ด๋Š” ์™„์ „ ์—ฐ๊ฒฐ ๋ ˆ์ด์–ด๊ฐ€ ํ•œ ๋ ˆ์ด์–ด์˜ ๋ชจ๋“  ๋‰ด๋Ÿฐ์„ ๋‹ค์Œ ๋ ˆ์ด์–ด์˜ ๋ชจ๋“  ๋‰ด๋Ÿฐ์— ์—ฐ๊ฒฐํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋งค๊ฐœ๋ณ€์ˆ˜์˜ ์ˆ˜๊ฐ€ ๋งŽ์•„์ง‘๋‹ˆ๋‹ค.

๋งˆ์ง€๋ง‰์œผ๋กœ, ์ตœ์ข… ํด๋ž˜์Šค ๋กœ์ง“์„ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ์ถœ๋ ฅ ๋ ˆ์ด์–ด๋ฅผ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

self.fc2 = nn.Linear(512, num_classes)

์ด๊ฒƒ์€ (512 + 1 (bias)) * num_classes์˜ ํ•™์Šต ๊ฐ€๋Šฅํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ num_classes๋Š” ๋ถ„๋ฅ˜ ์ž‘์—…์˜ ํด๋ž˜์Šค ์ˆ˜์ž…๋‹ˆ๋‹ค (์˜ˆ: GTSRB ๋ฐ์ดํ„ฐ์…‹์˜ ๊ฒฝ์šฐ 43).

๋˜ ๋‹ค๋ฅธ ์ผ๋ฐ˜์ ์ธ ๊ด€ํ–‰์€ ๊ณผ์ ํ•ฉ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ์™„์ „ ์—ฐ๊ฒฐ ๊ณ„์ธต ์ „์— ๋“œ๋กญ์•„์›ƒ ๋ ˆ์ด์–ด๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

self.dropout = nn.Dropout(0.5)

์ด ๋ ˆ์ด์–ด๋Š” ํ›ˆ๋ จ ์ค‘ ์ž…๋ ฅ ์œ ๋‹›์˜ ์ผ๋ถ€๋ฅผ ๋ฌด์ž‘์œ„๋กœ 0์œผ๋กœ ์„ค์ •ํ•˜์—ฌ ํŠน์ • ๋‰ด๋Ÿฐ์— ๋Œ€ํ•œ ์˜์กด๋„๋ฅผ ์ค„์ž„์œผ๋กœ์จ ๊ณผ์ ํ•ฉ์„ ๋ฐฉ์ง€ํ•˜๋Š” ๋ฐ ๋„์›€์„ ์ค๋‹ˆ๋‹ค.

CNN ์ฝ”๋“œ ์˜ˆ์ œ

import torch
import torch.nn as nn
import torch.nn.functional as F

class MY_NET(nn.Module):
def __init__(self, num_classes=32):
super(MY_NET, self).__init__()
# Initial conv layer: 3 input channels (RGB), 32 output channels, 3x3 kernel, padding 1
# This layer will learn basic features like edges and textures
self.conv1 = nn.Conv2d(
in_channels=3, out_channels=32, kernel_size=3, padding=1
)
# Output: (Batch Size, 32, 48, 48)

# Conv Layer 2: 32 input channels, 64 output channels, 3x3 kernel, padding 1
# This layer will learn more complex features based on the output of conv1
self.conv2 = nn.Conv2d(
in_channels=32, out_channels=64, kernel_size=3, padding=1
)
# Output: (Batch Size, 64, 48, 48)

# Max Pooling 1: Kernel 2x2, Stride 2. Reduces spatial dimensions by half (1/4th of the previous layer).
self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)
# Output: (Batch Size, 64, 24, 24)

# Conv Layer 3: 64 input channels, 128 output channels, 3x3 kernel, padding 1
# This layer will learn even more complex features based on the output of conv2
# Note that the number of output channels can be adjusted based on the complexity of the task
self.conv3 = nn.Conv2d(
in_channels=64, out_channels=128, kernel_size=3, padding=1
)
# Output: (Batch Size, 128, 24, 24)

# Max Pooling 2: Kernel 2x2, Stride 2. Reduces spatial dimensions by half again.
# Reducing the dimensions further helps to control the number of parameters and computational complexity.
self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)
# Output: (Batch Size, 128, 12, 12)

# From the second pooling layer, we will flatten the output to feed it into fully connected layers.
# The feature size is calculated as follows:
# Feature size = Number of output channels * Height * Width
self._feature_size = 128 * 12 * 12

# Fully Connected Layer 1 (Hidden): Maps flattened features to hidden units.
# This layer will learn to combine the features extracted by the convolutional layers.
self.fc1 = nn.Linear(self._feature_size, 512)

# Fully Connected Layer 2 (Output): Maps hidden units to class logits.
# Output size MUST match num_classes
self.fc2 = nn.Linear(512, num_classes)

# Dropout layer configuration with a dropout rate of 0.5.
# This layer is used to prevent overfitting by randomly setting a fraction of the input units to zero during training.
self.dropout = nn.Dropout(0.5)

def forward(self, x):
"""
The forward method defines the forward pass of the network.
It takes an input tensor `x` and applies the convolutional layers, pooling layers, and fully connected layers in sequence.
The input tensor `x` is expected to have the shape (Batch Size, Channels, Height, Width), where:
- Batch Size: Number of samples in the batch
- Channels: Number of input channels (e.g., 3 for RGB images)
- Height: Height of the input image (e.g., 48 for 48x48 images)
- Width: Width of the input image (e.g., 48 for 48x48 images)
The output of the forward method is the logits for each class, which can be used for classification tasks.
Args:
x (torch.Tensor): Input tensor of shape (Batch Size, Channels, Height, Width)
Returns:
torch.Tensor: Output tensor of shape (Batch Size, num_classes) containing the class logits.
"""

# Conv1 -> ReLU -> Conv2 -> ReLU -> Pool1 -> Conv3 -> ReLU -> Pool2
x = self.conv1(x)
x = F.relu(x)
x = self.conv2(x)
x = F.relu(x)
x = self.pool1(x)
x = self.conv3(x)
x = F.relu(x)
x = self.pool2(x)
# At this point, x has shape (Batch Size, 128, 12, 12)

# Flatten the output to feed it into fully connected layers
x = torch.flatten(x, 1)

# Apply dropout to prevent overfitting
x = self.dropout(x)

# First FC layer with ReLU activation
x = F.relu(self.fc1(x))

# Apply Dropout again
x = self.dropout(x)
# Final FC layer to get logits
x = self.fc2(x)
# Output shape will be (Batch Size, num_classes)
# Note that the output is not passed through a softmax activation here, as it is typically done in the loss function (e.g., CrossEntropyLoss)
return x

CNN ์ฝ”๋“œ ํ›ˆ๋ จ ์˜ˆ์ œ

๋‹ค์Œ ์ฝ”๋“œ๋Š” ์ผ๋ถ€ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ์œ„์—์„œ ์ •์˜ํ•œ MY_NET ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค. ์ฃผ๋ชฉํ•  ๋งŒํ•œ ๋ช‡ ๊ฐ€์ง€ ํฅ๋ฏธ๋กœ์šด ๊ฐ’์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  • EPOCHS๋Š” ๋ชจ๋ธ์ด ํ›ˆ๋ จ ์ค‘ ์ „์ฒด ๋ฐ์ดํ„ฐ์…‹์„ ๋ณด๋Š” ํšŸ์ˆ˜์ž…๋‹ˆ๋‹ค. EPOCH์ด ๋„ˆ๋ฌด ์ž‘์œผ๋ฉด ๋ชจ๋ธ์ด ์ถฉ๋ถ„ํžˆ ํ•™์Šตํ•˜์ง€ ๋ชปํ•  ์ˆ˜ ์žˆ๊ณ , ๋„ˆ๋ฌด ํฌ๋ฉด ๊ณผ์ ํ•ฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • LEARNING_RATE๋Š” ์ตœ์ ํ™”๊ธฐ์˜ ๋‹จ๊ณ„ ํฌ๊ธฐ์ž…๋‹ˆ๋‹ค. ์ž‘์€ ํ•™์Šต๋ฅ ์€ ๋А๋ฆฐ ์ˆ˜๋ ด์œผ๋กœ ์ด์–ด์งˆ ์ˆ˜ ์žˆ๊ณ , ํฐ ํ•™์Šต๋ฅ ์€ ์ตœ์  ์†”๋ฃจ์…˜์„ ์ดˆ๊ณผํ•˜์—ฌ ์ˆ˜๋ ด์„ ๋ฐฉํ•ดํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • WEIGHT_DECAY๋Š” ํฐ ๊ฐ€์ค‘์น˜์— ๋Œ€ํ•ด ํŒจ๋„ํ‹ฐ๋ฅผ ๋ถ€์—ฌํ•˜์—ฌ ๊ณผ์ ํ•ฉ์„ ๋ฐฉ์ง€ํ•˜๋Š” ์ •๊ทœํ™” ํ•ญ์ž…๋‹ˆ๋‹ค.

ํ›ˆ๋ จ ๋ฃจํ”„์™€ ๊ด€๋ จํ•˜์—ฌ ์•Œ์•„๋‘๋ฉด ์ข‹์€ ํฅ๋ฏธ๋กœ์šด ์ •๋ณด๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  • criterion = nn.CrossEntropyLoss()๋Š” ๋‹ค์ค‘ ํด๋ž˜์Šค ๋ถ„๋ฅ˜ ์ž‘์—…์— ์‚ฌ์šฉ๋˜๋Š” ์†์‹ค ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค. ์†Œํ”„ํŠธ๋งฅ์Šค ํ™œ์„ฑํ™”์™€ ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ์†์‹ค์„ ๋‹จ์ผ ํ•จ์ˆ˜๋กœ ๊ฒฐํ•ฉํ•˜์—ฌ ํด๋ž˜์Šค ๋กœ์ง“์„ ์ถœ๋ ฅํ•˜๋Š” ๋ชจ๋ธ ํ›ˆ๋ จ์— ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค.
  • ๋ชจ๋ธ์ด ์ด์ง„ ๋ถ„๋ฅ˜๋‚˜ ํšŒ๊ท€์™€ ๊ฐ™์€ ๋‹ค๋ฅธ ์œ ํ˜•์˜ ์ถœ๋ ฅ์„ ์˜ˆ์ƒํ•˜๋Š” ๊ฒฝ์šฐ, ์ด์ง„ ๋ถ„๋ฅ˜์—๋Š” nn.BCEWithLogitsLoss(), ํšŒ๊ท€์—๋Š” nn.MSELoss()์™€ ๊ฐ™์€ ๋‹ค๋ฅธ ์†์‹ค ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • optimizer = optim.Adam(model.parameters(), lr=LEARNING_RATE, weight_decay=WEIGHT_DECAY)๋Š” ๋”ฅ ๋Ÿฌ๋‹ ๋ชจ๋ธ ํ›ˆ๋ จ์— ์ธ๊ธฐ ์žˆ๋Š” ์„ ํƒ์ธ Adam ์ตœ์ ํ™”๋ฅผ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๊ธฐ์šธ๊ธฐ์˜ ์ฒซ ๋ฒˆ์งธ ๋ฐ ๋‘ ๋ฒˆ์งธ ๋ชจ๋ฉ˜ํŠธ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ฐ ๋งค๊ฐœ๋ณ€์ˆ˜์— ๋Œ€ํ•œ ํ•™์Šต๋ฅ ์„ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค.
  • optim.SGD (ํ™•๋ฅ ์  ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•) ๋˜๋Š” optim.RMSprop์™€ ๊ฐ™์€ ๋‹ค๋ฅธ ์ตœ์ ํ™”๊ธฐ๋„ ํ›ˆ๋ จ ์ž‘์—…์˜ ํŠน์ • ์š”๊ตฌ ์‚ฌํ•ญ์— ๋”ฐ๋ผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • model.train() ๋ฉ”์„œ๋“œ๋Š” ๋ชจ๋ธ์„ ํ›ˆ๋ จ ๋ชจ๋“œ๋กœ ์„ค์ •ํ•˜์—ฌ ๋“œ๋กญ์•„์›ƒ ๋ฐ ๋ฐฐ์น˜ ์ •๊ทœํ™”์™€ ๊ฐ™์€ ๋ ˆ์ด์–ด๊ฐ€ ํ‰๊ฐ€์™€ ๋น„๊ตํ•˜์—ฌ ํ›ˆ๋ จ ์ค‘์— ๋‹ค๋ฅด๊ฒŒ ๋™์ž‘ํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
  • optimizer.zero_grad()๋Š” ์—ญ์ „ํŒŒ ์ด์ „์— ๋ชจ๋“  ์ตœ์ ํ™”๋œ ํ…์„œ์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ์ง€์›๋‹ˆ๋‹ค. ์ด๋Š” PyTorch์—์„œ ๊ธฐ์šธ๊ธฐ๊ฐ€ ๊ธฐ๋ณธ์ ์œผ๋กœ ๋ˆ„์ ๋˜๊ธฐ ๋•Œ๋ฌธ์— ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์ง€์šฐ์ง€ ์•Š์œผ๋ฉด ์ด์ „ ๋ฐ˜๋ณต์˜ ๊ธฐ์šธ๊ธฐ๊ฐ€ ํ˜„์žฌ ๊ธฐ์šธ๊ธฐ์— ์ถ”๊ฐ€๋˜์–ด ์ž˜๋ชป๋œ ์—…๋ฐ์ดํŠธ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • loss.backward()๋Š” ๋ชจ๋ธ ๋งค๊ฐœ๋ณ€์ˆ˜์— ๋Œ€ํ•œ ์†์‹ค์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ณ„์‚ฐํ•˜๋ฉฐ, ์ด๋Š” ์ดํ›„ ์ตœ์ ํ™”๊ธฐ๊ฐ€ ๊ฐ€์ค‘์น˜๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
  • optimizer.step()์€ ๊ณ„์‚ฐ๋œ ๊ธฐ์šธ๊ธฐ์™€ ํ•™์Šต๋ฅ ์— ๋”ฐ๋ผ ๋ชจ๋ธ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค.
import torch, torch.nn.functional as F
from torch import nn, optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
from tqdm import tqdm
from sklearn.metrics import classification_report, confusion_matrix
import numpy as np

# ---------------------------------------------------------------------------
# 1. Globals
# ---------------------------------------------------------------------------
IMG_SIZE      = 48               # model expects 48ร—48
NUM_CLASSES   = 10               # MNIST has 10 digits
BATCH_SIZE    = 64               # batch size for training and validation
EPOCHS        = 5                # number of training epochs
LEARNING_RATE = 1e-3             # initial learning rate for Adam optimiser
WEIGHT_DECAY  = 1e-4             # L2 regularisation to prevent overfitting

# Channel-wise mean / std for MNIST (grayscale โ‡’ repeat for 3-channel input)
MNIST_MEAN = (0.1307, 0.1307, 0.1307)
MNIST_STD  = (0.3081, 0.3081, 0.3081)

# ---------------------------------------------------------------------------
# 2. Transforms
# ---------------------------------------------------------------------------
# 1) Baseline transform: resize + tensor (no colour/aug/no normalise)
transform_base = transforms.Compose([
transforms.Resize((IMG_SIZE, IMG_SIZE)),      # ๐Ÿ”น Resize โ€“ force all images to 48 ร— 48 so the CNN sees a fixed geometry
transforms.Grayscale(num_output_channels=3),  # ๐Ÿ”น Grayscaleโ†’RGB โ€“ MNIST is 1-channel; duplicate into 3 channels for convnet
transforms.ToTensor(),                        # ๐Ÿ”น ToTensor โ€“ convert PIL image [0โ€’255] โ†’ float tensor [0.0โ€’1.0]
])

# 2) Training transform: augment  + normalise
transform_norm = transforms.Compose([
transforms.Resize((IMG_SIZE, IMG_SIZE)),      # keep 48 ร— 48 input size
transforms.Grayscale(num_output_channels=3),  # still need 3 channels
transforms.RandomRotation(10),                # ๐Ÿ”น RandomRotation(ยฑ10ยฐ) โ€“ small tilt โ‡ข rotation-invariance, combats overfitting
transforms.ColorJitter(brightness=0.2,
contrast=0.2),         # ๐Ÿ”น ColorJitter โ€“ pseudo-RGB brightness/contrast noise; extra variety
transforms.ToTensor(),                        # convert to tensor before numeric ops
transforms.Normalize(mean=MNIST_MEAN,
std=MNIST_STD),          # ๐Ÿ”น Normalize โ€“ zero-centre & scale so every channel โ‰ˆ N(0,1)
])

# 3) Test/validation transform: only resize + normalise (no aug)
transform_test = transforms.Compose([
transforms.Resize((IMG_SIZE, IMG_SIZE)),      # same spatial size as train
transforms.Grayscale(num_output_channels=3),  # match channel count
transforms.ToTensor(),                        # tensor conversion
transforms.Normalize(mean=MNIST_MEAN,
std=MNIST_STD),          # ๐Ÿ”น keep test data on same scale as training data
])

# ---------------------------------------------------------------------------
# 3. Datasets & loaders
# ---------------------------------------------------------------------------
train_set = datasets.MNIST("data",   train=True,  download=True, transform=transform_norm)
test_set  = datasets.MNIST("data",   train=False, download=True, transform=transform_test)

train_loader = DataLoader(train_set, batch_size=BATCH_SIZE, shuffle=True)
test_loader  = DataLoader(test_set,  batch_size=256,          shuffle=False)

print(f"Training on {len(train_set)} samples, validating on {len(test_set)} samples.")

# ---------------------------------------------------------------------------
# 4. Model / loss / optimiser
# ---------------------------------------------------------------------------
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model  = MY_NET(num_classes=NUM_CLASSES).to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=LEARNING_RATE, weight_decay=WEIGHT_DECAY)

# ---------------------------------------------------------------------------
# 5. Training loop
# ---------------------------------------------------------------------------
for epoch in range(1, EPOCHS + 1):
model.train()                          # Set model to training mode enabling dropout and batch norm

running_loss = 0.0                     # sums batch losses to compute epoch average
correct      = 0                       # number of correct predictions
total        = 0                       # number of samples seen

# tqdm wraps the loader to show a live progress-bar per epoch
for X_batch, y_batch in tqdm(train_loader, desc=f"Epoch {epoch}", leave=False):
# 3-a) Move data to GPU (if available) ----------------------------------
X_batch, y_batch = X_batch.to(device), y_batch.to(device)

# 3-b) Forward pass -----------------------------------------------------
logits = model(X_batch)            # raw class scores (shape: [B, NUM_CLASSES])
loss   = criterion(logits, y_batch)

# 3-c) Backward pass & parameter update --------------------------------
optimizer.zero_grad()              # clear old gradients
loss.backward()                    # compute new gradients
optimizer.step()                   # gradient โ†’ weight update

# 3-d) Statistics -------------------------------------------------------
running_loss += loss.item() * X_batch.size(0)     # sum of (batch loss ร— batch size)
preds   = logits.argmax(dim=1)                    # predicted class labels
correct += (preds == y_batch).sum().item()        # correct predictions in this batch
total   += y_batch.size(0)                        # samples processed so far

# 3-e) Epoch-level metrics --------------------------------------------------
epoch_loss = running_loss / total
epoch_acc  = 100.0 * correct / total
print(f"[Epoch {epoch}] loss = {epoch_loss:.4f} | accuracy = {epoch_acc:.2f}%")

print("\nโœ… Training finished.\n")

# ---------------------------------------------------------------------------
# 6. Evaluation on test set
# ---------------------------------------------------------------------------
model.eval() # Set model to evaluation mode (disables dropout and batch norm)
with torch.no_grad():
logits_all, labels_all = [], []
for X, y in test_loader:
logits_all.append(model(X.to(device)).cpu())
labels_all.append(y)
logits_all = torch.cat(logits_all)
labels_all = torch.cat(labels_all)
preds_all  = logits_all.argmax(1)

test_loss = criterion(logits_all, labels_all).item()
test_acc  = (preds_all == labels_all).float().mean().item() * 100

print(f"Test loss: {test_loss:.4f}")
print(f"Test accuracy: {test_acc:.2f}%\n")

print("Classification report (precision / recall / F1):")
print(classification_report(labels_all, preds_all, zero_division=0))

print("Confusion matrix (rows = true, cols = pred):")
print(confusion_matrix(labels_all, preds_all))

์ˆœํ™˜ ์‹ ๊ฒฝ๋ง (RNNs)

์ˆœํ™˜ ์‹ ๊ฒฝ๋ง (RNNs)์€ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋‚˜ ์ž์—ฐ์–ด์™€ ๊ฐ™์€ ์ˆœ์ฐจ์  ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ์„ค๊ณ„๋œ ์‹ ๊ฒฝ๋ง์˜ ํ•œ ์ข…๋ฅ˜์ž…๋‹ˆ๋‹ค. ์ „ํ†ต์ ์ธ ํ”ผ๋“œํฌ์›Œ๋“œ ์‹ ๊ฒฝ๋ง๊ณผ ๋‹ฌ๋ฆฌ, RNNs๋Š” ์ž์‹ ์—๊ฒŒ ๋‹ค์‹œ ์—ฐ๊ฒฐ๋˜๋Š” ์—ฐ๊ฒฐ์„ ๊ฐ€์ง€๊ณ  ์žˆ์–ด, ์‹œํ€€์Šค์˜ ์ด์ „ ์ž…๋ ฅ์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์บก์ฒ˜ํ•˜๋Š” ์ˆจ๊ฒจ์ง„ ์ƒํƒœ๋ฅผ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

RNNs์˜ ์ฃผ์š” ๊ตฌ์„ฑ ์š”์†Œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  • ์ˆœํ™˜ ๋ ˆ์ด์–ด: ์ด ๋ ˆ์ด์–ด๋Š” ์ž…๋ ฅ ์‹œํ€€์Šค๋ฅผ ํ•œ ๋ฒˆ์— ํ•œ ์‹œ๊ฐ„ ๋‹จ๊ณ„์”ฉ ์ฒ˜๋ฆฌํ•˜๋ฉฐ, ํ˜„์žฌ ์ž…๋ ฅ๊ณผ ์ด์ „ ์ˆจ๊ฒจ์ง„ ์ƒํƒœ์— ๋”ฐ๋ผ ์ˆจ๊ฒจ์ง„ ์ƒํƒœ๋ฅผ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด RNNs๋Š” ๋ฐ์ดํ„ฐ์˜ ์‹œ๊ฐ„์  ์˜์กด์„ฑ์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ˆจ๊ฒจ์ง„ ์ƒํƒœ: ์ˆจ๊ฒจ์ง„ ์ƒํƒœ๋Š” ์ด์ „ ์‹œ๊ฐ„ ๋‹จ๊ณ„์˜ ์ •๋ณด๋ฅผ ์š”์•ฝํ•œ ๋ฒกํ„ฐ์ž…๋‹ˆ๋‹ค. ๊ฐ ์‹œ๊ฐ„ ๋‹จ๊ณ„์—์„œ ์—…๋ฐ์ดํŠธ๋˜๋ฉฐ, ํ˜„์žฌ ์ž…๋ ฅ์— ๋Œ€ํ•œ ์˜ˆ์ธก์„ ๋งŒ๋“œ๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
  • ์ถœ๋ ฅ ๋ ˆ์ด์–ด: ์ถœ๋ ฅ ๋ ˆ์ด์–ด๋Š” ์ˆจ๊ฒจ์ง„ ์ƒํƒœ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ตœ์ข… ์˜ˆ์ธก์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๋งŽ์€ ๊ฒฝ์šฐ, RNNs๋Š” ์–ธ์–ด ๋ชจ๋ธ๋ง๊ณผ ๊ฐ™์€ ์ž‘์—…์— ์‚ฌ์šฉ๋˜๋ฉฐ, ์ด ๊ฒฝ์šฐ ์ถœ๋ ฅ์€ ์‹œํ€€์Šค์˜ ๋‹ค์Œ ๋‹จ์–ด์— ๋Œ€ํ•œ ํ™•๋ฅ  ๋ถ„ํฌ์ž…๋‹ˆ๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด, ์–ธ์–ด ๋ชจ๋ธ์—์„œ RNN์€ โ€œThe cat sat on theโ€œ์™€ ๊ฐ™์€ ๋‹จ์–ด ์‹œํ€€์Šค๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ณ , ์ด์ „ ๋‹จ์–ด๋“ค์ด ์ œ๊ณตํ•˜๋Š” ๋งฅ๋ฝ์— ๋”ฐ๋ผ ๋‹ค์Œ ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ โ€œmatโ€œ์ž…๋‹ˆ๋‹ค.

์žฅ๊ธฐ ๋‹จ๊ธฐ ๊ธฐ์–ต (LSTM) ๋ฐ ๊ฒŒ์ดํ‹ฐ๋“œ ์ˆœํ™˜ ์œ ๋‹› (GRU)

RNNs๋Š” ์–ธ์–ด ๋ชจ๋ธ๋ง, ๊ธฐ๊ณ„ ๋ฒˆ์—ญ ๋ฐ ์Œ์„ฑ ์ธ์‹๊ณผ ๊ฐ™์€ ์ˆœ์ฐจ์  ๋ฐ์ดํ„ฐ์™€ ๊ด€๋ จ๋œ ์ž‘์—…์— ํŠนํžˆ ํšจ๊ณผ์ ์ž…๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์†Œ์‹ค ๊ธฐ์šธ๊ธฐ์™€ ๊ฐ™์€ ๋ฌธ์ œ๋กœ ์ธํ•ด ์žฅ๊ธฐ ์˜์กด์„ฑ์„ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐ ์–ด๋ ค์›€์„ ๊ฒช์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์žฅ๊ธฐ ๋‹จ๊ธฐ ๊ธฐ์–ต (LSTM) ๋ฐ ๊ฒŒ์ดํ‹ฐ๋“œ ์ˆœํ™˜ ์œ ๋‹› (GRU)๊ณผ ๊ฐ™์€ ํŠน์ˆ˜ ์•„ํ‚คํ…์ฒ˜๊ฐ€ ๊ฐœ๋ฐœ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์•„ํ‚คํ…์ฒ˜๋Š” ์ •๋ณด๋ฅผ ํ๋ฅด๊ฒŒ ํ•˜๋Š” ๊ฒŒ์ดํŒ… ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ๋„์ž…ํ•˜์—ฌ ์žฅ๊ธฐ ์˜์กด์„ฑ์„ ๋ณด๋‹ค ํšจ๊ณผ์ ์œผ๋กœ ์บก์ฒ˜ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

  • LSTM: LSTM ๋„คํŠธ์›Œํฌ๋Š” ์…€ ์ƒํƒœ์˜ ์ •๋ณด ํ๋ฆ„์„ ์กฐ์ ˆํ•˜๊ธฐ ์œ„ํ•ด ์„ธ ๊ฐœ์˜ ๊ฒŒ์ดํŠธ(์ž…๋ ฅ ๊ฒŒ์ดํŠธ, ๋ง๊ฐ ๊ฒŒ์ดํŠธ ๋ฐ ์ถœ๋ ฅ ๊ฒŒ์ดํŠธ)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ธด ์‹œํ€€์Šค์—์„œ ์ •๋ณด๋ฅผ ๊ธฐ์–ตํ•˜๊ฑฐ๋‚˜ ์žŠ์„ ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. ์ž…๋ ฅ ๊ฒŒ์ดํŠธ๋Š” ์ž…๋ ฅ๊ณผ ์ด์ „ ์ˆจ๊ฒจ์ง„ ์ƒํƒœ์— ๋”ฐ๋ผ ์–ผ๋งˆ๋‚˜ ๋งŽ์€ ์ƒˆ๋กœ์šด ์ •๋ณด๋ฅผ ์ถ”๊ฐ€ํ• ์ง€๋ฅผ ์กฐ์ ˆํ•˜๊ณ , ๋ง๊ฐ ๊ฒŒ์ดํŠธ๋Š” ์–ผ๋งˆ๋‚˜ ๋งŽ์€ ์ •๋ณด๋ฅผ ๋ฒ„๋ฆด์ง€๋ฅผ ์กฐ์ ˆํ•ฉ๋‹ˆ๋‹ค. ์ž…๋ ฅ ๊ฒŒ์ดํŠธ์™€ ๋ง๊ฐ ๊ฒŒ์ดํŠธ๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ ์ƒˆ๋กœ์šด ์ƒํƒœ๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ์ƒˆ๋กœ์šด ์…€ ์ƒํƒœ์™€ ์ž…๋ ฅ ๋ฐ ์ด์ „ ์ˆจ๊ฒจ์ง„ ์ƒํƒœ๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ ์ƒˆ๋กœ์šด ์ˆจ๊ฒจ์ง„ ์ƒํƒœ๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค.
  • GRU: GRU ๋„คํŠธ์›Œํฌ๋Š” ์ž…๋ ฅ ๊ฒŒ์ดํŠธ์™€ ๋ง๊ฐ ๊ฒŒ์ดํŠธ๋ฅผ ๋‹จ์ผ ์—…๋ฐ์ดํŠธ ๊ฒŒ์ดํŠธ๋กœ ๊ฒฐํ•ฉํ•˜์—ฌ LSTM ์•„ํ‚คํ…์ฒ˜๋ฅผ ๋‹จ์ˆœํ™”ํ•˜์—ฌ ๊ณ„์‚ฐ์ ์œผ๋กœ ๋” ํšจ์œจ์ ์ด๋ฉด์„œ๋„ ์—ฌ์ „ํžˆ ์žฅ๊ธฐ ์˜์กด์„ฑ์„ ์บก์ฒ˜ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

LLMs (๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ)

๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ (LLMs)์€ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ์ž‘์—…์„ ์œ„ํ•ด ํŠน๋ณ„ํžˆ ์„ค๊ณ„๋œ ๋”ฅ ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ํ•œ ์ข…๋ฅ˜์ž…๋‹ˆ๋‹ค. ์ด๋“ค์€ ๋ฐฉ๋Œ€ํ•œ ์–‘์˜ ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ๋กœ ํ›ˆ๋ จ๋˜์–ด ์ธ๊ฐ„๊ณผ ์œ ์‚ฌํ•œ ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•˜๊ณ , ์งˆ๋ฌธ์— ๋‹ตํ•˜๊ณ , ์–ธ์–ด๋ฅผ ๋ฒˆ์—ญํ•˜๋ฉฐ, ๋‹ค์–‘ํ•œ ์–ธ์–ด ๊ด€๋ จ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. LLMs๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ๋ณ€ํ™˜๊ธฐ ์•„ํ‚คํ…์ฒ˜๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋ฉฐ, ์ด๋Š” ์‹œํ€€์Šค ๋‚ด ๋‹จ์–ด ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ์บก์ฒ˜ํ•˜๊ธฐ ์œ„ํ•ด ์ž๊ธฐ ์ฃผ์˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ์‚ฌ์šฉํ•˜์—ฌ ๋งฅ๋ฝ์„ ์ดํ•ดํ•˜๊ณ  ์ผ๊ด€๋œ ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

๋ณ€ํ™˜๊ธฐ ์•„ํ‚คํ…์ฒ˜

๋ณ€ํ™˜๊ธฐ ์•„ํ‚คํ…์ฒ˜๋Š” ๋งŽ์€ LLMs์˜ ๊ธฐ์ดˆ์ž…๋‹ˆ๋‹ค. ์ด๋Š” ์ธ์ฝ”๋”-๋””์ฝ”๋” ๊ตฌ์กฐ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์œผ๋ฉฐ, ์ธ์ฝ”๋”๋Š” ์ž…๋ ฅ ์‹œํ€€์Šค๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ณ  ๋””์ฝ”๋”๋Š” ์ถœ๋ ฅ ์‹œํ€€์Šค๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๋ณ€ํ™˜๊ธฐ ์•„ํ‚คํ…์ฒ˜์˜ ์ฃผ์š” ๊ตฌ์„ฑ ์š”์†Œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  • ์ž๊ธฐ ์ฃผ์˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜: ์ด ๋ฉ”์ปค๋‹ˆ์ฆ˜์€ ๋ชจ๋ธ์ด ํ‘œํ˜„์„ ์ƒ์„ฑํ•  ๋•Œ ์‹œํ€€์Šค ๋‚ด์˜ ๋‹ค์–‘ํ•œ ๋‹จ์–ด์˜ ์ค‘์š”์„ฑ์„ ๊ฐ€์ค‘์น˜๋กœ ๋ถ€์—ฌํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋‹จ์–ด ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ฃผ์˜ ์ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ๋ชจ๋ธ์ด ๊ด€๋ จ ๋งฅ๋ฝ์— ์ง‘์ค‘ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
  • ๋‹ค์ค‘ ํ—ค๋“œ ์ฃผ์˜: ์ด ๊ตฌ์„ฑ ์š”์†Œ๋Š” ๋ชจ๋ธ์ด ์—ฌ๋Ÿฌ ์ฃผ์˜ ํ—ค๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹จ์–ด ๊ฐ„์˜ ์—ฌ๋Ÿฌ ๊ด€๊ณ„๋ฅผ ์บก์ฒ˜ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•˜๋ฉฐ, ๊ฐ ํ—ค๋“œ๋Š” ์ž…๋ ฅ์˜ ๋‹ค์–‘ํ•œ ์ธก๋ฉด์— ์ง‘์ค‘ํ•ฉ๋‹ˆ๋‹ค.
  • ์œ„์น˜ ์ธ์ฝ”๋”ฉ: ๋ณ€ํ™˜๊ธฐ๋Š” ๋‹จ์–ด ์ˆœ์„œ์— ๋Œ€ํ•œ ๋‚ด์žฅ ๊ฐœ๋…์ด ์—†๊ธฐ ๋•Œ๋ฌธ์—, ์œ„์น˜ ์ธ์ฝ”๋”ฉ์ด ์ž…๋ ฅ ์ž„๋ฒ ๋”ฉ์— ์ถ”๊ฐ€๋˜์–ด ์‹œํ€€์Šค ๋‚ด ๋‹จ์–ด์˜ ์œ„์น˜์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

ํ™•์‚ฐ ๋ชจ๋ธ

ํ™•์‚ฐ ๋ชจ๋ธ์€ ํ™•์‚ฐ ๊ณผ์ •์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ํ•™์Šตํ•˜๋Š” ์ƒ์„ฑ ๋ชจ๋ธ์˜ ํ•œ ์ข…๋ฅ˜์ž…๋‹ˆ๋‹ค. ์ด๋“ค์€ ์ด๋ฏธ์ง€ ์ƒ์„ฑ๊ณผ ๊ฐ™์€ ์ž‘์—…์— ํŠนํžˆ ํšจ๊ณผ์ ์ด๋ฉฐ ์ตœ๊ทผ ๋ช‡ ๋…„ ๋™์•ˆ ์ธ๊ธฐ๋ฅผ ์–ป๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ํ™•์‚ฐ ๋ชจ๋ธ์€ ๊ฐ„๋‹จํ•œ ๋…ธ์ด์ฆˆ ๋ถ„ํฌ๋ฅผ ๋ณต์žกํ•œ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ๋กœ ์ ์ง„์ ์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. ํ™•์‚ฐ ๋ชจ๋ธ์˜ ์ฃผ์š” ๊ตฌ์„ฑ ์š”์†Œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  • ์ •๋ฐฉํ–ฅ ํ™•์‚ฐ ๊ณผ์ •: ์ด ๊ณผ์ •์€ ๋ฐ์ดํ„ฐ๋ฅผ ์ ์ง„์ ์œผ๋กœ ๋…ธ์ด์ฆˆ๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ๊ฐ„๋‹จํ•œ ๋…ธ์ด์ฆˆ ๋ถ„ํฌ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ์ •๋ฐฉํ–ฅ ํ™•์‚ฐ ๊ณผ์ •์€ ์ผ๋ฐ˜์ ์œผ๋กœ ๊ฐ ์ˆ˜์ค€์ด ๋ฐ์ดํ„ฐ์— ์ถ”๊ฐ€๋œ ํŠน์ • ์–‘์˜ ๋…ธ์ด์ฆˆ์— ํ•ด๋‹นํ•˜๋Š” ์ผ๋ จ์˜ ๋…ธ์ด์ฆˆ ์ˆ˜์ค€์œผ๋กœ ์ •์˜๋ฉ๋‹ˆ๋‹ค.
  • ์—ญ๋ฐฉํ–ฅ ํ™•์‚ฐ ๊ณผ์ •: ์ด ๊ณผ์ •์€ ์ •๋ฐฉํ–ฅ ํ™•์‚ฐ ๊ณผ์ •์„ ์—ญ์ „์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์„ ํ•™์Šตํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ์ ์ง„์ ์œผ๋กœ ๋””๋…ธ์ด์ฆˆํ•˜์—ฌ ๋ชฉํ‘œ ๋ถ„ํฌ์—์„œ ์ƒ˜ํ”Œ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์—ญ๋ฐฉํ–ฅ ํ™•์‚ฐ ๊ณผ์ •์€ ๋ชจ๋ธ์ด ๋…ธ์ด์ฆˆ ์ƒ˜ํ”Œ์—์„œ ์›๋ž˜ ๋ฐ์ดํ„ฐ๋ฅผ ์žฌ๊ตฌ์„ฑํ•˜๋„๋ก ์œ ๋„ํ•˜๋Š” ์†์‹ค ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ›ˆ๋ จ๋ฉ๋‹ˆ๋‹ค.

๋˜ํ•œ, ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ์—์„œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ํ™•์‚ฐ ๋ชจ๋ธ์€ ์ผ๋ฐ˜์ ์œผ๋กœ ๋‹ค์Œ ๋‹จ๊ณ„๋ฅผ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค:

  1. ํ…์ŠคํŠธ ์ธ์ฝ”๋”ฉ: ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ๋Š” ํ…์ŠคํŠธ ์ธ์ฝ”๋”(์˜ˆ: ๋ณ€ํ™˜๊ธฐ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ž ์žฌ ํ‘œํ˜„์œผ๋กœ ์ธ์ฝ”๋”ฉ๋ฉ๋‹ˆ๋‹ค. ์ด ํ‘œํ˜„์€ ํ…์ŠคํŠธ์˜ ์˜๋ฏธ๋ฅผ ์บก์ฒ˜ํ•ฉ๋‹ˆ๋‹ค.
  2. ๋…ธ์ด์ฆˆ ์ƒ˜ํ”Œ๋ง: ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ์—์„œ ๋ฌด์ž‘์œ„ ๋…ธ์ด์ฆˆ ๋ฒกํ„ฐ๊ฐ€ ์ƒ˜ํ”Œ๋ง๋ฉ๋‹ˆ๋‹ค.
  3. ํ™•์‚ฐ ๋‹จ๊ณ„: ๋ชจ๋ธ์€ ์ผ๋ จ์˜ ํ™•์‚ฐ ๋‹จ๊ณ„๋ฅผ ์ ์šฉํ•˜์—ฌ ๋…ธ์ด์ฆˆ ๋ฒกํ„ฐ๋ฅผ ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ์— ํ•ด๋‹นํ•˜๋Š” ์ด๋ฏธ์ง€๋กœ ์ ์ง„์ ์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ๋‹จ๊ณ„๋Š” ์ด๋ฏธ์ง€๋ฅผ ๋””๋…ธ์ด์ฆˆํ•˜๊ธฐ ์œ„ํ•ด ํ•™์Šต๋œ ๋ณ€ํ™˜์„ ์ ์šฉํ•˜๋Š” ๊ฒƒ์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.

Tip

AWS ํ•ดํ‚น ๋ฐฐ์šฐ๊ธฐ ๋ฐ ์—ฐ์Šตํ•˜๊ธฐ:HackTricks Training AWS Red Team Expert (ARTE)
GCP ํ•ดํ‚น ๋ฐฐ์šฐ๊ธฐ ๋ฐ ์—ฐ์Šตํ•˜๊ธฐ: HackTricks Training GCP Red Team Expert (GRTE) Azure ํ•ดํ‚น ๋ฐฐ์šฐ๊ธฐ ๋ฐ ์—ฐ์Šตํ•˜๊ธฐ: HackTricks Training Azure Red Team Expert (AzRTE)

HackTricks ์ง€์›ํ•˜๊ธฐ