Malware Analysis

Reading time: 14 minutes

tip

Learn & practice AWS Hacking:HackTricks Training AWS Red Team Expert (ARTE)
Learn & practice GCP Hacking: HackTricks Training GCP Red Team Expert (GRTE)
Learn & practice Az Hacking: HackTricks Training Azure Red Team Expert (AzRTE)

Support HackTricks

Forensics CheatSheets

https://www.jaiminton.com/cheatsheet/DFIR/#

Online Services

Offline Antivirus and Detection Tools

Yara

Install

bash
sudo apt-get install -y yara

Prepare rules

Use this script to download and merge all the yara malware rules from github: https://gist.github.com/andreafortuna/29c6ea48adf3d45a979a78763cdc7ce9
Create the rules directory and execute it. This will create a file called malware_rules.yar which contains all the yara rules for malware.

bash
wget https://gist.githubusercontent.com/andreafortuna/29c6ea48adf3d45a979a78763cdc7ce9/raw/4ec711d37f1b428b63bed1f786b26a0654aa2f31/malware_yara_rules.py
mkdir rules
python malware_yara_rules.py

Scan

bash
yara -w malware_rules.yar image  #Scan 1 file
yara -w malware_rules.yar folder #Scan the whole folder

YaraGen: Check for malware and Create rules

You can use the tool YaraGen to generate yara rules from a binary. Check out these tutorials: Part 1, Part 2, Part 3

bash
 python3 yarGen.py --update
 python3.exe yarGen.py --excludegood -m  ../../mals/

ClamAV

Install

sudo apt-get install -y clamav

Scan

bash
sudo freshclam      #Update rules
clamscan filepath   #Scan 1 file
clamscan folderpath #Scan the whole folder

Capa

Capa detects potentially malicious capabilities in executables: PE, ELF, .NET. So it will find things such as Att&ck tactics, or suspicious capabilities such as:

  • check for OutputDebugString error
  • run as a service
  • create process

Get it int he Github repo.

IOCs

IOC means Indicator Of Compromise. An IOC is a set of conditions that identify some potentially unwanted software or confirmed malware. Blue Teams use this kind of definition to search for this kind of malicious files in their systems and networks.
To share these definitions is very useful as when malware is identified in a computer and an IOC for that malware is created, other Blue Teams can use it to identify the malware faster.

A tool to create or modify IOCs is IOC Editor.
You can use tools such as Redline to search for defined IOCs in a device.

Loki

Loki is a scanner for Simple Indicators of Compromise.
Detection is based on four detection methods:

1. File Name IOC
   Regex match on full file path/name

2. Yara Rule Check
   Yara signature matches on file data and process memory

3. Hash Check
   Compares known malicious hashes (MD5, SHA1, SHA256) with scanned files

4. C2 Back Connect Check
   Compares process connection endpoints with C2 IOCs (new since version v.10)

Linux Malware Detect

Linux Malware Detect (LMD) is a malware scanner for Linux released under the GNU GPLv2 license, that is designed around the threats faced in shared hosted environments. It uses threat data from network edge intrusion detection systems to extract malware that is actively being used in attacks and generates signatures for detection. In addition, threat data is also derived from user submissions with the LMD checkout feature and malware community resources.

rkhunter

Tools like rkhunter can be used to check the filesystem for possible rootkits and malware.

bash
sudo ./rkhunter --check -r / -l /tmp/rkhunter.log [--report-warnings-only] [--skip-keypress]

FLOSS

FLOSS is a tool that will try to find obfuscated strings inside executables using different techniques.

PEpper

PEpper checks some basic stuff inside the executable (binary data, entropy, URLs and IPs, some yara rules).

PEstudio

PEstudio is a tool that allows to get information of Windows executables such as imports, exports, headers, but also will check virus total and find potential Att&ck techniques.

Detect It Easy(DiE)

DiE is a tool to detect if a file is encrypted and also find packers.

NeoPI

NeoPI is a Python script that uses a variety of statistical methods to detect obfuscated and encrypted content within text/script files. The intended purpose of NeoPI is to aid in the detection of hidden web shell code.

php-malware-finder

PHP-malware-finder does its very best to detect obfuscated/dodgy code as well as files using PHP functions often used in malwares/webshells.

Apple Binary Signatures

When checking some malware sample you should always check the signature of the binary as the developer that signed it may be already related with malware.

bash
#Get signer
codesign -vv -d /bin/ls 2>&1 | grep -E "Authority|TeamIdentifier"

#Check if the app’s contents have been modified
codesign --verify --verbose /Applications/Safari.app

#Check if the signature is valid
spctl --assess --verbose /Applications/Safari.app

Detection Techniques

File Stacking

If you know that some folder containing the files of a web server was last updated on some date. Check the date all the files in the web server were created and modified and if any date is suspicious, check that file.

Baselines

If the files of a folder shouldn't have been modified, you can calculate the hash of the original files of the folder and compare them with the current ones. Anything modified will be suspicious.

Statistical Analysis

When the information is saved in logs you can check statistics like how many times each file of a web server was accessed as a web shell might be one of the most.


Android in-app native telemetry (no root)

On Android, you can instrument native code inside the target app process by preloading a tiny logger library before other JNI libs initialize. This gives early visibility into native behavior without system-wide hooks or root. A popular approach is SoTap: drop libsotap.so for the right ABI into the APK and inject a System.loadLibrary("sotap") call early (e.g., static initializer or Application.onCreate), then collect logs from internal/external paths or Logcat fallback.

See the Android native reversing page for setup details and log paths:

Reversing Native Libraries


Android/JNI native string deobfuscation with angr + Ghidra

Some Android malware and RASP-protected apps hide JNI method names and signatures by decoding them at runtime before calling RegisterNatives. When Frida/ptrace instrumentation is killed by anti-debug, you can still recover the plaintext offline by executing the in-binary decoder with angr and then pushing results back into Ghidra as comments.

Key idea: treat the decoder inside the .so as a callable function, execute it on the obfuscated byte blobs in .rodata, and concretize the output bytes up to the first \x00 (C-string terminator). Keep angr and Ghidra using the same image base to avoid address mismatches.

Workflow overview

  • Triage in Ghidra: identify the decoder and its calling convention/arguments in JNI_OnLoad and RegisterNatives setup.
  • Run angr (CPython3) to execute the decoder for each target string and dump results.
  • Annotate in Ghidra: auto-comment decoded strings at each call site for fast JNI reconstruction.

Ghidra triage (JNI_OnLoad pattern)

  • Apply JNI datatypes to JNI_OnLoad so Ghidra recognises JNINativeMethod structures.

  • Typical JNINativeMethod per Oracle docs:

    typedef struct {
        char *name;      // e.g., "nativeFoo"
        char *signature; // e.g., "()V", "()[B"
        void *fnPtr;     // native implementation address
    } JNINativeMethod;
    
  • Look for calls to RegisterNatives. If the library constructs the name/signature with a local routine (e.g., FUN_00100e10) that references a static byte table (e.g., DAT_00100bf4) and takes parameters like (encoded_ptr, out_buf, length), that is an ideal target for offline execution.

angr setup (execute the decoder offline)

  • Load the .so with the same base used in Ghidra (example: 0x00100000) and disable auto-loading of external libs to keep the state small.

    import angr, json
    
    project = angr.Project(
        '/path/to/libtarget.so',
        load_options={'main_opts': {'base_addr': 0x00100000}},
        auto_load_libs=False,
    )
    
    ENCODING_FUNC_ADDR = 0x00100e10  # decoder function discovered in Ghidra
    
    def decode_string(enc_addr, length):
        # fresh blank state per evaluation
        st = project.factory.blank_state()
        outbuf = st.heap.allocate(length)
        call = project.factory.callable(ENCODING_FUNC_ADDR, base_state=st)
        ret_ptr = call(enc_addr, outbuf, length)  # returns outbuf pointer
        rs = call.result_state
        raw = rs.solver.eval(rs.memory.load(ret_ptr, length), cast_to=bytes)
        return raw.split(b'\x00', 1)[0].decode('utf-8', errors='ignore')
    
    # Example: decode a JNI signature at 0x100933 of length 5 → should be ()[B
    print(decode_string(0x00100933, 5))
    
  • At scale, build a static map of call sites to the decoder’s arguments (encoded_ptr, size). Wrappers may hide arguments, so you may create this mapping manually from Ghidra xrefs if API recovery is noisy.

    # call_site -> (encoded_addr, size)
    call_site_args_map = {
        0x00100f8c: (0x00100b81, 0x41),
        0x00100fa8: (0x00100bca, 0x04),
        0x00100fcc: (0x001007a0, 0x41),
        0x00100fe8: (0x00100933, 0x05),
        0x0010100c: (0x00100c62, 0x41),
        0x00101028: (0x00100c15, 0x16),
        0x00101050: (0x00100a49, 0x101),
        0x00100cf4: (0x00100821, 0x11),
        0x00101170: (0x00100940, 0x101),
        0x001011cc: (0x0010084e, 0x13),
        0x00101334: (0x001007e9, 0x0f),
        0x00101478: (0x0010087d, 0x15),
        0x001014f8: (0x00100800, 0x19),
        0x001015e8: (0x001008e6, 0x27),
        0x0010160c: (0x00100c33, 0x13),
    }
    
    decoded_map = { hex(cs): decode_string(enc, sz)
                    for cs, (enc, sz) in call_site_args_map.items() }
    
    print(json.dumps(decoded_map, indent=2))
    with open('decoded_strings.json', 'w') as f:
        json.dump(decoded_map, f, indent=2)
    

Annotate call sites in Ghidra Option A: Jython-only comment writer (use a pre-computed JSON)

  • Since angr requires CPython3, keep deobfuscation and annotation separated. First run the angr script above to produce decoded_strings.json. Then run this Jython GhidraScript to write PRE_COMMENTs at each call site (and include the caller function name for context):

    #@category Android/Deobfuscation
    # Jython in Ghidra 10/11
    import json
    from ghidra.program.model.listing import CodeUnit
    
    # Ask for the JSON produced by the angr script
    f = askFile('Select decoded_strings.json', 'Load')
    mapping = json.load(open(f.absolutePath, 'r'))  # keys as hex strings
    
    fm = currentProgram.getFunctionManager()
    rm = currentProgram.getReferenceManager()
    
    # Replace with your decoder address to locate call-xrefs (optional)
    ENCODING_FUNC_ADDR = 0x00100e10
    enc_addr = toAddr(ENCODING_FUNC_ADDR)
    
    callsite_to_fn = {}
    for ref in rm.getReferencesTo(enc_addr):
        if ref.getReferenceType().isCall():
            from_addr = ref.getFromAddress()
            fn = fm.getFunctionContaining(from_addr)
            if fn:
                callsite_to_fn[from_addr.getOffset()] = fn.getName()
    
    # Write comments from JSON
    for k_hex, s in mapping.items():
        cs = int(k_hex, 16)
        site = toAddr(cs)
        caller = callsite_to_fn.get(cs, None)
        text = s if caller is None else '%s @ %s' % (s, caller)
        currentProgram.getListing().setComment(site, CodeUnit.PRE_COMMENT, text)
    print('[+] Annotated %d call sites' % len(mapping))
    

Option B: Single CPython script via pyhidra/ghidra_bridge

  • Alternatively, use pyhidra or ghidra_bridge to drive Ghidra’s API from the same CPython process running angr. This allows calling decode_string() and immediately setting PRE_COMMENTs without an intermediate file. The logic mirrors the Jython script: build callsite→function map via ReferenceManager, decode with angr, and set comments.

Why this works and when to use it

  • Offline execution sidesteps RASP/anti-debug: no ptrace, no Frida hooks required to recover strings.
  • Keeping Ghidra and angr base_addr aligned (e.g., 0x00100000) ensures that function/data addresses match across tools.
  • Repeatable recipe for decoders: treat the transform as a pure function, allocate an output buffer in a fresh state, call it with (encoded_ptr, out_ptr, len), then concretize via state.solver.eval and parse C-strings up to \x00.

Notes and pitfalls

  • Respect the target ABI/calling convention. angr.factory.callable picks one based on arch; if arguments look shifted, specify cc explicitly.
  • If the decoder expects zeroed output buffers, initialize outbuf with zeros in the state before the call.
  • For position-independent Android .so, always supply base_addr so addresses in angr match those seen in Ghidra.
  • Use currentProgram.getReferenceManager() to enumerate call-xrefs even if the app wraps the decoder behind thin stubs.

For angr basics, see: angr basics


Deobfuscating Dynamic Control-Flow (JMP/CALL RAX Dispatchers)

Modern malware families heavily abuse Control-Flow Graph (CFG) obfuscation: instead of a direct jump/call they compute the destination at run-time and execute a jmp rax or call rax. A small dispatcher (typically nine instructions) sets the final target depending on the CPU ZF/CF flags, completely breaking static CFG recovery.

The technique – showcased by the SLOW#TEMPEST loader – can be defeated with a three-step workflow that only relies on IDAPython and the Unicorn CPU emulator.

1. Locate every indirect jump / call

python
import idautils, idc

for ea in idautils.FunctionItems(idc.here()):
    mnem = idc.print_insn_mnem(ea)
    if mnem in ("jmp", "call") and idc.print_operand(ea, 0) == "rax":
        print(f"[+] Dispatcher found @ {ea:X}")

2. Extract the dispatcher byte-code

python
import idc

def get_dispatcher_start(jmp_ea, count=9):
    s = jmp_ea
    for _ in range(count):
        s = idc.prev_head(s, 0)
    return s

start = get_dispatcher_start(jmp_ea)
size  = jmp_ea + idc.get_item_size(jmp_ea) - start
code  = idc.get_bytes(start, size)
open(f"{start:X}.bin", "wb").write(code)

3. Emulate it twice with Unicorn

python
from unicorn import *
from unicorn.x86_const import *
import struct

def run(code, zf=0, cf=0):
    BASE = 0x1000
    mu = Uc(UC_ARCH_X86, UC_MODE_64)
    mu.mem_map(BASE, 0x1000)
    mu.mem_write(BASE, code)
    mu.reg_write(UC_X86_REG_RFLAGS, (zf << 6) | cf)
    mu.reg_write(UC_X86_REG_RAX, 0)
    mu.emu_start(BASE, BASE+len(code))
    return mu.reg_read(UC_X86_REG_RAX)

Run run(code,0,0) and run(code,1,1) to obtain the false and true branch targets.

4. Patch back a direct jump / call

python
import struct, ida_bytes

def patch_direct(ea, target, is_call=False):
    op   = 0xE8 if is_call else 0xE9           # CALL rel32 or JMP rel32
    disp = target - (ea + 5) & 0xFFFFFFFF
    ida_bytes.patch_bytes(ea, bytes([op]) + struct.pack('<I', disp))

After patching, force IDA to re-analyse the function so the full CFG and Hex-Rays output are restored:

python
import ida_auto, idaapi
idaapi.reanalyze_function(idc.get_func_attr(ea, idc.FUNCATTR_START))

5. Label indirect API calls

Once the real destination of every call rax is known you can tell IDA what it is so parameter types & variable names are recovered automatically:

python
idc.set_callee_name(call_ea, resolved_addr, 0)  # IDA 8.3+

Practical benefits

  • Restores the real CFG → decompilation goes from 10 lines to thousands.
  • Enables string-cross-reference & xrefs, making behaviour reconstruction trivial.
  • Scripts are reusable: drop them into any loader protected by the same trick.

AdaptixC2: Configuration Extraction and TTPs

See the dedicated page:

Adaptixc2 Config Extraction And Ttps

References

tip

Learn & practice AWS Hacking:HackTricks Training AWS Red Team Expert (ARTE)
Learn & practice GCP Hacking: HackTricks Training GCP Red Team Expert (GRTE)
Learn & practice Az Hacking: HackTricks Training Azure Red Team Expert (AzRTE)

Support HackTricks