Advanced Static Analysis and Reverse Engineering

Unit 8 - Advanced Static Analysis and Reverse Engineering

Last Update Unknown

Static Analysis

Basic Static Analysis

  • Trying to understand a program (or in our case, a piece of malware), without actually running it
  • Looking at the code / structure of program

Examples of this would be:

  • Analysing the sample with AVs
  • Hash signatures
  • Strings
  • Function names

Analysing the sample with AVs

  • May confirm maliciousness
  • Signature-based vs. Heuristic
  • Lots of techniques to counter detection

Hash signatures

  • Known malicious samples
  • Sharing
  • Label samples
  • Reduce size of datasets for experiments

Strings

  • IP addresses/URLs
  • Function names
  • DLLs
  • Error messages
  • Packer signatures

Function names

  • In Windows, each word starts with an uppercase letter, e.g. SetLayout
  • Imports
    • Import functions from another program
  • Exports
    • Conversely, export functions to another program
  • Link Libraries
    • Static linking: Copy whole code over and add to program
    • Runtime linking: Program interacts with libraries only when a function is needed
    • Dynamic linking: Program import referenced libraries as soon as program starts

Obfuscation

But what happens if our malware was packed/obfuscated?

  • AVs may not tag it as malicious
  • Hash signatures will differ
  • Not all strings will be shown
  • Not all function names will be displayed, but some functions can hint that a sample is packed, e.g. GetProcAddress, VirtualAlloc

Portable Executable (PE) File Format

Basic structure of a Windows executable file

  • PE header (metadata)
  • Sections:
    • .text
    • .idata
    • .data
    • .rdata
    • .edata
    • .pdata
    • .rsrc
    • .reloc

Why do you need to know about it?

  • Understand the inner of what you are working with
  • Spot uncommon sections which could raise suspiciousness
  • Possibly identify packers/obfuscation methods
  • Manual unpacking
  • Modifying an executable
  • Research

What do you need to remember?

  • What the PE file format is and its structure
  • What the PE header contains
  • What some of the typical sections are
  • What a malware analyst could find by looking at the PE header and sections of an executable
  • Examples of related tool

Tools

  • www.virustotal.com
  • 'strings'
  • md5sum/sha1sum/sha1deep/WinMD5/etc.
  • PEiD: detecting packed files
  • PEView: examining PE Files
  • PE Browse: browsing a PE header
  • PE Explorer: browsing a PE header
  • ImpREC: rebuilding the Import Table
  • LordPE: dumping an executable from memory
  • Dependencies: exploring DLLs and functions imported by a piece of malware.

Reverse Engineering

Reversing

Reverse engineering is a process where an engineered artefact (such as a car, a jet engine, or a software program) is deconstructed in a way that reveals its innermost details, such as its design and architecture.

  • i.e. dissecting a product to understand how it works
  • An advanced static analysis technique
  • Can vary from easy... to very challenging
  • Requires a lot of patience

Why learn Reverse Engineering?

  • Strengthen understanding (applies to literally everything)
  • Penetration-Testing
  • Problem solving skills
  • Solving legacy system issues
  • Benefits of looking at code written by others (think open source)

Why apply it to Malware Analysis?

  • Basic static analysis not always sufficient
  • Complex problems require complex solutions

Key Concepts

Machine code

  • Result of compiling a high-level language
  • Raw binary data

Low-level languages

  • Machine code is too much hassle
  • Hence, we use assembly
  • Uses "mnemonics" (opcodes)
  • Different instructions depending on platform
  • Different syntax... e.g. Intel vs. AT&T
  • Different processors require different approaches

High-level languages

  • E.g. C, C++
  • Easy to use and understand

Interpreted languages

  • E.g. C#, Java
  • Code → intermediate language (bytecode) → machine code

Processor and registers

Register are data storage units closest to the processor

  • Quick access
  • Different registers serve different purposes
Description 32-bit register 16-bit register 8-bit register
Extended Accumulator Register EAX AX AH/AL
Extended Base Register EBX BX BH/BL
Extended Counter Register ECX CX CH/CL
Extended Data Register EDX DX DH/DL
Extended Base Pointer EBP BP
Extended Stack Pointer ESP SP
Extended Source Index ESI SI
Extended Destination Index EDI DI
Description Register(s)
Extended Instruction Pointer EIP
Represent the outcome of computations and control the operation of the CPU EFLAGS
Segment registers (used to describe different segments of memory) CS, DS, ES, FS, GS, SS

Fetch and execute cycle

Main memory (RAM)

  • Data: contains values
  • Code: instructions to be fetched by the CPU to execute
  • Heap: allocate/free values to/from RAM
  • Stack: local variable and function parameters, control program flow

What could happen if an attacker corrupted the instruction pointer?

  • Modify it so they could then execute their own code in memory

Assembly Primers

Instructions

  • mnemonics, e.g. ADD, MOV, SUB, XOR, INC
  • May or may not have (an) operand(s), e.g. RET, POP

Operands

  • Registers
  • Values
  • Memory addresses
  • x86 → little-endian

Syntax

  • A source and a destination
  • Intel (<mnemonic> <dst>,<src>)
  • AT&T (<mnemonic> <src>,<dst>)

Simple instructions

Instruction Description
mov eax, ebx Copies the contents of EBX into the EAX register
mov eax, 0x42 Copies the value 0x42 into the EAX register
mov eax, [0x4037C4] Copies the 4 bytes at the memory location 0x4037C4 into the EAX register
mov eax, [ebx] Copies the 4 bytes at the memory location specified by the EBX register into the EAX register
mov eax, [ebx+esi*4] Copies the 4 bytes at the memory location specified by the result of the equation ebx+esi*4 into the EAX register

Arithmetic

Instruction Description
sub eax, 0x10 Subtracts 0x10 from EAX
add eax, ebx Adds EBX to EAX and stores the result in EAX
inc edx Increments EDX by 1
dec edx Decrements EDX by 1
mul 0x50 Multiplies EAX by 0x50 and stores the result in EDX:EAX
div 0x75 Divides EDX:EAX by 0x75 and stores the result in EAX and the remainder in EDX
Instruction Description
xor eax, eax Clears the EAX register
or eax, 0x7575 Performs the logical or operation on EAX with 0x7575
mov eax, OxA
shl eax, 2
Shifts the EAX register to the left 2 bits; these two instructions result in EAX = 0x28, because 1010 (0xA in binary) shifted 2 bits left is 101000 (0x28)
mov bl, OxA
ror bl, 2
Rotates the BL register to the right 2 bits; these two instructions result in BL = 10000010, because 1010 rotated 2 bits right is 10000010

Conditionals

cmp dst, src - The CMP instruction is identical to the sub instruction; however, the CMP instruction is used only to set the zero flag and carry flag (CF) but does not affect the operands.

cmp dst, src ZF CF
dst = src 1 0
dst < src 0 1
dst > src 0 0

test dst, src - The TEST operation returns 1, if the matching bits from both the operands are 1, otherwise it returns 0

Branching

Instruction Description
jz loc Jump to specified location if ZF = 1
jnz loc Jump to specified location if ZF = 0
je loc Same as jz, but commonly used after a cmp instruction. Jump will occur if the destination operand is not equal to the source operand
jne loc Same as jnz, but commonly used after a cmp instruction. Jump will occur if the destination operand is not equal to the source operand
jg loc Performs signed comparison jump after a cmp if the destination operand is greater than the source operand
jge loc Performs signed comparison jump after a cmp if the destination operand is greater than or equal to the source operand
ja loc Same as jg, but an unsigned comparison is performed
jae loc Same as jge, but an unsigned comparison is performed
jl loc Performs signed comparison jump after a cmp if the destination operand is less than the source operand
jle loc Performs signed comparison jump after a cmp if the destination operand is less than or equal to the source operand
jb loc Same as jl, but an unsigned comparison is performed
jbe loc Same as jle, but an unsigned comparison is performed
jo loc Jump if the previous instruction set the overflow flag (OF = 1)
js loc Jump if the sign flag is set (SF = 1)
jecxz loc Jump to location if ECX = 0

Recognising IF Statements

Recognising FOR loop

Recognising WHILE loop

Recognising SWITCH statements

"Hello World" Example


Current Research

  • Eilam, E. (2011). Reversing: Secrets of Reverse Engineering. John Wiley & Sons.
  • Sikorski, M., & Honig, A. (2012). Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software. No Starch Press.
  • http://www.peter-cockerell.net/aalp/html/frames.html
  • Penetration-Testing module: Lecture 5 Exploitation/Vulnerability Validation