How to Quickly Solve Disassembly Problems

For DCU’s Secure Programming course, the disassembly problems have a certain pattern. Using a fixed approach to solving them can help achieve quick results.

Prerequisite Skills

  1. Familiarity with assembly commands Assembly Instructions
  2. Understanding % and $ for registers and immediate values $ and % Registers and Immediate Values
  3. Knowledge of direct and indirect addressing Direct and Indirect Addressing
  4. Familiarity with one example C Code to Assembly Example

Approach to Solving

  1. Identify the number of parameters
  2. Identify the number of local variables
  3. Recognize the loop body
  4. Analyze remaining code snippets
  5. Identify the return value

Identify the Number of Parameters

The position of ebp is the saved frame pointer, and ebp+4 is the return address. Since the problems typically assume all parameters are of type int or int*, ebp+8, ebp+c, and ebp+10 correspond to the first, second, and third parameters, respectively.

By quickly scanning the code for occurrences of 0x__(%ebp) and identifying the largest offset, the number of parameters can be determined as (offset - 4) // 4.

For example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
push %ebp                   <foo+0>
mov %esp, %ebp <foo+1>
sub $0x4, %esp <foo+3>
mov 0x8(%ebp), %eax <foo+6>
mov %eax, -0x4(%ebp) <foo+9>
mov -0x4(%ebp), %eax <foo+12>
cmp 0x10(%ebp), %eax <foo+15>
jge <foo+32> <foo+18>
mov 0xc(%ebp), %eax <foo+20>
incl (%eax) <foo+23>
lea -0x4(%ebp), %eax <foo+25>
incl (%eax) <foo+28>
jmp <foo+12> <foo+30>
mov $0x0, %eax <foo+32>
leave <foo+37>
ret <foo+38>

Here, 0x10(%ebp) exists, so the parameter count is (16 - 4) / 4 = 3.

We can construct the framework of the code as:

1
2
3
int foo(int a, int b, int c) {

}

a, b, and c correspond to ebp+8, ebp+c, and ebp+10, respectively. Note that parameters are pushed onto the stack in reverse order, so the closer to ebp, the earlier the parameter appears in the list.

For now, assume all are int types. Adjust later if inconsistencies are found.

Identify the Number of Local Variables

The number of local variables is determined by the third line of the code: sub $0x4, %esp. The amount subtracted corresponds to the length of the allocated local variables.

In this example, sub $0x4, %esp indicates 4 bytes, so there is one local variable. Assume it is an int and name it i.

The code expands to:

1
2
3
int foo(int a, int b, int c) {
int i;
}

Recognize the Loop Body

Loops are typically while or for loops. To identify:

  1. Judgment Entry:

    • Look for a comparison instruction (e.g., cmp) followed by a jump instruction (e.g., jge or jle).
    • These indicate the start of a condition check.
  2. Loop Body:

    • Unconditional jmp instructions signify loops. The jump target is the beginning of the condition check.
  3. Condition:

    • The judgment condition combines the comparison and preceding instructions into a complete condition.

Example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
push %ebp                   <foo+0>
mov %esp, %ebp <foo+1>
sub $0x4, %esp <foo+3>
mov 0x8(%ebp), %eax <foo+6>
mov %eax, -0x4(%ebp) <foo+9>
mov -0x4(%ebp), %eax <foo+12>
cmp 0x10(%ebp), %eax <foo+15>
jge <foo+32> <foo+18>
mov 0xc(%ebp), %eax <foo+20>
incl (%eax) <foo+23>
lea -0x4(%ebp), %eax <foo+25>
incl (%eax) <foo+28>
jmp <foo+12> <foo+30>
mov $0x0, %eax <foo+32>
leave <foo+37>
ret <foo+38>

Judgment Entry:

The combination of cmp and jge indicates a judgment entry.

Loop:

The jmp command jumps to <foo+12>, signifying the loop condition check.

Condition:

  • mov -0x4(%ebp), %eax: Assigns the value of i to eax.
  • cmp 0x10(%ebp), %eax: Compares eax (value of i) with c.

This calculates i - c and checks the condition with jge. In assembly, conditions are reversed compared to C: jge skips the loop if the condition is met. Thus, the C condition is i - c < 0.

The code updates to:

1
2
3
4
5
6
int foo(int a, int b, int c) {
int i;
while (i - c < 0) {

}
}

Analyze Remaining Code Snippets

Before the Loop:

1
2
mov 0x8(%ebp), %eax         <foo+6>
mov %eax, -0x4(%ebp) <foo+9>

These lines assign the value of a to i:

1
i = a;

Inside the Loop:

1
2
3
4
mov 0xc(%ebp), %eax         <foo+20>
incl (%eax) <foo+23>
lea -0x4(%ebp), %eax <foo+25>
incl (%eax) <foo+28>
  • mov 0xc(%ebp), %eax and incl (%eax) increment the value at the address stored in b:
1
(*b)++;
  • lea -0x4(%ebp), %eax and incl (%eax) increment i:
1
i++;

The updated code becomes:

1
2
3
4
5
6
7
8
int foo(int a, int *b, int c) {
int i;
i = a;
while (i - c < 0) {
(*b)++;
i++;
}
}

Identify the Return Value

In x86 calling conventions, return values are stored in the eax register.

1
mov $0x0, %eax

This indicates the function returns 0:

1
return 0;

Final Code

1
2
3
4
5
6
7
8
9
int foo(int a, int *b, int c) {
int i;
i = a;
while (i - c < 0) {
(*b)++;
i++;
}
return 0;
}