How the ARM Function Call Stack Frame Works (With C & Assembly Examples)

When writing low-level embedded C code for ARM Cortex-M, understanding the ARM function call stack frame is essential for analyzing memory usage, optimizing performance, and debugging hard faults. This post dives deep into the ARM calling conventions, stack frame setup, and the distinction between caller-saved and callee-saved registers — all through practical examples.

ARM Cortex-M Architecture Essentials:

ARM Cortex-M CPUs use a register-based architecture. Registers play a vital role in function calls, local variable storage, return addresses, and more.

✅ Key Registers in Cortex-M:

Register	Role
R0–R3	Function arguments, return values (caller-saved)
R4–R11	Local variables (callee-saved)
R12	Scratch register (caller-saved)
R13	Stack Pointer (SP)
R14	Link Register (LR – holds return address)
R15	Program Counter (PC)
xPSR	Program Status Register

📐 ARM AAPCS Calling Convention:

The ARM Architecture Procedure Call Standard (AAPCS) defines how parameters are passed and how registers are used across function boundaries.

🔹 Register Responsibility:

Register	Who Saves?	Usage
R0–R3	Caller-saved	Arguments, return value
R4–R11	Callee-saved	Local variables
R12	Caller-saved	Scratch
LR (R14)	Caller-saved	Return address

This division improves performance and keeps function calls predictable.

🛠️ What Happens on a Function Call?

When a function is called:

Arguments are placed in R0–R3 (rest on stack).
BL (Branch with Link) saves return address in LR (R14).
Callee function uses R0–R3 freely.
If callee wants to use R4–R11, it must save/restore them using the stack.
Upon return, the result is placed in R0 and BX LR is used to return.

🧪 Example 1: Simple Function (No Stack Frame):

int add(int a, int b) 
{
    return a + b;
}

Assembly Output (Thumb2):

add:
    ADD R0, R0, R1
    BX  LR

a and b come in R0 and R1.
Result returned in R0.
No stack frame — efficient!

🧱 Example 2: Function With Local Variable (Stack Frame)

int squareAdd(int a, int b) 
{
    int temp = a * a;
    return temp + b;
}

Assembly (IAR/GCC):

squareAdd:
    PUSH {R4, LR}       ; Save callee-saved R4 and LR
    MUL R4, R0, R0      ; temp = a * a
    ADD R0, R4, R1      ; return temp + b
    POP {R4, LR}        ; Restore registers
    BX  LR

This shows callee-saved register use (R4), requiring a PUSH/POP.

🧵 Caller-Saved vs Callee-Saved Registers:

✅ Caller-Saved (R0–R3, R12, LR)

Caller is responsible for saving them before calling another function if needed.
Callee can use them freely.

✅ Callee-Saved (R4–R11)

Callee must save & restore these if used.
Ensures caller’s values are preserved.

⚠️ Misconception Clarified
Q: I saw a squareAdd function using R0 in calculations. Isn’t R0 a caller-saved register?

✅ Yes — and that’s exactly why it’s allowed.
The callee function can freely use R0 without saving it, because it’s the caller’s responsibility to preserve R0 if it still needs its value after the call.

Stack Frame Layout (Simplified) – Inside the ARM Function Call Stack Frame

When a function uses local variables, calls nested functions, or needs to preserve callee-saved registers, the compiler generates a stack frame using PUSH/POP and SP adjustments. Let’s examine this with a complete example:

int multiply(int a, int b) 
{
    return a * b;
}

int compute_area(int length, int width) 
{
    int area;
    int margin = 2;

    area = multiply(length, width);
    area = area - margin;

    return area;
}

⚙️ Compiler Output (Simplified Assembly – Thumb2) :

compute_area:
    PUSH    {R4, LR}           ; Save R4 (callee-saved) and LR (return address)
    SUB     SP, SP, #4         ; Allocate space for local variable `margin`

    MOV     R2, #2             ; R2 = 2
    STR     R2, [SP, #0]       ; Store 2 to stack → margin = 2

    BL      multiply           ; Call multiply(length, width)
    MOV     R4, R0             ; Save result to area (R4)

    LDR     R2, [SP, #0]       ; Load margin from stack
    SUB     R0, R4, R2         ; area = area - margin → return in R0

    ADD     SP, SP, #4         ; Deallocate local space
    POP     {R4, LR}           ; Restore R4 and LR
    BX      LR                 ; Return

🧱 Stack Frame Structure:

Here’s what the stack looks like during the execution of compute_area():

High Address
  |
  |   [Previous Stack Frame]
  |------------------------- ← SP before function
  |   Return address (LR)   ← pushed first
  |   R4                    ← saved register
  |------------------------- ← SP after PUSH
  |   margin (value = 2)    ← space for local
  |------------------------- ← SP after SUB
  |
Low Address

The stack grows downward, so local variables are at lower addresses than the saved LR/R4.
ARM ABI mandates 8-byte alignment, so SP might be further adjusted depending on context.

💡 Optimization Notes:

If no callee-saved registers are used, PUSH/POP may be omitted.
If no locals exist, SUB/ADD SP is skipped.
Leaf functions (those that don’t call others) are often stackless.

🧠 Final Thoughts:

Understanding how function calls work in ARM Cortex-M helps you:
Debug efficiently (especially crashes due to stack overflows),
Write better low-level code (e.g., ISRs, context switching),
Read compiler output (for performance tuning),
Understand when and why registers are saved/restored.

Reference:

AAPCS Reference link.