When writing low-level embedded C code for ARM Cortex-M, understanding the ARM function call stack frame is essential for analyzing memory usage, optimizing performance, and debugging hard faults. This post dives deep into the ARM calling conventions, stack frame setup, and the distinction between caller-saved and callee-saved registers — all through practical examples.
ARM Cortex-M Architecture Essentials:
ARM Cortex-M CPUs use a register-based architecture. Registers play a vital role in function calls, local variable storage, return addresses, and more.
✅ Key Registers in Cortex-M:
| Register | Role |
|---|---|
| R0–R3 | Function arguments, return values (caller-saved) |
| R4–R11 | Local variables (callee-saved) |
| R12 | Scratch register (caller-saved) |
| R13 | Stack Pointer (SP) |
| R14 | Link Register (LR – holds return address) |
| R15 | Program Counter (PC) |
| xPSR | Program Status Register |
📐 ARM AAPCS Calling Convention:
The ARM Architecture Procedure Call Standard (AAPCS) defines how parameters are passed and how registers are used across function boundaries.
🔹 Register Responsibility:
| Register | Who Saves? | Usage |
|---|---|---|
| R0–R3 | Caller-saved | Arguments, return value |
| R4–R11 | Callee-saved | Local variables |
| R12 | Caller-saved | Scratch |
| LR (R14) | Caller-saved | Return address |
This division improves performance and keeps function calls predictable.
🛠️ What Happens on a Function Call?
When a function is called:
- Arguments are placed in R0–R3 (rest on stack).
- BL (Branch with Link) saves return address in LR (R14).
- Callee function uses R0–R3 freely.
- If callee wants to use R4–R11, it must save/restore them using the stack.
- Upon return, the result is placed in R0 and BX LR is used to return.
🧪 Example 1: Simple Function (No Stack Frame):
int add(int a, int b)
{
return a + b;
}
Assembly Output (Thumb2):
add:
ADD R0, R0, R1
BX LR
- a and b come in R0 and R1.
- Result returned in R0.
- No stack frame — efficient!
🧱 Example 2: Function With Local Variable (Stack Frame)
int squareAdd(int a, int b)
{
int temp = a * a;
return temp + b;
}
Assembly (IAR/GCC):
squareAdd:
PUSH {R4, LR} ; Save callee-saved R4 and LR
MUL R4, R0, R0 ; temp = a * a
ADD R0, R4, R1 ; return temp + b
POP {R4, LR} ; Restore registers
BX LR
This shows callee-saved register use (R4), requiring a PUSH/POP.
🧵 Caller-Saved vs Callee-Saved Registers:
✅ Caller-Saved (R0–R3, R12, LR)
- Caller is responsible for saving them before calling another function if needed.
- Callee can use them freely.
✅ Callee-Saved (R4–R11)
- Callee must save & restore these if used.
- Ensures caller’s values are preserved.
⚠️ Misconception Clarified
Q: I saw a squareAdd function using R0 in calculations. Isn’t R0 a caller-saved register?
✅ Yes — and that’s exactly why it’s allowed.
The callee function can freely use R0 without saving it, because it’s the caller’s responsibility to preserve R0 if it still needs its value after the call.
Stack Frame Layout (Simplified) – Inside the ARM Function Call Stack Frame
When a function uses local variables, calls nested functions, or needs to preserve callee-saved registers, the compiler generates a stack frame using PUSH/POP and SP adjustments. Let’s examine this with a complete example:
int multiply(int a, int b)
{
return a * b;
}
int compute_area(int length, int width)
{
int area;
int margin = 2;
area = multiply(length, width);
area = area - margin;
return area;
}
⚙️ Compiler Output (Simplified Assembly – Thumb2) :
compute_area:
PUSH {R4, LR} ; Save R4 (callee-saved) and LR (return address)
SUB SP, SP, #4 ; Allocate space for local variable `margin`
MOV R2, #2 ; R2 = 2
STR R2, [SP, #0] ; Store 2 to stack → margin = 2
BL multiply ; Call multiply(length, width)
MOV R4, R0 ; Save result to area (R4)
LDR R2, [SP, #0] ; Load margin from stack
SUB R0, R4, R2 ; area = area - margin → return in R0
ADD SP, SP, #4 ; Deallocate local space
POP {R4, LR} ; Restore R4 and LR
BX LR ; Return
🧱 Stack Frame Structure:
Here’s what the stack looks like during the execution of compute_area():
High Address | | [Previous Stack Frame] |------------------------- ← SP before function | Return address (LR) ← pushed first | R4 ← saved register |------------------------- ← SP after PUSH | margin (value = 2) ← space for local |------------------------- ← SP after SUB | Low Address
- The stack grows downward, so local variables are at lower addresses than the saved LR/R4.
- ARM ABI mandates 8-byte alignment, so SP might be further adjusted depending on context.
💡 Optimization Notes:
- If no callee-saved registers are used, PUSH/POP may be omitted.
- If no locals exist, SUB/ADD SP is skipped.
- Leaf functions (those that don’t call others) are often stackless.
🧠 Final Thoughts:
- Understanding how function calls work in ARM Cortex-M helps you:
- Debug efficiently (especially crashes due to stack overflows),
- Write better low-level code (e.g., ISRs, context switching),
- Read compiler output (for performance tuning),
- Understand when and why registers are saved/restored.
Reference: