In the past few years, I’ve spent a fair bit of time disassembling code for embedded devices. Over time, I’ve built up mental lists of properties that embedded devices can have that facilitate or frustrate disassembly. I’m sure that by now, I’ve forgotten some of these properties, so I decided it was time to make some lists.
And so, without further ado and in no particular order, properties of embedded devices that facilitate disassembly.
- Contains internal symbols.
- Dynamically linked (so it contains external symbols).
- Uses example code (often without checks for error conditions) with few modifications.
- Contains debugging messages/error messages.
- Provides a UNIX-like environment, complete with a debugger.
- Optimizations turned on. While this typically makes debugging one’s own code difficult since the debugger has difficulty single stepping high level instructions, if I’m disassembling, I don’t have the source so I don’t care about associating assembly with C and optimizations eliminate constantly loading and storing values, superfluous register moves, dead code, etc.
- Brings UART pins out on the board. I’m happy to solder wires to a MM232R and read your debugging messages while the device operates or interact with the shell you helpfully connected to the UART for testing.
- Runs code directly from the ROM.
And now properties that frustrate disassembly. Since absence of properties in the first list tend to frustrate disassembly, the lack of their presence is not duplicated below.
- Statically linked. I end up wasting lots of time disassembling various standard library functions.
- Optimizations turned on. Sometimes, the compiler is simply smarter than I am and things like slightly different versions of loops depending on data address alignment is a serious time-waster (for the one whose time matters most: me). So I said optimizations both facilitated and frustrated disassembly; what’s your point?
- Contains hand-written assembly that doesn’t follow standard parameter passing conventions.
- Written in C++. In particular, virtual function calls.
- Contains misleading/incorrect error messages.
- Passes data using global variables.
- Changes argument order. Think
foo(a,b,c)
calls bar(c,a,b)
calls baz(b,c,a)
. - Runs on an architecture that has branch delay slots.
- Contains machine generated code. Think Lex or Yacc.
- Is a state machine.
- Uses tightly coupled threads.
- Written by programmers who don’t understand basic synchronization mechanisms.
- Code gets copied piece by piece from ROM to RAM by a bootloader. (Substitute Flash for ROM as desired.)
- Uses memory overlays.
- Monolithic.
- Self-modifying.
- Contains triply nested loops (or deeper).
- Contains mutually recursive functions.
- Passes around function pointers.
- Written for an architecture not supported by IDA Pro.
Some of these properties are intentional, others are fortuitous, and some are simply mistakes. I have encountered all of these properties at one time or another, but no single embedded device contains all of them.
I am very thankful to designers/implementers of embedded devices who include elements from the first list and curse the names of those who include elements of the second list. The two are not mutually exclusive.