Writing a Simple Mach-O Parser with Python ctypes

Overview

When trying to understand the file format for a given executable type, there is no better way of accomplishing that than writing a parser for it. In this post we will walk through how to build a simple parser using Python and ctypes for the Mach-O file format. There are many libraries available that have already been created for what we are about to cover, but remember this is all about really understanding the target file format structure, so we must dive into it ourselves.

The Mach-O

The Mach object format is used by iOS and OS X to house executables, object code, shared libraries, and dynamically loaded code.

https://en.wikipedia.org/wiki/Mach-O

In this post we are only going to inspect the Mach-O's header structure. But you can leverage what is covered here to parse the entire file format as you see fit. Here is what the header structure looks like.

00532     // Structs from <mach-o/loader.h>  
00533  
00534     struct mach_header {  
00535       uint32_t magic;  
00536       uint32_t cputype;  
00537       uint32_t cpusubtype;  
00538       uint32_t filetype;  
00539       uint32_t ncmds;  
00540       uint32_t sizeofcmds;  
00541       uint32_t flags;  
00542     };  

http://llvm.org/docs/doxygen/html/Support2MachO8h_source.html

The magic field indicates the endianness of the current architecture. The cputype and cpusubtype fields describe the CPU architecture (ARM) and its subtype (ARM v7). Our parser will target these three fields and read their respective values.

Python ctypes

" .. ctypes is a foreign function library for Python. It provides C compatible data types, and allows calling functions in DLLs or shared libraries. It can be used to wrap these libraries in pure Python .. "

https://docs.python.org/2/library/ctypes.html#module-ctypes

The ctypes library is incredibly powerful, because it will allow us read in our target binary header and create an exact representation specified by the mach_header struct . We can do this by leveraging Structured data types which are used to create native (C) structures.

(https://docs.python.org/2/library/ctypes.html#structured-data-types)[https://docs.python.org/2/library/ctypes.html#structured-data-types]

Building the Parser

We now have all the fundamentals that we need in order to build a parser for the Mach-O file format. First let's create our MachOHeader class that will subclass Structure from the ctypes library in order to construct the Mach-O header.

class MachOHeader(Structure):

    _fields_ = [

        ("magic", c_uint),
        ("cputype", c_uint),
        ("cpusubtype", c_uint),
        ("filetype", c_uint),
        ("ncmds", c_uint),
        ("sizeofcmds", c_uint),
        ("flags", c_uint)

    ]

Next we need populate this structure with our target binary.

try:  
    with open(b, "rb") as f:
        binary = bytearray(f.read())
        print("[*] Loading : {0}".format(b))
        f.close()
except IOError:  
    print("[*] Cannot open {0} (!) ".format(b))

    # Copy the binary into our MachOHeader structure
    macho_header = MachOHeader.from_buffer_copy(binary)

The from_buffer_copy() method creates a ctypes instance, copying the buffer from the source object buffer which must be readable

https://docs.python.org/2/library/ctypes.html#ctypes.CData.frombuffer_copy

Now we should be able to read the values from the fields defined within our MachOHeader structure. In this example our target binary is 32-bit, so the magic field will be 0xfeedface, the CPU architecture is ARM and subtype is ARM v7, so the respective fields for cputype and cpusubtype are 12, and 9.

if hex(macho_header.magic).rstrip("L") == "0xfeedface":  
    print("[*] Loaded 32-bit architecture (!) ")

if macho_header.cputype == 12:  
    print("[*] CPU Type : ARM ")

    """
    CPU_SUBTYPE_ARM_V7': 9
    """
if macho_header.cpusubtype == 9:  
    print("[*] CPU SUBTYPE : ARM_v7")

So here is our final script -> https://gist.github.com/rotlogix/d3c0858b3b76c3777daf

Let's run the damn thing!

python parser.py bin  
[*] Loading : bin
[*] Loaded 32-bit architecture (!)
[*] CPU Type : ARM
[*] CPU SUBTYPE : ARM_v7

Conclusion

Hopefully this demonstrated how easy it is to write few lines of code in order to accomplish the very fruitful goal of learning more about binary file formats.