The C programming language is not memory safe and it is not type safe. In practice, this means that a simple error that confuses the type of an object—or, as well see, forgets to check the type of an object—can lead to code accessing arbitrary memory. When an adversary is able to arrange for the type of one object to be confused for the another type, we call this a type confusion attack and it frequently results in complete software compromise.
In many cases, a modern C compiler with warnings turned on will catch simple examples of this, but there’s one programming pattern where the compiler cannot help you because the code itself is relying on type confusion enabled by guarantees in the language standard. And unlike most other modern (and even not that modern) languages rule this behavior out. That pattern is hand-rolled inheritance.
Let’s take a look at an example. Let’s say we want to have a linked list containing multiple types of objects. Here’s one approach.
enum {
FOO_TYPE,
BAR_TYPE,
};
struct Foo {
// Common members.
int type;
char *name;
struct Foo *next;
// Unique members.
char *data;
};
struct Bar {
// Common members.
int type;
char *name;
struct Foo *next;
// Unique members.
long x;
};
Notice that both struct Foo
and struct Bar
have the same 3 initial members in common, a type field, and name, and a next pointer.
Structures in C are laid out in memory sequentially (possibly with padding for alignment reasons) C23 §6.7.2.1. The upshot is that we can convert a pointer to a struct Foo
into a pointer to a struct Bar
and then access the members the three members in common. Here’s an example that walks such a linked list and prints out the name field of each element in the list.
void print_list(struct Foo *head_of_list) {
for (struct Foo *p = head_of_list; p != NULL; p = p->next) {
puts(p->name);
}
}
Notice that even if some of the members of the list are struct Bar
, this code works correctly because the only members that are accessed are name
and next
and those are in the same location in struct Bar
.
The problem comes in when you decide to access one of the unique members. Let’s update our print_list
function.
void print_list(struct Foo *head_of_list) {
for (struct Foo *p = head_of_list; p != NULL; p = p->next) {
printf("%s: %s\n", p->name, p->data);
}
}
The problem, of course, is that we’ve tried to access the data
member of a struct Bar
. This is type confusion. It’s undefined behavior. The compiler is free to do anything it wants with this code except that it cannot tell that the code has any bugs so it’s likely to treat the data it gets from p->data
as a pointer to a string and then try to print it. Since long
and char *
usually have the same alignment and size, this means we’ll be using the x
member of the struct Bar
as if it were a pointer. This example is likely to crash.
The fix is simple: Don’t access a member unless you know the type of the structure is what you expect. Since C cannot help us here, we have to use the type
member to disambiguate. This leads to the following correct code.
void print_list(struct Foo *head_of_list) {
for (struct Foo *p = head_of_list; p != NULL; p = p->next) {
if (p->type == FOO_TYPE) {
printf("%s: %s\n", p->name, p->data);
} else {
struct Bar *b = (struct Bar *)p;
printf("%s: 0x%lX\n", b->name, b->x);
}
}
}
Here’s a link to Compiler Explorer with this code if you want to play with it.
C is the problem
The root of the issue here is that C enables this style of code but provides no tools to help you use it safely. Most modern languages simply disallow it and provide users with tools to deal with it.
In Java, for example, we would use a base class that both Foo
and Bar
inherit from. And if we tried to cast an instance of Foo
to Bar
, we’ll get a runtime error, specifically a ClassCastException
.
In Rust, this error would be prevented at compile time. It’s simply not possible to treat at Foo
as a Bar
without using an unsafe
block.
Even C++ provides the tools to handle this safely. It won’t stop you from casting a Foo *
to a Bar *
exactly as in C; however, you can use C++’s object-oriented nature to create a base class, as we do with Java, and then use dynamic_cast
to convert from a base pointer to a Foo
or Bar
pointer. We don’t get an exception if this down cast fails, instead the result is nullptr
.
This is not a theoretical problem, but a real one that impacts real code bases, large and small. I ran across an instance of this issue in some code using libxml2. Libxml2 takes XML or (old) HTML and produces a tree of nodes representing the documents. There are several different types of nodes, including element nodes, attribute nodes, and text nodes. Element nodes (and text nodes for that matter) are represented by an xmlNode
structure where as attributes are represented by an xmlAttr
structure. These are structured similarly to struct Foo
and struct Bar
above in that they have some members in common—including a type, a name, and pointers children
, last
, parent
, next
, and prev
that point to other nodes in the tree. Other nodes like the xmlDtd
(which comes from a <!DOCTYPE ...>
) behave similarly.
In the case I saw, an xmlAttr *
was being treated as an xmlNode *
(which was fine for traversing the list of attributes on an element) but the properties
field of an xmlNode
was being accessed but xmlAttr
doesn’t have a properties
field. This type confusion was leading to crashes. The fix was simple (check the type
field exactly as in my example above).
Note that using an xmlNode *
to point to an xmlAttr
is an expected use of the API. See, for example, xmlSetNs
which sets a namespace on a node of type either XML_ELEMENT_NODE
or XML_ATTRIBUTE_NODE
. Its first argument is a pointer to an xmlNode
so if you wish to set the namespace of an attribute node, you need to cast the address to an xmlNode *
.
Not using C is the solution
We need to stop using C when we have alternatives. I see no other way.
To quote Fish in a Barrel, “Stop writing C/C++.”