Details of C/C++ Dereferences and C++ Calls to
By Justin Poirier
There are many very specific cases of C/C++ dereferences and the C++ "delete" operator, for which the average programmer does not bother to memorize the correct behaviour to be expected from the language. Cataloguing these will provide a single point of reference so that the programmer can stop looking such cases up on-demand; and provide a more thorough understanding of memory transactions.
Many beginning C/C++ programmers assume that when a C/C++ pogram is executed, it has some kind of run-time system, implemented by compiler-generated or operating system code, which executes concurrently to the explicitly-written code and manages memory. This assumption may be the result of previous experience using Java, with its extensive run-time system.
The assumption may hold that this imaginary run-time system checks for things like out-of-bounds array accesses and incorrect typecasts. More relevant to our discussion, it may hold that the system keeps track of the state of all pointers in use by the program, including which distinct item in memory, if any, a pointer refers to. Several factors may cause this impression. First, the idea might seem less likely if beginners realized that C/C++ systems aren't even required to check for access to uninitialized pointers and other variables at compile-time, let alone complete run-time tracking of pointers. This is unlike the Java compilers beginners may be used to, which feature a thorough check for such access1. Second, many beginners believe that a process's dynamically-allocated memory is interspersed with that of other processes running on the system, and that the operating system allocates memory for all programs. If this were the case it would require tracking of individual pointers and their pointees. Finally, the very fact that C/C++ pointers have different types despite normally holding addresses of the same format2, might imply that there will be some system running in the background, that will require pointers defined by the programmer to match the types of the items they point to.
In reality such a system does not typically exist3. A process typically has its own private heap contained in a relatively large block of memory4. The C "malloc()" function allocates blocks for use by the program by sub-dividing the heap according to a strategy like First Fit or Buddy Allocation. The structure of the heap's division is simply tracked within the heap itself, as all free blocks are connected via pointers to form a linked data structure. The process as described here also occurs with the C++ "new" operator, and in fact the implementation of new often makes use of malloc(). When it comes time to free memory, the C "free()" function or C++ delete operator (which often uses free()5) determines the size of the block to be freed not by interacting with the imagined run-time system, but by looking a few bytes before the start of the block, where the size will have been stashed by malloc()/new. free() trusts that the parameter it is passed points to a dynamically allocated block, and therefore that the size info will be present.
Dereferences in C/C++ are implemented using nothing more than compiler generated code to replace each expression formed by a dereference, with code that defines the expression as the value of the memory contents starting at the address in the pointer, and extending for the size of the data type. Here again, there is implicit trust that the memory region in fact holds an item of the data type.
Knowing that the processes we've described are all that go on when a basic memory operation occurs, we can now compile a list of a few special cases of deletions and dereferences, and it shouldn't be surprising when a case is not protected against. For example, the case where a wrongly-casted pointer is dereferenced is listed below as undefined. With our knowledge of how dereferences work it is easy to imagine why this might be the case. A dereference merely looks up the memory contents of a region with the size of the type referenced by the casted pointer. It does not know anything about the distribution of the process's memory, so if the pointer is cast to a pointer to a larger type, the memory the dereference looks up might cut into a neighbouring item in memory. Such a random memory transaction obviously has an ambiguous effect on the subsequent operation of the program. Furthermore, the region looked up by a wrongly-casted pointer's dereference might even cut into memory outside the process's address space, if the pointee was actually of a smaller type and resided near the end of the process's memory block. This would cause a protection fault on systems with virtual memory and a general protection fault on systems with segmentation-based memory protection.
For any of the cases listed below as "undef", a particular C/C++ system might in fact check for incorrect behaviour; by calling such cases undefined we simply mean that a C++ system is not required to do so.
A few of the items listed are frequently asked questions regarding cases at real risk of happening; the others are very specific cases that one might ponder out of curiosity. We start with cases where the pointer's address could wind up being anywhere in memory. The simplest example of this involves explicitly deleting/dereferencing a specific address using a literal number of the format, typically unsigned int, used internally by the system for pointers. An example of this would be
delete (char*)100u;. Note that while delete is an operator defined to act on casted pointers, some compilers may perform the extra step of disecting operands that are casted on the spot like in our example, and not allow the value being cast to be a literal.
|What happens if I delete/deref a casted literal?||These cases may cause the operand address of the delete/deref to be in any location as categorized to the right.||heap, code segment, stack, data segment, bss segment, outside process's address space||undef||undef|
|What happens if I delete/deref a pointer that's already been deleted?|
||no effect||run-time error|
|What happens if I delete/deref an uninitialized pointer?|
|What happens if I delete/deref an uncasted void pointer?||undef||compile-time error|
|What happens if I delete/deref a wrongly-casted pointer?||undef||undef|
|What happens if I delete/deref the address of a local, static or global variable?||undef||allowed|