Instead of a thunk insert the dispatch into the original function.
If the original function should be executed the prolog just jumps to the "real" code in the function. Otherwise the replacement function is called.
There is one little complication here: when the replacement function calls the original function, the original function should not dispatch to the replacement again.
To pass this information, we use a flag in thread local storage.
The setting and reading of the flag is done in two new runtime functions.
rdar://problem/51043781