Use the HeapWalk API for heap iteration instead of the Heap32First/Next API, which was known to be slow. Since HeapWalk only works locally, it requires using a remote thread and a DLL.