In general the performance differences between gnumalloc and this malloc are not that big. The major difference comes when primary storage is seriously over-committed, in which case gnumalloc wastes time paging in pages it's not going to use. In such cases as much as a factor of five in wall-clock time has been seen in difference. Apart from that gnumalloc and this implementation are pretty much head-on performance wise.
Several legacy programs in the BSD 4.4 Lite distribution had code that depended on the memory returned from malloc being zeroed. In a couple of cases, free(3) was called more than once for the same allocation, and a few cases even called free(3) with pointers to objects in the data section or on the stack.
A couple of users have reported that using this malloc on other platforms yielded "pretty impressive results", but no hard benchmarks have been made.