BVH4 is just broken in my opinion, 64-byte per BVH4 with 128-byte cachelines doesn't make sense, you're still fetching the 2nd set of 64 bytes you often won't need. The intersection HW isn't as expensive as the memory hierarchy bandwidth. So BVH8 is basically a "free"(-ish) improvement and it...