Yes, the cost for accurate raytracing of soundwaves is huge. Not so much because the final mixing or the effects applied to each channel would be expensive, even old EAX hardware could handle that with ease.
But because you need to raytrace with multiple bounces on diffuse textures over the entire(!) scene to determine the predominant reverb sources. That's like raytracing a scene with a lot of almost totally reflecting objects, it's pretty much the worst case for any ray / path tracer.
I actually wouldn't compare this to EAX x.0, but rather against A3D 2.0 / 3.0, which did actually perform just this, wavetracing, before. EAX is just for hardware mixing for effects specified by the game engine, A3D did accept a level geometry, albeit only a simplified one, and raytraced the sound sources in that.
If Nvidia did actually raytrace it accurately based on a normal mapped scene with diffuse reflections, using the same geometry as the graphics portion, for both ears independently, then it's not much of a surprise that it performed so badly. And I don't think this is the way to go yet, not unless the cost of raytracing for the sound perception is at least offset by reusing the results for other parts of the render path as well. And even then not by using the full scene geometry, but still only a good approximation.