The projected area is a parallelogram in the generic case (which is also a trapezoid of course but more special).
The special case you've drawn is a rhombus (when the derivatives are equal in size).
I fear that texldd has a high cost because the driver/hardware doesn't know the the passed ddx/ddy parameters are coherent in the quad, and basicly calculates things 4 times.
I don't know how founded that theory is, or if the driver could somehow tell the hardware that it's a coherent texldd and execute it faster (I guess no games use that yet, so it wasn't important to optimize for it.)
The projected area is a parallelogram in the generic case (which is also a trapezoid of course but more special).
The special case you've drawn is a rhombus (when the derivatives are equal in size).
Maybe, maybe not, depending on how independent the texture samplers are per pixel. I guess it could be 4 times X cycles with higher quality filtering.
I always thought texldd was somewhat flawed, or that there should be at least a "texldc" that takes virtual texture coordinates instead of derivatives. That way you can guarantee coherent (though not necessarily cache-friendly) access.
The projected area of a 2x2 pixel quad into texture space will always be a convex quadrilateral with straight edges as long as the only distortion applied is the standard perspective division. There is however no guarantee that any two opposite edges will be parallel, so the shape is in general neither a rhombus, parallellogram nor even a trapezoid.
In practice, however, it seems to work OK to just take the two longest non-opposite edges and perform derivatives/trilinear/aniso calculations as if these two edges form two edges of a virtual parallellogram.