more of them:
GigaPixel GP1/2 (free FSAA, faster than anything else when using FSAA 4x), GP3 (EMBM, 8-layer, T&L, DXT1, 80KB SRAM)
I'm not so sure GP3 was even capable of EMBM:
http://users.otenet.gr/~ailuros/gp-3.pdf
As for the "free FSAA" yeahrightsureok. It was plain Multisampling and it means as in all other cases that it's nearly fillrate and bandwidth "free" (always in a very relative sense) compared to Supersampling. Under that sense GPUs have since the dawn of R300 already "free FSAA".
GP1 wasn't capable of AF by the way, which makes Multisampling by itself a very moot point.
http://users.otenet.gr/~ailuros/gp-1.pdf
Here are the numbers 3dfx gave to the public when they bought off Gigapixel and advertised for GP1 and GP2 licensing:
Despite it being a 100MHz core from what I recall, the results aren't something to knock me out of my socks.
OAK WARP5 (Windows accelerator and Rendering Processor) - 1997, trilinear, free FSAA
Same thing for the "free FSAA" claim; in order to get close to zero performance penalty for FSAA you're either testing a case where the system is completely CPU bound or the GPU might have a very unbalanced pipeline where the concentration on AA is higher than on anything that was ever available. PowerVR's MBX also claims "FSAA4free"; of course is it nearly resource free, but it's also from what I recall 2x sample Multisampling on one axis.
Between R300 and today's GPUs if you run in a resolution where the system is CPU bound enough the performance penalty for multisampling is relatively small and as "free" as anywhere mentioned above. On today's GPUs as long as you don't exceed 1280*1024 even 4xMSAA will tax fillrate/bandwidth and memory footprint only by very small amounts.