NVIDIA celebrates 25 years since the GeForce 256

My very first big budget GPU - Asus Annihilator Geforce 256 DDR.

My dad was like, why the f is this graphics card $400 dollars. I was like, that's what it costs. I'm so lucky he didn't take it back. What incredible moments I had with that card.
 
My very first big budget GPU - Asus Annihilator Geforce 256 DDR.

My dad was like, why the f is this graphics card $400 dollars. I was like, that's what it costs. I'm so lucky he didn't take it back. What incredible moments I had with that card.
gf.jpg
 
It was nothing new when it came out if you look outside PC-space. Aracede machines had had T&L" for years at that point
 
It was nothing new when it came out if you look outside PC-space. Aracede machines had had T&L" for years at that point
Yes but these were usually a separate processor from the rasterizer. Even Naomi2 has its T&L, Elan, on a separate processor. Model 2 and 3 used various different processors for t&l. Even professional graphics cards for the PC has options for geometry processors. But geforce256 is the first to have it integrated on the same chip as the rasterizer?
 
NV10 was just ridiculously ambitious and ahead of its time.

It had a die size of ~138mm2 on 220nm which is by far the largest die of any GPU at the time (and it beat everyone else to TSMC 220nm as well AFAIK). It was a 4x1 pipeline with free trilinear (upgraded to 4x2 bilinear on NV15) while the best competitors at the time were 2x1 with bilinear. It beat everyone else to T&L and DX7 by months (ATI) or even years (3DFX if they had survived). It was *ridiculously* DRAM bandwidth limited for the SDR version, and still *very* DRAM and CPU limited for the DDR version that came just 2 months later. It was the first GPU with DDR by 6+ months. It was actually cheaper to manufacture than the 3DFX VSA-100 dual-chip in the Voodoo 5500 given that was 2x112mm2 on 250nm, and once ported to 180nm, it was also way more area efficient than ATI's Radeon (88mm2 for GF2 GTS vs 111mm2 for R100 - ATI's key advantage was HyperZ/Early-Z which NVIDIA didn't have before GF3 just 9 months after R100) while being faster most of the time and with significantly better drivers by that point.

The Riva 128 and TNT were extremely competitive chips for certain markets but they had significant trade-offs as they were engineered extremely quickly due to NVIDIA running out of money after the NV1/SEGA fiasco (the Riva 128 ). The NV10 architecture didn't have any trade-offs (except no 3DFX Glide support obviously), it was just best-in-class all around. 3DFX went bankrupt just 16 months later.

It's honestly kind of insane that the (ex-SGI) NV10 Lead Architect said in the DF interview that he was hired in Summer 1997 and the GeForce 256 was publicly available in October 1999. That's basically 18 months from concept to tape-out for a brand new architecture! I guess that's slow compared to the Riva 128's insane 6 months from concept to tape-out starting from scratch after firing half their employees when they ran out of money(!), but it's still incredibly impressive.
 
I wonder if anyone tried DDR with Rage 128 VR. Or why it did not happen.
Wow, I think you're right, the Rage 128 VR did support DDR back in 1998 (but it was either 128-bit SDR or 64-bit DDR so just a cost reduction option rather than a performance improvement - and that's assuming DDR was cheap enough which I'm not sure it was): http://www.bitsavers.org/components/ati/Rage_128/Rage_128_Overview.pdf and https://bitsavers.computerhistory.org/components/ati/Rage_128/Rage_128_GC_Spec.pdf

What would a TNT without trade-offs look like?
AFAIK the main downside of the TNT was SW/driver maturity, which was greatly improved by the time the GeForce 256 came out. In terms of HW trade-offs, I think RAMDAC 2D quality was far from best-in-class and they missed their clock targets pretty badly, but both those things were already mostly fixed with the TNT2. The GeForce 256 didn't come out of nowhere, and the TNT2 was a very competitive GPU, but it wasn't so clearly ahead of everything else when it came out as the GeForce 256 DDR was.
 
Everyone planning for 250 nm in 1998 missed the clock targets.
Original TNT was TSMC 350nm though which they had already used for the Riva 128ZX a few months earlier, NVIDIA claims the clock target miss was due to power consumption being too high for passive cooling, which seems plausible given how rudimentary pre-tape-out power consumption estimation must have been at the time...

Somehow I had completely forgotten/missed that TNT(2) didn't have HW triangle setup at all which is why it was so much more CPU dependent than Voodoo 2/3, and GeForce 256 just skipped that step to go straight from nothing to full T&L.
(EDIT: my mistake, the sites claiming that were clearly just confused about why early NVIDIA drivers had higher CPU overhead)

---

For the sake of balance, I feel like I should temper my praise a little bit, and point out that in many ways, TNT and GeForce 256 were pretty straightforward bruteforce architectures, and there is nothing particularly innovative in their design as far as I can tell. The architecture of PowerVR TBDR GPUs at the same time seems significantly more complex, and ATI's early GPUs also had some tricks that NVIDIA only added much later. Heck, nearly everyone beat NVIDIA in terms of architectural cleverness in one way or another at some point between 1996 and 2004 including Matrox/3DLabs/etc...

But complexity should never be more than an unfortunate means to an end rather than a goal in itself; a simpler architecture is easier to implement in HW without bugs, easier to develop great drivers for, etc... In my opinion, NVIDIA didn't win thanks to their architecture; they won because they had *by far* the best engineering execution and velocity of any of the early GPU companies. And I believe a good but relatively simple and unexciting high-level architecture was a good thing and a key part of what made that possible.
 
Last edited:
Original TNT was TSMC 350nm though which they had already used for the Riva 128ZX a few months earlier, NVIDIA claims the clock target miss was due to power consumption being too high for passive cooling, which seems plausible given how rudimentary pre-tape-out power consumption estimation must have been at the time...

Somehow I had completely forgotten/missed that TNT(2) didn't have HW triangle setup at all which is why it was so much more CPU dependent than Voodoo 2/3, and GeForce 256 just skipped that step to go straight from nothing to full T&L.

The original announcement might have been lost to time, but there are still references to it proclaiming a 250 nm process. And since it wasn't mature enough, back to 350 it went.

What made you think the TNTs did not have a triangle setup?
 
I always found the Rendition card fascinating - as it was amongst the first 3d accelerators I used, and it had that intriguing programmable aspect paired with the fixed function elements.

I think Verite 2100/2200 (and perhaps V1000 as well) had triangle set-up, but the speed of some of those functions c.f. CPU seemed like the benefit was highly contingent on the vintage of processor. There was that prototype Hercules card with the Fujitsu co-processor, which may have approximated to the GeForce 256 type transform and lighting.

It is interesting how some of the market segmentation and cost reduction strategies are consistent across the eras - S3 in the 2d realm with DRAM versus VRAM; then Nvidia with the M64 and Vanta etc.
 
Yes but these were usually a separate processor from the rasterizer. Even Naomi2 has its T&L, Elan, on a separate processor. Model 2 and 3 used various different processors for t&l. Even professional graphics cards for the PC has options for geometry processors. But geforce256 is the first to have it integrated on the same chip as the rasterizer?
Not at all disputing the point generally, but in the console area - I think the RSP component of the Nintendo 64's RCP did transform and lighting calculations?
 
Back
Top