NVIDIA GF100 & Friends speculation

Rys · Feb 26, 2010

Sontin said:
All their consumer software is free - "Cuda", PhysX, Drivers, OpenCL...

It certainly wasn't free to develop, and thus it certainly isn't free for NV to never amortise in the costs of their hardware products. Just because you can download it without paying....well, you get the idea.

Let's just stop this retarded train of discussion please.

Jawed · Feb 26, 2010

Silus said:
So let me get this straight: Most of you think that NVIDIA is dumb and:

1) Wasn't expecting Cypress to be smaller than Fermi

Fermi was well under way before the shock and awe of R770. Even AMD engineers weren't convinced that R770 was the right thing to do and were planning to make Cypress much bigger than it turned out. And remember R770's SIMDs/TUs grew by 20% "last-minute" when it turned out it was pad limited, so RV770 was faster than AMD originally planned for.

I doubt anyone here thought NVidia expected Cypress to be bigger than Fermi. After all, NVidia knew that that's not possible at TSMC, you know, since they have a big chip that's basically as big as anyone dares to go at TSMC. And there's a general suspicion that NVidia is addicted to the biggest possible chips.

2) Took absolutely no precautions to ensure that Fermi would be profitable

NVidia simply didn't adjust GF100 to the vagaries of 40nm at TSMC. Notice that AMD did adjust Cypress. Maybe that's simply because AMD is used to debugging TSMC's nodes early in their life, while NVidia "sits back"?

Fermi will undoubtedly be the right chip for 40nm at TSMC, at some point. The question is, when? Will GF102/GF112 (whatever, it's shrunken, tweaked, successor) on 28nm arrive before that? In theory NVidia should have less trouble at 28nm, now that it's been through the pain of this new architecture at 40nm. Depends how painful TSMC finds 28nm, I guess.

3) Doesn't know how to design chips

Clearly has problems executing since 2007. Despite that, decided to bet the pot on a major re-design with the biggest possible chip on a process that TSMC was clearly struggling with.

and despite the forward looking architecture and key elements of it, the performance delta over previous generations is barely 30% higher.

Count me out of that. I'm expecting it to be substantially faster than GTX285. Them old TMUs and ROPs are a disaster zone, for a start. And the ALUs, well...

If the new architecture is as fast as NVidia's claiming then NVidia could easily have afforded to make GF100 smaller to take account of the manufacturing problems at TSMC.

4) Will charge an arm and a leg for it, despite not having a good performance lead over the competition

I'm sure they'll sell all of them, regardless of price.

Jawed

Alexko · Feb 26, 2010

aaronspink said:
4 GT/s isn't exactly high speed for GDDR5.

True, but most rumors point to something between 4 and 4.8GT/s for Fermi. So memory shouldn't add too much to the cost, even with 1.5GB.

MfA · Feb 26, 2010

Silus said:
3) Doesn't know how to design chips

I think they weren't expecting TSMC to have quite this many problems on 40 nm ...

4) Will charge an arm and a leg for it

They will charge whatever maximizes their profits which in the short term I expect is an arm and a leg simply because of supply and demand ... whether they have a big or a small lead. A lead is a lead, and I think enough people will pay the premium for the fastest single GPU card for that to exhaust their supplies (which I think will be extremely limited) regardless.

trinibwoy · Feb 26, 2010

Jawed said:
Clearly has problems executing since 2007. Despite that, decided to bet the pot on a major re-design with the biggest possible chip on a process that TSMC was clearly struggling with.

To be fair it wouldn't have looked like an extraordinary risk at the outset. Theoretically they would have had a GT214/GT212 in the bag before Fermi came to market. But of course, things often don't go according to plan and hindsight is 20/20 etc....

Jawed · Feb 26, 2010

trinibwoy said:
To be fair it wouldn't have looked like an extraordinary risk at the outset. Theoretically they would have had a GT214/GT212 in the bag before Fermi came to market. But of course, things often don't go according to plan and hindsight is 20/20 etc....

Cypress is designed for time to market, whereas the Fermi is specialized in the rich feature.

Jawed

trinibwoy · Feb 26, 2010

Jawed said:
Cypress is designed for time to market, whereas the Fermi is specialized in the rich feature.

Seriously though I wonder if Nvidia will do anything major on 28nm. It seems Fermi's featureset is well ahead of DirectX already and has addressed a lot of the concerns in the compute community as well. It should be a much easier ride than this one was.

AlexV · Feb 26, 2010

trinibwoy said:
It seems Fermi's featureset is well ahead of DirectX already

What is this based on? If we look beyond the re-branding of features mandated by DX11 for promo-PDF usage, I'm having trouble figuring out the parts where the featureset is well ahead of DX...maybe I'm not looking where I shoult be looking?

trinibwoy · Feb 26, 2010

AlexV said:
If we look beyond the re-branding of features mandated by DX11 for promo-PDF usage

Like what?

I'm having trouble figuring out the parts where the featureset is well ahead of DX...maybe I'm not looking where I should be looking?

CUDA vs CS? Parallel geometry processing where the API assumes otherwise?

rpg.314 · Feb 26, 2010

Cypress is designed for time to market, whereas the Fermi is specialized in the rich feature.

Except in this case, it is rather accurate.

DeanoC · Feb 26, 2010

trinibwoy said:
Like what?

CUDA vs CS? Parallel geometry processing where the API assumes otherwise?

CUDA and CS are roughly similar, obviously CUDA takes better advantage of NV hardware so is a little ahead there. No API restricts you to scalar geometry processing. Its entirely upto the driver how it deals with them, as long as the basic triangle order guarentee is kept.

trinibwoy · Feb 26, 2010

DeanoC said:
CUDA and CS are roughly similar, obviously CUDA takes better advantage of NV hardware so is a little ahead there. No API restricts you to scalar geometry processing. Its entirely upto the driver how it deals with them, as long as the basic triangle order guarentee is kept.

Yes, but the API doesn't explicitly facilitate parallel processing. On the contrary it actually makes it difficult to do due to exactly the in-order requirement you mentioned. Hence the hardware is ahead of the software in this case.

Jawed · Feb 26, 2010

trinibwoy said:
Yes, but the API doesn't explicitly facilitate parallel processing.

That's like saying a kettle doesn't explicitly facilitate making 2 cups of tea.

Jawed

trinibwoy · Feb 26, 2010

Jawed said:
That's like saying a kettle doesn't explicitly facilitate making 2 cups of tea.

You can be as glib as you like but I'm sure you get my point. So if this doesn't count then what would possibly count as hardware superceding the DirectX API? Do you not think that Fermi's architecture makes many things possible that are not specified by Microsoft as requirements for DX compliance? AMD's old tessellator is an easy example and so is G80's compute support. Why is it any different now?

DeanoC · Feb 26, 2010

trinibwoy said:
Yes, but the API doesn't explicitly facilitate parallel processing. On the contrary it actually makes it difficult to do due to exactly the in-order requirement you mentioned. Hence the hardware is ahead of the software in this case.

Well no, anybody can have parellel geometry if you don't require in-order it just means you have almost no way of knowing what comes out at the other end (imagine what happens with the z-buffer off).
Regardless the order guarentee is in the backend, there is nothing to stop you in any api, to parellel process the geometry as long as you in-order the pixels (you have to, to make non z-buffer rendering working). There have already been parellel geometry engines that have worked fine with the in-order output requirement (i.e. think SGI)

Dave Baumann · Feb 26, 2010

And why are we assuming that Cypress is purely limited to DX capabilities as well?

trinibwoy · Feb 26, 2010

DeanoC said:
Regardless the order guarentee is in the backend, there is nothing to stop you in any api, to parellel process the geometry

You're taking a different slant. You're saying the API doesn't explicitly prevent something. I'm saying that the hardware enables something not explicitly enabled by the API. See the difference? With your perspective you can always say the API is as advanced as the hardware since it defines the output.

Dave Baumann said:
And why are we assuming that Cypress is purely limited to DX capabilities as well?

Who said anything about Cypress?

3dilettante · Feb 26, 2010

Fermi's support for register-indirect branching is something they've beat their drum about. Is that even actually needed for DX11, or in competing chip lines?
The same goes for Fermi's exception handling capability, which overlaps with the indirection in control flow.

Jawed · Feb 26, 2010

trinibwoy said:
Do you not think that Fermi's architecture makes many things possible that are not specified by Microsoft as requirements for DX compliance?

For what it's worth I think there might be a few things in Fermi that will turn up in D3D11.1 or D3D12, e.g. looser constraints on the use of UAVs (not that I've studied this topic closely). I expect Fermi to be forward-looking, frankly.

But I'm intrigued to see what it is you're thinking of specifically. I haven't spent time on CUDA 3.0 to see what clues lie there-in. A quick rummage in G.1:

Floating-point atomic addition operating on 32-bit words in global and shared memory (Section B.10)
__ballot()
__threadfence_system()
__syncthreads_count()
__syncthreads_and()
__syncthreads_or()

apparently reveals the entire set of new CUDA features in Fermi. Some of those are required for CS5.0. Manipulating/inspecting predicates is important stuff.

What about OpenCL 1.1 which is due this summer-ish?

AMD's old tessellator is an easy example and so is G80's compute support. Why is it any different now?

I'm not saying you're wrong. Just curious to see what you're thinking of specifically.

Jawed

Creig · Feb 26, 2010

trinibwoy said:
Seriously though I wonder if Nvidia will do anything major on 28nm. It seems Fermi's featureset is well ahead of DirectX already and has addressed a lot of the concerns in the compute community as well. It should be a much easier ride than this one was.

I guess that remains to be seen. But just because features are present on a GPU doesn't automatically mean they'll be used to any great degree. ATi has had hardware tessellation present since, when? The 2000 series, I think. That was back in 2007. And tessellation is only now just beginning to gain developer support. And neither Ageia nor Nvidia seem to have been able to make any huge impact with PhysX (pun not intended) despite both their best efforts. And that started back in 2006. To say that Nvidia is far enough ahead of its competition to be able to coast a generation is dubious at best and disastrous at worst.

And there is nothing to say that all these hardware technological advancements in Fermi will work properly, either. Remember the PureVideo debacle? A major advertised feature of the then revolutionary NV40 ended up being borked:

Anandtech: NV4x's Video Processor - What Happened?

Given Fermi's extreme lateness, the number of respins and its sheer 3 billion transistor complexity, I've been wondering if perhaps Nvidia doesn't have another NV40 situation on their hands.

NVIDIA GF100 & Friends speculation

Rys

Graphics @ AMD

Jawed

Alexko

MfA

trinibwoy

Meh

Jawed

trinibwoy

Meh

AlexV

Heteroscedasticitate

trinibwoy

Meh

rpg.314

DeanoC

Trust me, I'm a renderer person!

trinibwoy

Meh

Jawed

trinibwoy

Meh

DeanoC

Trust me, I'm a renderer person!

Dave Baumann

Gamerscore Wh...

trinibwoy

Meh

3dilettante

Jawed

Creig

Similar threads