NVIDIA GF100 & Friends speculation

All their consumer software is free - "Cuda", PhysX, Drivers, OpenCL...
It certainly wasn't free to develop, and thus it certainly isn't free for NV to never amortise in the costs of their hardware products. Just because you can download it without paying....well, you get the idea.

Let's just stop this retarded train of discussion please.
 
So let me get this straight: Most of you think that NVIDIA is dumb and:

1) Wasn't expecting Cypress to be smaller than Fermi
Fermi was well under way before the shock and awe of R770. Even AMD engineers weren't convinced that R770 was the right thing to do and were planning to make Cypress much bigger than it turned out. And remember R770's SIMDs/TUs grew by 20% "last-minute" when it turned out it was pad limited, so RV770 was faster than AMD originally planned for.

I doubt anyone here thought NVidia expected Cypress to be bigger than Fermi. After all, NVidia knew that that's not possible at TSMC, you know, since they have a big chip that's basically as big as anyone dares to go at TSMC. And there's a general suspicion that NVidia is addicted to the biggest possible chips.

2) Took absolutely no precautions to ensure that Fermi would be profitable
NVidia simply didn't adjust GF100 to the vagaries of 40nm at TSMC. Notice that AMD did adjust Cypress. Maybe that's simply because AMD is used to debugging TSMC's nodes early in their life, while NVidia "sits back"?

Fermi will undoubtedly be the right chip for 40nm at TSMC, at some point. The question is, when? Will GF102/GF112 (whatever, it's shrunken, tweaked, successor) on 28nm arrive before that? In theory NVidia should have less trouble at 28nm, now that it's been through the pain of this new architecture at 40nm. Depends how painful TSMC finds 28nm, I guess.

3) Doesn't know how to design chips
Clearly has problems executing since 2007. Despite that, decided to bet the pot on a major re-design with the biggest possible chip on a process that TSMC was clearly struggling with.

and despite the forward looking architecture and key elements of it, the performance delta over previous generations is barely 30% higher.
Count me out of that. I'm expecting it to be substantially faster than GTX285. Them old TMUs and ROPs are a disaster zone, for a start. And the ALUs, well...

If the new architecture is as fast as NVidia's claiming then NVidia could easily have afforded to make GF100 smaller to take account of the manufacturing problems at TSMC.

4) Will charge an arm and a leg for it, despite not having a good performance lead over the competition
I'm sure they'll sell all of them, regardless of price.

Jawed
 
3) Doesn't know how to design chips
I think they weren't expecting TSMC to have quite this many problems on 40 nm ...
4) Will charge an arm and a leg for it
They will charge whatever maximizes their profits which in the short term I expect is an arm and a leg simply because of supply and demand ... whether they have a big or a small lead. A lead is a lead, and I think enough people will pay the premium for the fastest single GPU card for that to exhaust their supplies (which I think will be extremely limited) regardless.
 
Clearly has problems executing since 2007. Despite that, decided to bet the pot on a major re-design with the biggest possible chip on a process that TSMC was clearly struggling with.

To be fair it wouldn't have looked like an extraordinary risk at the outset. Theoretically they would have had a GT214/GT212 in the bag before Fermi came to market. But of course, things often don't go according to plan and hindsight is 20/20 etc....
 
To be fair it wouldn't have looked like an extraordinary risk at the outset. Theoretically they would have had a GT214/GT212 in the bag before Fermi came to market. But of course, things often don't go according to plan and hindsight is 20/20 etc....
Cypress is designed for time to market, whereas the Fermi is specialized in the rich feature.

Jawed
 
Cypress is designed for time to market, whereas the Fermi is specialized in the rich feature.

:LOL:

Seriously though I wonder if Nvidia will do anything major on 28nm. It seems Fermi's featureset is well ahead of DirectX already and has addressed a lot of the concerns in the compute community as well. It should be a much easier ride than this one was.
 
It seems Fermi's featureset is well ahead of DirectX already

What is this based on? If we look beyond the re-branding of features mandated by DX11 for promo-PDF usage, I'm having trouble figuring out the parts where the featureset is well ahead of DX...maybe I'm not looking where I shoult be looking?
 
Like what?



CUDA vs CS? Parallel geometry processing where the API assumes otherwise?
CUDA and CS are roughly similar, obviously CUDA takes better advantage of NV hardware so is a little ahead there. No API restricts you to scalar geometry processing. Its entirely upto the driver how it deals with them, as long as the basic triangle order guarentee is kept.
 
CUDA and CS are roughly similar, obviously CUDA takes better advantage of NV hardware so is a little ahead there. No API restricts you to scalar geometry processing. Its entirely upto the driver how it deals with them, as long as the basic triangle order guarentee is kept.

Yes, but the API doesn't explicitly facilitate parallel processing. On the contrary it actually makes it difficult to do due to exactly the in-order requirement you mentioned. Hence the hardware is ahead of the software in this case.
 
That's like saying a kettle doesn't explicitly facilitate making 2 cups of tea.

You can be as glib as you like but I'm sure you get my point. So if this doesn't count then what would possibly count as hardware superceding the DirectX API? Do you not think that Fermi's architecture makes many things possible that are not specified by Microsoft as requirements for DX compliance? AMD's old tessellator is an easy example and so is G80's compute support. Why is it any different now?
 
Yes, but the API doesn't explicitly facilitate parallel processing. On the contrary it actually makes it difficult to do due to exactly the in-order requirement you mentioned. Hence the hardware is ahead of the software in this case.
Well no, anybody can have parellel geometry if you don't require in-order it just means you have almost no way of knowing what comes out at the other end (imagine what happens with the z-buffer off).
Regardless the order guarentee is in the backend, there is nothing to stop you in any api, to parellel process the geometry as long as you in-order the pixels (you have to, to make non z-buffer rendering working). There have already been parellel geometry engines that have worked fine with the in-order output requirement (i.e. think SGI)
 
Regardless the order guarentee is in the backend, there is nothing to stop you in any api, to parellel process the geometry

You're taking a different slant. You're saying the API doesn't explicitly prevent something. I'm saying that the hardware enables something not explicitly enabled by the API. See the difference? With your perspective you can always say the API is as advanced as the hardware since it defines the output.

And why are we assuming that Cypress is purely limited to DX capabilities as well?

Who said anything about Cypress? :LOL:
 
Fermi's support for register-indirect branching is something they've beat their drum about. Is that even actually needed for DX11, or in competing chip lines?
The same goes for Fermi's exception handling capability, which overlaps with the indirection in control flow.
 
Do you not think that Fermi's architecture makes many things possible that are not specified by Microsoft as requirements for DX compliance?
For what it's worth I think there might be a few things in Fermi that will turn up in D3D11.1 or D3D12, e.g. looser constraints on the use of UAVs (not that I've studied this topic closely). I expect Fermi to be forward-looking, frankly.

But I'm intrigued to see what it is you're thinking of specifically. I haven't spent time on CUDA 3.0 to see what clues lie there-in. A quick rummage in G.1:
  • Floating-point atomic addition operating on 32-bit words in global and shared memory (Section B.10)
  • __ballot()
  • __threadfence_system()
  • __syncthreads_count()
  • __syncthreads_and()
  • __syncthreads_or()
apparently reveals the entire set of new CUDA features in Fermi. Some of those are required for CS5.0. Manipulating/inspecting predicates is important stuff.

What about OpenCL 1.1 which is due this summer-ish?

AMD's old tessellator is an easy example and so is G80's compute support. Why is it any different now?
I'm not saying you're wrong. Just curious to see what you're thinking of specifically.

Jawed
 
:LOL:

Seriously though I wonder if Nvidia will do anything major on 28nm. It seems Fermi's featureset is well ahead of DirectX already and has addressed a lot of the concerns in the compute community as well. It should be a much easier ride than this one was.

I guess that remains to be seen. But just because features are present on a GPU doesn't automatically mean they'll be used to any great degree. ATi has had hardware tessellation present since, when? The 2000 series, I think. That was back in 2007. And tessellation is only now just beginning to gain developer support. And neither Ageia nor Nvidia seem to have been able to make any huge impact with PhysX (pun not intended) despite both their best efforts. And that started back in 2006. To say that Nvidia is far enough ahead of its competition to be able to coast a generation is dubious at best and disastrous at worst.

And there is nothing to say that all these hardware technological advancements in Fermi will work properly, either. Remember the PureVideo debacle? A major advertised feature of the then revolutionary NV40 ended up being borked:

Anandtech: NV4x's Video Processor - What Happened?

Given Fermi's extreme lateness, the number of respins and its sheer 3 billion transistor complexity, I've been wondering if perhaps Nvidia doesn't have another NV40 situation on their hands.
 
Back
Top