Nvidia GT300 core: Speculation

Status
Not open for further replies.
You clearly haven't read what I wrote about data transfer errors. We are dealing with GDDR5, it won't fail, it will scale badly or even impact perf. Moreover no app is entirely ALU limited or bw limited, bottlenecks are dynamic and constantly change while rendering a single frame.
BTW..not just talking about the memory modules 'failing', the GDDR5 interface can fail as well.

Albeit I don't know anything yet but assuming the 384bit bus for GF100 is true, what guarantees that we might see something similar here too?
 
I on the other hand believe that CPU style caches dont scale. LRB's rendering pipeline is an ample proof of that. We'll need scratch pad memories, just like cell/gpu's of today. However, the one thing that I'll change over cell is to allow vector scatter gather from global memory as well, and not just async. dma's.

Cell programmers might be banging their heads against walls, stones etc. But gpu programmers have got on pretty fine in the last 2.5 years on CUDA.
 
. But gpu programmers have got on pretty fine in the last 2.5 years on CUDA.
If you believe that you haven't read enough CUDA based research papers :)
edit: sooner or later nvidia & ati will add proper coherent r/w caches to their architectures, it's just a matter of time.
 
You clearly haven't read what I wrote about data transfer errors. We are dealing with GDDR5, it won't fail, it will scale badly or even impact perf. Moreover no app is entirely ALU limited or bw limited, bottlenecks are dynamic and constantly change while rendering a single frame.
Yes, I did read what you wrote and I do understand it. And nothing you say contradicts the fact that Crysis scaled better with engine clock. It doesn't matter if the memory wasn't scaling as well due to errors: 9% engine clock gave 5% performance boost. If both engine and memory were increased by 9% the maximum gain we'd expect would be 9%. So 9% memory clock increase could give at most 4% more performance.

Engine clock is having a larger impact here. Note that engine speed regulates more than just ALU speed, it also controls ROP performance, vertex rates, etc.

-FUDie
 
If you believe that you haven't read enough CUDA based research papers :)
May be. But I'd like to see someone using r/w coherency of caches on a say O(50) core chip with high performance to be convinced otherwise.

edit: sooner or later nvidia & ati will add proper coherent r/w caches to their architectures, it's just a matter of time.

I am in the software managed caches camp for now. r/w coherent caches hurt more than the help in the O(50) cores regime, as your compute increases as O(p) but your communication increases by O(p^2).
 
May be it is possible to reduce the O(p^2) to something lower, but I am still waiting for something that uses the r/w coherency of caches on an O(50) core chip with high performance.
 
Seriously, if Rys is already doing diagrams of GF100 (while those for HD5k are still not out yet?), I'll definitely wait for the GF100 before deciding where to sink my money.
Those for HD 5870 are done, and were done before I started work on GF100 (thanks Alex!). We'll publish on it soon.
 
GPU specifications
This is the meat part you always want to read fist. So, here it how it goes:
* 3.0 billion transistors
* 40nm TSMC
* 384-bit memory interface
* 512 shader cores [renamed into CUDA Cores]
* 32 CUDA cores per Shader Cluster
* 1MB L1 cache memory [divided into 16KB Cache - Shared Memory]
* 768KB L2 unified cache memory
* Up to 6GB GDDR5 memory
* Half Speed IEEE 754 Double Precision
BSN
 
GF100 ? where did this come from I know about G300, but Gf100 ???

edit: and Gt212 what the bloody hell is that ?

GT212 was IMHO a 40nm/D3D10.1 project which would had been a pretty dumb release considering that it also had a 384bit bus and 32SPs/cluster. It wouldn't had come close to GF100 though but most likely a future performance iteration of it. I'd say that if they had any common sense when they cancelled that project they moved its human resources into a GF10x performance GPU project.

Since you're asking questions I hope now some come can understand why the intentional false information in supposed roadmaps. They just "named" the D12U something like GTX280 1.5GB.
 
Maybe it could happen, though like Charlie I would question whether it would be wise to try to out-Larrabee Larrabee.

Looks like that's exactly what they're trying to do. Strange that there's no mention of any graphics specific bits so far. Not saying there aren't any but the focus seems to have veered sharply away from graphics.

A clean-sheet design that would basically abandon a huge chunk of the G80-GT200 framework would take time and resources to bring about. Given the time cycles for something like that, the roughly four years since the completion of G80 (assuming GT200's somewhat underwhelming improvements meant it was a secondary effort) would be a frighteningly tight timeline to architect a general purpose VLSI architecture.

That's true, but the same could be said for G71->G80 which was an even bigger change. Though they are trying to do more stuff now which could have put a strain on resources.

A big problem I see, as was noted in the discussion concerning the latency of Nvidia's atomic ops, was how the read-write-read process for GPUs with their read-only caches was so very long. As far as general computation is concerned, the rearchitecting of how caches interact would be something Nvidia would be interested in looking at...

It's probably safe to assume that if they're serious about computing, performance of atomics would have been high on their todo list. Side question - are the existing caches on GPUS generally useful for non-texture data (not referring to the specialized caches like PTVC)?
 
Status
Not open for further replies.
Back
Top