Nvidia GT300 core: Speculation

Status
Not open for further replies.
It's funny that Charlie was so busy hailing Nvidia's downfall that the "journalist" in him neither noticed nor questioned those obvious inconsistencies :LOL:

Hadn't you heard? "Obvious Inconsistencies" is actually the name of his next website.
 
http://www.semiaccurate.com/2009/09/01/nvidia-roadmaps-turn/

Easier to post the pic than try to sum it up in words:

NV_quarter_roadmap.jpg




There's a second picture which fleshes things out. Except it looks horribly confused, naming GTX280 (not GTX285) and assigning it 1792MB of memory.

Jawed

Wow. Interesting... That article looks amazingly a lot like my wild guess that I made last week in the other topic.... :oops::sly:
 
Wow. Interesting... That article looks amazingly a lot like my wild guess that I made last week in the other topic.... :oops::sly:

Yeah, if that post was pure speculation I guess we found the source for that table.. :LOL: (not saying that charlie himself constructed it from the post ofcourse)
 
Maybe the roadmap is what NVidia used to confuse AMD - this is the roadmap that lead AMD to think it would be able to show Evergreen months before NVidia could show anything. Now, things have changed.

Jawed
 
Oh, thanks :)

But there's already a 55nm overclocked GTX280. It's called the GTX285. And 55nm anything will be worthless against Cypress. So he hasn't really explained why either that table or his article make any sense.

I think he means something faster than the GTX285. After all, it was introduced months ago, when Nvidia began shipping GT200b. Since then, the process has probably improved and they should be able to release something 10% faster or so.
 
How about indexable registers in GT300? In G80 and in GT200, all registers' usage must be statically known. You can use thread local arrays inside kernels, but all of them are implicitly spilled to local memory. This significantly impacts many workloads which need arrays but cannot afford to go to off chip memory in order to preserve performance.

I am not 100% sure, but I think AMD gpu's already allow indexable registers. If GT300 would add indexable registers, that would be a BIG plus for CUDA.

What do you think about possibility of indexable registers in GT300?
 
You are right timothy, but the shared memory space is just a quarter of the register file on GT200. Real context storage needs registers as volkov showed. There he got away with using static registers, but in other situations, it is not always possible.

In gt200, reg file is 64k wide per sm and there is 16k of shared mem per sm. I am expecting (if the architecture is broadly gt200 based) gt300 to have a minimum of 128k reg file and 32k shared mem (minimum mandaed by dx11). There you have already reached 62.5% of lrb's per core cache but you have much less flexibility.

But if you convert all of that to shared memory, and provide very few registers (say 4-5) per thread for storage (mostly for storing pointers to the per thread area in shared memory), and instead use shared memory to store per thread context, you gain the flexibilty of lrb in that you can get good throughput even with fewer threads.

IOW, cut down on reg file and use shared memory to simulate the present day local memory (which is off chip today).

This way, you gain the benefit of fast indexed registers and yet you don't need to modify your hw architecture too much. After all, the real per thread registers are still statically addressed.
 
Status
Not open for further replies.
Back
Top