Nvidia GT300 core: Speculation

Status
Not open for further replies.
Yeah, well - it's kinda easy and nice (PR-wise) to ride the open-standards wave.

I wonder why AMD hasn't bought all the rights to a proprietary technology with about 30% market share and then put it into public domain, making it accessible for everyone via Open Standards(tm). Havok* for example?
Because it's effectively that's exactly what those statements say about Nvidia and Physx.

They bought Ageia and Physx (an asset), invested (dunno how much) into it's further development and (i am inclined to think) a whole lot more into it's marketing and now want to earn money with it. I wonder why people think that's somehow unethical for a company.

What's so damn amoralic of advancing technology and at the same time trying to earn money with it, if the alternative seems to be waiting for Godout?

How many people bought into the hype and purchased cards based on that? What was delivered? How far would all this have progressed had they not tried to monopolize the game physics and simply worked towards an open standard? How was physics good for anyone, except for possibly nVidia?

Edit: Didn't read til the end of the thread!
 
Last edited by a moderator:
Code:
<14:30:54>      AlexV - did you see the gt300 leak?
<14:31:38>        rys - Nope
<14:32:53>      AlexV - [url]http://golubev.com/about_cpu_and_gpu_2_en.htm[/url]
<14:32:57>      AlexV - bottom of page
<14:34:06>        rys - It's not right
<14:34:30>      AlexV - well that takes care of it

Given that it's silly season around here anyway, I shudder to think what it's going to be like in a few weeks! :p

You know Rys.. I havent seen on IM in ages.. :(
 
A few weeks? What's going to change in a few weeks? 2H 2010 might be Charlie FUD, but what is the chance that AMD would point out of the ballpark and tell everyone they were going to be first with just a 6 week lead? Of all the information available I still find AMD's 100% certainty of being first the greatest reason to not expect anything before Christmas (well product wise at any rate).
 
He's referring to his own speculation here.

Nvidia was always aiming for a launch close to Black Friday, the first Friday after Thanksgiving and this year it is on November 27th. It is important to launch the product before this date as most of the shopping is usually done around this date.
 
Trinibwoy ... it doesn't, but when you are looking into the future by 4 months and a lot can still go wrong it's a rather small margin of error.
 
Sure, but Rys isn't necessarily referring to GT300 showing up at Newegg. There are lots of other tidbits that could leak or be released before that.
 
Regarding the MIMD rumour, it's quite unlikely that GT300 has an instruction decoder and scheduler (+ I$?) per SP. Perhaps it refers to be able to have some sort of task parallelism while running in compute mode.
 
Ever since I read that dynamic warp formation paper, whenever I hear about MIMD on GPUs I always assume it's still SIMD hardware but MIMD from the view of the running program. I've never really understood why the data sent to a SIMD has to be from the same warp. The SIMD doesn't care does it?

Could they practically extend GT200's scoreboarding mechanism to simply collect bundles of "ready" threads and their associated operands from any/all running warps? It would probably require some sort of operand buffering mechanism and trickier prioritization but it doesn't sound much different from what they're doing now anyway.
 
There are two different arguments mixing here. One thing is being able to re-converge your threads and pack in a warp only (or mostly..) threads that share the same IP. Another thing is being able to schedule instructions from diverged control flow or even different programs into the same warp. The latter is way more complex (and requires more instruction decoders)
 
BTW, what is already accumulated in warps at the moment? Are only vertices/fragments of a single drawcall combined? Or will it try to see what changes in between drawcalls?
 
There are two different arguments mixing here. One thing is being able to re-converge your threads and pack in a warp only (or mostly..) threads that share the same IP. Another thing is being able to schedule instructions from diverged control flow or even different programs into the same warp. The latter is way more complex (and requires more instruction decoders)

Could you expand on that a bit? Why is packing by PC easier than packing by instruction?
 
Could you expand on that a bit? Why is packing by PC easier than packing by instruction?

Packing by PC *is* packing by instruction (not the other way around though :) ).
What's harder is to pack different instructions in the same warp, as you need to improve, among other things, your instructions decoding rate.
 
What's harder is to pack different instructions in the same warp, as you need to improve, among other things, your instructions decoding rate.

I'd be concerned about operand fetching as well.

Does this get easier if your instruction set is simplified? I don't recall the exact details, but it seems like the instruction set is already pretty sparse, with MADD being the odd outlier and operand types (int16 vs. int32 vs. float vs. double?) contributing. Do we gain much by splitting the MADD? At some cost to operand bandwidth, one could gain greater use of the two math units, and it is a simplification (no more triple operand fetches, fewer instructions to support).
 
Packing by PC *is* packing by instruction (not the other way around though :) ).
What's harder is to pack different instructions in the same warp, as you need to improve, among other things, your instructions decoding rate.

No, I'm saying to pack the same instruction but not necessarily the same PC. A PC specifies not only an instruction, but an instruction at a specific point in the program. Not sure why you would need a faster decoding rate if you're packing by decoded instruction. There'll be more latency between decoding and execution but that shouldnt matter.

Edit: I'm assuming here that instructions and operand addresses are kept separate. If that's not the case then ignore me :)
 
No, I'm saying to pack the same instruction but not necessarily the same PC. A PC specifies not only an instruction, but an instruction at a specific point in the program. Not sure why you would need a faster decoding rate if you're packing by decoded instruction. There'll be more latency between decoding and execution but that shouldnt matter.
You make the assumption that you can decode more instructions without stalling execution, but that works only if you increase the instructions decoding rate (unless you also want to assume that current parts are unbalanced..)
 
Ok, now I understand what you're saying. :oops: But that investment could be amortized over wider SIMDs or something (which they probably need to do regardless to avoid AMD completely running away with flops/mm).
 
Ok, now I understand what you're saying. :oops: But that investment could be amortized over wider SIMDs or something (which they probably need to do regardless to avoid AMD completely running away with flops/mm).
Well, if you make it wider, than you need even more decoders to able to fill a bigger warp with useful work to do :)
 
Status
Not open for further replies.
Back
Top