FrameBuffer
Banned
If they turn out to be true, I hope Charlie removes all his negitivity about the card from here and his site as he will have to eat major crow.
and if not we're to assume you'll do the same ??
If they turn out to be true, I hope Charlie removes all his negitivity about the card from here and his site as he will have to eat major crow.
But it would be quite a waste of resources. Remember that all units (in both subblocks) contain extended multiplier capabilities anyway. Why would they add that to a unit in the SP only subblock if it is wasted for DP, which needs an even larger multiplier (and the size of a multiplier scales roughly with the square of the number of bits to multiply)? They could have solved this by issuing 32bit integer multiplies only to the DP subblock. That would still be an enourmous increase in throughput and actually sensible as 32bit integer multiplies are not that important anyway (compare with the integer multiplier capabilities of CPUs).For some reasons, Nvidia doesn't seem to want or to not be able to go the AMD way. Their chips tend to be quite big anyway and so making the basic units even bigger for coupling is maybe a worse choice wrt to their basic architecture than had they gone also for a 5-way VLIW.
Plus, it would go completely against their traditional strategy of first implemention for experiments, then making the feature usefull and only after that to go fully along that route.
Yes, that's right for the older chips. Do we know if Fermi can still do this? ALU instructions now only run for 2 (hot) clocks, of course even if the chip can't do it it would still be able to co-issue at least one fp op from the other scheduler.I don't think that's accurate. At least one patent highlights the ability of the dispatcher (running at base clock) to issue instructions to the SFU and ALU pipelines in alternate cycles, even in G80 class hardware. Each ALU instruction runs for 4 hot-clocks or 2 base-clocks which provides a window to do so. The SFU and ALU pipelines presumably have dedicated operand collectors to support this as well.
Yes, that's right for the older chips. Do we know if Fermi can still do this? ALU instructions now only run for 2 (hot) clocks, of course even if the chip can't do it it would still be able to co-issue at least one fp op from the other scheduler.
and if not we're to assume you'll do the same ??
Right, the instructions are issued (they run much longer) over two clocks (one half warp per clock) now instead of a half warp every two (hot) clocks. But the same was true in a sense for G80/GT200 too. The scheduler only alternated between the ALUs and SFUs, now the schedulers can issues all types of instructions every two cycles.Yes, that's right for the older chips. Do we know if Fermi can still do this? ALU instructions now only run for 2 (hot) clocks
GF100's maximum load temperature is 55 C.
OMG, if that one is true I will eat my Razer hat!
Hey if those benches are right and nv's yields and die size as rumored it means an excellent new top-end part and 5870s for $299 so what's the matter with that?
http://www.pcgameshardware.com/aid,...DX-11-Update-Radeon-HD-5970-results/Practice/
Dirt2 performance; now if a GF100 can yield 148 fps in 1920*1200 with 4x Supersampling and 16xAF then of course pigs can fly
oops, good catch there for "run for two cycles"...Right, the instructions are issued (they run much longer) over two clocks (one half warp per clock) now instead of a half warp every two (hot) clocks. But the same was true in a sense for G80/GT200 too. The scheduler only alternated between the ALUs and SFUs, now the schedulers can issues all types of instructions every two cycles.
I don't know , but he corrected it later , saying that Dirts 2 was MSAA , but AVP3 was SSAA !!Not saying it's true, but as we all discussed before, some loads favor one architecture over the other. In GRID for example, RV770s mopped the floor with GT200s and yet GT200 were faster in almost everything else. Why can't this be a similar case for GF100 vs RV870 (if true) ?
Not saying it's true, but as we all discussed before, some loads favor one architecture over the other. In GRID for example, RV770s mopped the floor with GT200s and yet GT200 were faster in almost everything else. Why can't this be a similar case for GF100 vs RV870 (if true) ?
No that's not the reason. I thought the decoupling meant it could sort of self-schedule for handling the full warp in 8 clocks, but it still needed "normal" instruction issue first (hence only on the first (two) clocks nothing else could be issued). If that's fully decoupled from dispatch unit, that's kind of weird SFU instructions would have a completely separate scheduler? That also doesn't really fit into the other quotes about dual-issue."The SFU pipeline is decoupled from the dispatch unit, allowing the dispatch unit to issue to other execution units while the SFU is occupied."
That pretty explicitly says that the dispatch unit can issue more instructions while the SFU is running. Why do you take it to mean the opposite? Are you taking "dispatch unit" and "scheduler" to be two different things because I believe they're one and the same.
What trinibwoy said.oops, good catch there for "run for two cycles"...
I think I still don't quite get how the dual warp schedulers work.
The fermi whitepaper states that "Fermi’s dual warp scheduler selects two warps, and issues one instruction from each warp to a group of sixteen cores, sixteen load/store units, or four SFUs." (note the or) and "Most instructions can be dual issued; two integer instructions, two floating instructions, or a mix of integer, floating point, load, store, and SFU instructions can be issued concurrently".
And also (for SFU) "a warp executes over eight clocks. The SFU pipeline is decoupled from the dispatch unit, allowing the dispatch unit to issue to other execution units while the SFU is occupied."
That's confusing to me, the latter somehow seems to suggest that after an instruction is issued to the SFU it'll just run for eight clocks without any further dispatch needed but in turn it's not possible to issue anything else at the same time (from one scheduler).
Yes ok but then it's unrelated to dual issue.Are you sure you're not over-complicating it? "Decoupled" simply means that the schedulers can issue an instruction and then go about their business issuing other instructions without blocking on the completion of the first one. It doesn't imply that there's a dedicated SFU scheduler.
Ah ok I thought that's how it handled SFU issue - first clock for alu, second for sfu.In terms of "self-scheduling", I'm not sure that's the right term. In past architectures the ALUs would process a warp over 2 base-clocks but the instruction itself is issued in only the first. It's the operands that are fed over 2 clocks, not instruction issue. Depending on operand collector bandwidth all the operands for a Fermi SFU instruction may be provided in the first clock as well or fed in over 2 or 4 clocks.