NVIDIA GF100 & Friends speculation

Ooh, interesting, that should mean it can despatch an instruction to each of the 3 SIMDs and a load/store. That seems better than GF100.
oh I see , if we count the L/S units then it's back to being efficient again .

a quick question , compared to prior and competing architectures , Fermi seems to elaborate more on these L/S units , were they rudimentary in G80 and GT200/Cypress , or did they simply evolve much better in Fermi ?
 
Im sure there'll be three players in the market before that happens. Intel, AMD and Nvidia so the loss won't be quite as bad as you'd think. Its Karma really. When Nvidia bested 3DFX there looked to be only one big player, but then ATI came along and... ;)

Well, when 3DFX were bested you still had multiple GPU vendors that had previously dominated the PC scene. ATI, S3, Trident, Cirrus Logic, Matrox, etc. Who during that time were bumbling along. We're lucky that out of those ATI managed to get it together and become world class competition to Nvidia. Matrox and S3 almost but not quite made it. Trident and Cirrus Logic, well the less said about them the better. :D

There isn't a similar pool of former PC graphics heavyweights with cash to compete anymore. Even former 3D specialists like 3DLabs were still around back then but are no longer present. We're left with basically Matrox (very doubtful they'll spend the cash to be anything other than a niche player) and Intel (who are most likely re-evaluating their entry into the graphics market and whether it's a good idea).

The barrier of entry in cost and technology is very high right now as Intel found out. I'm not sure a 3rd player will have an opportunity until there's a radical shift in graphics rendering perhaps brought on by current rendering technology hitting a wall.

Although who knows if Nvidia or ATI manage to put the other out of business and get complacent that they do not innovate for a period of many years, that might open up an opportunity.

Regards,
SB
 
Well, when 3DFX were bested you still had multiple GPU vendors that had previously dominated the PC scene. ATI, S3, Trident, Cirrus Logic, Matrox, etc. Who during that time were bumbling along. We're lucky that out of those ATI managed to get it together and become world class competition to Nvidia. Matrox and S3 almost but not quite made it. Trident and Cirrus Logic, well the less said about them the better. :D

Yes the less thats said the better... :p

There isn't a similar pool of former PC graphics heavyweights with cash to compete anymore. Even former 3D specialists like 3DLabs were still around back then but are no longer present. We're left with basically Matrox (very doubtful they'll spend the cash to be anything other than a niche player) and Intel (who are most likely re-evaluating their entry into the graphics market and whether it's a good idea).

The barrier of entry in cost and technology is very high right now as Intel found out. I'm not sure a 3rd player will have an opportunity until there's a radical shift in graphics rendering perhaps brought on by current rendering technology hitting a wall.

I suspect there are a number of Chinese companies looking at the market with more than a little envy. Given the cost of labour being significantly lower and all that in China and the number of over-educated under-employed engineers there it makes sense that either China or perhaps India which lives with pretty similar conditions and have trained engineers courtesy of various outsource projects.

In any case I doubt Intel has abandoned graphics, I suspect they will do a bottom up approach given the fact that 'good enough' is probably the best part of the market to be in especially when your product makes up the majority of GPUs shipped overall. So the question is how they intend to tackle the mid end if they have decided that the high end is too much to bother with. Increased environmental regulation and a need to offer better performance/watt overall can only help them make a case in the market.

Although who knows if Nvidia or ATI manage to put the other out of business and get complacent that they do not innovate for a period of many years, that might open up an opportunity.

Regards,
SB

Well theres always that third player which is Intel which you cannot ignore. AMD fusion is reason enough for Intel to have their own reasonably high performance GPU mated to their CPUs considering AMD is their major competition.
 
Although who knows if Nvidia or ATI manage to put the other out of business and get complacent that they do not innovate for a period of many years, that might open up an opportunity.
Well, I don't think that would happen. What I think would happen is that one IHV may be forced to target their products solely for the lower end of the market for a while. An IHV could last a very long time even without a top-performing flagship product, as long as their low-mid range products are decent value and have features that differentiate them from the competition. They may not do great, but they could definitely survive until they can get back on their feet.

The only other option, really, would be gross mismanagement. Which does occasionally happen.
 
The only other option, really, would be gross mismanagement. Which does occasionally happen.

Yeah, like 3dfx. Although ATI is further handicapped by having to hope the CPU side can continue to do well enough against Intel that it doesn't sink the business. On the other hand it's in Intel's best interest to keep AMD around which is why there isn't significant pressure on AMD CPU chips in the sub 100 USD segment.

Regards,
SB
 
On the other hand it's in Intel's best interest to keep AMD around which is why there isn't significant pressure on AMD CPU chips in the sub 100 USD segment.

Regards,
SB
I don't understand this point. Why do you think keeping AMD around is in Intel's best interest?

In any event, AMD has been around making x86 CPU's ever since IBM first picked up Intel's processors. I'd be willing to bet they'll stick around for a while longer.
 
Did you guys read anand's piece ?
they say that increasing core count per SM resulted in a super scalar fasion utilization of cores , where the driver and dispatch logic share roles in distributing the code .

this could mean that at a worst case scenario a GTX 460 could perform like a 256-core part .

Also (to Jawed) register file size didn't increase .
 
Strange that noone is testing tesselation benchmarks. Only chinese had some strange heaven benchmarks with just gtx460 against 5770 and 5830. And the final scores were quite close.
Any tesselation tests against gtx470 and gtx480.:?:
 
I don't understand this point. Why do you think keeping AMD around is in Intel's best interest?

In any event, AMD has been around making x86 CPU's ever since IBM first picked up Intel's processors. I'd be willing to bet they'll stick around for a while longer.

Because they'd have all the anti-monopoly agencies (or whatever they're called) hanging them from their balls if AMD went under for any reason
 
Looks like this multiprocessor setup will be the "tryout" template for a 28nm shrink of retrofitted GF100 design -- 768 SPs, 128 TMUs, etc.
 
a quick question , compared to prior and competing architectures , Fermi seems to elaborate more on these L/S units , were they rudimentary in G80 and GT200/Cypress , or did they simply evolve much better in Fermi ?
R600 has load/store. NVidia earlier than Fermi issues non-texture loads and non-graphics stores as instructions with memory-addressed operands/resultants.

The scheduling of load/store in Fermi basically means it's more like texture fetch or pixel export: latency can be hidden and it doesn't occupy operand collector capacity. Instead load/store act directly upon the register file. Older NVidia GPUs could hide the latency, but it impinged on the operand collector.
 
Did you guys read anand's piece ?
they say that increasing core count per SM resulted in a super scalar fasion utilization of cores , where the driver and dispatch logic share roles in distributing the code .
The GF100 whitepapers actually make a virtue out of the fact that superscalar issue is not required :rolleyes:

whitepaper said:
Because warps execute independently, GF100’s scheduler does not need to check for dependencies from within the instruction stream. Using this elegant model of dual-issue, GF100 achieves near peak hardware performance.

Basically it means the scheduling hardware gets even more complex. Which is why FLOPS/mm² is still appalling.

this could mean that at a worst case scenario a GTX 460 could perform like a 256-core part .

Also (to Jawed) register file size didn't increase .
Not surprising. But I do expect register file bandwidth to be a real bottleneck and yet another reason why "performance will be more like a 256-core part". We'll see.

Jawed
 
Not surprising. But I do expect register file bandwidth to be a real bottleneck and yet another reason why "performance will be more like a 256-core part". We'll see.

Jawed

Do we have any examples where AMD's ILP comes to bite them and we can expect the same for NV?
 
Do we have any examples where AMD's ILP comes to bite them and we can expect the same for NV?
Something simple like issuing MAD to all three SIMDs where there's no operand shared by any of the 3 instructions.

Or all of that, plus a store.

In ATI it's not possible to issue 15 distinct operands in one instruction (consisting of 5 MADs). Only 12 distinct operands are available.

In GF104 it appears that only 2 warps can issue at a time. Each warp can then issue upto 2 instructions. So we're talking about warp A issuing MAD with operands 1, 2 and 3, warp A issuing MAD with operands 4, 5, 6, warp B issuing MAD with operands 7, 8, 9 and warp B issuing a store with operand 10.

So the question is, is there enough register file bandwidth to support issue with those 10 operands?

Jawed
 
GF104 has the same cache structure like GF100, you were right. ;)

Yes, because the amount of Registers and L1 Cache went up along with the CC count per SM :rolleyes:

Oops!

Finally, we have the ROPs. There haven’t been any significant changes here, but the ROP count does affect compute performance by impacting memory bandwidth and L2 cache. Even though NVIDIA keeps the same number of SMs on both the 1GB and 768MB of the GTX 460, the latter will have less L2 cache which may impact compute performance. Compute performance on the GTX 460 may also be impacted by pressure on the registers and L1 cache: NVIDIA increased the number of CUDA cores per SM, but not the size of the Register File or the amount of L1 cache/shared memory, so there are now additional CUDA cores fighting for the same resources. In the worst case scenarios, this can hurt the efficiency of GF104 compared to GF100.
 
Last edited by a moderator:
Every SM has 64kb L1 Cache / shared memory and every ROP partion has 128kb L2 Cache.
Looks like the exact same structure.

SM / L1 numbers stayed the same, but the density inside the SM increased, leaving you with a lower ratio of CUDA Cores per Register/L1kb, see Jawed's description above.

GF100 = 32CC has 16Load/Store/64kbL1
GF104 = 48CC has 16Load/Store/64kbL1
 
Back
Top