NVIDIA GF100 & Friends speculation

by this "nVidias "AUSUM equivalent"" I'm not saying that it'll have a switch or anything, but just regular OC by the end user, but who knows maybe it'll have a switch...Also I'm not saying that the OC headroom will be awesome, if the stock voltages are low, but once the voltages are cranked up to regular GTX 5xx levels, it should fly.
Well yes the chips should be able to do higher, obviously. However, I'm not sure if that'll be very practical. As long you only increase clocks it should be fine but probably not that much headroom. If you increase voltage and clocks to GTX 580 levels, you're looking at 600W (unthrottled) furmark power consumption, and even in games you will reach about 500W. I don't know if the card is built to really handle that (VRM) or just goes up in smoke, but even if it is you'll almost certainly need ear protection (not that this would be different to the OC HD 6990). Sure if all you care is "highest 3dmark score" then that doesn't matter, but otherwise you probably need some more investments in water cooling.
 
So nvidia has done it - asymmetric memory configuration (but with 6 chips, not 4 or 8 where I claimed BS).
I'm really interested in how the memory management is actually done - at the very least it seems to work well enough, I didn't see any apparent anomalies in benchmarks at first sight. The perf increase for a 550 over a 450 seems SLIGHTLY above what you could expect just from the clock increase (roughly 15%) (not really without AA but with AA it's like an additional 5% faster), so I guess that's ok (I never really expected it to make much of a difference as the GTS 450 was not much constrained by bandwidth to begin with).
The (load) power consumption isn't that good, and despite nvidia apparently really trying hard to beat the HD5770 (high clocks/voltage/power consumption) it fails to really distance itself from that - at a (for now) much higher price.
 
Well yes the chips should be able to do higher, obviously. However, I'm not sure if that'll be very practical. As long you only increase clocks it should be fine but probably not that much headroom. If you increase voltage and clocks to GTX 580 levels, you're looking at 600W (unthrottled) furmark power consumption, and even in games you will reach about 500W. I don't know if the card is built to really handle that (VRM) or just goes up in smoke, but even if it is you'll almost certainly need ear protection (not that this would be different to the OC HD 6990). Sure if all you care is "highest 3dmark score" then that doesn't matter, but otherwise you probably need some more investments in water cooling.

I wouldn't be surprised if the card could handle 500w and with perhaps hand picked chips it should perform pretty well. If all nVidia cared about was a practical card instead of a halo performance card, they either wouldn't release this at all or would release a more sensible 2x560 card, but yeah one probably doesn't build a silent rig around GTX 590 :)
 
some Fermi based cards bandwith numbers measured by cuda bandwith test

GTX 460 @675/3600MHz
58952.9 MB/s

GTX 460 @950/4600MHz
78165.6 MB/s

GTX 560Ti @880/4000MHz
82228.0 MB/s

GTX 570 @732/3800MHz
119327.5 MB/s

GTX 580 @823/4276MHz
149636.8 MB/s

any comments why GTX 460 numbers half of actual numbers??
 
Hm, after reading some 550 Ti reviews, it is now safe to say: GF106/116 is still the worst part of the line-up in perf./mm², and at current prices in perf./$ as well.
Not that it was unexpected...

any comments why GTX 460 numbers half of actual numbers??
maybe they tested the 768 MB model?
edit: or are these your own numbers? In that case it might be simply a bug, maybe even the same bug as with the 550 Ti (mentioned in the 550 review on anandtech).
 
I have to say NV got the price totally wrong for GTX 550. If they had launched at the same price as GTS 450, ie $129 it might have been acceptable. But when a Radeon 5770 sells for $110 and even the GTS 450 sells for under $100, how did they think they'd get away with $149? :???: Might as well get a Radeon 6850 for just over $160 and get way higher performance
 
KFA2 plans dual GTX 560 Ti graphics card
The keen eyed among you may notice that the prototype board pictured below sports a pair of GF104 GPUs, similar to those found on GeForce GTX 460 cards. However, KFA2 claims that this is because it originally developed the card for the GF104 chip. The company says it's since decided to upgrade the card's GPUs to the GF114 chip found on new GeForce GTX 560 Ti cards.
http://www.bit-tech.net/news/hardware/2011/03/15/kfa2-plan-gtx-560ti-sli-on-a-single-pcb/1
 
So, we saw that the 590 has at least 3 DVI connectors, do you guys think they're going to try and beat AMD at 5x1 or 2x3 set-ups? (quad-sli)
 
Anand's GTX 550 review :
GF116 has 3 64-bit memory controllers, each of which is attached to a pair of GDDR5 chips running in 32bit mode ... The best case scenario is always going to be that the entire 192-bit bus is in use, giving the card 98.5GB/sec of memory bandwidth (192bit * 4104MHz / 8), meanwhile the worst case scenario is that only 1 64-bit memory controller is in use, reducing memory bandwidth to a much more modest 32.8GB/sec.
Why would that happen ? the way I understand it , is that there are two 64-bit controllers with 1Gb chips , and one 64-bit controller with 2Gb chips , the reduced bandwidth could only occur in the last controller , why would it happen across the entire array ?
 
Anand's GTX 550 review :

Why would that happen ? the way I understand it , is that there are two 64-bit controllers with 1Gb chips , and one 64-bit controller with 2Gb chips , the reduced bandwidth could only occur in the last controller , why would it happen across the entire array ?

Technically "reduced bandwidth" could happen on any configuration, not just in this case. There is a "hashing" algorithm to make sure data is spreading over all memory controllers. So in some very rare situation, it's possible that all data you want is in a single memory chip so the bandwidth will be reduced. The larger memory chip only makes this a bit more likely to happen.
 
That may be but what happens once you have equal amounts in all 3 pools, but there's still excess capacity in the 3rd pool with the 2Gb chips?

So is it just that anything above 768 MB limited to 32.8 GB/sec?

Regards,
SB
 
That may be but what happens once you have equal amounts in all 3 pools, but there's still excess capacity in the 3rd pool with the 2Gb chips?

So is it just that anything above 768 MB limited to 32.8 GB/sec?

Regards,
SB

Generally it's not assigned this way. It's more likely to be interleaved. For example, you can assign byte 0 ~ 15 to the first memory chip, 16 ~ 31 to the second, 32 ~ 47 to the third, and 48 ~ 63 to the fourth, etc.

In the case of one memory controller has a larger memory chip, you'll have to assign two interleaved sections to that controller, such as, 0 ~ 15 for the first, 16 ~ 31 for the second, and 32 ~ 63 for the third (and there's no fourth memory controller).

If your access pattern is linear (sequential) or truely randomized, it should be very unlikely that all your memory access hit the third memory controller.
 
Generally it's not assigned this way. It's more likely to be interleaved. For example, you can assign byte 0 ~ 15 to the first memory chip, 16 ~ 31 to the second, 32 ~ 47 to the third, and 48 ~ 63 to the fourth, etc.

In the case of one memory controller has a larger memory chip, you'll have to assign two interleaved sections to that controller, such as, 0 ~ 15 for the first, 16 ~ 31 for the second, and 32 ~ 63 for the third (and there's no fourth memory controller).

If your access pattern is linear (sequential) or truely randomized, it should be very unlikely that all your memory access hit the third memory controller.

So basically you never attain the theoretical 98.5 MB/s transfer rate but also don't drop down to worst case 32.8 MB/s?

I'm assuming that with half of all data going to 1 memory controller that it could be a rather significant bottleneck.

Regards,
SB
 
In the case of one memory controller has a larger memory chip, you'll have to assign two interleaved sections to that controller, such as, 0 ~ 15 for the first, 16 ~ 31 for the second, and 32 ~ 63 for the third (and there's no fourth memory controller).
If you do it that way for all the memory, you'd effectively have a 128bit controller however (half the data in the 3rd controller).
Given anandtech's finding, I think it's likely it's done similar to how it's done with cpus: memory is interleaved evenly to all channels for the first 768MB, and so the rest really only has a 64bit interface.
I think this isn't really that bad due to two reasons:
a) the memory above 768MB is mostly there for marketing reasons only anyway.
b) even if it is used in the rare game with high resolution (and AA) it should be possible to put low-bandwidth stuff there (compressed textures maybe), not really hampering performance that much (at the very least still much faster than swapping things to main memory).
 
mczak: Yes, as long their memory management can avoid the last 256mb - not an usual requirement, and generally their memory management has seemed a little sub par.
 
Back
Top