AMD: R7xx Speculation

Status
Not open for further replies.
Yeah I do have a problem with the word. Especially coming from people who should be a bit more insightful.

My point is that the comparison would be useless with a PPU as well. Or an 8-core CPU. The combined 3dmark score was useless from the beginning. This just makes it even more so. The whole "Nvidia is cheating again" mantra is very catchy but this time it seems to be bred from ignorance more than anything else.

In any case, how would Nvidia avoid "cheating" in this case? Doesn't 3dmark just make calls to the PhysX API?

Does CPU test 2 actually get a boost from having a PPU installed? And do the GPU physics feature tests (flags and particles) get boosts from these physx drivers? If we're only seeing an increase in CPU test 2 and nothing else including the actual GPU physics tests NV is definitely cheating.
 
So what is Crysis bottlenecked by? The GTX 280 shows a comparatively small (peaking at about 40%) performance bump over the 9800 GTX/8800 GTX, and we know how unfriendly the game is to multi-GPU solutions.
 
So what is Crysis bottlenecked by? The GTX 280 shows a comparatively small (peaking at about 40%) performance bump over the 9800 GTX/8800 GTX, and we know how unfriendly the game is to multi-GPU solutions.

Its bottlenecked by poor optimisation. They admitted as much with the announcement of Crysis:Warhead which is supposed to run at 30-35fps on high settings on a low end rig. Not sure what res, but would wager 1280X1024.

They also said these optimisations are possible in Crysis, but they do not have the time to implement them. For me they would be better off making a patch for the original, as a lot of Joe Bloggs are put off by the enormous requirements needed to run the game at a decent res with acceptable fps.
 
Does CPU test 2 actually get a boost from having a PPU installed? And do the GPU physics feature tests (flags and particles) get boosts from these physx drivers? If we're only seeing an increase in CPU test 2 and nothing else including the actual GPU physics tests NV is definitely cheating.

Your logic is quite flawed.

1. Yes CPU test 2 does get a boost from a PPU http://www.planetx64.com/index.php?...k=view&id=1185&Itemid=20&limit=1&limitstart=9
2. GPU physics feature tests do not gets boosts because they do not use PhysX (simple eh?)

CPU test 2 is the only test that uses PhysX. So obviously that's the only test that would benefit from PhysX hardware acceleration. This is my point - the "cheating" hype is so strong people miss the simple and obvious facts. CPU test 2 is neither a CPU nor GPU physics test. It's a PhysX test.
 
That would quadruple the number of thread contexts that the SIMD sequencers would have to pick through in order to set up the SIMD's execution schedule.
Similarly, there would be four times as many instruction queues.

Whatever storage that holds the instructions for an ALU clause would be accessed every cycle, as opposed to twice every 8 cycles.

It would require four times the branch units to resolve branches, and a complex operation like a transcendental or integer multiply would require multiplying the number of transcendental units by a factor of four to keep that throughput equivalent.
I don't agree with most of your assertions here. You don't have to be able to branch every scalar instruction. Every fourth would match branching performance with the old design. As for transcendentals, I said let's ignore it for simplicity, but if you want to go there then I will.

Okay, so let's compare the old design (A) with the new one (B). A has 16x(4x1D + 1D) SIMD units, and B has 64x1D + 16x1D SIMD units (MAD + transcendental). A has a "macrobatch" of two 64-thread batches, and B's consists of eight 64-thread batches. A's macrobatches can be switched every 8 cycles, B's every 32 cycles. Both have instruction packets of (4x1D + 1D), but B has the additional flexibility of dependency in the MAD parts.

Every 8 cycles for A:
Load two batches, execute an instruction packet on each, branch up to twice.

Every 32 cycles in B:
Load eight batches, execute an instruction packet on each, branch up to eight times. Note that the 16 trans. units can operate on all eight batches in this time.

You can see that instruction packet throughput and branch throughput is the same in both systems, so you don't really need more resources there for decoding/fetching/whatever. You just need a little more pipelining for the same scheduling system in A to handle B. The register file may need to be a bit smarter with dependencies, but I don't see much of a problem there, particularly with the use of a temp register. The only big change is the same one I was asking about earlier: switching instructions every clock instead of every 4 clocks within the SIMD arithmetic logic.
It also appears that the single lane layout in G80 was a stumbling block to getting higher DP FLOPs, or AMD just lucked out that its scheme allowed for a quicker path to DP math.
I guess that makes sense, but part of the problem is that NVidia is working with a smaller batch size, making it harder to go SIMD with the DP.
 
So what is Crysis bottlenecked by? The GTX 280 shows a comparatively small (peaking at about 40%) performance bump over the 9800 GTX/8800 GTX, and we know how unfriendly the game is to multi-GPU solutions.

OT to this thread, but I've been wondering the same thing in general. Why is 4000 series+GT200 series boasting double most every functional unit (not double textures in Nvidia, but more than double in ATI), possibly double bandwidth, but often only 30-40% performance increases..it's odd.
 
OT to this thread, but I've been wondering the same thing in general. Why is 4000 series+GT200 series boasting double most every functional unit (not double textures in Nvidia, but more than double in ATI), possibly double bandwidth, but often only 30-40% performance increases..it's odd.

Yeah especially in Crysis. I've been wondering the same thing myself. In some tests the new cards are only a few percentage points faster. It's as if Crysis just simply refuses to run faster.....
 
Yeah I do have a problem with the word. Especially coming from people who should be a bit more insightful.

My point is that the comparison would be useless with a PPU as well. Or an 8-core CPU. The combined 3dmark score was useless from the beginning. This just makes it even more so. The whole "Nvidia is cheating again" mantra is very catchy but this time it seems to be bred from ignorance more than anything else.

In any case, how would Nvidia avoid "cheating" in this case? Doesn't 3dmark just make calls to the PhysX API?
The driver that enables PhysX does not meet FM's policy for approval.
 
CPU test 2 is neither a CPU nor GPU physics test. It's a PhysX test.
Uhh, am I the only one who see a problem with this statement?

It may not be a test of the CPU, but it is clearly categorized as a CPU test as that is its name. Anyway, FutureMark is the one that screwed this up. PhysX should never have been part of the core score in the first place, as it's an exclusive technology with minimal applicability to game performance.
 
The driver that enables PhysX does not meet FM's policy for approval.

That's not under debate. The question is whether it's Nvidia's duty to disable their PhysX driver when 3dmark is running. IMO it's not. 3dmark makes calls to the PhysX API. If GPU accelerated PhysX wasn't factored into the assumptions that FM made in designing the test that way then they are at fault. They also vastly underestimated how much faster a GPU would be than a PPU (assuming Nvidia really isn't cheating to get to those performance numbers).

Uhh, am I the only one who see a problem with this statement?

It may not be a test of the CPU, but it is clearly categorized as a CPU test as that is its name.

They can call it whatever they want. What it does is what matters. And what it does is make calls to the PhysX API.

Anyway, FutureMark is the one that screwed this up. PhysX should never have been part of the core score in the first place, as it's an exclusive technology with minimal applicability to game performance.

Exactly.
 
CPU test 2 is neither a CPU nor GPU physics test. It's a PhysX test.

It's a PhysX test alright. And PhysX was originally meant to be done on the CPU or - if you had one installed - on a PPU. I'm sure they would've removed the test had they known that NV was going to take over Ageia and turn their GPU into a PPU.

Futuremark needs to take a stand. They might approve of it if you use a seperate GF card (not in SLI) to process physics instead of doing partial physics on the GPU like it's being done now. The way it's done now just goes against their own Driver Approval Policy... it artificially boosts the 3DM score in a way that it's not meant to be benched...
 
Xenos is not as related to the R600 as you might think at first glance, team compositions for the two projects were somewhat different, and R600 is quite different in practice from Xenos.

Granted but you would still expect lessons learned from Xenos to be utilised by the R600 team and thus it seems unlikely that Xenos is actually more efficient per functional unit than R600. I mean, surely they model changes at the design stage and if it results in worse performance, they don't implement it?

I guess you could argue though that R600 added a hell of a lot more transistors for what turned out to be a rather minimal functional unit boost, while R770 seems to have done the opposite bringing its performance/transistor into what we would assume is the region of Xenos (and R580 for that matter). The addition of DX10 functionality to R600 does scew that comparison though.
 
Not saying it can't, but the cards we are looking at, its not.
When you see the 4870 performing over 20% faster than the 4850 in games without AA, you'll see how wrong you are. The 4850 has less BW per MAD than RSX.

Even G92 will improve substantially with more BW. Not linearly, of course, but I can imagine a 60/40 split between memory and clock speed, e.g. 10%/0% core/mem increase gives a 6% increase, 0%/10% gives 4%.
 
That's not under debate. The question is whether it's Nvidia's duty to disable their PhysX driver when 3dmark is running. IMO it's not. 3dmark makes calls to the PhysX API. If GPU accelerated PhysX wasn't factored into the assumptions that FM made in designing the test that way then they are at fault. They also vastly underestimated how much faster a GPU would be than a PPU (assuming Nvidia really isn't cheating to get to those performance numbers).

Cheating or not, the results are invalid and do not make for useful comparisons. Really that's the point here.
 
It's a PhysX test alright. And PhysX was originally meant to be done on the CPU or - if you had one installed - on a PPU. I'm sure they would've removed the test had they known that NV was going to take over Ageia and turn their GPU into a PPU.

Futuremark needs to take a stand. They might approve of it if you use a seperate GF card (not in SLI) to process physics instead of doing partial physics on the GPU like it's being done now. The way it's done now just goes against their own Driver Approval Policy... it artificially boosts the 3DM score in a way that it's not meant to be benched...

Yep, they should take a stand and admit that the PhysX test was a stupid idea in the first place and remove it completely. I don't get this - "PhysX was originally meant to run on a CPU or PPU". I've never seen any restriction placed on what type of hardware can process a workload as long as the output is valid.

The fundamental concept of 3dmark vantage has always been flawed. But now all of a sudden it's a complete travesty because Nvidia cards have an advantage over the competition.

This could all be fixed if FM would design a comprehensive system test that incorporates all the various technologies under evaluation. How can you design an isolated standalone PhysX test and then get mad when it runs in isolation on hardware that would otherwise be busy with another workload. Design the freaking test properly and we avoid all this unnecessary flag waving.

Cheating or not, the results are invalid and do not make for useful comparisons. Really that's the point here.

Yep, and it's up to FM to fix it.
 
I'm sure they would've removed the test had they known that NV was going to take over Ageia and turn their GPU into a PPU.

Well you've been round here as long as I have, maybe together we could guess when the following sentiments were first expressed here:

a) Ageia has no future, it exists solely to be taken over by one of AMD, ATI, NVIDIA or Intel
b) the GPU physics concept is heavily flawed because it suck resources from a (the!) piece of hardware most heavily influencing gaming graphics performance

I'm thinking late 2006 that that was obvious to many or most of us here. Maybe early 2007.
 
Cheating or not, the results are invalid and do not make for useful comparisons. Really that's the point here.

The results are as valid as the benchmark. No amount of "driver approval policy" makes the benchmark more valid if it purports to represent the future, yet fails to account for the bleedin' obvious scenarios in the future.

Worse still if the company marketing the benchmark insists on encapsulating everything in a single score because "that's what it's customers demand", despite that being rather loosely representative of real in-game performance. Worse further that the hardware review community insist on using that benchmark despite it having loopholes that you could fit Kim Kardashians ass through, and the potential of which have been obvious to many here for 18 months or so now.

And suddenly this is all NVIDIAs fault? We seem to be in danger of using NVIDIAs aggressive marketing tactics as some sort of cover-all get-off-the-hook-free ticket for short-comings in the rest of the system as far as I can see.
 
Last edited by a moderator:
When you see the 4870 performing over 20% faster than the 4850 in games without AA, you'll see how wrong you are. The 4850 has less BW per MAD than RSX.

Even G92 will improve substantially with more BW. Not linearly, of course, but I can imagine a 60/40 split between memory and clock speed, e.g. 10%/0% core/mem increase gives a 6% increase, 0%/10% gives 4%.


not saying it can't be ;) just not in this situatation we are talking about right now.
 
Status
Not open for further replies.
Back
Top