Which Future Hardware?

Dave Baumann

Gamerscore Wh...
Moderator
Legend
It seems that Lars has had a responce from ATI over the Aquamark Alpha Blending issue (although I don't yet see it at Toms). The method they managed to get a result on sounds interesting:

[url=http://www.rivastation.com/go_e.htm?[url said:
http://www.rivastation.com/news/news_de.htm#1069335362]Riva[/url] Station[/url]]It took much longer than I anticipated, but we reached to a conclusion on the AM3 "explosion" issue. The initial suspicions about our alpha blenders turned out to be true. To prove this we had to run the test scene on a future HW emulator - which took all this effort.

To summarize at a high level - we have single-bit differences when alpha blending compared to ref-rast image. When accumulating several layers of alpha blend objects, some pixels may end up looking darker on R300 family than refrast. We are rendering all the layers and the deviation we have in rendering each pixel is within an acceptable tolerance level from refrast - as evidenced by the fact that we pass all the WHQL DCT tests that compare images rendered on R300 v/s refrast.

Now, the question is - which hardware was it being emulated on?

Given that one of the major deviations in the pipeline of R3x0 to the Refrast is the fact that the pixel pipeline is FP24 whereas the Refast default is FP32 this is probably the cause of why the refrast and FX images are similar in this respect, but the Radeon differs. If we take ATI's responce at face value, then presumably the pipeline of the "Future" hardware emulator is likely to be closer to the specification of the Refast not to cuase these single bit issues, which could indicate an FP32 pipeline.

So, what future hardware? R420 or R500? If R420 is still only at the EMU stage then they are looking late to the party, however if R420 is still FP24 (which, according to Richard Huddy is still the specification of PS3.0) then it may well still have the same accumulated single bit differences.
 
It could be a hardware emulator for one of the consoles...
To do an emulation you wouldn't need to run it on a x86 based workstation or not that this is relevant anyway.

HW emulation for the console parts should be about ready now given the timeframe for the release of the next gen hardware.

I would speculate that this HW is not the next Radeon but the one after that for the same reason you speculate on. ATI would be running very late if it was the R420 and only at the emulation stage.

Perhaps they should have borrowed a GFFX? Oh wait they did that already :p
 
Every time Lars is at THG he seems to get Larsngitis with respect to any issues of cheating. Unless of course its ATI under suspicion.
 
My betting is that it must be R500. We know that the lead time for new chips is easily a couple of years, and that this tech was probably demoed to win the XBox 2 contract. On the other hand we know that R420 has taped out and it due for the sping 2004 market, making it well beyond the emulation stage.

R420 sticking with 24 bit and R500 going to 32 bit seems to be the current concensus, so it would make sense.
 
Dave,
as I read that statement, it's the (fixed function) alpha blending logic that's causing these single bit differences here. The error is accumulated over lots of layers to become visible.

I just don't see any connection to shader precision. If you want to put it into a functional block, it would be the ROP, not the pixel shader.
 
Yes, remember that modern video cards do not support blending with FP buffers, which basically means that the blenders are all made for use with an 8-bit framebuffer, so FP24 has nothing to do with it.

Anyway, if it is true that this is a series of 1-bit errors, then what we are seeing is a total lack of error-correction hardware in the blenders. Apparently ATI's engineers didn't feel there was any need to have more than 1-2 stages of blending. Simply making the last bit pseudo-random would fix the 1-bit errors from accumulating (making the last bit random could be as simple as using a flip flop circuit that is swapped every clock time a blended pixel is written, but would obviously look better with a more chaotic function).

While this is disappointing, it won't affect performance.
 
DaveBaumann said:
Given that one of the major deviations in the pipeline of R3x0 to the Refrast is the fact that the pixel pipeline is FP24 whereas the Refast default is FP32 ...
Excuse me?

I thought the refrast "default" is what you specify in an app.
 
Ostsol said:
Chalnoth said:
While this is disappointing, it won't affect performance.
I wonder if Lars knows that. . .
Lars obviously didn't have the same explanation for why there was a difference. He had another explanation that seemed completely plausible to me: the drivers were automatically using an alpha test in conjunction with an alpha blend that totally removes from the pipeline pixels whose alpha is below a set value, pixels that would presumably be too dim to see any difference with one or two levels of transparency, but appear when multiple levels of transparency were used.

This seemed perfectly plausible to me, given the fact that not only was the smoke dimmer in the ATI shot, but the brightest/dimmest parts of the center of the smoke were not the brightest/dimmest parts in the nVidia shot. That and the already dimmer (and smaller) smoke plume in the background was *much* dimmer in the ATI shot. If this smoke plume in the background used fewer levels of transparency than the foreground one (which might be a good optimization technique), then the alpha test enabling would make sense.

Of course, the two reasons for the difference give essentially identical effects, so obviously they both seem plausible when only looking at the effects. The fact remains that only one causes a performance difference.

And, there is always the possibility that the alpha level at which ATI cuts off for the forced alpha test is 1/256, which would produce single-bit errors, and would give a performance increase. But making the math errors always undershoot the "actual" answer would cause more dimming than setting an alpha test at 1/256.

Anyway, the obvious way to test this is to simply have the programmers of the benchmark to enable an alpha test on the nVidia card in conjunction with the alpha blend, and see if they can exactly produce the ATI results.
 
Reverend said:
DaveBaumann said:
Given that one of the major deviations in the pipeline of R3x0 to the Refrast is the fact that the pixel pipeline is FP24 whereas the Refast default is FP32 ...
Excuse me?

I thought the refrast "default" is what you specify in an app.

Hmm that seems like a silly way to design a api. Allowing a application to attempt to tell a renderer what level of precision to use since 99% of the time its only gonna have full precision and partial precision.


Back on the emulation possible they where emulating in software/fpga a R420 and where able to get feed back on internal registers.

Remeber they never said the future hardware rendered the scene the same as refrast.
 
The wording for "future hardware simulator" seems kind of ambiguous to me.

One interpretation is that they ran an emulator for future hardware, but that wouldn't make any sense when the goal was to find a potential problem in extant hardware.

I thought it meant they loaded up into a FPGA setup (or something similar) the HDL code for R350 and ran a simulation on that. That is, that they used a device that is used to simulate future hardware, rather than a simulation of future hardware.

Unless future hardware uses the same alpha blending setup.
 
bloodbob said:
Reverend said:
DaveBaumann said:
Given that one of the major deviations in the pipeline of R3x0 to the Refrast is the fact that the pixel pipeline is FP24 whereas the Refast default is FP32 ...
Excuse me?

I thought the refrast "default" is what you specify in an app.

Hmm that seems like a silly way to design a api. Allowing a application to attempt to tell a renderer what level of precision to use since 99% of the time its only gonna have full precision and partial precision.
Either Dave didn't type what he actually meant, or I misunderstood him.

The refrast will render what you tell it to render, that was my point.
 
They probably have a new software emulator with the ability to debug the process and see what is happening internally at each stage. I would imagine they would use the "future" hardware emulator because it would be better then the current hardware emulator. Maybe it runs on faster hardware. I think you can't speculate too much about what was said. It could be a PCI Express 9800 simulation who knows.

I would expect that software simulators are used on a regular basis for debugging complex graphical issues. I don't think you could debug things that easily running full speed on hardware without breakpoints or the ability to trace registers, cache, and memory addresses.
 
I'd just ignore the whole "future hardware" part. It's obviously been dropped in there on purpose, to get a discussion going, distracting from the hardware issue. Tell you what, I find it decent enough that they're actually coming forward and admit there's something wrong with the chips. Doesn't happen too often in the graphics biz. I don't need PR droplets mixed in for the situation to look satisfactory.

They could just as well stated "While playing with our current VHDL code on a hardware simulator, we've found out that we're having single-bit errors in our alpha blender". They didn't ...
 
Reverend said:
DaveBaumann said:
Given that one of the major deviations in the pipeline of R3x0 to the Refrast is the fact that the pixel pipeline is FP24 whereas the Refast default is FP32 ...
Excuse me?

I thought the refrast "default" is what you specify in an app.

The only thing you can specify in an app is "partial precision", which pushes down to FP16. If you don't specify partial precision you will get full precision and full precision in the refrast is FP32. There is no actual way of specifying "FP24" precision, thats just and allowable "full precision" precision.
 
zeckensack said:
They could just as well stated "While playing with our current VHDL code on a hardware simulator, we've found out that we're having single-bit errors in our alpha blender". They didn't ...

Well, thar actually what they did say. The "future HW emulator" is in there to say thats the lengths they went to to find out what the issue was and most people won't understand "VHDL".
 
The alpha blending is a bit (no pun intended) too dark - you actually can't get 255 in the framebuffer when alpha blending is enabled. Very few pixels are even two bits off.
 
Back
Top