two interesting slides about 9800XT from [H]

hjs · Oct 3, 2003

OpenGL guy said:
and leave it at that.

No that is Joe Chien a software director from Silicon Valley. The dude on the left some of you may know from the forums as OpenGL Guy

Heathen · Oct 3, 2003

I pegged him as taller for some reason and red haired... :? :?

Frank · Oct 3, 2003

Should we regard the adress op the R3x0 performs separately as a 'stand-alone' instruction, compared to the instruction set of the NV3x? Probably not.

But, if we don't, we should also take temp regs and constants into account. And that takes the NV3x quite a bit of extra clocks. For the ATi, you would want to use as much regs and constants as possible. It is for free, when we don't count the adress op and see that it gets written in the same pulse anyway.

While the NV3x wins on some special calculations, it has to spend more time shuffling things around, expanding calculations to use as few temp regs as possible and loading constants.

In the end, just about any program you write will execute in less clockpulses on an ATi. And it has twice as many pipelines.

OpenGL guy · Oct 3, 2003

Heathen said:
I pegged him as taller for some reason and red haired... :? :?

~6' 2" is not tall enough? I knew I should've worn risers

andypski · Oct 4, 2003

Chalnoth said:
And if this is what ATI is basing their performance comparison on, then it is severely flawed. Their description of nVidia's number of operations per clock is very different from the description they apply to ATI's hardware. Similarly, it doesn't take into account other functions. According to David Kirk, the NV3x can do a sin/cos in 2 cycles, while ATI takes 7-8. If this comparison is on a per-pipeline basis, then nVidia could do sin/cos functions in half the time on a per-clock basis.

"How useful is sincos() as a dedicated instruction?" is something that I find myself asking.

If you have a spare sampler (pretty likely in most shaders) and spare texture instruction slots (also likely) you could get sin or cos (or some other function or combination of functions) with a texture lookup - either from a floating point texture with no filtering, or (probably preferably) from a high-precision integer texture with linear filtering. Then you could get full advantage from the parallel nature of texture lookups on R3xx - in a shader with lots more ALU ops than texture ops you could effectively get sin or cos in 0 cycles - this sounds better than 2 to me.

Using a 1-D, 1-component, 2048 entry texture for either sin or cos, I expect the accuracy could be pretty good for most purposes.

A 2048 entry 16-bit fixed point table would introduce an additional error (at the sampling points) of about 3e-5 when compared to a 32-bit float implementation (such as on a current CPU). I haven't bothered to work out the maximum error at the linearly interpolated intermediate points yet, but I reckon it's probably pretty useable.

As a reference the maximum permitted absolute error for the sincos instruction in a pixel shader is 0.002, so there's probably some wiggle room - it seems like a workable method.

You could also get sin and cos from one lookup if you use a 2 component texture, just as with the sincos instruction.

- Andy.

Sxotty · Oct 4, 2003

OpenGL guy said:
I'll just say your analysis and conclusion are severly flawed and leave it at that.

We would rather you didn't and expounded upon whether these are accurate.

Quitch · Oct 6, 2003

hjs said:
OpenGL guy said:

and leave it at that.

Click to expand...

ps. nice pic at DH.

No that is Joe Chien a software director from Silicon Valley. The dude on the left some of you may know from the forums as OpenGL Guy

Click to expand...

Meeting in a condemed building by the looks of it

K.I.L.E.R · Oct 6, 2003

Nice beer GL Guy.

andypski · Oct 6, 2003

andypski said:
A 2048 entry 16-bit fixed point table would introduce an additional error (at the sampling points) of about 3e-5 when compared to a 32-bit float implementation (such as on a current CPU). I haven't bothered to work out the maximum error at the linearly interpolated intermediate points yet, but I reckon it's probably pretty useable.

As a reference the maximum permitted absolute error for the sincos instruction in a pixel shader is 0.002, so there's probably some wiggle room - it seems like a workable method.

Just did a quick check, and the error over the whole range with linear interpolation stays at around 3e-5, so doing it this way seems fine compared to the macro implementation.

So there you have it - for a really accurate sin/cos on an R300 you can use a texture and get it in 0 cycles (sometimes).

Definitely better than 2

jimbob0i0 · Oct 6, 2003

andypski said:
andypski said:

A 2048 entry 16-bit fixed point table would introduce an additional error (at the sampling points) of about 3e-5 when compared to a 32-bit float implementation (such as on a current CPU). I haven't bothered to work out the maximum error at the linearly interpolated intermediate points yet, but I reckon it's probably pretty useable.

As a reference the maximum permitted absolute error for the sincos instruction in a pixel shader is 0.002, so there's probably some wiggle room - it seems like a workable method.

Click to expand...

Just did a quick check, and the error over the whole range with linear interpolation stays at around 3e-5, so doing it this way seems fine compared to the macro implementation.

So there you have it - for a really accurate sin/cos on an R300 you can use a texture and get it in 0 cycles (sometimes).

Definitely better than 2

LMAO - thanks for the update Andy.... I can see it now in a follow-up interview with DK

Q: "Do you have any further comments on intruction lengths for certain functions on the NV3x and R3x0 hardware following teh revelation that it is not 7 or 8 instructions as you guessed but 0 compared to your 2"....

A: "Ah but that is a cheating hack which will use up texture instructions... as they aren't doing multiptexturing it will significantly affect performance. With our improved CineFX 2.1 architecture due in NV40 we will not only cope with Sin/Cos faster but give you an additional texture lookup for free"

KimB · Oct 6, 2003

andypski said:
Just did a quick check, and the error over the whole range with linear interpolation stays at around 3e-5, so doing it this way seems fine compared to the macro implementation.

Is that with a 16-bit integer texture? Is filtering supported with that texture format on the R3xx?

nelg · Oct 6, 2003

OpenGL guy, you look like you had one to many in that photo. Remember, no drinking and drivering

.

andypski · Oct 6, 2003

Chalnoth said:
Is that with a 16-bit integer texture? Is filtering supported with that texture format on the R3xx?

Yes and yes.

With an 8-bit texture the maximum error would be around 0.004 (or about double the permitted spec error for sincos), which might still be usable in many cases, but since 16-bit textures are no problem for the hardware it doesn't really make much sense to sacrifice the accuracy. It would be better to cut down on the table size to get better caching characteristics - you can probably go down to 256 (or even 128) entries without killing the accuracy too much.

- Andy.

Heathen · Oct 6, 2003

~6' 2" is not tall enough?

Not for an industry giant like you.

Dug myself out of that self made hole yet?

Dio · Oct 6, 2003

Heathen said:
I pegged him as taller for some reason and red haired...

The mad scottish driver writer?

OpenGL guy said:
~6' 2" is not tall enough?

Not quite...

K.I.L.E.R said:
Nice beer GL Guy

He's rarely seen without one...

Genghis Presley · Oct 6, 2003

nelg said:
OpenGL guy, you look like you had one to many in that photo. Remember, no drinking and drivering .

I think this may be related to the mysterious 'illness' he was suffering the next day.

Genghis.

OpenGL guy · Oct 6, 2003

Genghis Presley said:
nelg said:

OpenGL guy, you look like you had one to many in that photo. Remember, no drinking and drivering .

Click to expand...

I think this may be related to the mysterious 'illness' he was suffering the next day.

Tattletale!

OpenGL guy · Oct 6, 2003

K.I.L.E.R said:
Nice beer GL Guy.

I'm pretty sure it was a margarita

Chris · Oct 7, 2003

ps. nice pic at DH.

No that is Joe Chien a software director from Silicon Valley. The dude on the left some of you may know from the forums as OpenGL Guy

You can sure tell that all the available money at ATI is going into R&D from the look of ATI's booth in the photo

...or maybe GL guy has been "bad" and we're looking at his new office at ATI world headquarters. The good news is that in case of a fire, he'll already be in the fire escape...

Unit01 · Oct 11, 2003

OT

LOL crude jokers

BTW why is there a blue circle around one of em?

two interesting slides about 9800XT from [H]

hjs

Heathen

Frank

Certified not a majority

OpenGL guy

andypski

Sxotty

Quitch

K.I.L.E.R

Retarded moron

andypski

jimbob0i0

KimB

nelg

andypski

Heathen

Dio

Genghis Presley

OpenGL guy

OpenGL guy

Chris

Unit01

Similar threads