Anand talk R580

DemoCoder said:
Moreover, with only so many temporary registers, maximum recursion depth is going to be practically limited to 4, both by SM3.0 and/or storage limitations.
Actually, you could probably get to a recursion level of 6 by packing values into 4-vectors (which will be enough for many integrals...remember that you're splitting things up into pieces as small as 1/64th that of the full integral, and doing cubic interpolation on each piece). But I suppose Mintmaster's right in that branching is an inherent part of recursion, even if only to break out of a loop.

Still, it may be undoable in PS3 hardware since I don't think there's any way to deal with registers as sort of a stack of memory, where you reference them via the value of a variable. This would be pretty necessary, I think, for building a tree recursive algorithm with any reasonable program length.
 
The compiler could emulate recursive stack calls by generating multiple versions of a function, and then using dynamic branching or call with a predicate to determine whether to call the next one or not. Here's a quick sketch (might be some mistakes)

e.g. pseudocode
Code:
fib(1) = 1
fib(2) = 1
fib(n) = fib(n-1) + fib(n-2)

the compiler could generate (assume all result registers pre-initialized to 1)
Code:
fib(n)
  r1 = n - 1 
  call (with predictate n > 2) fib_1(r1) (result left in r2)
  r0 += r2
  r1 = r1 -1
  call (with predictate n > 2) fib_1(r1) (result left in r2)
  r0 += r2

fib_1(r1)
  r3 = r1 -1
  call(with predicate r1 > 2) fib_2(r3) (result left in r4)
  r2 = r4
  r3 = r3 - 1
  call(with predicate r1 > 2) fib_2(r3) (result left in r4)
  r2 += r4

ad infinitum

Each fib_n represents the function at stack depth n and consumes 2 registers (input/output)

Given 32 temporary registers, you could probably fit 12 levels of nesting. Of course, fibonacci can be made tail recursive, but this is just a trivial example of how to unroll recursion and use condition call/predictate to terminate on the base recursive case.
 
Last edited by a moderator:
DemoCoder said:
The compiler could emulate recursive stack calls by generating multiple versions of a function, and then using dynamic branching or call with a predicate to determine whether to call the next one or not. Here's a quick sketch (might be some mistakes)
Yeah, I suppose that would work. But it'd be absurdly long. Hell, I may try it one of these days when I get time.
 
DemoCoder said:
In fact, tail recursion and loop iteration are provably identical. The only branch in most cases is to exit the loop. I merely pointed out even in tough cases, one can convert tree recursion to tail recursion if one is willing to recalculate.

Well I wasn't referring to that kind of recursion (I mean you could avoid loops altogether in any program), but still, as you mention, looping has branching, and it's not a negligible amount of branching either.

Programming models matter. One could construct a pixel shading API that is programmed in a register combiner like fashion, but it would be overwhelmingly hard to use. PS3.0 increases programmer productivity as well as making high level language support easier by making the static compiler's job easier. That's its biggest selling point, by bringing the instruction set closer to an orthogonal full CPU instruction set. The only thing it's missing is a real stack and scatter writes.
Okay, if you say so. I don't see the big improvement, though. If you want easier programming, why aren't you using HLSL? If you remove DB, I don't see how it's so much closer to a full CPU instruction set than PS2.0+. Other than input semantics, I keep hearing "better programming model" without anything to back it up. It seems like asm shading is the only thing that PS3.0 marginally improves.

Why should I accept your arbitrary assertion that dynamic branching is the only feature that defines a PS3.0 capable hardware
No, I said that's what defines a PS3.0 shader, not hardware. Obviouly MS decides what hardware can carry the label. That's my claim. I'll easily admit NV30 is PS2.0+ hardware, it just runs such shaders slowly. I could write a shader that just moves an iterated value to the output, and tack ps_3_0 to the beginning. Is it fair to call that shader PS3.0?

What if I claimed that the defining feature of VS3.0 is vertex texture lookup, and since ATI doesn't support it, ATI has a half-ass implementation of VS3.0? Your whole argument seems to be an attempt to redefine NV4x as not SM3.0 capable.
Of course ATI's vertex shader is a half-assed implementation of VS3.0. No argument from me. In fact, I always thought high speed VT was the most important part of SM3.0, and was looking forward to it for years because it allows a means for pixel shader results to go back into the vertex shader, allowing a slew of effects not possible before.

You're blaming me for trying to say NV4x is not SM3.0 capable, when I explicitly mentioned in my last post that it's good NVidia can at least run DB shaders. My argument concerns what we and especially reviewers refer to as a "PS3.0 shader" or "PS3.0 effect"
 
Last edited by a moderator:
Bill said:
I hesitate to even post this but: http://www.nextl3vel.org/board/index.php?showtopic=729

I thought R580 was 48 pipes? And he's saying R520 is really 32? Would explain the large die size, but dont we have an X-Ray of the die? What does it show? How many qauds?
:LOL: (<-- inspired by what I've learned from these forums)

Please ask him to point out the eight quads required for 32 "pipes" in this R520 die shot. Jawed and others were only able to identify four in another thread here.

You could guess R520 could fit 32 pipes in its ~300M transistor count when compared to R420's ~160M, but one look at the huge memory controller in the middle of the die should dispel that notion. Plus, ATI had to beef everything up to FP32 (which, according to Eric Dremers would require something like 50% more transistors than FP24) and tweak for better branching performance in addition to separating the texture units and ROPs from the shader units, all of which likely doesn't translate to a 1:1 ratio b/w R420's and R520's pixel "pipes."

Again, these new GPUs are redefining "pixel pipes" as we knew them. R580 will have 48 pixel shaders, not 48 old-school pixel pipes. Heck, even G70 can't be described as having 24 pixel pipes b/c it has only 16 ROPs.

So what about the R580?
Well for starters it is/was being developed by a separate team than the R520 team; the card bears little to no similarities. While based on the same design, they are not the same chips. Saying this is like saying that AMD and Intel processors, while being based on the same technologies, are exactly the same, with the only difference being the speed (which we all know to be wrong).
AFAIK: wrong, contradicting himself, why not use a car analogy :)? No offense intended to the poster, but it's hard to trust his interpretation of his sources when he says first that R520 and R580 "bear little to no similarities" and then that they're "based on the same design." Do we consider RV530 to bear "little to no similarity" to R520? More importantly, how many pipes does he consider RV530 to have, and how would he translate that to what we know of R580 (16-1-3-1)?

Edit: Oh, Jawed, you and your wanting to settle things definitively. ;P
 
Last edited by a moderator:
Dave Baumann said:
I think there is a greater chance of R580 + GDDR3 and then the refresh being R580 with GDDR4.

Dave would I be safe in assuming That if R580 does appear in the form GDDR3 upon its first release . That when GDDR4 gets on to the card itself that I could simply buy a card with GDDR4 and install the R580 GPU onto that card. Now this does sound appealing to me.
We'll have to see how this plays out . But an ALL IN Wonder card with removalable GPU sounds like a really cool deal. The only real question running threw my head is the release of the R600 would the R580 card except the R600 core ? If this is the case this is really exciting stuff:cool: .

Nvidias Ideal of putting the latch GPU onto the M/B is also very interesting . But I see ATI'S solution to be more feasable mainly because of the Memory issue And the added heat to the M/B.:rolleyes:
 
Last edited by a moderator:
Turtle 1 said:
Dave would I be safe in assuming That if R580 does appear in the form GDDR3 upon its first release . That when GDDR4 gets on to the card itself that I could simply buy a card with GDDR4 and install the R580 GPU onto that card. Now this does sound appealing to me.
We'll have to see how this plays out . But an ALL IN Wonder card with removalable GPU sounds like a really cool deal. The only real question running threw my head is the release of the R600 would the R580 card except the R600 core ? If this is the case this is really exciting stuff:cool: .

Nvidias Ideal of putting the latch GPU onto the M/B is also very interesting . But I see ATI.S solution to be more feasable manily because of the Memory issue And the added heat to the M/B.:rolleyes:

It's very unlikely that they would do anything of the sort. Creating an upgrade path that the vast majority won't use, just doesnt make sense for them.
 
SugarCoat said:
http://www.penstarsys.com/previews/graphics/nvidia/512_7800gtx/512gtx_3.htm

no idea how reliable that stuff is, will be interesting if true.

Probably as accurate as his article on R520 delays. Recall when he talked about his inside information on how only 16 of the 32 pipelines were working and the chip was delayed because of current leakage. In reality the chip only had 16 pipelines and the issue was a soft ground on the memory controller.
 
Back
Top