nv30 vs r300 article

For those that haven't yet figured out what to make out of that statement:... It's "how fast can you do a 32 instruction shader" now and not "how many single texture pixels can you plot" anymore.

I don't see how that contradicts Zephyr's remarks about nVidia's "quality" vs. "performance" of pixels.

Just because nVidia says (just going along with what you're saying), it's now about how fast you can execute shaders of Z instructions...that doesn't mean that's what's actually important...for these products.

Whether it's better for a product to be able to plot single texture pixels faster, or perform 32 instruction shaders, depends entirely on the software environment / target audience. And that's pretty much what Zephyr said.

Gamimg applications...how many really care how fast you can execture 32 instruction shaders? Any at all?

"Professional / renderfarm" applications is pretty much the opposite. Is there any disagreement on this?

So based on that, and assuming R300 is a better "pixel plotter" (which is speculation in itself, based mostly on anticipated bandwidth) Zephyr is speculating that R300 may be better suited for current gaming apps, and NV30 for renderfarm apps, and "distant future" gaming apps. Seems like reasonable speculation to me. Doesn't mean it won't be completely wrong, of course. :)
 
DaveBaumann said:
MDolenc said:
And why would that be good for?
Regardless, this is not an inaccuracy since the hardware does support this number of instructions.
Exactly... INSTRUCTIONS. Instruction slots are different. In PS.1.x there is instruction co-pairing and even if you say:
mul r0.rgb, t0, v0
+ add r1.a, r1, c2
This still occupies one instruction slot (but contains two instructions).
 
Joe DeFuria said:
... Zephyr is speculating that R300 may be better suited for current gaming apps, and NV30 for renderfarm apps, and "distant future" gaming apps. Seems like reasonable speculation to me. Doesn't mean it won't be completely wrong, of course. :)

Yeah. Some input to the discussing from Huddy and Mitchell from their GDC/Europe presentation about VS 2.0 + PS 2.0 hardware. They talk about doing Phong Lighting in PS 2.0 using 3 instructions (dp3, rsq, mul)to do normalisation:

  • Radeon 9700 can do this ~800 million times per second
    That means that it makes sense to aim for pixel shaders whose length on average is roughly 15 instructions per pixel
    15 because:
    15 = ½ * 2400M / (1280 * 1024 * 60Hz)
    Depth complexity etc will reduce this…

Just a real world example from the R300.
 
John Reynolds said:
Reverend said:
As I said in the news post thread, there are some misinformation. Most of the "bugs" have to do with your comments re DX9b2.1 (and, less so, how it relates to NV30/R300)... since Wavey said this is "verified" and hence "accurate", you may want to re-check or re-verify with whoever your sources are.

I'm sure Kristof didn't go through this article (or didn't go through in-depth), otherwise this would not have happened.

You might want to double-check your sources on what makes you so "sure" that Kristof didn't go through the article.

Regardless, these petty, condescending insults need to stop, Rev.
Stop being so sensitive to everything I post, John. All I did was say that your sources are incorrect even if they tell you they are (re Wavey said that it was verified to be correct). And that I hold K in high regard.

I'm trying to "help" the site, not insult it John.
 
Stop being so sensitive to everything I post, John. All I did was say that your sources are incorrect even if they tell you they are (re Wavey said that it was verified to be correct). And that I hold K in high regard.

Given the conversation thats already occured on this article its hardly as though you are going to recieve confirmation that K did verify it it, is it. However I am completely satisfied that the source of the DX9 specifications are valid specifications.

It would also have been helpful if have suggested what you think the inaccuracies were so we could verify them, as we have asked / mentioned several times already.
 
Joe DeFuria said:
I don't see how that contradicts Zephyr's remarks about nVidia's "quality" vs. "performance" of pixels.

Only "better pixels" vs. "more pixels" doesn't have to mean shader effects in the least. It could also mean FSAA, anisotropic, and the like. I'm of the mind that it means both.
 
Reverend said:
Stop being so sensitive to everything I post, John. All I did was say that your sources are incorrect even if they tell you they are (re Wavey said that it was verified to be correct). And that I hold K in high regard.

I'm trying to "help" the site, not insult it John.

I'm not overly sensitive to all your posts. . .just the ones where you're obviously trying to insult Dave (I expect to see this article at [H]/ this wouldn't have happened if K went over the article, etc.) because of this petty behind-the-scenes squabbling/bickering that's been going on and that you keep fueling. And if you truly wanted to help you would've simply offered up the inaccuracies as repeatedly asked.
 
Whatever you say or as you wish John. That "bickering" has long been forgotten (I know a better B3D EIC when I see one and I'm talking about Wavey). Criticism isn't kindly taken when it comes from me I guess (perhaps understandably so). Also, I had PM'ed K earlier with my reason for not detailing all the wrong info in the article in a post at this site (summarily, takes too much of my time as it'll be a longish "bug report" and I didn't really felt like it being on a dial up).

BTW, if you took my mention of seeing this at [H] as an insult, then you're insulting [H] as well. I only mentioned [H] because Brent chosed to use a warezed version of UT2003 in one of his articles at [H]... that's why I said I'd thought it's something that would appear at that site. But you may think differently of [H]. And honestly, I thought that K would've done a better job going through the article. That's not an insult, that's an opinion.
 
Reverend said:
Also, I had PM'ed K earlier with my reason for not detailing all the wrong info in the article in a post at this site (summarily, takes too much of my time as it'll be a longish "bug report" and I didn't really felt like it being on a dial up).

Sorry to point this out, but I guess you could have opted to spend your time writing at least some of this long bug report in the first place rather than having to go though the emotions. Okay, I know that this is probably an annoying reply, but since you obviously have something to contribute with Rev, please contribute it! ;)
 
MDolenc said:
I also always thought that vertex shaders 1.x are already floating point, aren't they?

Whoops…, my bad, the float precision should be in the PS table not in the VS table. It will be updated soon!

MDolenc said:
And what kind of "per channel masking" does NV30 and VS3.0 have that others can't match? VS1.x can do any kind of swizzle and can mask destination registers, so what can NV30 and VS3.0 do more?

"Per channel masking" here is the shorter form of "per channel condition codes & write masks". Per channel conditional masking seems more clear.

MDolenc said:
And you also got a bit messy with call nesting and dynamic and static flow control didn't you?

Would you mind giving me more clear description or classification here?

MDolenc said:
Any hardware that will want to support VS2.0 will have to provide at least 12 temp registers, 16 integer registers and 16 boolean registers (that's 44 registers at any time) so NV30 will not be even a VS2.0 part ?

Please give your link or source here to prove NV30 does have 16 integer registers or boolean registers.

MDolenc said:
And that "sampler" row is quite a laugh. It's not a bug it's a feature! Or was it the other way around?

Seems I have noted it would/could be used in the tessellator. If I misunderstand it, please give more details here.

MDolenc said:
When both _abs and negate (-) are present, the _abs happen first in NV30 and PS3.0. Aren't we still talking about VS here.

It will be updated soon.:D

MDolenc said:
When will we see drivers from ATI that will expose 160 instruction slots?

It got a confirmation.

MDolenc said:
And how many constants and instruction slots will NV30 expose? They said somewhere that each constant costs one instruction slot, so instruction count can greatly vary based on how many constants will we use, right?

1024 instructions
512 constants or uniform parameters
- Each constant counts as one instruction

- NVIDIA Programmable Graphics Technology, P.15

It also got a confirmation. Besides, in the table we just list 1024 as Max Instruction Number.

MDolenc said:
Didn't we come to conclusion that Radeon 9700 doesn't support arbitrary swizzles some time ago?

My source told me that R300 supports it. BUT, if you can give evidence that R300 does NOT support it. I will revise it.

MDolenc said:
Exactly... INSTRUCTIONS. Instruction slots are different. In PS.1.x there is instruction co-pairing and even if you say:
mul r0.rgb, t0, v0
+ add r1.a, r1, c2
This still occupies one instruction slot (but contains two instructions).

Are you sure that such pairing instruction in PS1.x occupies one instruction slot and counts as two instructions? And so does the instruction in the R300 PS?

Thank you for your good comments.
 
LeStoffer said:
Reverend said:
Also, I had PM'ed K earlier with my reason for not detailing all the wrong info in the article in a post at this site (summarily, takes too much of my time as it'll be a longish "bug report" and I didn't really felt like it being on a dial up).

Sorry to point this out, but I guess you could have opted to spend your time writing at least some of this long bug report in the first place rather than having to go though the emotions. Okay, I know that this is probably an annoying reply, but since you obviously have something to contribute with Rev, please contribute it! ;)
I spend my (rather expensive) time on the Net reading what I enjoy. To go through an article and spend time writing down errors I read is not what I enjoy nor can afford. Also, while I can be sure of the bugs I see, I'm also unsure of bugs I may miss (sireric, in his post at the news forum, brought forth some stuff I didn't notice, for example).
 
Also, while I can be sure of the bugs I see, I'm also unsure of bugs I may miss (sireric, in his post at the news forum, brought forth some stuff I didn't notice, for example).

That in no way precludes you from detailing what you think you do know. What you don't know shouldn't concern you.
 
Luminescent said:
Do you guys really think that the vertex processors and pixel processors on the NV30 will share some logic? I find that unlikely, being that it would not fully exploit the parallelism possible in a streaming data processor with 3 vertex processors and, possibly, 8 pixel processors. Each pipeline should have individual units, for even the higher precision functions (i.e radeon 9700 - scalar and vector units in parallel). Does anyone have anymore information as to processor sharing within the NV30?

Some speculate the former because of the small difference in transistor count between the NV30 and radeon 9700, but we should remember that the 9700 allocates logic to the encoding and decoding of memory formats, such as 3d float textures and such. The 9700 also holds 4 vertex pipelines. These extra convenience features may not be found on the NV30.

Sure, my little speculations could be wrong. Tell you the truth, I am very curious how nVidia can intergrate so many functions/features in a 120M transitors chip without trade-off! Especially, I am sure that LMAIII and advanced pixel processor will consume huge counts of transitor. Then plus 3 math units in VS and 8 math units in PS...

Basic said:
I believe "hax" at these fora is working at ATI, and here's what he said:

http://www.beyond3d.com/forum/viewtopic.php?t=2393&start=38

So it's rather arbitrary swizzles, but not completely.

It's not the complete 4 to 4 mapping.

Ok, It should be noted, thank you:)
 
I appreciate your response Zephyr, thought no one read it. It is just my 2 cents on the subject. Let us say we have a program which requires high precision math functions (exp2, sin, cos) on the vertices and on the pixels. I am no hardware engineer, but for reasonable performance, there would either have to be 3 scalar units for each vertex pipe (for single cycle execution, assuming no overhead latency). If there was only one complex math unit (assuming complex lighting calculations involve such a unit), then the unit would have to be simd-vector to address the 3 data registers (each with vertex instructions). I have only heard of 4-way and 2-way simd, so it would kind of eliminate such a solution for 3 vertex pipes. The pixel pipelines, also of a parallel nature would require their own math unit, being that sharing three vertex pipeline units or a vertex unit with 8 pixel pipes would be out of the question, unless the "general" unit (hypothetical) was a massively parallel alu.
 
DaveBaumann said:
Also, while I can be sure of the bugs I see, I'm also unsure of bugs I may miss (sireric, in his post at the news forum, brought forth some stuff I didn't notice, for example).

That in no way precludes you from detailing what you think you do know. What you don't know shouldn't concern you.
I'm a perfectionist when I write anything :) In any case, the interest simply wasn't there for me to write a "bug report". I thought I'd just say there are inaccuracies and let others pick it up, which is what has happened (and all the better since I learned a few extra things as well re those "bugs" I missed).
 
One more opinion.

I think it was a great article but did not understand 99% of it.... erm can we have a version of the article for stupid people like me (or just for me?).

It was balanced but it did seem like an article written on the Somme in 1916 ... i.e. you take two foot we take 4 you take another, and then we both lose eight foot... er of land that is. If you get my meaning.
 
There was a comment about a drawback of using program space for constants. Saying that if you want to change one constant, you'd need to change it in every place in the program where it's used.

Do we know that the constants are immediate (assembler speak)?
They don't have to be that just because they are stored in the same memory.

Either way, the time to transfer the constants is probably rather small compared to the penalty you get for having the state change at all.

Luminescent:
Hoping for instructions like exp/sin/cos/... to run in a single cycle is rather optimistic. Hoping that more than one of them could run in parallel in one cycle, even more so.

The more realistic ways are:
*) A separate complex scalar unit, doing one scalar instruction over a couple of cycles. And if the result isn't needed until some instructions later, the SIMD unit could run in parallel.
*) Do the complex scalar function in the SIMD unit, spreading the calculation over the SIMD elements to save cycles. It will still need multiple cycles though.
 
Back
Top