More info about RSX from NVIDIA

Jawed said:
Sorry the onus of proof is on you. I've given you an example of a game where the increased capability of the G70 pipeline makes no difference. Now explain why it and others like it are showing no improvement beyond clock/pipeline-count.
And I've already shown you four benchmarks, two of which where the performance in games is higher than that given by the fillrate alone. Given that memory bandwidth only increased by a paltry amount, this really shows me that the confluence of changes that occured between the NV40 and the G70 really add up to a design that is more efficient.

These game benchmarks won't say what changes in the G70 allow it to be more efficient, just that it is. And if you only care about game benchmarks, don't bother to attempt to comment on which architecture changes made (or didn't make) the difference you're talking about.
 
I'm not disputing that G70 is more efficient in certain areas. In game tests both HDR and AF seem considerably better. You don't need synthetic tests to see this.

Jawed
 
Acert93 said:
Jawed said:
What's the use of fancy technology when games don't get faster because of it? We don't play synthetic benchmarks.

While I would not defend every "fancy technology" as delivering better games... it would seem that if it did accelerate performance in a benchmark that it theoretically could help *future* games designed around the added performance in certain areas.

What if it turns out that a feature is only active 1% of the time?...

NV40's dual-issue capability trips up badly because it's limited to four 32-bit register reads per clock - a principle reason why FP16 code is still favoured whenever possible. It's not difficult to generate such code (i.e. dual-issued MUL and MAD). G70 prolly doesn't suffer this limitation, indeed with the ability to perform MAD and MAD (which is as much as six 32-bit register reads) it would be nuts if NVidia kept G70 constrained this way.

Obviously games are designed to the strengths of the target HW. So I guess the real question is: Is this extra performance something that can/will be utilized?

Well we're going to have to wait and see. In theory it'll take NVidia another 6 months just to write a compiler that can cope with G70's revised fragment pipeline architecture :LOL:

In the meantime, it seems to me that the fragment pipeline changes are so minor it's not worth shouting from the rooftops about it. The HDR and AF improvements plus the AA options are where the meat is at. The lack of MSAA with HDR is lame beyond belief though.

My observation is that GPU IHVs have gotten better about recognizing bottlenecks in the system and introducing new features and performance increases in the areas that matter the most. There are always exceptions (a recent example would be is FP16 seems very underpowered on NV40) but it does seem IHVs are more realistic and wise about where they invest in their transistor budget.

It seems to me that NVidia is doing something analogous to what Intel did with the Pentium 4 pipeline - making it longer and longer and creating ever more complex scenarios for the compiler and scheduler (to try) to target. When you write a piece of software that perfectly matches the architecture, you get fabulous results - e.g. video encoding on P4 systems. When you try to run a variety of applications, the simpler pipeline (i.e. Athlon) runs rings around the complex pipeline.

So, is this an area where games--especially on a closed platform like a console--could benefit from this extra shading performance or are the other bottlenecks (memory, pipelines, fillrate) significant enough that this bump in performance wont be seen in most apps?

I think the thing people forget is that a pixel shader on RSX and a pixel shader on G70 (clocked at the same speed) will run with exactly the same performance. There is no gain to be made there just because RSX is running in a closed box.

The closed box will make a big difference to the vertex performance (of both PS3 and XB360 as compared with PC), but once the vertices are generated the pixels are solely the domain of the GPU.

Jawed
 
mckmas was talking about graphical performance, not a specific game implementation!

I must say the visuals from UE3 do stand out at the moment amongst Next-gen screenshots, many of which (that I've seen) don't look too hot. I guess that's part of the reason it's riding a crest at the moment.
 
Watching vid now, the tiled textures get on my wick. It's not that hard to create non-repeating pavements and grass so why on earth do we STILL see these?!
 
Shifty Geezer said:
Watching vid now, the tiled textures get on my wick. It's not that hard to create non-repeating pavements and grass so why on earth do we STILL see these?!

Cos artists get bored otherwise?

Jawed
 
It seems Sony guys are working on improving the Cg compiler, this is Alan Heirich home page at Caltech
http://alumnus.caltech.edu/~heirich/activities.htm
I work on the graphics architecture for the PlayStation 3, the most advanced graphics supercomputer ever commercialized and an awesome gaming console.
See the first paper (to appear at Graphics Hardware 2005 this summer)
Optimal Automatic Multi-pass Shader Partitioning by Dynamic Programming
From the abstract:
Complex shaders must be partitioned into multiple passes to execute on GPUs with limited hardware resources.Automatic partitioning gives rise to an NP-hard scheduling problem that can be solved by any number of established
techniques.

Experimental results on a set of test
cases with a commercial prerelease compiler for a popular high level shading language showed a DP algorithm
had an average runtime cost of O(n1:14966) which is less than O(nlogn) on the region of interest in n. This demonstrates
that efcient and optimal automatic shader partitioning can be an emergent byproduct of a DP-based code
generator for a very high performance GPU

IMHO the popular high level shading language is Cg (Nvidia just released Cg 1.4) and the very high performance GPU is RSX.

RSX should support very long shaders but he wrote:

The DP solution is motivated by a study of a very high
performance GPU that supports large and complex shaders.
The size of these shaders implies multi-pass execution and
motivates the search for a scalable partitioning algorithm.
This study is concerned with partitioning shaders for a very
high performance GPU that has a very efcient intermediate
storage mechanism. GPU resources that must be scheduled
include live operands (register allocation), outstanding texture
requests, instruction storage and rasterized interpolants.
Every shader pass must observe the physical resource limits
of the GPU with any excess storage requirements satised
from intermediate storage between passes. Each of these resource
types has architecture-specic considerations that in-
uence the cost function and the location of pass boundaries.

Maybe he's using the ther 'multipass' in a new way, he sees intermediate internal results storage as the main problem and he's trying to allievate it.
Any idea?
 
<Insert standard Shifty comment about stopping with the fanboi comments>

Actually, better yet...

How'd KK get dragged into this?! For crying out loud, that's a different topic that's totally unrelated! If you drag his comments in, someone somewhere will counter them, someone else will drag in all MS's brain-dead comments, and before you know where you are suddenly it's a fan boy fire-fight.

Comment on the topic and it's relevance, discuss topics that derive from the discussion from it, but for crying out loud when entering the B3D forum leave the PR campaigning at the door Mad
 
Jawed said:
What if it turns out that a feature is only active 1% of the time?...

NV40's dual-issue capability trips up badly because it's limited to four 32-bit register reads per clock - a principle reason why FP16 code is still favoured whenever possible. It's not difficult to generate such code (i.e. dual-issued MUL and MAD).
Er, this shouldn't be an issue. Since, from what I understand, the register reads are limited by bandwidth, dual-issue and co-issue shouldn't be limited by this at all.

G70 prolly doesn't suffer this limitation, indeed with the ability to perform MAD and MAD (which is as much as six 32-bit register reads) it would be nuts if NVidia kept G70 constrained this way.
Since the G70 still shows performance increases from partial precision, it clearly as similar limitations. It may be able to read more per clock, but it is still limited in some way. As for how silly it is to do this, well, just consider that more registers supported requires more transistors. Given a limited budget, it would mean cuts somewhere else.

It seems to me that NVidia is doing something analogous to what Intel did with the Pentium 4 pipeline - making it longer and longer and creating ever more complex scenarios for the compiler and scheduler (to try) to target. When you write a piece of software that perfectly matches the architecture, you get fabulous results - e.g. video encoding on P4 systems. When you try to run a variety of applications, the simpler pipeline (i.e. Athlon) runs rings around the complex pipeline.
Except that the main thing that keeps the Pentium 4 from having high performance is its deep pipelines. In a graphics architecture, you want to have deep pipelines, in order to hide texture fetch latency. So no, I don't think this is analagous at all (after all, AMD is the one that is going for higher IPC, not Intel).

I think the thing people forget is that a pixel shader on RSX and a pixel shader on G70 (clocked at the same speed) will run with exactly the same performance. There is no gain to be made there just because RSX is running in a closed box.
Now that depends upon how much people are willing to optimize the shaders.
 
scooby_dooby said:
mckmas8808 said:
I know this video is from the G70 tech demo, but this is the least that we can expect from the RSX. Watch with care. :D

http://streamingmovies.ign.com/pc/article/628/628005/unrealdemo_062105_wmvlowwide.wmv?

lol, judging form the vehicle they're driving..that's Huxley, a MMOFPS, that probably requires a HD.

So, you can expect to see that on PC and X360, but I dunno bout RSX! :p

are you sure that's huxley? this screenshot is from Unreal 2007 and it looks like the same vehicle: http://ve3d.ign.com/articles/614/614295p1.html

also the water bit they showed looks a lot like the water when they showed looking outside the window in the ps3 conference. a Unreal Tournament 2007 fact sheet from the same link also says "includes the brand new...vehicle-focused combat raging across multiple maps..."

plus, i think Tim Sweeny enjoys showing off UT2007 more than anything else
 
IT doesn't matter what game it is if its on the unreal 3 engine . We know both systems can run it just fine. We had the unreal demo on the cell + 6800ultra sli and we have gears of war coming to the x360 and unreal 2007 .

The future is great
 
Back
Top