More info about RSX from NVIDIA

mckmas8808 said:
Another point risen from Anandtech
WOW!!! :oops: Do you guys believe the conclusion that they have arrive to? :?:

"Nearly" sounds very subjective and it obviously is in this case. I do think we're going to see some incredible graphics that to the casual eye may rival something like TSW, but on a technical level it obviously won't be comparable. The more discerning, i.e. us, will also be a lot more picky than most. I for one don't think UE3 is "near" TSW, as Anand suggests with his comparison, but I could see how others might think that perhaps.
 
mckmas8808 said:
Another point risen from Anandtech

Even though features haven't been added to the vertex and pixel shaders directly, the increased power will allow game developers more freedom to generate more incredible and amazing experiences. Though not seen in any game out now or coming out in the near term, the 7800 GTX does offer the ability to render nearly "Sprits Within" quality graphics in real-time. Games that live up to this example (such as Unreal Tournament 2007) still have quite a ways to go before they make it into our hands and onto our hardware, but it is nice to know the 7800 GTX has the power to run these applications when they do come along.

WOW!!! :oops: Do you guys believe the conclusion that they have arrive to? :?:
I'm more shocked that he considers UT2007 equivalent to Spirits Within.
 
Titanio said:
In Cell's case at least it shouldn't matter, since it's also should be an int op monster IIRC.
Thats not strictly true... SPEs have some serious limitations for int ops.
 
DeanoC said:
Titanio said:
In Cell's case at least it shouldn't matter, since it's also should be an int op monster IIRC.
Thats not strictly true... SPEs have some serious limitations for int ops.

You noticed.

So have you looked ay how many instructions the compiler generate to read an unaligned byte?
 
DeanoC said:
Titanio said:
In Cell's case at least it shouldn't matter, since it's also should be an int op monster IIRC.
Thats not strictly true... SPEs have some serious limitations for int ops.

I had read they weren't given the same priority in the pipeline, but otherwise it should be similar. The former point (or something else?) may make a big difference though (?)

Anyway, specifically regarding what we're talking about here, ints or flops aside I don't think there's much argument that Cell needs GPU assistance for HD decoding given the performance they've disclosed in that area previously.

(It'd be interesting if you could elaborate on your comment though!)
 
Actually its probably broken down closer to this:

(PS)24*2 ALUs each of which can issue 2 instructions in co-issue = 96 inst +
(PS)24 Misc Ops (aka 16b NRM) = 24 inst
(VS)8 VALUs each of which can issue 1 instructions = 8 inst +
(VS)8 SALUs = 8
Total = 136


That makes sense. And based on what we know about Xenos's 3x16 arrangement and the "33%" comment, it seems likely to be able to perform 16 norm's per clock. ( 9 flops/norm * 16 = 144 flops ) So equivalent instruction count for Xenos's comparable execution units:

(UNI)48 ALUs each of which can issue 2 instructions = 96 inst +
(UNI)16 Misc Ops (aka norm) = 16 inst
Total = 112
 
ralexand said:
mckmas8808 said:
Another point risen from Anandtech

Even though features haven't been added to the vertex and pixel shaders directly, the increased power will allow game developers more freedom to generate more incredible and amazing experiences. Though not seen in any game out now or coming out in the near term, the 7800 GTX does offer the ability to render nearly "Sprits Within" quality graphics in real-time. Games that live up to this example (such as Unreal Tournament 2007) still have quite a ways to go before they make it into our hands and onto our hardware, but it is nice to know the 7800 GTX has the power to run these applications when they do come along.

WOW!!! :oops: Do you guys believe the conclusion that they have arrive to? :?:
I'm more shocked that he considers UT2007 equivalent to Spirits Within.

Games that live up to this example (such as Unreal Tournament 2007) still have quite a ways to go before they make it into our hands and onto our hardware...

They didn't.
 
Chalnoth said:
Jawed said:
But designing a GPU for "peak" is clearly not working.

In all these reviews, the best case we're seeing is a 50% speed-up over 6800 Ultra in shader-limited cases. That 50% speed-up can be entirely explained by increased pipelines and clock.
One of the shader benches here at B3D showed 119% speedup over the 6800 Ultra. Two of the Shadermark shaders showed less than 50% speedup (which can likely be attributed to either an immature compiler, or the limitation not being in the shader). Most were much higher.

Yes those synthetic shader benchmarks got faster. But nobody plays them, do they?

I'm referring solely to game benchmarks, which show no speed-up beyond that implied by core clock increase combined with vertex and pixel pipe increases.

Jawed
 
two said:
ralexand said:
mckmas8808 said:
Another point risen from Anandtech

Even though features haven't been added to the vertex and pixel shaders directly, the increased power will allow game developers more freedom to generate more incredible and amazing experiences. Though not seen in any game out now or coming out in the near term, the 7800 GTX does offer the ability to render nearly "Sprits Within" quality graphics in real-time. Games that live up to this example (such as Unreal Tournament 2007) still have quite a ways to go before they make it into our hands and onto our hardware, but it is nice to know the 7800 GTX has the power to run these applications when they do come along.

WOW!!! :oops: Do you guys believe the conclusion that they have arrive to? :?:
I'm more shocked that he considers UT2007 equivalent to Spirits Within.

Games that live up to this example (such as Unreal Tournament 2007) still have quite a ways to go before they make it into our hands and onto our hardware...

They didn't.

Um, yes they did. The rest is just saying UT2007 and similar games won't be out for awhile.
 
What bothers me is the Jen-Hsun Huang said at E3 the RSX will have the power of two 6800 Ultras. And at the same time, G70 has been widely shown and said to have the power of two 6800 Ultras. This does not fit in with the comments that the RSX is one generation beyond the G70.
 
Jawed said:
Yes those synthetic shader benchmarks got faster. But nobody plays them, do they?

I'm referring solely to game benchmarks, which show no speed-up beyond that implied by core clock increase combined with vertex and pixel pipe increases.
Well, check out the B3D review, then. Once the system is no longer CPU-limited, the card is often higher-performing than just the core speed increase would indicate. This is really impressive given that the memory bandwidth didn't increase by nearly as much.

Examples:
(All at 1600x1200 8xS, 16-degree anisotropy)
UT2004: +73.7%
Far Cry: +57.5%
HL2: +76.2%
Doom3: +57.6%

Edit:
(note: pixel processing, by number of pipelines/clockspeed should be up 61.3%, memory bandwidth is up 9.1%)

But, the real problem is that you can't really discern how the above performance increases were related to the various changes that were made in the core, since there could be any number of other efficiency improvements that allow these high performance figures than just shader improvements.

So, if you want to talk about shader performance, you really need to use synthetic benchmarks. If you see performance increases in simple synthetic benchmarks that both work hard to use shaders used in-game as well as remove other limitations to performance, then you can make an educated statement about shader performance.
 
SanGreal said:
From the chart in the beginning of the thread the RSX and G70 look almost identical. Was that all discredited at some point in the thread?
Not by the thread, but by Nvidia themselves. Point your browser to any site that covers Nvidia's G70 launch event.
 
SanGreal said:
From the chart in the beginning of the thread the RSX and G70 look almost identical. Was that all discredited at some point in the thread?
Probably you mean its power, but FWIW, from your anand link
Anand said:
There will definitely be some differences between the RSX GPU and future PC GPUs, for a couple of reasons:

1) NVIDIA stated that they had never had as powerful a CPU as Cell, and thus the RSX GPU has to be able to swallow a much larger command stream than any of the PC GPUs as current generation CPUs are pretty bad at keeping the GPU fed.

2) The RSX GPU has a 35GB/s link to the CPU, much greater than any desktop GPU, and thus the turbo cache architecture needs to be reworked quite a bit for the console GPU to take better advantage of the plethora of bandwidth. Functional unit latencies must be adjusted, buffer sizes have to be changed, etc...

We did ask NVIDIA about technology like unified shader model or embedded DRAM. Their stance continues to be that at every GPU generation they design and test features like unified shader model, embedded DRAM, RDRAM, tiling rendering architectures, etc... and evaluate their usefulness. They have apparently done a unified shader model design and the performance just didn't make sense for their architecture.

NVIDIA isn't saying that a unified shader architecture doesn't make sense, but at this point in time, for NVIDIA GPUs, it isn't the best call. From NVIDIA's standpoint, a unified shader architecture offers higher peak performance (e.g. all pixel instructions, or all vertex instructions) but getting good performance in more balanced scenarios is more difficult. The other issue is that the instruction mix for pixel and vertex shaders are very different, so the optimal functional units required for each are going to be different. The final issue is that a unified shader architecture, from NVIDIA's standpoint, requires a much more complex design, which will in turn increase die area.

NVIDIA stated that they will eventually do a unified shader GPU, but before then there are a number of other GPU enhancements that they are looking to implement. Potentially things like a programmable ROP, programmable rasterization, programmable texturing, etc...
 
one said:
SanGreal said:
From the chart in the beginning of the thread the RSX and G70 look almost identical. Was that all discredited at some point in the thread?
Probably you mean its power, but FWIW, from your anand link
Anand said:
There will definitely be some differences between the RSX GPU and future PC GPUs, for a couple of reasons:

1) NVIDIA stated that they had never had as powerful a CPU as Cell, and thus the RSX GPU has to be able to swallow a much larger command stream than any of the PC GPUs as current generation CPUs are pretty bad at keeping the GPU fed.

2) The RSX GPU has a 35GB/s link to the CPU, much greater than any desktop GPU, and thus the turbo cache architecture needs to be reworked quite a bit for the console GPU to take better advantage of the plethora of bandwidth. Functional unit latencies must be adjusted, buffer sizes have to be changed, etc...

We did ask NVIDIA about technology like unified shader model or embedded DRAM. Their stance continues to be that at every GPU generation they design and test features like unified shader model, embedded DRAM, RDRAM, tiling rendering architectures, etc... and evaluate their usefulness. They have apparently done a unified shader model design and the performance just didn't make sense for their architecture.

NVIDIA isn't saying that a unified shader architecture doesn't make sense, but at this point in time, for NVIDIA GPUs, it isn't the best call. From NVIDIA's standpoint, a unified shader architecture offers higher peak performance (e.g. all pixel instructions, or all vertex instructions) but getting good performance in more balanced scenarios is more difficult. The other issue is that the instruction mix for pixel and vertex shaders are very different, so the optimal functional units required for each are going to be different. The final issue is that a unified shader architecture, from NVIDIA's standpoint, requires a much more complex design, which will in turn increase die area.

NVIDIA stated that they will eventually do a unified shader GPU, but before then there are a number of other GPU enhancements that they are looking to implement. Potentially things like a programmable ROP, programmable rasterization, programmable texturing, etc...

So that means RSX is basically G70 with a higher bandwidth to the cell processor.
 
nAo said:
version said:
mpeg4 removed
a SPE faster than 8 vertex shader removed too

32 pixelshader+4 redundancy pixelshader will be fine
I love you version, cause you keep mixing dreams with reality, go on dude, don't stop please :)
I think a reasonable and optimistic list is:
- video decoding
- ultrashadow
- stuff that can be considered 'legacy'
+ FlexIO interface (definitely there)
+ logic to access XDR and allow CELL CPU to access DDR (definitely there, details and performance unknown)
+ logic, cache and buffers to accomodate the expected very high input and output to/from CELL CPU.

Looks reasonable and the tran counts may hover around the same range.

What is very interesting to me is actually the power consumption of ~100W for a 110nm chip. If both CELL CPU and RSX are at 90nm, the PS3 console debut models may hover around north of 100W consumption. Higher than last gen, but much smaller than what I was expecting this gen, considering the tech present within the box.
 
passby said:
What is very interesting to me is actually the power consumption of ~100W for a 110nm chip. If both CELL CPU and RSX are at 90nm, the PS3 console debut models may hover around north of 100W consumption. Higher than last gen, but much smaller than what I was expecting this gen, considering the tech present within the box.
Yes, it's an amazing feat as it is supposed to cram more than a big PS3 dev kit into that console box, with a power supply unit. PS3 uses this technology for its internal power supply unit while Xbox 360 uses an external AC adopter.
 
I think a reasonable and optimistic list is:
- video decoding
- ultrashadow
- stuff that can be considered 'legacy'
+ FlexIO interface (definitely there)
+ logic to access XDR and allow CELL CPU to access DDR (definitely there, details and performance unknown)
+ logic, cache and buffers to accomodate the expected very high input and output to/from CELL CPU.

Looks reasonable and the tran counts may hover around the same range.

It depends on how intergrated these things are . video decoding ? ultrashadow and legacy stuff may be very closely tide in .

I highly doubt nvidia will want ultrashadow gone . That is thier baby , i'm sure they are going to want devs to use it so that we will see it used in the pc more often .


Also time is important . Sony keeps saying spring . Is that enough time to rip out these thigns and replace other things while moving to 90nm and increasing the clock speeds ? I dunno. It will certianly be interesting to see what the key diffrences are
 
Back
Top