XBox 1 Backwards compatibility

aaronspink said:
ATI R500 48*.5 = 24 Giga Ops per cycle. Don't need to account to wait states because of multiple contexts provided in hardware.

Vince's PS3 on crack 16*4/160 = 400 Mega Ops per cycle.

Didn't we already discuss what an SPU is and didn't I mention it's widely believed that it can already handle 32 simultaneous contexts? Not taking into account the work of the main DMAC and PU in arbitration.

aaronspink said:
I'll provide you your answer: a large number of hardware contexts optimized to cover the texture fetch latency.

As I already stated, this isn't dependent upon the unified SIMD Vector|Scalar datapath output (which is what was being discussed), it's dependent upon the complex you build around it. A Synergistic Processor is just this.

aaronspink said:
A building block in the Sony design is not optimized for this workload because it can't handle the texture fetch latency as well as various other operations (sampling, filtering, early Z reject, etc) that are designed into the GPU.

I fully agree with you on the latter functions, which is what I believe Sony will utilize, the ROP/Pixel Engine functionality. I do think you're incorrect in your assessment of texture access effeciency.
 
Vince said:
Dave, we're all past the "fixed" verse "flexible" mentality. Tell me a new tale, Explain to me how.

Ok, sure, but I'm not quite sure you are going to understand.


We have, basically, analogous unified SIMD pathways.

This would be incorrect.

We have roughly similar complexes built around them.

Where is the texture filtering? ROBs? Where is the Z reject? The complexes have striking differences.

One is a highly tuned SOI processor that clocks almost 10X as high and offers full flexibility and the other doesn't.

A 10 Ghz 386 still isn't going to out perform a 1 Ghz Alpha or 1.3 Ghz Power4 or 1 Ghz Pentium 3 or a 1 Ghz Athlon. Just cause something has a high clock speed doesn't mean its fast.

Aaron Spink
speaking for myself inc.
 
aaronspink said:
This would be incorrect.

Where is the difference?

aaronspink said:
Vince said:
We have roughly similar complexes built around them.

Where is the texture filtering? ROBs? Where is the Z reject? The complexes have striking differences.

I was refering to the ALUs, which are disjointed from the raster-ops in contemporary GPUs AFAIK. The NV40 is like this concerning the shading complexes/units which are composed of 2 ALUs and other associated units which are physically independent of the ROP/Z/Stencil functionality. I assume the X2's unified shaders will follow in this.

A 10 Ghz 386 still isn't going to out perform a 1 Ghz Alpha or 1.3 Ghz Power4 or 1 Ghz Pentium 3 or a 1 Ghz Athlon. Just cause something has a high clock speed doesn't mean its fast.

When the contructs are similar as they are in the unified SIMD Scalar|Vector processors, it surely applies. As I said concerning the answer that the difference lies in the numbers of units, it's not relevant if the compuitational resources lie along the temporal or spatial axis.
 
aaronspink said:
Vince said:
Dave, we're all past the "fixed" verse "flexible" mentality. Tell me a new tale, Explain to me how.

Ok, sure, but I'm not quite sure you are going to understand.


We have, basically, analogous unified SIMD pathways.

This would be incorrect.

We have roughly similar complexes built around them.

Where is the texture filtering? ROBs? Where is the Z reject? The complexes have striking differences.

One is a highly tuned SOI processor that clocks almost 10X as high and offers full flexibility and the other doesn't.

A 10 Ghz 386 still isn't going to out perform a 1 Ghz Alpha or 1.3 Ghz Power4 or 1 Ghz Pentium 3 or a 1 Ghz Athlon. Just cause something has a high clock speed doesn't mean its fast.

Aaron Spink
speaking for myself inc.

How about a 1.5ghz 386 with 64MB L1 cache and 256MB L2 cache?(btw, why did you do straight out 1ghzes but did 1.3 for the power4...afaik alpha has the fastest mhz per mhz performance, then athlon, then power4, then p3 in last)
 
Tuttle said:
aaronspink said:
Tuttle said:
Uh no.


With NVIDIA's only experience in console hardware being the horrendously overpriced and underperforming xbox GPU, you can be certain that Sony has NVIDIA on a tight leash design-wise.

Hmm, well lets see. Overpriced? Underperforming? Compared to what? I'd wager no small amount of money that the production costs for the Nvidia GPU and associated memory is cheaper than the Sony GS. Now what MS pays may not be, but that is a different issue.

Performance would have to the the Nvidia GPU.

The point being that you don't need experience in console hardware design to design graphics and experience in console hardware design does not equal experience in graphics hardware design.

The only reason that Nvidia is designing anything for PS3 is because Sony realized that its internal designs would not be sufficient. I would suspect that the leash that Sony has on Nvidia is made out of a wet noodle.


Aaron Spink
speaking for myself inc.

You don't actually believe that do you? Do you actually think Sony's GS is more costly than the xbox GPU???

After doing little more than giving MS a big fat expensive peecee video card to bolt on to the xbox, NVIDIA will have hopefully been given quite an education in console GPU desgin over the past two years from Sony.
I don't know how you can fault Nvidia for the Xbox having a PC like design with limited bandwidth. The console design for PS3 is done by Sony as Microsoft chose the console design for Xbox. So if you have a problem with Xbox as a whole blame Microsoft.

In PS3 I expect specs like memory bandwidth to be dictated by Sony. As far as the graphics chip design is concerned I'd bet Sony suggested some features, but for the most part the design is all Nvidia. It seems Vince is thinking Sony will help design the ALUs to run at a faster clock rate. Maybe. I don't think anyone reading Beyond3D, that would talk, can answer Vince's question for sure.
 
3dcgi said:
I don't know how you can fault Nvidia for the Xbox having a PC like design with limited bandwidth. The console design for PS3 is done by Sony as Microsoft chose the console design for Xbox. So if you have a problem with Xbox as a whole blame Microsoft.

In PS3 I expect specs like memory bandwidth to be dictated by Sony. As far as the graphics chip design is concerned I'd bet Sony suggested some features, but for the most part the design is all Nvidia. It seems Vince is thinking Sony will help design the ALUs to run at a faster clock rate. Maybe. I don't think anyone reading Beyond3D, that would talk, can answer Vince's question for sure.

Coulda Shoulda Woulda. The fact remains. Nvidia has one entry on their console hardware resume, and it is a ugly one.

There is no chance Sony just "suggesting features." None. The PS3 GPU is going to be a Sony design from top to bottom. End of story. The low level details of that design will most likely be Nvidia though.
 
Tuttle said:
There is no chance Sony just "suggesting features." None. The PS3 GPU is going to be a Sony design from top to bottom. End of story. The low level details of that design will most likely be Nvidia though.
You're correct about the "suggesting features" quote. Of course they will dictate the features they'd like to see if Nvidia's design didn't already support them. I just mistyped that. I don't see how you're so sure Sony is designing the GPU from top to bottom though. I wouldn't believe that unless I heard it straight from Nvidia.
 
Tuttle said:
There is no chance Sony just "suggesting features." None. The PS3 GPU is going to be a Sony design from top to bottom. End of story. The low level details of that design will most likely be Nvidia though.
Ok, ok. We can just wait for public details to emerge after the presentation in February - and hopefully before the close of Q4 '04. I mean we've already waited like what? 2 years for some half-obsured Cell details. We can wait another 4 months. :oops: Until then the scoop of this discussion will keep hiting a wall.
 
In PS3 I expect specs like memory bandwidth to be dictated by Sony. As far as the graphics chip design is concerned I'd bet Sony suggested some features, but for the most part the design is all Nvidia. It seems Vince is thinking Sony will help design the ALUs to run at a faster clock rate. Maybe. I don't think anyone reading Beyond3D, that would talk, can answer Vince's question for sure.

IMO there are few scenarios that can happend,

1. Like Dave suggested, PS3 GPU is NV next generation PC part, known as the NV50 that has been rumoured cancelled.

2. Like the first scenario, but Sony is helping NV to get the clock speed up and power down, with their 90nm fabs, which will be quite mature by the time PS3 GPU needs to go into production.

3. Sony contributes the Synergistic Processors, and NV contributes the rasterizer with their pixel shading technology.

4. Like the third, but NV modifies and restructures the synergistic processor to tune it for graphics work.

Just my two cents.
 
Fox5 said:
How about a 1.5ghz 386 with 64MB L1 cache and 256MB L2 cache?

When you say 386, do you mean the actual old-timer 386 chips?

Even with the impossibly huge caches, the 386 would lose, and lose badly. A simple register to register add takes 2 cycles on a 386 (memory register adds took a minimum 7 cycles). Rudimentary pipelining could shave off a cycle if no address offsets were needed for the next instruction, meaning if the only thing you wanted to do was only repeated register adds, 1.5 billion adds a second are possible.

On an alpha EV6, even without caches that huge, something so simple as endlessly adding can be dispatched 4 times a clock cycle. So to match a 1 Ghz alpha on insanely pointless code, this 386 would have to go 4 Ghz doing only a little more than nothing.

On any code not trivially easy, the lack of pipelining would slam the single-issue 386 down to an execution rate of a fraction of an instruction per cycle. This is assuming the giant cache removes all memory considerations (it won't, especially not with the ISA exposed to a 386).

Far smaller caches have something higher than 80% of that effectiveness already, and with the die area of cache that huge, you could fit a half dozen alphas with far better effect.

A modern processor, unless you are deliberately telling it to idle, will fail miserably at cutting its execution rate anywhere near that low, and it will have floating point capability to boot.

edit: not to say that superscalar processors are all that great at getting more than 1 instruction executed per clock thanks to memory latency and other hazards. I think someone gave an average of .8 instructions per clock for a K8. If the old chips weren't orders of magnitude worse, modern chips would look much less impressive.
 
fox5 said:
and I've never seen a mass market drive or adapter to play the original games of 1 system on another.

besides the SMS->GG (doesn't really count imho because they were effectivly the same hardware) and SMS->gen adapters already mentioned...

nintendo has released the super gameboy (gb->snes) and gameboy player (gb/gbc/gba->gc)

bleemcast was planned and sort of released (in single game packs) for dc (psx->dc)

3do had a hardware card that allowed you to play 3do games on a pc

apple used to sell pc cards (effectivly a pc on a card to run pc apps on a mac)

there was a few psx emulators comercialy available for pc and mac (bleem!, CVGS)

datel released a product known as the "advance game port" for the gc that plugged into the memory card slot on the GC and played gba games (no gb/gbc compatability)

speaking of datel, and in the "how can this be legal" slot, the action replay max for playstaion 2 offers a slew of features including a "retro game emulator". apparently (a friend has one) you can play genesis games on it pretty well, and i've heard rumors that it supports snes also.
http://www.datel.co.uk/products.asp

action replays sell pretty well, too.
 
DaveBaumann said:
So why use that if you can get better performance out of units dedicated to the task that its required to to for similar or smaller die sizes?

I had forgotten, but was reminded, that your claims of smaller die size in the context of the ALU is incorrect. Currently, on both the NV40, the R3xx derived, and I believe the X2 ALUs utilize seperate Scalar|Vector constructs placed in parallel per ALU 'axis.' Gschwind, Hofstee and Altman's APU is based around a unified SIMD Vector|Scalar construct that appears to be more area effecient and which doesn't have the drawbacks seen when using SIMD subword parallelism via partitioned Scalar constructs.

I believe your entire argument is suspect and untenable, which we'll see more clearly come ISSCC when the entire SPU complex is presented.
 
Just a question Vince.

Do you believe that Toshiba's "rumored" GPU for PlayStation 3 was CELL based in any or some way ?

Just asking... ;).
 
Currently, on both the NV40, the R3xx derived, and I believe the X2 ALUs utilize seperate Scalar|Vector constructs placed in parallel per ALU 'axis.'

Nope. Only on the VS currently, not for the PS. (PS on NV40 and R300 are vector, but capable of a co-issue split - up to Vec3 and scalar for R300, upto Vec3 and scalar or Vec2 + Vec2 on NV40).
 
DaveBaumann said:
Currently, on both the NV40, the R3xx derived, and I believe the X2 ALUs utilize seperate Scalar|Vector constructs placed in parallel per ALU 'axis.'

Nope. Only on the VS currently, not for the PS. (PS on NV40 and R300 are vector, but capable of a co-issue split - up to Vec3 and scalar for R300, upto Vec3 and scalar or Vec2 + Vec2 on NV40).

Perhaps read the entire post first Dave, maybe then respond to what was being talked about?
 
You talked about what you thought was the PS structure Vince, I was correcting your incorrect ideas of that structure.
 
First of all, according to your own site, R3xx has parallel Vector and Scalar constructs, yes or no? And you totally missed my part about achieving subword parallelism and the drawbacks by partitioning for some bizzare reason...
 
First of all, according to your own site, R3xx has parallel Vector and Scalar constructs, yes or no?

As your previous post said the Vertex Shader has a parallel vector and scalar ALU processor capabilites such that a "5D" operation can be achieved in a single cycle per unit, the Pixel Shaders on the other hand do not; they are a "4D" unit only (but have differing co-issue capabilties when up to 4 ops are required in two instructions).
 
DaveBaumann said:
As your previous post said the Vertex Shader has a parallel vector and scalar ALU processor capabilites such that a "5D" operation can be achieved in a single cycle per unit, the Pixel Shaders on the other hand do not; they are a "4D" unit only (but have differing co-issue capabilties when up to 4 ops are required in two instructions).

I do believe it was you making the assumption (which I responded and told you to respond to what was written) and it was you who selectively quoted, leaving out the second half of the post.
 
Back
Top