What's the Xenon<>Xenos bandwidth?

Shifty Geezer · Jul 14, 2005

I thought MEMEXPORT worked across the XB360's RAM, but talk here, on that and other points, suggests there's direct access of GPU to Xenon's L2 cache.

If so this is a strong mirror of PS3. Do we have any details on this connection and transfer rates? Anand talked of a modified DMA call which to an uneducated me suggest going through the main RAM buffer but avoiding latency and consuming RAM.

Titanio · Jul 14, 2005

From Dave's article:

That's basically how bandwidth looks. 21.6GB/s aggregate.

Tap In · Jul 14, 2005

not sure if this answers your question or not

but....

The Xenon's L2 is arranged as an n-way set associative cache (the value of n hasn't been publicly disclosed as of yet). A programmer can place a thread in write streaming mode, which means that the system will wire down (to use traditional virtual memory terminology) or lock (to use Microsoft's terminology) a set of cache lines and attach that set directly and exclusively to a particular thread. This set is then initialized as a FIFO queue, so that it can act as a write buffer to store the output of a data generation thread. The data generation thread feeds its vertex data output directly into this FIFO queue (bypassing the L1 cache entirely), and the GPU reads that vertex data directly from this queue using a modified DMA protocol. Thus the write buffer logically couples a single data generation thread to the GPU by acting as a conduit for vertex data.

http://arstechnica.com/articles/paedia/cpu/xbox360-1.ars/5

Shifty Geezer · Jul 14, 2005

Yes, TapIn, that's the Anand part I read about DMA requests. But it seems like it is all over the main RAM bandwidth. Which isn't really something to moan about because otherwise the two platforms would be even MORE similar than they currently are. We need some differences to argue about which is better!

Acert93 · Jul 14, 2005

Arstechnica's article mentions streaming vertex data directly from the CPU to the GPU via loching some of the L2 cache and suggests this as a way to save memory bandwidth and memory space.

I am not sure how this works. It would be interesting to know exactly how all the buses in the system work and if direct CPU<GPU> traffic limits memory bandwidth in any way.

Obviously I do not know

Squeak · Jul 15, 2005

Now that we are on the subject of 360s bandwidth: In Daves diagram it clearly says 32GB/s R/W between xenos main and daughter die.

Now some people, around the time the article was published, claimed that it really was 32GB/s write and 16GB/s read.

Can anyone confirm or disacknowledge that?

blakjedi · Jul 15, 2005

Did we ever settle on a total bandwidth value for x360?

the total i get from the diagram is:

21.6GB CPU<>GPU
22.4GB Northbridge <> Memory
32 GB Parent <> Daughter

76 GB Total

However here's where the math gets fuzzy

Squeak there IS supposed to be an asynchronous value in the the read write between the parent and daughter of 32GB Parent16 GB read value from Daughter <> Parent increasing Band to 92 GB but i dont know where that went.

Did we ever resolve the 256 gb issue and whether or not it was valuable for inclusion in the system bandwidth totals?

Titanio · Jul 15, 2005

blakjedi said:
Did we ever settle on a total bandwidth value for x360?

the total i get from the diagram is:

21.6GB CPU<>GPU
22.4GB Northbridge <> Memory
32 GB Parent <> Daughter

76 GB Total

However here's where the math gets fuzzy

Squeak there IS supposed to be an asynchronous value in the the read write between the parent and daughter of 32GB Parent16 GB read value from Daughter <> Parent increasing Band to 92 GB but i dont know where that went.

Did we ever resolve the 256 gb issue and whether or not it was valuable for inclusion in the system bandwidth totals?

You seem to wish to count total bandwidth everywhere in the system? You're mixing chip-to-chip bandwidth with memory bandwidth, which suggests that, and in that case you're missing lots of figures you could include (like bandwidth to cache etc.

)

IMO the bandwidth counts should be seperated, chip-to-chip, system memory, on chip memory etc.

blakjedi · Jul 15, 2005

Titanio said:
blakjedi said:

Did we ever settle on a total bandwidth value for x360?

the total i get from the diagram is:

21.6GB CPU<>GPU
22.4GB Northbridge <> Memory
32 GB Parent <> Daughter

76 GB Total

However here's where the math gets fuzzy

Squeak there IS supposed to be an asynchronous value in the the read write between the parent and daughter of 32GB Parent16 GB read value from Daughter <> Parent increasing Band to 92 GB but i dont know where that went.

Did we ever resolve the 256 gb issue and whether or not it was valuable for inclusion in the system bandwidth totals?

Click to expand...

You seem to wish to count total bandwidth everywhere in the system? You're mixing chip-to-chip bandwidth with memory bandwidth, which suggests that, and in that case you're missing lots of figures you could include (like bandwidth to cache etc. )

IMO the bandwidth counts should be seperated, chip-to-chip, system memory, on chip memory etc.

No... i remember this same discussion with shifty... for example most people count the 32GB (or 32GB up 16GB down) of bandwidth between parent and daughter because it is a chip to chip transfer where work is done that would normally utilise system memory bandwidth. The question was always how does the 256 GB fit in or does it? Shifty say well lets count the SPE Bandwidth between logic and caches too... which I dont think is necessarily the right approach.

j^aws · Jul 15, 2005

Squeak said:
Now that we are on the subject of 360s bandwidth: In Daves diagram it clearly says 32GB/s R/W between xenos main and daughter die.

Now some people, around the time the article was published, claimed that it really was 32GB/s write and 16GB/s read.

Can anyone confirm or disacknowledge that?

AFAICS, that 32 GB/sec is 'net' bandwidth. Otherwise you'd see bi-directional separate 'arrows' in the diagram. So you can't 'write' 32 GB/sec AND 'read' 16 GB/sec in a single cycle.

Shifty Geezer · Jul 15, 2005

After lots of careful consideration, at the end of the day aggregate bandwidths of any form are a totally useless metric - they mean absolutely nothing.

eg. I've two pipelines shipping oil. One has 16 junctions with each passing 1 million gallons an hour. The other has 4 junctions with each passing 2 million gallons an hour. So one has a total bandwidth of 16 million G/h, the other a total of 8 million. But of course the throughput of the latter is 2x the former.

Each area of bandwidth corresponds to one or more activities of the program, and it's relevance is only in that respect. eg. Cell has umpteen trigabytes of internal BW, but it's only of use to SPE code execution, so can be discounted for rendering processes. Cell has 35 GB/s to RSX but that doesn't contribute anything to reading program data or working on AI. 256 GB/s for frame buffer work isn't gonna help any with physics.

Looking at aggregates really tell us nothing. PS3 has, what, 22.5+25+35(Cell<>RSX)+5 (IO) = 87.5 GB/s. XB360 has a count of 22.5+32/48(+256 even) for maybe 326.5 GB/s going by some people's counting. Giving 326.5 vs 87.5 as comparisons is totally ridiculous when in the case of XB360, 256 GB/s of that is very specialist.

Perhaps BW could be counted as how much available for a purpose? You could create a table of things the consoles need to do and how much BW that functionality has. Something starting like (For PS3)

Game Code : 25 GB/s (let's keep things simple)
Textures : 22.5 GB/s
AI : 25 GB/s
CPU<>GPU communication : 35 GB/s
BackBuffer Processing : 22.5 GB/s

but even that's no use, as it doesn't show where savings are made. eg. Two platforms could both have 30 GB/s CPU to RAM, but if one shares that BW with GPU to send it data, and the other communicates with GPU over a seperate bus, it frees up that BW so more is in reality available.

At the end of the day, truth be told, when all's said and done, the BW figures cannot be considerd outside of the whole system. In essence looking at these platforms piecemeal is giving false impressions. Each whole system design, with it's whole approach to solving problems, must be considered as a whole, complete body. Then perhaps system wide comparisons can be made like 'in rendering a frame buffer, this system has the advantage' and 'procedurally generating geometry, this one seems like it'l be on top'. Facts and figures from dissected hardware should only be used to understand the systems, not compare them.

j^aws · Jul 15, 2005

Shifty Geezer said:
After lots of careful consideration, at the end of the day aggregate bandwidths of any form are a totally useless metric - they mean absolutely nothing.
...

I disagree. I hold a Masters in Engineering in a field where computer science borrowed a lot of methodologies from. Bandwidth is just a flow of data. There are very well known methodologies that model flows of various physical properties. This is really no dofferent. It's just a matter of understanding the methodologies!

What's the Xenon<>Xenos bandwidth?

Shifty Geezer

uber-Troll!

Titanio

Tap In

Shifty Geezer

uber-Troll!

Acert93

Artist formerly known as Acert93

Squeak

blakjedi

Titanio

blakjedi

j^aws

Shifty Geezer

uber-Troll!

j^aws

Similar threads