PS3 vs X360: Apples to Apples high level comparison...

Ok, so speaking of apples to apples...
Are the CPU cores in the X360 (3) and the Cell (1) exactly identical?
If this has already been addressed on this forum, I missed it.
 
This also has its own thread and is named almost the exact same as the question you just asked... It should be in the top 5 or so forum posts.
 
Mordecaii said:
This also has its own thread and is named almost the exact same as the question you just asked... It should be in the top 5 or so forum posts.
Yeah, I see. But at the time I clicked to post here, that thread was much further down the page. Besides, they weren't really getting anywhere with their discussion. Nobody really knows at the present time, I guess. So I retract the question until further notice.
 
Jawed said:
Well you've just "proven" that caching and data compression don't work.

You can't normalise memory bandwidths like that.


Shifty Geezer said:
Why would that be? Surely a listed figure as 25 GB/s is 25 GB/s for the RAM amount it connects to. Neither part has provided bandwidth/pin or bandwidth/megabyte RAM figures.

Vaan said:
...
So I don't find any method to measure or compare memory bandwidths between the two systems. Let's see in a couple of months with the full final specs.

Well you guys hit the nail on the head as the memory architecture is the hardest to normalise, hence my attempt to get something remotely comparable that at least considered the amount of RAM distribution and external bandwidth. Well, it's better than just plain 'adding dem up'...

X360 is a hybrid UMA and PS3 is a NUMA setup. So direct comparisons between memory architectures is difficult at the best of times. But this was a high level attempt and any serious memory architecture comparisons would need to take local, on die, global and cache setups etc...

Laa-Yosh said:
...
or is it like Nvidia's architecture where the pixel pipes' shader ALUs have to do it as well? And just how many ALUs are there per pixel pipe in the RSX?

P1010276.jpg


http://www.beyond3d.com/forum/viewtopic.php?p=522139

Well, this is what we know so far...in addition,

Kutaragi said:
...
For example, RSX is not a variant of nVIDIA's PC chip. CELL and RSX have close relationship and both can access the main memory and the VRAM transparently. CELL can access the VRAM just like the main memory, and RSX can use the main memory as a frame buffer. They are just separated for the main usage, and do not really have distinction.

This architecture was designed to kill wasteful data copy and calculation between CELL and RSX. RSX can directly refer to a result simulated by CELL and CELL can directly refer to a shape of a thing RSX added shading to (note: CELL and RSX have independent bidirectional bandwidths so there is no contention). It's impossible for shared memory no matter how beautiful rendering and complicated shading shared memory can do.
...

http://www.beyond3d.com/forum/viewtopic.php?p=528125#528125



blakjedi said:
...
Question is this the CPU Read bandwidth is only half that of the northbridge... wouldnt it have been better to have the same bandwidth for reads and writes as the GPU?

The north bridge has a high bandwidth throughput on the GPU because it's on die AFAIK...

Off-chip interconnects will naturally have lower bandwidths...

Qroach said:
So this isn't really an apples to apples comparrison?

Make your own fruit! :p

As I mentioned earlier, it's the closest I can get to normalising them, unless you can think of something even closer with what we know so far...?


Johnny Awesome said:
Not at all, but it was a nice attempt. :)

Thanks! :)

Riddlewire said:
Ok, so speaking of apples to apples...
Are the CPU cores in the X360 (3) and the Cell (1) exactly identical?
If this has already been addressed on this forum, I missed it.

They're not exactly identical. However the CELL PPE and XeCPU are both Power based, 12 Flops per cycle, 2-way SMT, in-order cores...
 
Xmas said:
Jaws said:
2) Dot products


-PS3

claimed PS3 ~ 51 billion dot products per second

Cell ~ 8 per cycle (7 SPU + VMX)

8*3.2GHz~ 25.6 billion dot products per second

RSX ~ 51-25.6 ~ 25.4* billion dot products per second

* deduced from claim

PS3 ~ 51 billion dot products per second
I think this is wrong, and despite panajev's explanations, I guess they counted half of the max ops per cycle for both, i.e. 12.8 billion for Cell and 37.4 billion for RSX. It's the same way NVidia balanced NV40, half the shader ops can be dot products.

Xenos is capable of 24 billion dot products per second. If you allocate 37.4 billion to RSX, that's a helluva increase considering they're both on 90nm, no? :oops:
 
Yeah that is a big difference. Didn't the Nvidia guy say that dots per second was one of the most important things in a GPU? Or was it shader ops/sec? In one of them he was saying that something had 250 shades of light or something. I don't know it was in the conference.
________
Honda dn-01
 
Last edited by a moderator:
Jaws said:
They're not exactly identical. However the CELL PPE and XeCPU are both Power based, 12 Flops per cycle, 2-way SMT, in-order cores...

Was there confirmation of this? Specifically the in-order bit?
 
Jaws said:
Xenos is capable of 24 billion dot products per second. If you allocate 37.4 billion to RSX, that's a helluva increase considering they're both on 90nm, no? :oops:

I had asked in the xenos thread whether the work output spoken of related to Xenos includes the edram or is just the shader part...

"However, using Sony's claim, 7 dot products per cycle * 3.2 GHz = 22.4 billion dot products per second for the CPU. That leaves 51 - 22.4 = 28.6 billion dot products per second that are left over for the GPU. That leaves 28.6 billion dot products per second / 550 MHz = 52 GPU ALU ops per clock."
 
Sony had nothing to do with that quote by the way... Just wanted to mention that before the cries of "OMG TEH EVVIL $ONY HYPE MACHINE!!!11!!1!" This guy is mostly talking about being able to download a person's brain into a computer so you didn't "truly" die and how it'll be possible by 2050...
 
I wonder what matrix were they using. My pocket calculator is 1000 more powerful then my brain when it comes to solving differential equations within a given time span.
 
http://www.extremetech.com/article2/0,1558,1818127,00.asp

The 48 ALUs are divided into three SIMD groups of 16. When it reaches the final shader pipe, each of the 16 ALUs has the ability to write out two samples to the 10MB of EDRAM. Thus, the chip is capable of writing out a maximum of 32 samples per clock. At 500MHz, that means a peak fill rate of 16 gigasamples. Each of the ALUs can perform 5 floating-point shader operations. Thus, the peak computational power of the shader units is 240 floating-point shader ops per cycle, or 120 billion shader ops per second at 500MHz

The 10MB of EDRAM is actually on a separate die, at least initially. As future process technologies become available, it is possible that it could be on the same piece of silicon as the GPU. Still, the EDRAM resides on the same package, and has a wide bus running at 2GHz to deliver 256GB/sec of bandwidth. That's a true 256GB/sec, not one of those fuzzy counting methods where the 256GB is "effective" bandwidth that accounts for all kinds of compression. The GPU writes the back buffer, Z buffer, and stencil buffer to the EDRAM. When it is finally able to drawn to the screen, the EDRAM transfers the back buffer to the 512MB of GDDR3 for scan-out. The EDRAM does not store any textures—the full 10MB gets pretty much filled up with 1280x720 HD resolution, including Z, stencil, and anti-aliasing sub-pixel samples.

There's even a little magic that happens at that phase. The EDRAM has built in logic to perform Z compare, alpha blending, and resolving anti-aliasing samples into pixels. Normally those operations happen on the GPU, and require not only valuable silicon real estate and on-chip caches, but eat into memory bandwidth as data has to go back and forth to the GPU from the main graphics RAM. ATI's solution of building that logic into the EDRAM where the back, Z, and stencil buffers live eliminates a lot of data transfer and save time and silicon space on the GPU die itself. Because of the bandwidth savings and absolutely massive bandwidth to EDRAM, the Xbox 360 should be able to perform frame buffer effects like motion blur, depth of field, or lens flare with incredible speed.

8)
 
TexT said:
Sony is at it again...

As an example of the advances being made, Pearson noted that Sony's new PlayStation 3 computer games console is 35 times as powerful as the model it replaced, and in terms of processing is "one percent as powerful as a human brain".

http://uk.news.yahoo.com/050522/323/fjiiv.html

Well if each generation is even just 20 times as powerful as the last one, it will only take about 3 or 4 generations for consoles to become as powerful as our brains, according to the guy... That's about 20-25 years...
And only for consoles.

Before we get there, Blue Gene will have taken over the world 5 times and a half and we'll all be slaves of the machines.
 
If there are 256 GBytes/s between GPU and edram why they underdesigned their ROPs and fill rate halves when rendering to 64 bits render targets?
I don't believe in the 256 GBytes/s number as the real bandwith between GPU core and edram.
 
I don't believe there's 256GB/s between GPU and EDRAM, either.

I do believe there's 256GB/s between the ROPs and the back-buffer.

Jawed
 
Jawed said:
I don't believe there's 256GB/s between GPU and EDRAM, either.

I do believe there's 256GB/s between the ROPs and the back-buffer.

Jawed

Isn't that already sort of agreed upon, and because of the ROPs and eDram being on the same die you can't really call it bandwidth either?

Or am I missing something? :oops:
 
For a while people were saying that the 256GB/s figure inside the EDRAM was fictional...

It was originally described as "effective".

It would appear it's real, not effective.

Though we're still waiting to get hard facts, so I'll continue to "believe" rather than treat it as a hard fact (admittedly, hard to do).

Jawed
 
Back
Top