Second Gen Cell info

one said:
How about the overhead in thread context switches on Pentium4? Hyperthreading limits the effective size of a cache, hence less mem bandwidth...

Who says you'd need to context switch? You'd just batch process the current frame for all the streams on one thread. You have a whole 1/60 of a second before you need to display any of them, so as long as you can finish processing them all before then, you're fine.
 
Blazkowicz_ said:
The similarity is you still see 25 (it was maybe 20, or 30, whatever) moving images at 25fps (no 30fps NTSC junk :devilish: ), and I was replying about Tacitblue's idea, which is good but has been implented more simply. could you hear anything if you heard all that crap at the same time? It's already painful when Marty junior watches six channels at the same time in 2015 :LOL:
OK TV guide seems a bit verbose example, it's useful when Marty junior previews his personal pr0n vid library in 2020 :LOL:
 
aaaaa00 said:
one said:
How about the overhead in thread context switches on Pentium4? Hyperthreading limits the effective size of a cache, hence less mem bandwidth...

Who says you'd need to context switch? You'd just batch process the current frame for all the streams on one thread. You have a whole 1/60 of a second before you need to display any of them, so as long as you can finish processing them all before then, you're fine.
What if those 6 SPEs employed in the demo were not at 100% load? They might be 6 just to form a pipeline.
 
ERP said:
Not by my math, according to the slide those were SD streams not HD
Oops you're right, it was late last night when I posted and I read DTV there instead of SDTV :p Anyway still, as noted below, shouldn't that be 720x480/576? (That's what DVD movies are encoded at anyway - heck even our PS2 FMVs are 720xsomething.

VNZ said:
It'd be more interesting to hear about the bitrate, though.
Well it depends on their implementation - but that may not make much difference for performance.

On PS2, the IPU is rated at 768cycles per Macroblock for decoding, and realworld measured performance precisely matches that (~95MPix/sec) regardless of what bitrate I tried (although that may also be because it's fixed-hw solution).
I've tested I-Frames encoded as large as 200KB for 640x480, and as low as 20KB(which'd make 10x bitrate difference in I-Frame only stream) for the same dimensions and I'd get same speed in both cases.

Admitedly, and I've only run speed tests for my area of interest - I-Frame only streams, as I needed explicit frames for in-game texture compression, but those kind of streams are the worst case scenario for bitrate anyhow :p
 
Your right about the resolution of course.

(~95MPix/sec)

Shock Horror ----- PS3 only 5x faster than PS2 ------

Sorry couldn't resist.

I'd have guessed the real problem with decoding 48 Mpeg 2 streams would be reading the source fast enough unless they were interleaved in some fashion.
 
ERP beat me to it. I was going to comment that the most interesting thing about the demo was that the system can sustain that kind of data input - provided as ERP pointed out - the data is not interleaved. I think we can safely assume it is not - else this demo is useless.

The latest top-end CPUs *may* have the raw performance to do something similar, but I *suspect* they may choke long before they reach that number...

After all it has been pointed out again and again that for CPUs today, it's not so much how fast it can run, but rather how quickly they can be fed.
 
Even at standard DVD resolution and quality that's still only 48MB/s. Are you saying a HDD can't sustain 48MB/s? Not only that you can also buffer the streams into RAM. I would think a couple of Gigs would be enough.
 
It's more sustaining 48 distinct streams that's difficult.

Seek times on DVD are hideous, better on HD obviously, but it's still not simple. Just seeking between the streams eats up the bandwidth. Using a large enough RAM buffer would let you do it, but I suspect that would be a lot of RAM for the 48 buffers.

That's why I qualified unless they are interleaved in some way. At that point you require no seeks and it's all down to bandwidth.

If I were a betting man I'd guess that the streams were in fact interleaved, but thats because the point of the demo seems to be the decoding speed, not the streaming capability.
 
ERP said:
Using a large enough RAM buffer would let you do it, but I suspect that would be a lot of RAM for the 48 buffers.

Not so sure I'd agree. At DVD bitrate, we're talking 1 MB/s per stream.
48 streams = 48 MB/s.

So a GB of RAM is enough to buffer 21 seconds. (I don't think you'll need anywhere near that much buffer though.)

If you read the streams in 1MB chunks from 2 hard disks, you should be able to meet the I/O throughput requirement pretty easily, even with the seeks -- 9ms seek time means you can do about 100 seeks a second... so with 48 streams/1MB chunks you'd spend half your time seeking, and the other half of the time you can be reading data.

Not that this level of HW is practical to ship in a cheap consumer device, but for the purposes of the demo, it seems pretty straightforward to accomplish -- we haven't even gotten into anything semi-exotic (like FibreChannel or RAID) in terms of I/O hardware.
 
Jaws said:
Whether the NV5x|G7x custom GPU for PS3 is on a UMU or NUMA setup (with or without eDRAM), FlexIO will still provide ballpark bandwidth figures for 'data flow' to/from the R500 / Xenon UMA setup and the off-chip eDRAM module (16+32 read/ write GB/s from 'leak').

So FlexIO is not really 'overkill' IMHO.

That's 10 GB/s more each way, compare to what you expected Xenon to be. And much more compare to what NV expected for PC, which they design their next gen GPU around and only adapt it to work with Cell.
 
V3 said:
Jaws said:
Whether the NV5x|G7x custom GPU for PS3 is on a UMU or NUMA setup (with or without eDRAM), FlexIO will still provide ballpark bandwidth figures for 'data flow' to/from the R500 / Xenon UMA setup and the off-chip eDRAM module (16+32 read/ write GB/s from 'leak').

So FlexIO is not really 'overkill' IMHO.

That's 10 GB/s more each way, compare to what you expected Xenon to be.

Well that's nearly a 50% increase in bandwidth for a product supposedly shipping ~ 6 months later in 2006. Bandwidth growth has always been slower than other performance metrics. But that % increase doesn't sound too unreasobale IMHO.

V3 said:
And much more compare to what NV expected for PC, which they design their next gen GPU around and only adapt it to work with Cell.

The GPU has nothing to do with PC limitations like PCI-e. It only shares technology from NV's next gen GPU not bandwidth limitations of PC's.

Afterall what consoles lack in total RAM, they always make it up with bandwidth...so in 2006, I still expect that to be true...

Also don't assume work started on the GPU 'only' last December 2004...
 
BTW the presentation by Toshiba is
A CELL Software Platform for Digital Media Application
Seiji Maeda, Shigehiro Asano, Tomofumi Shimada, Koichi Awazu, and Haruyuki Tago (Toshiba)
so it's more about the demonstration of Toshiba's software platform for Cell.
 
one said:
BTW the presentation by Toshiba is
A CELL Software Platform for Digital Media Application
Seiji Maeda, Shigehiro Asano, Tomofumi Shimada, Koichi Awazu, and Haruyuki Tago (Toshiba)
so it's more about the demonstration of Toshiba's software platform for Cell.

Any word on if there were other examples given, or was this the only demo?

But yeah, sounds more like it was demoing the software platform than Cell's raw power..
 
PC-Engine said:
How many streams would a hyperthreading 3.5GHz dual core P4 be able to decode?

Cross platform comparisons like that are more or less meaningless wouldn't you say? The overheads of a pc operating system environment aka software interrupts from background processes or other programs means you'll never really get a fair or decent comparision. Hell if you ran SuperPi on the Cell it would undoubtedly monster anything out there but again apples and oranges. If you tried it though you'd have to run tests with both software or hardware decoding.

I think its a bit easy to see the kind of power that a streaming optimised processor like Cell has, when bottom line that's its strength, it was designed for that. a Pentium D with hyperthreading (isn't that the EE?) is not only prohibitably expensive (the processor all by itself costs more than a couple of PS3's expected cost) just has more burden from the bloatware it runs to keep us happy at work......or frustrated, depends on the time of day and what your deadlines are. :LOL:
 
Tacitblue said:
PC-Engine said:
How many streams would a hyperthreading 3.5GHz dual core P4 be able to decode?

Cross platform comparisons like that are more or less meaningless wouldn't you say? The overheads of a pc operating system environment aka software interrupts from background processes or other programs means you'll never really get a fair or decent comparision. Hell if you ran SuperPi on the Cell it would undoubtedly monster anything out there but again apples and oranges. If you tried it though you'd have to run tests with both software or hardware decoding.

I think its a bit easy to see the kind of power that a streaming optimised processor like Cell has, when bottom line that's its strength, it was designed for that. a Pentium D with hyperthreading (isn't that the EE?) is not only prohibitably expensive (the processor all by itself costs more than a couple of PS3's expected cost) just has more burden from the bloatware it runs to keep us happy at work......or frustrated, depends on the time of day and what your deadlines are. :LOL:

The EE has 2 MB cache on the chip. Standard P4s (above a certain revision, dunno which one) have hyperthreading as well.
 
a688 said:
The EE has 2 MB cache on the chip. Standard P4s (above a certain revision, dunno which one) have hyperthreading as well.

Yeah but the Pentium D's (dual-core) do not have hyperthreading. Only the Extreme Edition dual core chip has hyperthreading in the 'D' arena.
 
xbdestroya said:
a688 said:
The EE has 2 MB cache on the chip. Standard P4s (above a certain revision, dunno which one) have hyperthreading as well.

Yeah but the Pentium D's (dual-core) do not have hyperthreading. Only the Extreme Edition dual core chip has hyperthreading in the 'D' arena.


Oh so when i go to the shop to buy one i have to tell them "Can i please have a Pentium D dual core Extreme Edition dual core with hyperthreading please? Dual core. Extreme. Please.".

Talk about a mouthfull.
 
ppe.JPG


2 multiply-add VMX units, minimum 4 instructions /cycle with 2 threads
 
Back
Top