randycat99
Veteran
We aren't any closer to an answer of "orders of magnitude" or not (unless you stand behind your numbers claim), so I guess you lose on both counts.
randycat99 said:We aren't any closer to an answer of "orders of magnitude" or not (unless you stand behind your numbers claim), so I guess you lose on both counts.
nAo said:Decoded streams must be written (+1) and read (+1), that's why we have a 2x factor.JF_Aidan_Pryde said:Why 640 x 480 x 2? What's the x2?
Unfurtunately this is not the case.With resizing, don't you just have to read in the 48 streams again (1.47GB/s) and output at 1920x1024? With fancier filtering, the number of operations in the CPU will need to go up but bandwidth should still be no more than 1.47GB/s input, no?
Every filter with a domain that is larger than a single sample will require multiple samples per pixel.
This means sample values are going to be reused multiple times and since we don't have a cache that can hold a full decoded stream no matter what we do, some sample is going to be fetched multiple times from memory.
Obviously there are ways to improve this situation, like to store a decoded stream in some hierarchical tiled fashion, and filtering a tile at time, following a special tiling order to maximize sample reuse between tiles.
Cache would allievate such problems.JF_Aidan_Pryde said:Wouldn't cache fix such problems?
This is true, but IF cache can't store ALL the unique samples you need to filter the image it would have to drop some sample once you fetch new samples to filter, cause you can't visit al the tiles without dropping some edge (sooner or later you would walk all the tile but since cache can't hold all the samples some tile would be lost and the hw would have to fetch it another time)For example, if bilinear filtering is used, four samples are required for each output pixel. Surely the neighbouring pixels would be fetched into cache (like tiles as you said) and the values used multiple times to calculate the value of all resulting pixels.
I believe GPU caches are built in a way to maximize hit under bilinear filtering pattern, but the basic principle is the same.Isn't this the reason why GPUs don't need 4x memory bandwidth to do bilinear filtering? Is there something special about CPU cache that makes this scenario no longer valid?
Save that your hypothesis assumes this Toshiba demo is showing the absolute limits of Cell's performance. For all we know a single SPE can process 48 MP2 streams and they could have had a couple of hundred little images on screen. It's like seeing a Ferrari beat a Fiat 500 in a drag race with the Ferrari travelling no faster than 60 miles an hour. It still beats the Fiat, but it doesn't mean the Ferrari can ONLY do 60 MPH. We have no idea what the code optimisation was like. We don't know why one SPE wasn't used. It's possible the Cell processor choked and couldn't handle it. It's also possible that for demonstration purposes showing 8x6 thumbnail videos was clearer than showing 24x18 so Toshiba limited how many videos to work with. The key point being we DON'T KNOW and therefore cannot derive any sensible benchmarks from this demo.PC-Engine said:Anyway...I made a hypothesis based on some assumptions to conclude it's not orders of magnitude faster which I think is reasonable.
Why do you insult them then?If people want to disagree then fine, I have no problem with that.
What people? :? No-one here was saying that. We were just talking about what this demo does/doesn't show. Read through the first few pages of posts and the debate is polite and intelligent, with a jovial few smart-arse remarks. The first antagonist is you, as is the second. It then drops into a smiley ridden slag-fest with you trying to prove you're right over something that is of no concern to anyone. So what is Cell is or isn't a 1 teraflop uber-processor? No-one's lives are at stake! It wouldn't be the first time promises/hype never came true! Why are you so insistant on trying to convince everyone not to have any faith in Cell?It's the people that think in a vacuum ->CELL = 300GFLOPS therefore CELL = orders of magnitude faster that's the issue.
Shifty Geezer said:pahcman : No-one's getting annoyed at Cell not being 10^n times more powerful than other processors. We're getting annoyed at PCEngine's attempts to flog some understanding of Cell's performance from facts that haven't been proven yet. If he wasn't harping on about how wrong everyone was to think Cell's a wonder chip (which i don't think anyone here is really bothered about) there wouldn't be this argument of people trying to correct his manic behaviour.
JF_Aidan_Pryde said:...
Pentium 4 Dual Core 3.5GHz (250 Million transistors)
Single core FP performance: 3.5 x 4 (SSE) = 14 GFLOPS
Dual core = 28 GFLOPS
Cell (234 Million transistors)
Cell @ 3.5GHz = 3.5 x 8 (FMADD) x 8 SPEs = 224GFLOPS
...
Save that your hypothesis assumes this Toshiba demo is showing the absolute limits of Cell's performance. For all we know a single SPE can process 48 MP2 streams and they could have had a couple of hundred little images on screen. It's like seeing a Ferrari beat a Fiat 500 in a drag race with the Ferrari travelling no faster than 60 miles an hour. It still beats the Fiat, but it doesn't mean the Ferrari can ONLY do 60 MPH. We have no idea what the code optimisation was like. We don't know why one SPE wasn't used. It's possible the Cell processor choked and couldn't handle it. It's also possible that for demonstration purposes showing 8x6 thumbnail videos was clearer than showing 24x18 so Toshiba limited how many videos to work with. The key point being we DON'T KNOW and therefore cannot derive any sensible benchmarks from this demo.
Do you disagree with this?
Why do you insult them then?
What people? No-one here was saying that. We were just talking about what this demo does/doesn't show. Read through the first few pages of posts and the debate is polite and intelligent, with a jovial few smart-arse remarks. The first antagonist is you, as is the second. It then drops into a smiley ridden slag-fest with you trying to prove you're right over something that is of no concern to anyone. So what is Cell is or isn't a 1 teraflop uber-processor? No-one's lives are at stake! It wouldn't be the first time promises/hype never came true! Why are you so insistant on trying to convince everyone not to have any faith in Cell?
Jaws said:Unless I missed something?
JF_Aidan_Pryde's notes said:Notes:
* Pentium 4 calulation is for the SSE unit only (excludes regular FPU).
* Cell calculation is for SPEs only (excludes PPE).
* Assuming media applications, ie. regular single precision FP instructions
rendezvous said:Jaws said:Unless I missed something?
The notes!
JF_Aidan_Pryde's notes said:Notes:
* Pentium 4 calulation is for the SSE unit only (excludes regular FPU).
* Cell calculation is for SPEs only (excludes PPE).
* Assuming media applications, ie. regular single precision FP instructions
PC-Engine said:Jaws do you know what the double precision GFLOPS is for that same P4?