First Cell demo (48 MPEG 2 Videos)

Discussion in 'Console Technology' started by mckmas8808, Apr 25, 2005.

Thread Status:
Not open for further replies.
  1. randycat99

    Veteran

    Joined:
    Jul 24, 2002
    Messages:
    1,772
    Likes Received:
    12
    Location:
    turn around...
    We aren't any closer to an answer of "orders of magnitude" or not (unless you stand behind your numbers claim), so I guess you lose on both counts.
     
  2. PC-Engine

    Banned

    Joined:
    Feb 7, 2002
    Messages:
    6,799
    Likes Received:
    12
    Uh no you might want to look up the definition of a single order of magnitude, then do the same for the plural form. :lol:

    If you want to believe CELL was running at 100MHz and at 10% utilization to prove that it is orders of magnitude faster then go ahead, it's water under the bridge at this point.

    If CELL was orders of magntitude faster then you can be sure even Toshiba would be trumpeting it. :wink:

    BTW why do you suppose the clock speed was secret? :lol:

    If I had a processor that could do that and only required a couple hundred MHz, I wouldn't hide that fact. The competition already knows the GFLOPS rating of a CELL. :wink:
     
  3. randycat99

    Veteran

    Joined:
    Jul 24, 2002
    Messages:
    1,772
    Likes Received:
    12
    Location:
    turn around...
    So you made a strawman argument that someone here is making claims that Cell will be 100+ times the fastest PC of current day??? Who here has made such a claim? Where do you place a single P4, right now? 20 GFLOPs? 40? 15? So somewhere you see a rising belief that Cell will be 1.5/2.0/4 TFLOPs??? Wow! You broke some serious news! :roll:

    Hey, know what? If a P4 is 20-ish, and a single order of magnitude brings 200, then Cell may not be so "off-track", afterall. Maybe that is where you went wrong with your argument?
     
  4. PC-Engine

    Banned

    Joined:
    Feb 7, 2002
    Messages:
    6,799
    Likes Received:
    12
    A 3.5GHz dual core HT P4 is not 20ish, besides GFLOPS isn't everything. Regardless both ERP and aaaaa00 must be arguing with that same strawman too. We have enough information to reasonably declare it's not orders of magnitude faster in realworld situations. We haven't even talked about other types of apps that favor a P4 not to mention apps that require double precision. :wink:

    Really if you want to believe CELL is orders of magnitude faster by simply comparing theoretical SP GFLOPS numbers then feel free nobody is stopping you. I'm more interested in how apps run on these processors that's one of the reasons why I brought up the DVD streams comparison.
     
  5. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    44,106
    Likes Received:
    16,898
    Location:
    Under my bridge
    randycat : Order of magnitude, though not particularly defined, is taken generally to mean 10x (at least when I was at uni.) Therefore a processor an order of magnitude more powerful than another is something like 10x more powerful.

    I would say PCEngine's observations, that if an HT DC P4 can manage 40 streams Cell isn't an order of magnitude higher, is a valid statement.

    pahcman : No-one's getting annoyed at Cell not being 10^n times more powerful than other processors. We're getting annoyed at PCEngine's attempts to flog some understanding of Cell's performance from facts that haven't been proven yet. If he wasn't harping on about how wrong everyone was to think Cell's a wonder chip (which i don't think anyone here is really bothered about) there wouldn't be this argument of people trying to correct his manic behaviour.

    PCEngine : What's stupid with your statements is that you've said we can't tell Cell performance from this as we don't know how this demo taxxed the Cell system (especially as it wasn't a hardware demo), and we haven't confirmed an HT DC P4 can handle 40 streams. The jury is out. The debate is ongoing. You've taken the comments of two programmers as fact and ignored other statements from other sources (including a programmer) who think otherwise.

    Add to that your a rude, arrogant twat who persists in insulting people but thinks that's okay, civilised, mature behaviour. :wink: :D

    You've all the intellectual capabilities of the fungus that grows on the rotting remains of a maggot :wink: :lol:
    You've the charm and social standing of a diarroeic camel's rectal discharges :wink: :D
    You've the reasoning capacity of a the wart on a pregnant baboon's unborn child :wink: :lol: :p
    You're a fat :lol: , retarded :wink: , spastic :lol: :lol: , nazi :wink: :lol: , m***** f***** gay nigger :wink: :p :D :lol: :cry: :lol: :p :p :lol:

    Mods ... can we have some rules that explain the point I illustrate above and PCEngine doesn't seem to get? This behaviour is totally out of place.
     
  6. PC-Engine

    Banned

    Joined:
    Feb 7, 2002
    Messages:
    6,799
    Likes Received:
    12
    :lol: Anyway...I made a hypothesis based on some assumptions to conclude it's not orders of magnitude faster which I think is reasonable. If people want to disagree then fine, I have no problem with that. It's the people that think in a vacuum ->CELL = 300GFLOPS therefore CELL = orders of magnitude faster that's the issue. To those people I'd ask how CELL compares in scientific computing apps that need DOUBLE PRECISION? :lol:
     
  7. JF_Aidan_Pryde

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    601
    Likes Received:
    3
    Location:
    New York
    Wouldn't cache fix such problems? For example, if bilinear filtering is used, four samples are required for each output pixel. Surely the neighbouring pixels would be fetched into cache (like tiles as you said) and the values used multiple times to calculate the value of all resulting pixels. Isn't this the reason why GPUs don't need 4x memory bandwidth to do bilinear filtering? Is there something special about CPU cache that makes this scenario no longer valid?
     
  8. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,400
    Likes Received:
    440
    Location:
    San Francisco
    Cache would allievate such problems.
    This is true, but IF cache can't store ALL the unique samples you need to filter the image it would have to drop some sample once you fetch new samples to filter, cause you can't visit al the tiles without dropping some edge (sooner or later you would walk all the tile but since cache can't hold all the samples some tile would be lost and the hw would have to fetch it another time)

    I believe GPU caches are built in a way to maximize hit under bilinear filtering pattern, but the basic principle is the same.
    GPUs don't need 4x the memory bandwith to bilerp texels but cache can't just reduce this bandwith requirements to 1x.
    That's why GPU caches are 'small' cause desginer don't want to capture ALL the texture, but they want just to reuse samples under a certain walkind order among texture tiles.

    EDIT: typos
     
  9. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    44,106
    Likes Received:
    16,898
    Location:
    Under my bridge
    Save that your hypothesis assumes this Toshiba demo is showing the absolute limits of Cell's performance. For all we know a single SPE can process 48 MP2 streams and they could have had a couple of hundred little images on screen. It's like seeing a Ferrari beat a Fiat 500 in a drag race with the Ferrari travelling no faster than 60 miles an hour. It still beats the Fiat, but it doesn't mean the Ferrari can ONLY do 60 MPH. We have no idea what the code optimisation was like. We don't know why one SPE wasn't used. It's possible the Cell processor choked and couldn't handle it. It's also possible that for demonstration purposes showing 8x6 thumbnail videos was clearer than showing 24x18 so Toshiba limited how many videos to work with. The key point being we DON'T KNOW and therefore cannot derive any sensible benchmarks from this demo.

    Do you disagree with this?
    Why do you insult them then?

    What people? :? No-one here was saying that. We were just talking about what this demo does/doesn't show. Read through the first few pages of posts and the debate is polite and intelligent, with a jovial few smart-arse remarks. The first antagonist is you, as is the second. It then drops into a smiley ridden slag-fest with you trying to prove you're right over something that is of no concern to anyone. So what is Cell is or isn't a 1 teraflop uber-processor? No-one's lives are at stake! It wouldn't be the first time promises/hype never came true! Why are you so insistant on trying to convince everyone not to have any faith in Cell?

    Honestly, I can't see why you're arguing your point.
     
  10. JF_Aidan_Pryde

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    601
    Likes Received:
    3
    Location:
    New York
    Thanks for the explaination nAo.
    I guess that's why Cell uses a programmer controlled local store so that for something like this, the required pixels for whatever filtering is in already in the local store before the filtering goes underway. :)

    Using the stream method between SPEs, this would be very nice. The decoded stream data gets stored in the local store of the next SPE. The next SPE then just reads its local store as a perfect pixel cache to do filtering.

    ---

    Regarding order of magnitude speedups, it is true in terms of raw theoretical specs. It's a reasonable comparison since both are fabbed at 90nm, have similar die sizes and have roughly 200+ million transistors.

    Pentium 4 Dual Core 3.5GHz (250 Million transistors)
    Single core FP performance: 3.5 x 4 (SSE) = 14 GFLOPS
    Dual core = 28 GFLOPS

    Cell (234 Million transistors)
    Cell @ 3.5GHz = 3.5 x 8 (FMADD) x 8 SPEs = 224GFLOPS

    That's to say, Cell has roughly 10 times the floating point capability of a dual core Pentium 4 at the same clock speed.

    ---
    Notes: * Pentium 4 calulation is for the SSE unit only (excludes regular FPU).
    * Cell calculation is for SPEs only (excludes PPE).
    * Assuming media applications, ie. regular single precision FP instructions
     
  11. Tacitblue

    Newcomer

    Joined:
    Apr 23, 2005
    Messages:
    131
    Likes Received:
    1
    For the floating point talk, keep in mind that a: it doesn't strictly follow IEEE SP precision but b: it does do 26 GFLOPS DP. The Blue Gene/L chip does between 5 and 6. Granted its clocked much lower. Take that what you will, anyways, its been an entertaining thread but unlike some *cough cough* who 's raison d'etre is turning every interesting conversation into a pissing contest I've got a life, off to the Canadian superbike opener in Shannonville. Hope things settle down in here and I'll return to the sandbox later. Play nice kiddies.
     
  12. pahcman

    Regular

    Joined:
    Jul 1, 2004
    Messages:
    252
    Likes Received:
    0
    Yes pce do seem overboard with his replies, but imho this thread took a danger when someone decide to trumpet this cell demo as wiping pc chips, with wmvhd instead of mpg2 examples to boot!

    from there on, replies very misunderstanding spiral. one side say impressive demo but pentiums can do fine, aaaaa0 even set his own tests. other side took it as implying cell not better than p4....all start imho because biased perceptions toward members.

    back n forth, i dont see replies joining final cell ps3 with this demo? in fact i see people pressing pce to give a number on cell p4.

    as known this demo is no indication of anything just show 48mp2 can run good. can we leave this as nice demo from toshiba?
     
  13. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,992
    Likes Received:
    137
    I get different figures,

    A dual core HT Pentium, 4-threads,

    1 core ~ 1 (FPU) + 4(SSE) ~ 5 Flops per cycle

    Dual core HT pentium ~ 5+5 ~10 Flops per cycle ~ 35 Gflops @ 3.5 GHz



    Cell with 1 PPE, 8 SPEs, (with FMADD), 10-threads,

    PPE ~ 2 (FPU) + 8 (VMX) ~ 10 Flops per cycle

    8 SPE ~ 8*8 (SPU) ~ 64 Flops per cycle

    CELL ~ 10+64 ~ 74 Flops per cycle ~ 259 GFlops @ 3.5 GHz

    Unless I missed something?
     
  14. PC-Engine

    Banned

    Joined:
    Feb 7, 2002
    Messages:
    6,799
    Likes Received:
    12
    Uh wasn't the demo shown in a videoclip played in Windows? :lol:

    Take my advice and reread the thread and understand it otherwise you're just wasting my time dude.


    It's obvious you have a problem reading from page 1? It started on page 2 and it wasn't me just in case you're still asleep.

    And what does that have ANYTHING to do with the example I gave?? If aaaaa00 gave the example instead of me would it change the point??? And where did you get the idea that this was to convince everyone to not have faith in CELL??? You mean if some anonymous poster came in and posted this instead of me, it would also mean that person has an agenda??? Lay down the pipe man and stop wasting my time. BTW for the last time, I don't think it's me who needs to reread the thread. :wink:

    I'll put it this way for the technically challenged. On paper CELL is roughly an order magnitude faster than the P4 in my example if strictly talking about GFLOPS. This demo shows 48 DVD streams + downsampling. You telling me CELL could actually do about 400 DVD streams without downsampling???? It doesn't matter if CELL could do more than 48 understand??? Anything below a certain number will make it less than an order of magnitude get it???? Have a nice day. :lol:
     
  15. rendezvous

    Regular

    Joined:
    Jun 12, 2002
    Messages:
    347
    Likes Received:
    12
    Location:
    Lund, Sweden
    The notes!
     
  16. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,992
    Likes Received:
    137
    Haha!...I never read the small print! :)

    Well, it makes sense now...I guess I've seen these figures so many times that it just looked odd to me when I always personally include those 'notes' for consistency... ;)
     
  17. PC-Engine

    Banned

    Joined:
    Feb 7, 2002
    Messages:
    6,799
    Likes Received:
    12
    Jaws do you know what the double precision GFLOPS is for that same P4?
     
  18. DuckThor Evil

    Legend

    Joined:
    Jul 9, 2004
    Messages:
    5,996
    Likes Received:
    1,062
    Location:
    Finland
    Slightly off topic, but when making comparison 3.5ghz dual core HT P4 vs Cell, I think we should remember that, only the Extreme Edition supports HT, and it's max clockspeed at the moment is only 3.2ghz, and it might take while before they can crank it up, also the model mentioned above will costs billions, whereas Cell will be on ~300-400$ console. Keeping that in mind Intel doesn't look so good.
     
  19. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,992
    Likes Received:
    137
    Not sure, I know PPC G5 > Pentium for DP flops per clock,

    1 core G5, using FMADD ~ 2 (*2 FPU) ~ 4 Flops per cycle

    Dual core G5 ~ 4+4~ 8 flops per cycle ~ 28 GFlops @ 3.5 GHz

    Because the Pentium can't do FMADDs,

    Dual core HT Pentium ~ 1(*2 FPU) ~ 2 Flops per cycle ~ 7 GFlops @ 3.5 GHz

    Note, SSE and VMX units can't do DP AFAIK...
     
  20. PC-Engine

    Banned

    Joined:
    Feb 7, 2002
    Messages:
    6,799
    Likes Received:
    12
    Oh man I don't know if I should laugh or cry. Yes it costs Intel $1000 each to manufacture Itaniums, Xeons, and P4EEs. Thanks for that post Dr. Evil. Better luck next time. :lol:

    BTW the comparison has nothing to do with Intel vs another company, but of course you need to turn it into X company is better than Y company. :roll:

    But I can play that game too. Intel makes HUGE profits on their highend cpus. SONY will be taking losses on the hardware. :wink:

    Thanks Jaws. :)

    I guess that means CELL at 3.5GHz is orders of magnitude faster too when DP is used. :lol:
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...