Does DD2 signify a problem with Cell?

Back to the original question. Seeing that DD2 and DD3 (and now DD3.1) follow so closely I would think that they were done to address a design issue rather than as a natural progression. To me it is reminiscent of the nV's 128 bit bus decision on the nV30.
 
nelg said:
Back to the original question. Seeing that DD2 and DD3 (and now DD3.1) follow so closely I would think that they were done to address a design issue rather than as a natural progression. To me it is reminiscent of the nV's 128 bit bus decision on the nV30.

If I had to guess, I'd say that DD2 and possibly DD3 were designed before DD1 was back from the fab. Using the same technique they mentioned in the XB360 core design, i.e. keep testing you're model after you've submitted the mask and keep fixing the issues you find, submit a second or 3rd spin while waiting still for the first.
 
ADEX said:
That's why I put a question first!

That said IBM do have a new 4 issue core in the works which is designed for high frequencies and has already taped out.

The PPE to my knowledge has always been at its heart a conditionally 2-issue superscalar chip (2 instructions per cycle if there are two threads being run on the PPE, 1 if only one thread). I haven't seen the latest news on the most recent revision, but I haven't heard anything about 4-wide anywhere, and it would require re-engineering a huge chunk of the chip to fit it in.

I'd be interested in the name of the other chip you are talking about. I am aware of the POWER family, which has a wierd 4+1 branch width, which might be what you are talking about. Given the size of those chips, it is understandable that Cell kept away from them.
 
The PPE to my knowledge has always been at its heart a conditionally 2-issue superscalar chip (2 instructions per cycle if there are two threads being run on the PPE, 1 if only one thread).

It can issue 2 (2 from one thread or 1 from each).

I'd be interested in the name of the other chip you are talking about. I am aware of the POWER family, which has a wierd 4+1 branch width, which might be what you are talking about. Given the size of those chips, it is understandable that Cell kept away from them.

POWER6, it's a completely new core but should be smaller and a lot cooler than the previous versions. I'm wondering if they'll use one in a future Cell. Would be rather nice methinks.

http://www.realworldtech.com/page.cfm?ArticleID=RWT121905001634
 
ERP said:
If I had to guess, I'd say that DD2 and possibly DD3 were designed before DD1 was back from the fab. Using the same technique they mentioned in the XB360 core design, i.e. keep testing you're model after you've submitted the mask and keep fixing the issues you find, submit a second or 3rd spin while waiting still for the first.

Thanks! I could see the logic that different versions were planned at the outset with the least ambitious, obviously, being made first. Especially so, being that this is a new design with no evolutionary path to fall on. What struck me though was the the extent of the revision. A near doubling of the PPE size seems surprising.
 
ADEX said:
It can issue 2 (2 from one thread or 1 from each).

MPRonline article

Top of page 4:
"When one thread cannot issue a new instruction or is not active, the other active thread will be allowed to issue an instruction every cycle."

This may be a misstatement or I misread it, but it sounds like issue width drops to 1 if one thread stalls.

POWER6, it's a completely new core but should be smaller and a lot cooler than the previous versions. I'm wondering if they'll use one in a future Cell. Would be rather nice methinks.

http://www.realworldtech.com/page.cfm?ArticleID=RWT121905001634

Thanks, I hadn't realized it had taped out already.
 
3dilettante said:
MPRonline article

Top of page 4:
"When one thread cannot issue a new instruction or is not active, the other active thread will be allowed to issue an instruction every cycle."

This may be a misstatement or I misread it, but it sounds like issue width drops to 1 if one thread stalls.



Thanks, I hadn't realized it had taped out already.

Well, each thread fetches 2 instructions every other cycle and they pass through decoding phases ending in a top level Issue Queue that feeds the FXU, the LSU, the BPU and a second level Issue Queue used for VMX and FP (computations and LOAD/STORE) instructions.

The neat thing about having a decoupled Issue Queue for VMX/FP instructions is that if you imagine a thread being very FP heavvy and one thread being Integer heavvy and the two not stalling each other or themselves much then you could raise (think about some stalls that allowed the Issue Queues to be both near full level) the effective Issue width to 4 instructions per cycle: two per each Issue Queue (of course the fact that the VMX/FP Queue is fed from the top level Issue Queue means that you cannot keep it always full whil e also issuing to the other execution units at peak rate).
 
I think the consensus is that designers only use sustainable issue width when discussing how wide a processor is.

If we go by internal peak instruction issue, then the Pentium 4 is a six instruction wide processor, even though it can't actually sustain that more than a few cycles, and can only reach that peak if there was a previous stall that allowed the instructions to build up in the buffer.

Though I wish various companies would agree on a common use of words. Every company seems to use them differently, so it gets harder to tell what they are talking about.
 
Last edited by a moderator:
3dilettante said:
Though I wish various companies would agree on a common use of words. Every company seems to use them differently, so it gets harder to tell what they are talking about.


I guess that kind of depends on the engineers. :p
 
3dilettante said:
"When one thread cannot issue a new instruction or is not active, the other active thread will be allowed to issue an instruction every cycle."

This may be a misstatement or I misread it, but it sounds like issue width drops to 1 if one thread stalls.
How can it "drop" to anything if one thread only ever issues one instruction per cycle? There's no point in stating that a thread that ordinarily issues one instruction/cycle is allowed to issue an instruction if another thread stalls; it's been issuing one instruction/cycle all along!

So there's be no change at all other than one thread stalling and the other doesn't. Your quote looks like a poorly worded sentence to me, and what it probably means is the thread that doesn't stall is allowed to issue an extra instruction in the other thread's stead. That's just my guess tho. :D
 
Guden Oden said:
How can it "drop" to anything if one thread only ever issues one instruction per cycle? There's no point in stating that a thread that ordinarily issues one instruction/cycle is allowed to issue an instruction if another thread stalls; it's been issuing one instruction/cycle all along!

There would be if the quote was trying to explain how the PPE's multhreading worked. Also, such a stall would result in the 2-wide chip functioning as a scalar processor.

So there's be no change at all other than one thread stalling and the other doesn't. Your quote looks like a poorly worded sentence to me, and what it probably means is the thread that doesn't stall is allowed to issue an extra instruction in the other thread's stead. That's just my guess tho. :D

I could see some good reasons for the PPE to have this issue restriction. For one thing, instructions in separate threads don't need dependency checks, which would simplify the front end even more. The back end is pretty narrow, so there's not a huge amount of pressure to avoid a few tiny hiccups in utilization.

On the balance, I'd say such a core is 2-wide, just with the caveat that it be given adequately threaded code.
 
Back
Top