HotChips 17 - More info upcoming on CELL & PS3

Status
Not open for further replies.
PC-Engine said:
You think so? Isn't the PPE in XCPU similar to a Power5? Isn't the Power5 designed for servers?

There are many types of server work loads. I suggest you read about Suns plans for Niagrara and Rock and how Sun envisions the future. Niagrara is targeted at front end servers that do light weight transaction processing.

Xcpu uses ideas from power, similar if you use the word loosely I guess. Power 5 is a pretty different beast tho from the Xcpu.
 
Last edited by a moderator:
PC-Engine said:
The more cores on a die, the higher the chances of a core being bad. 8 is two times 4. If you tak out the SPEs and replaced them with 3 PPEs you would get better yields and would not need a 5th PPE for redundancy.

ahhhhh PC-Engine. Please stop talking! This is painful to read as you have no idea what you're talking about.
 
Last edited by a moderator:
  • Like
Reactions: Ty
PC-Engine said:
Sure, but if you have more cores, 2 or MORE defects are not likely to hit the SAME core. With more cores you need more redundancy because 2 or MORE defects are likely to destroy more cores than a design with less cores. With fewer cores you don't NEED redundancy, you simply throw away the chip.

So in summary my point still stands bigger die area AND more cores means you will need more redundancy. ;-)

you were talking about redundancy? why waste words trying to prove that? its like saying "i breathe oxygen"
We all know that....! and we all know redundancy is a good thing, it means the chip can be saved. A chip without chances for redundancy is the scenario where i would be worried about.
So why talk about it? lol..... me confused, ppl trying to downplay redundancy..

So you are supposing that there are usualy 2 or more defects on a die space, well, either way, chances are its gonna hit another small core. Chip lives! :)
 
  • Like
Reactions: Ty
seriously, if i have not seen what those 2 vectors (from ps2) can do, i would have nothing to complain when reading that.

point of the story is, the past as shown otherwise.

the real question is: how many spe's will work? -because the question if game code is gonna run ok in Cell architecture as already been answered in Ps2. But everybody seems to think they are worlds appart (EE and CELL), and they are, but not in the subject of "architecture + game code" and keep in mind those 2 vectors are a pain in the ass to code. Still, games run in them... pretty well.
Those who coded PS2 must be in heaven when looking at Cell.... no more Assembley and a full-GPU to play with!
 
Last edited by a moderator:
dskneo said:
We all know that....! and we all know redundancy is a good thing, it means the chip can be saved. A chip without chances for redundancy is the scenario where i would be worried about.

This is exactly I understand it. Redundancy is a good thing and it should help CELL yields since 1 defective core means the chip can still be validated, yes? Whereas 1 defective core in the Xbox360 CPU means it needs to be tossed away.

Then add another defect to the previous example. Yes, it's quite possible that the CELL chip would have two defective cores and thus need to be thrown away. But with *any* defective core, you need to toss away the Xbox360, right?

So how is CELL at a disadvantage here? At worst it will be just as bad as the Xbox360 but potentially quite a bit better in yields (when considering core redundancy anyhow).
 
I don't expect the Cell will be tossed even with 2 defective SPEs or 4 for that matter. If Sony start putting this chip in their CE lines, then a HDTV/BRD/Digital Video receiver with a 1x6 or 1x4 Cell would be more than enough to do what was needed. ie encode/decode digital video/audio streams.

This only leaves Cells that have defective PPEs really fubared.

If this becomes the case then the cost of the chips goes down as effective yields go up.
 
Am I going nuts here or are the 256K local to each SPE to be dismissed entirely in this discussion? ...if so...why? (just because their LS...but there's more to it than that...)

SPE(s): 7 x 256K = 1792K

PPE: L2 cache 512K

-----------------

Further questions:

Why would the X360 CPU have an easier time of it with cache thrashing assuming I'm not crazy in that the X360 CPU has less available on chip mem to work with?

With gaming...larger caches haven't really made all that big a difference with respect to performance in the PC realm. Have I missed something? (please show me...benches are what I've referred to)

Am I wrong in thinking that game data is bit off to the caching mechanism...data is used so randomly so that a cache rarely fills up with data that will be used over and over again but rather only a fraction of it hits this mark while the rest flows through like a sieve. Am I just lost?

So if I'm not lost...would this not suggest you don't need especially large caches for good gaming performance?
 
Last edited by a moderator:
scificube said:
With gaming...larger caches haven't really made all that big a difference with respect to performance in the PC realm. Have I missed something? (please show me...benches are what I've referred to)

While I'm not going to show it to you personally, look for Athlon 64 benchmarks. There can be two processors at the same rating but with different actual cache size and clock speed (less cache, faster speed. More cache, slower speed). The useful size of a cache is dependent entirely on how well the cpu can use it and how well things are programmed to fit into that cache.
 
I understand.

It's not like I don't see 'some' better performance when I looked at the benches...but all in all I could not see significant gains.

It's the main reason I didn't go for an A64 with a bigger cache...I didn't deem it worth the money. I could have been wrong, but I did give it some good thought.
 
Why waste time and money on redundancy when self-healing chips (that's one of the features of the POWER family isn't it?) can sweep mistakes under the carpet without us ever knowing about it? ;)

Computer, Heal Thyself!

Microchips are like potato chips. More of them come out of the oven broken than whole. And of the chips -- micro, not potato -- that make it to market, many have built-in weaknesses that eventually cause them to fail. Most people don't care. The useful lifespan of an electronic device is only about three years, and it's hard to consume just one. By the time your cell phone's processor melts down, you've already bought a newer model.

But if you're planning to send a computer on, say, a 10-year mission into deep space, then you will need more staying power. The best option used to be to send lots of spare processors and cross your fingers. As your probe flew silently through the night, you would dream about chips that could fix themselves.

It's not crazy. A type of processor called a field programmable gate array really can recover on the fly. Invented in 1984, FPGAs don't have hardwired patterns of circuits. Instead, their wiring runs through programmable intersections called logic blocks. They're slower than ordinary chips, and until recently their high cost limited their application to rapid prototyping of chip layouts. But advances in fabrication are finally lowering the price.

"There's little need for fault-tolerant chips in the market," says Jason Lohn, a computer scientist at the NASA Ames Research Center. "But for space applications, we need much longer lifetimes." His team is working on systems with two processors that are proprietary variations of FPGAs. If a fault occurs in one, the backup chip takes over, generating a new configuration using an evolutionary algorithm -- it tries different approaches until a layout emerges that gets the job done. Researchers at NASA's Jet Propulsion Laboratory exposed their self-healing chip to 250 kilorads of radiation, enough to kill a person (or give them super powers). After getting fried, the system started fixing itself, attempting up to 100 configurations per second until it found one that worked.

Ultimately, though, engineers hope that chips will do more than recover from a blast of cosmic radiation. "We want systems that can grow, self-repair, adapt, cope with environmental changes, and give us fault tolerance," says Andy Tyrell, electronics department chair at the UK's University of York. Tyrell is working on what he calls immunotronics, a digital immune system, complete with antibodies. He has designed an electronic circuit that can distinguish between self and other, just like a human being does -- though the machine uses strings of data instead of proteins. The system looks for "diseased" information (data with unexpected characteristics) and, if it finds some, reconfigures itself.

Microprocessors may not come into the world with finesse, but they're learning to grow old gracefully.

Source: WIRED Magazine, 09 | 2005
 
PC-Engine said:
Sure, but if you have more cores, 2 or MORE defects are not likely to hit the SAME core. With more cores you need more redundancy because 2 or MORE defects are likely to destroy more cores than a design with less cores. With fewer cores you don't NEED redundancy, you simply throw away the chip.

No, each defect is nearly independent of the other defects. A defect is as likely to be right next to another defect as it is to be across on the other side of the chip. A larger core means higher suseptability (of that core) to defects, regardless of the number of cores.

Since defect count is an essentially memoryless random event, defect density is basically low-lambda Poisson, so redundancy really helps yields. It's not the number of cores that matter, it's the chip size. Two chips of equal size built in the same process have equal chances of getting defects. More cores means that chip area disabled by that defect decreases. The Titanic analogy is (in the first approximation) a good one (props to whoever came up with that one).
 
standing ovation said:
Why waste time and money on redundancy when self-healing chips (that's one of the features of the POWER family isn't it?)
No, they are quite ordinary chips that die if they're hit by radiation strong enough.

FPGAs, such as your quoted article talks about, are terribly slow by today's standards. They're clocked in the upper kilohertz range (ie, not even 1MHz), making them essentally useless in all but fringe uses here on earth. One being prototyping of production chips as mentioned in the article, another being various kinds of research, including self-evolving or self-healing chips.

Still, it's an interesting idea, but if one was conspiratorial, one would consider the urban myth about the car engine that ran on water, which means this tech likely won't see the light of day in practical applications because then chip vendors couldn't push new hardware on us every couple years. ;)

In reality though, physical limits such as clock speed or reliability may well be a much greater hindrance than conspiracies however. Can you trust a chip that reconfigures itself? How do you know there aren't hidden hardware bugs in the new layout? Any CPU design meant for production and sale to consumers are rigorously tested for tens of thousands of hours. No way of doing that if the chip redesigns itself 100s of times per second. :)
 
Guden Oden said:
FPGAs, such as your quoted article talks about, are terribly slow by today's standards. They're clocked in the upper kilohertz range (ie, not even 1MHz), making them essentally useless in all but fringe uses here on earth.

No, not true...I've personally made FPGA-based logic clocked at over 100MHz, and the new generations can go much higher. It really depends on how you design it. FPGAs are used everywhere, which is why they exist. Not cost-competitive with ASICs for mass-production applications, but for everything else, they are quite useful.
 
Increase in execution resources ALWAYS sees benefits from an increase in cache relative to some base point. This is not something that applies specifically to PC CPUs or server CPUs... This is just a fundamental effect of the fact that memory will never run as fast as CPUs as long as the Earth revolves around the sun. 1 MB of L2 cache to a single XCPU core will run X fast, adding a second core without increasing the L2 WILL slow down per core performance because you're increasing the probability of cache misses by having two cores contend for the same cache lines. Adding a third core WILL slow down further, though the impact will be smaller than the second. Adding a 4th, 5th, 6th, all the same story.

And you're another person who's arguing a point that nobody was talking about. Of course more cache is better for performance. What I said which you still haven't answered, is how do you know adding a 4th core will DECREASE OVERALL performance compared to a 3 core while keeping the cache size the same? Answer: You don't. Do you even know why they chose 1MB instead of 800KB or 1.2MB? I didn't think so. BTW nobody is arguing about PER CORE performance. I'm talking about aggregate/total/over performance of all cores.

If you were capable of reading, the point I made was not about how much cache PC cpus (since when is Itanium a PC cpu?) need, but how the quantity of cache relates to the throughput that a CPU is capable of. Did you ever wonder why the POWER MCMs (given that you say POWER is so similar) have 144 MB of cache for their measly 16 hardware threads?

Simple answer: Because more cache is better just like more memory is better. It's not rocket science man. You keep saying the sky is blue...uh yeah we know that. My question to you is, what is the MINIUM USEABLE cache size for a 3 core XCPU? Can you answer that? I didn't think so. If you cannot answer that then don't pretend you know a 4 core with 1MB L2 is going to perform worse OVERALL than a 3 core XCPU. You act like 1MB is the bare minium for a 3 core XCPU or something without knowing anything about its OVERALL performance. You saying a 4 core XCPU will have the same FP performance as a 3 core version because the 1MB L2 is the limiting factor? You know this for a fact?

cost of a cache miss is a lot more cycles when the clock is that much higher

Sure but how do you explain the increase in performance when overclocking a chip? Did the cache magically need to get larger to keep up with the higher clock? I think not.

So you are supposing that there are usualy 2 or more defects on a die space, well, either way, chances are its gonna hit another small core. Chip lives!

Uh not quite. The chance of a defect hitting the main PPE is the same as a defect hitting another SPE. When you have two dead SPEs you have a dead chip. When you have a dead PPE you have a dead chip. If STI has uses for CELLs with two dead SPEs then yes it would still be alive, but I'm talking about PS3 not a HDTV with a crippled 4 SPE CELL.

So how is CELL at a disadvantage here?

Uh maybe because it has a bigger die area?

a688 said:
While I'm not going to show it to you personally, look for Athlon 64 benchmarks. There can be two processors at the same rating but with different actual cache size and clock speed (less cache, faster speed. More cache, slower speed). The useful size of a cache is dependent entirely on how well the cpu can use it and how well things are programmed to fit into that cache.

Yes, exactly. Some people are too hung up on cache sizes and think large caches are REQUIRED which is a fallousy. It's a luxury not a requirement.

No, each defect is nearly independent of the other defects. A defect is as likely to be right next to another defect as it is to be across on the other side of the chip.

I'm aware of that.

A larger core means higher suseptability (of that core) to defects, regardless of the number of cores.

This is a contradictory statement because at any given die size more cores means less space per core which means the chances of more cores being bad is higher. In other words the number of cores is directly related to the size of the individual cores for any given die size.

Take a piece of paper and divide it into 4 quadrants. It doesn't matter if you have one defect or 100 defects since all it takes is 1 defect to disable a full core and ultimately the whole CPU.

Now take a piece of paper and divide it into 8 octants. Now it does matter how many defects you have. 1 defect will disable a PPE OR a SPE. 2 defects could disable 1 PPE OR 1 SPE OR 1 PPE + 1 SPE OR 2 SPEs. Now considering a bigger die has a higher chance of defects you can kinda see how the two compares.

Since defect count is an essentially memoryless random event, defect density is basically low-lambda Poisson, so redundancy really helps yields. It's not the number of cores that matter, it's the chip size. Two chips of equal size built in the same process have equal chances of getting defects. More cores means that chip area disabled by that defect decreases. The Titanic analogy is (in the first approximation) a good one (props to whoever came up with that one).

The problem is that the chip in this case (CELL) with more cores has a bigger die area hence a higher chance of defects.

seismologist said:
ahhhhh PC-Engine. Please stop talking! This is painful to read as you have no idea what you're talking about.

What a worthless post.
 
Last edited by a moderator:
PC-Engine said:
The more cores on a die, the higher the chances of a core being bad. 8 is two times 4. If you tak out the SPEs and replaced them with 3 PPEs you would get better yields and would not need a 5th PPE for redundancy.

I should go to my managers with this new found revelation. Fewer logic blocks within the same die space = higher yields. We'll save millions in manufacturing costs. This is is fantastic! Thanks PC-Engine. YOU ARE RIGHT. *cough* Now can we please get back to the topic at hand??
 
seismologist said:
I should go to my managers with this new found revelation. Fewer logic blocks within the same die space = higher yields. We'll save millions in manufacturing costs. This is is fantastic! Thanks PC-Engine. YOU ARE RIGHT. *cough* Now can we please get back to the topic at hand??

Nice selective quoting without any context whatsoever. Try responding to the rest of my posts if you can.

What you should ask your manager is how removing 8 SPEs and replacing them with 3 PPEs will magically keep the die size the same. :LOL:
 
Last edited by a moderator:
PC-Engine said:
Nice selective quoting without any context whatsoever. Try responding to the rest of my posts if you can.

It's hard to respond to four pages worth of backpedaling over one wrong statement that you made. I guess only a few more pages until youl have it reworded to the point of vaguely resembling something that makes sense.

What you should ask your manager is how removing 8 SPEs and replacing them with 3 PPEs will magically keep the die size the same. :LOL:

I think if you go back to page that was the point that you were trying to make is it not? That the Xbox CPU could benefit by adding another core without sacrificing yield due to the relative smaller die area over cell. Or something like that?
Speaking of which, what exactly WAS the point you were trying to make?
 
how do you know adding a 4th core will DECREASE OVERALL performance compared to a 3 core while keeping the cache size the same? Answer: You don't. Do you even know why they chose 1MB instead of 800KB or 1.2MB? I didn't think so. BTW nobody is arguing about PER CORE performance. I'm talking about aggregate/total/over performance of all cores.
Ummm... AFAIK, you can't have an 800K or 1.2 MB cache (though 768k and 1.25MB are possible), as they're not integer multiples of powers of two. The sets are direct-mapped by n-bits, and there are a number of lines within a set (which is the arbitrary part). I suppose you could put actual address division hardware into the mapping logic, but caches have enough latency already, and it stupidly affects the density.

You're the only one who's talking about overall performance, and you seem to be of the impression that the chances of an overall performance drop with an increase in cores and no increase in cache is zero. That's fundamentally false, and the only reason you think so is because you're stupidly optimistic. The chances are actually very high -- high meaning something on the order of tens of percentage -- that's enough from an engineering standpoint to see that you're on precarious grounds. FWIS, everything everybody's been saying is about the efficiency loss of having 3 cores and 1 MB cache being bad enough as it is, and there's no sense in making it worse. The point was not that it WILL decrease performance, but that it very well COULD. You're the only one twisting words since you have no other means to argue and never will.

You saying a 4 core XCPU will have the same FP performance as a 3 core version because the 1MB L2 is the limiting factor? You know this for a fact?
Obviously, as you've said, I can't say anything about a processor that doesn't exist, but as for the 1 MB L2 being a limiting factor... Yes, I know that for a fact. And not just me, but any living human being with half a brain cell who's so much as laid eyes on the hardware or even the hardware docs. Or even so much as laid eyes on MS's own implementation tips from the support newsgroups, for crying out loud. Everything suggests that you should write code as if cache miss probability is 100% and branch mispredict probability is 100% (obviously they're not, but that kind of paranoia is rife throughout everything). A simple way to show it would be to run a game on 360 in debug mode -- in debug mode, every computation gets copied off to the stack (for debugging purposes, obviously)... speed? HA... care to see 99 Nights at 0.99 fps?

Sure but how do you explain the increase in performance when overclocking a chip? Did the cache magically need to get larger to keep up with the higher clock? I think not.
You're right... you don't think. Name one case where overclocking bought you a perfectly linear gain in performance that wasn't a completely trivial performance test. Name one case where you can overclock and not get diminishing returns. Name one case where you could overclock and never see a point where performance gain was negligible (assuming you can actually overclock significantly in the first place).
 
Ummm... AFAIK, you can't have an 800K or 1.2 MB cache (though 768k and 1.25MB are possible), as they're not integer multiples of powers of two. The sets are direct-mapped by n-bits, and there are a number of lines within a set (which is the arbitrary part). I suppose you could put actual address division hardware into the mapping logic, but caches have enough latency already, and it stupidly affects the density.

Then pick any number you want doesn't change my point does it? WHY didn't they choose 768K or 1.25MB??? Get back to me when you have a real answer instead of dodging with stupid details..that is if you have an answer.

You're the only one who's talking about overall performance, and you seem to be of the impression that the chances of an overall performance drop with an increase in cores and no increase in cache is zero.

Typical "Uh you're the only one blah blah blah" retort. Answer the damn question. Will overall performance increase??? YES or NO??
That's fundamentally false, and the only reason you think so is because you're stupidly optimistic.

You call it stupidly optimistic but you can't answer a simple question. Doesn't that make you stupidly pessimistic? :LOL: I ask again.

Will aggregate/total/overall performance increase??? YES or NO??

The chances are actually very high -- high meaning something on the order of tens of percentage -- that's enough from an engineering standpoint to see that you're on precarious grounds. FWIS, everything everybody's been saying is about the efficiency loss of having 3 cores and 1 MB cache being bad enough as it is, and there's no sense in making it worse.

Uh people are only talking about it simply because they've gotten used to the dev kit which has more cache for each G5 core. It's not the end of the world man. Besides, if 1MB isn't enough for YOU to do your work then don't use the 4th core in a 4 core XCPU. I'm sure other programmers could manage if they wanted to use the extra processing power a 4th core provides. ;)

The point was not that it WILL decrease performance, but that it very well COULD.
Sure it could just like I could win the lottery. Aggregate/total/overall performance is more likely to INCREASE by adding a 4th PPE.

You're the only one twisting words since you have no other means to argue and never will.

Answer the damn question....if you can.

You're right... you don't think. Name one case where overclocking bought you a perfectly linear gain in performance that wasn't a completely trivial performance test. Name one case where you can overclock and not get diminishing returns. Name one case where you could overclock and never see a point where performance gain was negligible (assuming you can actually overclock significantly in the first place).

First of all NOBODY said the increase in performance will be linear so stop making up strawman arguments to make yourself sound right. Second I can overclock a 3.8GHz P4 to 4.3GHz with stock cooling and stock voltages. Now you tell me how this cache is magically keeping up without increasing the cache size??? What if I take that realworld example further and use liquid nitrogen to overclock it further? Guess what? The performance will still magically increase further. Where's your argument now?

Overclocking increases performance without the need for increasing the cache size, end of discussion.

I think if you go back to page that was the point that you were trying to make is it not? That the Xbox CPU could benefit by adding another core without sacrificing yield due to the relative smaller die area over cell. Or something like that?
Speaking of which, what exactly WAS the point you were trying to make?

Don't argue a point when you don't even know what the point was/is.
 
Last edited by a moderator:
I asked: So how is CELL at a disadvantage here?

PC-Engine responded: Uh maybe because it has a bigger die area?

Well 1> I mentioned, "when considering core redundancy" as in, how would the number of cores negatively impact the defect rate?

2> Everyone else is saying that defects are related to die size, NOT # of cores. But that doesn't seem to be what you were saying earlier.

PC-Engine said:
When you have a huge chip that takes up over 220 sq mm of space that has 8 small cores, then it's obvious you might have problems with one of those cores. With 4 big cores that take up less than 200 sq mm, there's no need for redundant cores.

Here we can clearly see that you are linking die size (oddly enough calling something 10% larger "huge" in comparison to the smaller one) AND # of cores with the possibility of having a defect in one of those cores.

*My* understanding is that this does not seem to be the case as the others have been saying for the last couple of pages.
 
Status
Not open for further replies.
Back
Top