Is it possible for multicore chip to act on a single thread?

ralexand · Jul 7, 2005

With all this hand wringing about multi-threading etc. I don't understand why hardware designers can't design a multicore chip that acts as a single one similar to how virtual memory works where its transparent to the programmer and compiler. Why can't a single parent core marshal other processor cores to due its task in a sort of built in load balancing way? It's strange to hear these new console developers saying they haven't touched the other cores of these new processors. Why can't the hardware handle this? We already have gpus that can marshal pipe processing units automatically without developer intervention.

nAo · Jul 7, 2005

Cause you can't easily extract parallelism from a signle thread code, this is what OOOE processors are doing since they were introduced many years ago and now we're observing how OOOE logic dominates with its complexity modern CPU designs..
GPUs can exploit parallelism much better then CPUs simply cause a modern 3D pipeline is already tailored to be 'easily' parallelized.
Just look at vertex or pixel shaders: when you're shading a pixel or a vertex you can't know anything about other vertices or other pixel, why does it work that way? Try to guess..

ralexand · Jul 7, 2005

nAo said:
Cause you can't easily extract parallelism from a signle thread code, this is what OOOE processors are doing since they were introduced many years ago and now we're observing how OOOE logic dominates with its complexity modern CPU designs..
GPUs can exploit parallelism much better then CPUs simply cause a modern 3D pipeline is already tailored to be 'easily' parallelized.
Just look at vertex or pixel shaders: when you're shading a pixel or a vertex you can't know anything about other vertices or other pixel, why does it work that way? Try to guess..

I kind of understand the difficulties involved in such a solution but I guess I'm saying that with all the brilliant hardware guys out there there should be a solution that would exploit the built in parallel nature of multicore processors to marshal processor resources while still maintaining context without the developer having to do the grunt work.

nAo · Jul 7, 2005

I'm saying that with all the brilliant hardware guys out there there should be a solution...

It's not true any problem has a solution.. if something can't be done..well..it can't be done

BOOMEXPLODE · Jul 7, 2005

The problem is dependancy. If the execution of an instruction is dependant on the answer of another instruction then they have to be done sequentially, they can't be done at the same time. What the original post describes sounds alot like predication, which is one of the concepts behind IA64. The idea is that all possible branches if/then/else or whatever are solved, and once it's known what branch is the correct one the other answers are just thrown out. In this way you can (theoretically) keep the processor completely busy all the time, but of course it's busy doing alot of work that just gets thrown out so it's not a panacea.

Basically however you try to get around it, if a program has data dependancies it's not going to be parallelized very well. This is why for example T&L is a perfect candidate for parallelization: geometry is a huge array of independant data whose parts can mostly be operated upon independantly.

ralexand · Jul 7, 2005

BOOMEXPLODE said:
The problem is dependancy. If the execution of an instruction is dependant on the answer of another instruction then they have to be done sequentially, they can't be done at the same time. What the original post describes sounds alot like predication, which is one of the concepts behind IA64. The idea is that all possible branches if/then/else or whatever are solved, and once it's known what branch is the correct one the other answers are just thrown out. In this way you can (theoretically) keep the processor completely busy all the time, but of course it's busy doing alot of work that just gets thrown out so it's not a panacea.

Basically however you try to get around it, if a program has data dependancies it's not going to be parallelized very well. This is why for example T&L is a perfect candidate for parallelization: geometry is a huge array of independant data whose parts can mostly be operated upon independantly.

Could you have the main processor making the branch prediction decision while your coprocessors are marshaled to due the serialized code? Would some type of latency be a problem with doing that?

aaaaa00 · Jul 7, 2005

ralexand said:
Could you have the main processor making the branch prediction decision while your coprocessors are marshaled to due the serialized code? Would some type of latency be a problem with doing that?

Like nAo said, they already have that today. It's called a having a "branch prediction unit" with "out-of-order execution", a large instruction scheduling window, and lots of functional units on the CPU.

Such a design helps, but even a huge heavily optimized extremely wide and deep OOOe CPU can only extract so much parallelism out of code that was never written to be parallel in the first place.

The only way you could create hardware that could do what you want, is if the hardware could read the mind of the programmer, understand his intentions, and replace his algorithm with a different more easily parallelized version -- and if you could do that, you wouldn't really need people to program computers any more now, would you?

Squeak · Jul 7, 2005

I think ralexands point is that the PU is supposed to do the work of the BPU among other things.

aaaaa00 · Jul 7, 2005

Squeak said:
I think ralexands point is that the PU is supposed to do the work of the BPU among other things.

It simply can't work that way.

The PPU doesn't have the ability to monitor every branch instruction that an SPU executes, or schedule work items on the level of every possible branch in a program. That would defeat the point of having independent SPUs in the first place.

The only practical way to do what ralexand wants is to build CPUs that look like today's PC CPUs.

Shifty Geezer · Jul 7, 2005

I guess there is an alternative and that's to write code differently. If the code were written as one thread with independant processes inside...Loop 1, Loop 2, Loop 3, repeat...this could be converted into three threads for three cores.

However, the idea of transparently converting existing OOO code into multiple threads for multiple processors is an impossibility. The only way really is to write code differently.

Note to non-programmers : The existing models for programming aren't the be all and end all. There were different models before then. Object Oriented coding hasn't always been around, and such code may be good for the software designers, but isn't always what runs best on the hardware. Modern CPUs are designed to support modern coding practises; the two growing up together hand in hand.

Before then, at the beginning, we had Assembly and even direct Hex editing (or binary punchcards!). This required programmers to think differently to how they do now. They had to think like the machine, and program what the machine wants to read, even if the code was very alien to how the human coder thinks. As computers became more powerful, software got bigger and managing large code using the 'old ways' became a nightmare, so new software systems were developed to be more human-friendly, which weren't CPU friendly. That's why CPUs were redesigned to support human-friendly code.

We come to the next evolutionary step. Progress of this style CPU has reached an impasse with little room for speed increasements. CPU's need to turn multicore if they are to provide more processing power, and to get lots of cores which is the key to large-scale multiprocessing's advantage, these cores need to be simple. Once again coders will need to start thinking and writing in more machine friendly ways, even if that's difficult for them at first. Though remember humans can adapt the ways the work, whereas static silicon cannot.

It's also not going to be as bad as the first days of computers. There are advances in software tools and compilers to ease the process, and these will have a lot of work done on them in the coming years as multithreading becomes the mainstream. Also console programmers are way you'll find the most hardware savvy of developers, as to push the system an intrinsic knowledge of the hardware is needed. As the 'birthplace' for in-order multicore, multithreaded development, the proving ground where the first software tools experience is found, the console space will make more of the situation then the PC space.

aaaaa00 · Jul 7, 2005

Shifty Geezer said:
I guess there is an alternative and that's to write code differently. If the code were written as one thread with independant processes inside...Loop 1, Loop 2, Loop 3, repeat...this could be converted into three threads for three cores.

That's exactly what OpenMP is supposed to help you do (but with a little more sophistication).

Which is why it happens to be supported in VS2005.

Guden Oden · Jul 7, 2005

ralexand said:
Could you have the main processor making the branch prediction decision while your coprocessors are marshaled to due the serialized code?

Well, you have to understand, ral, that the reason MS and Sony went the multithreaded way in the first place is because the only method that realistically exists to do what you want to do (namely out-of-order execution) requires so much die space and so many transistors just to extract parallelization, that it becomes the MAJOR PART of the processor die! The actual execution units are less than 50% of the logic portion of the chip - not counting cache, in other words.

So if you want higher performance than today's microprocessors, you need to heap on even more of the already highly complicated buffering/prediction logic that is already a substantial amount of today's microprocessors, and that would become totally unwieldy to handle. Complicated logic means high risk of hardware bugs too, so you need lots and lots of testing, making things even more complicated and more expensive.

Instead, what was done was cutting away all of the prediction logic and inserting more raw execution hardware instead. This requires programmers to think differently and write different code. Naturally, as programmers don't want to have to work hard, they whine and bitch a lot, which is why you've heard the sky is falling a number of times already in regards to the xcpu and cell.

ralexand · Jul 7, 2005

Thanks for the excellent responses guys and thanks for indulging my flights of fancy. I've always been a firm believer of making the programmer's life easier since when that's done then its easier for him to abstract what his ultimate goals are and accomplish those goals. I guess for the immediate future the hardware manufactures are out to get the maximum performance even if it means making the programmers job more difficult. There has always been those who want to code to the metal and I guess we will reach that point in the future where higher level multithreaded code will approach the performance level of coding to the metal as it did in the past.

blakjedi · Jul 7, 2005

Shifty Geezer said:
It's also not going to be as bad as the first days of computers. There are advances in software tools and compilers to ease the process, and these will have a lot of work done on them in the coming years as multithreading becomes the mainstream.

Isnt that what hardware abstraction layers and middleware are supposed to do?

Is it possible for multicore chip to act on a single thread?

ralexand

nAo

Nutella Nutellae

ralexand

nAo

Nutella Nutellae

BOOMEXPLODE

ralexand

aaaaa00

Squeak

aaaaa00

Shifty Geezer

uber-Troll!

aaaaa00

Guden Oden

Senior Member

ralexand

blakjedi

Similar threads