Background:
I have had this question down in Wordpad for a couple weeks waiting for the time to ask it correctly. With the recent remarks from John Carmark I decided to hold off in hopes this would not become a system flame war, so please, lets keep this on topic This is a serious question for the programmers and developers on the forum.
Some noted next-gen console developers, some even on the B3D forums, have noted the difficulties of programming for--and maximizing performance of--next generation console CPUs. The nuts and bolts: Getting good performance, let alone anything near peak performance, is difficult when working in a multi-core environment. Yet the Xbox 360, PS3, and even the PC (Intel Pentium D, AMD X2) are going this route due to process limitations. At this point there really is no other choice. Hate it or Like it (some devs have shown some excitement) it is the present reality.
Xenon, Pentium D, and X2 have all gone the route of 2 or 3 symetric cores (with 4-core CPUs in the near future for PC parts). The PS3, on the other hand, has cone with an asymetric design with 8 total cores and can be classified as a "stream processor" and uses techniques similar to GPUs. Because the CELL microprocessor has the most cores I have selected it as the "poster boy" of this thread. What stands true of most of the answers for CELL should apply roughly to processors with fewer cores.
This topic is not aimed to engage "hyper threading" or the similar techniques where one core can do more than one hardware thread. ERP noted in another thread that his theory is that 1 Core with 2 Hardware Threads you would want them doing similar tasks (i.e. not lumping AI on "core 0 thread 0" and Physics on "core 0 thread 1" on a single processing core). This theory was put forward as a way to avoid dependancy issues. Further, it is important to note that a thread != a core, so you wont get double performance. Hyperthreading on Intel chips seems to get 15% improvement in ideal situations at best (from what I have seen) and frequently no gain because the goal of hyperthreading is to "back fill" idle time of the primary thread. In-order chips like the PPC cores in the XeCPU and CELL may have more significant gains, but we are limited on that info so for this discussion lets keep it simple and just discuss cores (unless a developer working on the machines thinks it is relevant, in which case go ahead!)
Focus:
CELL. Why? It has the most processing cores. More cores means theoretically more work getting your application to (a) synchronize and avoid stalls and (b) more ways in which one is required to divide their code up. If a core goes unused and/or underutilized the peak performance of the processor is significantly reduced. 8 processors running at 50% is no better than 4 processors running at 100%. CELL also has two distinct processing elements, unlike the other processors, so focusing on CELL answers questions applicable to both camps.
In the grand scheme of things we all know that the next gen console CPUs wont hit peak performance in any game, ever. That is a given. Just as relevant though is that for the XeCPU and CELL processors to obtain their maximum potential they must have all their CPUs actively running and operating at peak capacity--that means 3 and 8 processors, respectively, running full bore all the time with no hiccups. That means:
No idle cores with no assigned tasks.
No cores waiting for other cores to produce data.
No cores running a single non-intensive thread that leaves unused potential.
Since the CELL has the most processors it would appear to be the most prone to performance issues related to the other cores (it also just happens to be the chip with the most potential!!), therefore I would like to focus on the CELL microprocessor in my questions for developers:
Question: What is the developer plan of attack for dealing with CELL? Some of my specific questions are:
1. How are developers attacking the problem of data dependancy. If "core 0" is waiting for "core 1" to return data that means "core 0" is being ineffient and therefore the theoretical peak performance of the processor takes a SIGNIFICANT hit. The more processors, the more potential for a mess. So how is this issue being addressed?
2. How do developers plan to divide their code onto 8 processors? It has been noted here the issues with breaking apart code--it is not as simple as having Physics on 1 CPU, AI on another CPU, the game code on another CPU, etc... So what are some current thoughts on how to tackle this issue? Unused processors means unused potential!!!
3. Effeciency. Will it be difficult to find tasks for all 8 processors to actively work on at the same time? e.g. if a processor is dedicated to particle effects and none are present at the time that would mean the core is sitting idle. This could apply to Physics, AI, whatever. If you dedicate processors to tasks that are not being used in a scene, yet another task may be totally saturating the processor. They may even ping pong back and forth. Can this be avoided?
4. How many threads is "enough"? It occurs to me that 8 threads may not be enough for two reasons. One being threads that are "on / off" like the particle effects mentioned above. Another is low-intensity tasks like some sound engines. So is there a need to have more threads?
5. SPEs. It seems to me one solution is to have more than one thread on a core (e.g. two low-intensity threads that better use the power of a core OR two threads that tend to alternate between being idle and running hard and play nice with eachother) Do the SPEs realistically have the memory size to deal with two or three threads? Another option seems to be swapping applications from the main memory. Is this fast enough to avoid dependancy issues? Is the latency low enough for this type of behavior and to avoid stalls (see #1)? (=>Lots of general questions here, but basically questions of Ineffeciency, how do you plan to deal with this?)
6. Does the asymmetric nature of the processors raise any significant hurdles? The SPEs are full features cores, but as some developers here have noted they are not great performers in certain areas. Will you be forced to run code on the SPEs that does not allow them to reach their full potential if you were provided another PPE to run such code on? Does dealing with two distinct processors (and their different cache sizes) cause serious issues with portability of code or is this a minor issue because they both accept compiled code and performance appears to be acceptible on either core?
7. Any other areas or issues of concern you are being faced with OR you have overcome.
Obviously multi-core environment force programmers to look at new ways to deal with problems. Scheduling, dependancy, and effeciency are important parts of the puzzle to bring together. I am interested to hear how this problem in general is being attacked and any feedback on what we can realistically expect in the next couple years.
Ps- For this discussion I want to avoid the following topics:
1. How the XeCPU, Cell, and PC multi-core solutions vary (unless directly related to how developers plan to overcome the issue of a multicore environment... e.g. an issue works for some but not other platforms). No, "Microprocessor X sucks! My favorite CPU kicks its butt!!!111"
2. HDD
3. Killzone
4. Allard
Thanks
I have had this question down in Wordpad for a couple weeks waiting for the time to ask it correctly. With the recent remarks from John Carmark I decided to hold off in hopes this would not become a system flame war, so please, lets keep this on topic This is a serious question for the programmers and developers on the forum.
Some noted next-gen console developers, some even on the B3D forums, have noted the difficulties of programming for--and maximizing performance of--next generation console CPUs. The nuts and bolts: Getting good performance, let alone anything near peak performance, is difficult when working in a multi-core environment. Yet the Xbox 360, PS3, and even the PC (Intel Pentium D, AMD X2) are going this route due to process limitations. At this point there really is no other choice. Hate it or Like it (some devs have shown some excitement) it is the present reality.
Xenon, Pentium D, and X2 have all gone the route of 2 or 3 symetric cores (with 4-core CPUs in the near future for PC parts). The PS3, on the other hand, has cone with an asymetric design with 8 total cores and can be classified as a "stream processor" and uses techniques similar to GPUs. Because the CELL microprocessor has the most cores I have selected it as the "poster boy" of this thread. What stands true of most of the answers for CELL should apply roughly to processors with fewer cores.
This topic is not aimed to engage "hyper threading" or the similar techniques where one core can do more than one hardware thread. ERP noted in another thread that his theory is that 1 Core with 2 Hardware Threads you would want them doing similar tasks (i.e. not lumping AI on "core 0 thread 0" and Physics on "core 0 thread 1" on a single processing core). This theory was put forward as a way to avoid dependancy issues. Further, it is important to note that a thread != a core, so you wont get double performance. Hyperthreading on Intel chips seems to get 15% improvement in ideal situations at best (from what I have seen) and frequently no gain because the goal of hyperthreading is to "back fill" idle time of the primary thread. In-order chips like the PPC cores in the XeCPU and CELL may have more significant gains, but we are limited on that info so for this discussion lets keep it simple and just discuss cores (unless a developer working on the machines thinks it is relevant, in which case go ahead!)
Focus:
CELL. Why? It has the most processing cores. More cores means theoretically more work getting your application to (a) synchronize and avoid stalls and (b) more ways in which one is required to divide their code up. If a core goes unused and/or underutilized the peak performance of the processor is significantly reduced. 8 processors running at 50% is no better than 4 processors running at 100%. CELL also has two distinct processing elements, unlike the other processors, so focusing on CELL answers questions applicable to both camps.
In the grand scheme of things we all know that the next gen console CPUs wont hit peak performance in any game, ever. That is a given. Just as relevant though is that for the XeCPU and CELL processors to obtain their maximum potential they must have all their CPUs actively running and operating at peak capacity--that means 3 and 8 processors, respectively, running full bore all the time with no hiccups. That means:
No idle cores with no assigned tasks.
No cores waiting for other cores to produce data.
No cores running a single non-intensive thread that leaves unused potential.
Since the CELL has the most processors it would appear to be the most prone to performance issues related to the other cores (it also just happens to be the chip with the most potential!!), therefore I would like to focus on the CELL microprocessor in my questions for developers:
Question: What is the developer plan of attack for dealing with CELL? Some of my specific questions are:
1. How are developers attacking the problem of data dependancy. If "core 0" is waiting for "core 1" to return data that means "core 0" is being ineffient and therefore the theoretical peak performance of the processor takes a SIGNIFICANT hit. The more processors, the more potential for a mess. So how is this issue being addressed?
2. How do developers plan to divide their code onto 8 processors? It has been noted here the issues with breaking apart code--it is not as simple as having Physics on 1 CPU, AI on another CPU, the game code on another CPU, etc... So what are some current thoughts on how to tackle this issue? Unused processors means unused potential!!!
3. Effeciency. Will it be difficult to find tasks for all 8 processors to actively work on at the same time? e.g. if a processor is dedicated to particle effects and none are present at the time that would mean the core is sitting idle. This could apply to Physics, AI, whatever. If you dedicate processors to tasks that are not being used in a scene, yet another task may be totally saturating the processor. They may even ping pong back and forth. Can this be avoided?
4. How many threads is "enough"? It occurs to me that 8 threads may not be enough for two reasons. One being threads that are "on / off" like the particle effects mentioned above. Another is low-intensity tasks like some sound engines. So is there a need to have more threads?
5. SPEs. It seems to me one solution is to have more than one thread on a core (e.g. two low-intensity threads that better use the power of a core OR two threads that tend to alternate between being idle and running hard and play nice with eachother) Do the SPEs realistically have the memory size to deal with two or three threads? Another option seems to be swapping applications from the main memory. Is this fast enough to avoid dependancy issues? Is the latency low enough for this type of behavior and to avoid stalls (see #1)? (=>Lots of general questions here, but basically questions of Ineffeciency, how do you plan to deal with this?)
6. Does the asymmetric nature of the processors raise any significant hurdles? The SPEs are full features cores, but as some developers here have noted they are not great performers in certain areas. Will you be forced to run code on the SPEs that does not allow them to reach their full potential if you were provided another PPE to run such code on? Does dealing with two distinct processors (and their different cache sizes) cause serious issues with portability of code or is this a minor issue because they both accept compiled code and performance appears to be acceptible on either core?
7. Any other areas or issues of concern you are being faced with OR you have overcome.
Obviously multi-core environment force programmers to look at new ways to deal with problems. Scheduling, dependancy, and effeciency are important parts of the puzzle to bring together. I am interested to hear how this problem in general is being attacked and any feedback on what we can realistically expect in the next couple years.
Ps- For this discussion I want to avoid the following topics:
1. How the XeCPU, Cell, and PC multi-core solutions vary (unless directly related to how developers plan to overcome the issue of a multicore environment... e.g. an issue works for some but not other platforms). No, "Microprocessor X sucks! My favorite CPU kicks its butt!!!111"
2. HDD
3. Killzone
4. Allard
Thanks