Be thankful for Cell ... *mitosis*

Status
Not open for further replies.
What do they teach now? Python?

Apart from learning to design your own simple assembly language, I though it was typically whatever language made sense five years ago. In my day that was Modula 2 when it was clear to all of us it should have been Java. Imagine it is still Java? We also did Prolog btw.
 
Apart from learning to design your own simple assembly language

PDP11 is already designed "simple assembly" language. That was the whole purpose of it in CS courses.
I.e. I'm referring to "computer design" related courses and not to the "programming" ones where obviously PDP11 has no place.
I think that for a good CS degree the student should learn at least one assembly, one functional language, one strongly typed one and one loosely typed one.
 
I learned 68000 and x386 assembly and C++ and C and Pascal and all the various scheme and symbolic languages like Lisp, Prolog, and also a bit of Ada, Maple, Matlab in the early 1990s.
 
That reminds me of my last Atari, the failing Falcon, with a 68030 and its 56001 dsp coprocessor. I tried to learn how to code for that but failed miserably.
 
My first class of assembly was on 8085 and i960 microcontrollers and serial LCD displays, and we had one overview lab with 8086 and 68000. The whole class learned very quickly. It gave a nice overview of everything that is useful. We could just borrow a kit and work at home if we had a computer to connect the board and EEprom programmer. Then we had a class of ORCAD and made our own protoboard.

A friend of mine was in CS and worked way too much on a VAX. They also did a crazy amount of Cobol. It seemed like a useless bunch of courses and a waste of time and money. This wasn't the 70's, it was the early 90's. Some places just suck at teaching or hold on to old crap like PDP11 and VAX. There's no excuse.
 
Do you routinely program 8085 or i960 right now?
I will ignore your misdirection, and your use of the words "routinely" and "right now"...
My point was that we didn't have to put a VAX in our backpack, because a small 5"x6" board was equally if not more helpful. One with really simple cisc as a stepping stone toward 80x86, and the other with a nice orthogonal RISC.

But you'd be surprised how long the i960 instruction set stayed relevant in the 90's.

At my first job, we started a project with a 80296SA, very similar to the i960 instruction set. Then costs factors and procurement advantages made us switch to a crappy (but inexpensive) 16bit siemens/infineon C166. Just another uC like most Pic, Atmel, etc... We saw Atmel in class too. All uC are mostly the same.

One of my classmate got a job at [nameless military contractor] and worked extensively on real i960.

I could have worked on a storage company where most RAID cards were on an i960 variant.
 
Onion bus is a workaround. Around inherent GPU/CPU synchronization problem.
The Onion bus what allows the GPU to be allowed to work in the same memory space as the CPU. It exists because what the GPU does in its own domain is neither acceptable or safe for the rest of the system without constraints.
Should the Onion bus go away, it is because the GPU's comparatively primitive memory model learns how to operate up to the requirements that CPUs have met for decades.

In fact any modern game needs CPU just for one thing: reading the inputs. Everything else could and should be done on GPU.
Which GCN instruction initializes the microcode engines for OS control?
Which shader program constructs and maintains the page tables?
Can you give the general properties of a shader that saves your game?

The question here is not "which hardware has multithreading", but "which hardware forces you to utilize it".
I am not sure which architecture in this set of architectures made it impossible to write a highly serial application.
As noted, some opted to run on the PPE early on and Cell didn't line up the devs and shoot them.

But you can do that, and it will be done, in future titles.
The level of control is not as pervasive as assumed. The GPU's execution of its runlists is not under direct programmer control, and not the kernel either.
Compute has context switching and preemption available for the current GCN generation.
Carizzo will be the first example of the next step, where the graphics subsystem is also fully preemptable.

The SPE model did not support preemption well, and that is something GCN's compute had to improve. The "kernel runs uninterrupted to completion" methodology was not acceptable for the direction APUs are going in. That preemption and QoS exist in a platform means individual contexts can request, but not demand, that the system runs their workload as they desire.

On the low level the fixed function pipeline can only do that. Everything else is done by a GPGPU part. Which in fact is a "modern CPU" (without all the x86 baggage).
There is more context to a GPU than vertices of a triangle. How would it even know what to do about them without some kind of context to know how it should be transformed?
There also seems to be some confusion between "baggage" and "not being unacceptably primitive", and the ISA cruft is as relevant as the fuzzy dice hanging from someone's mirror are to the operation of the transmission. What does this modern CPU with no branch prediction, interrupt handling, or privileged access do when it tries to read data that is not resident in RAM?


So, in some sense the current PS4 hardware is closer to PS2, than to PS3.
By those words, the PS4 walked back from the direction the PS3, in a sense.

So what? There is no need for millions of low level game programmers, but without them nothing would happen.
I think we do not have the same definition of practicality, to the point that I do not follow this.

So what? PDP11 is also a thing of the past, why universities teach that to CS students?
Is this a question on which machine is more dead? Do more teach how to program Cell than PDP11?
 
In fact any modern game needs CPU just for one thing: reading the inputs. Everything else could and should be done on GPU.
How on earth is that in any way a "fact"? GPUs are not suited for running general, or branchy code, including things like AI, or even some physics workloads.

Just because you can write and post something on a web forum doesn't make it so. :p
 
But you'd be surprised how long the i960 instruction set stayed relevant in the 90's.

I know that. Even first "GPUs" (on SGI) used i960.

GPU's comparatively primitive memory model

And why exactly a more sophisticated memory model is needed in game console?
To switch faster into menu?

Which GCN instruction initializes the microcode engines for OS control?
Which shader program constructs and maintains the page tables?
Can you give the general properties of a shader that saves your game?

All of these are one time setup tasks, which can be considered baggage as well.
Right now you need to feed CP with same commands each frame, which is a bigger problem. But it just means that drivers/APIs are bad, we know that already.

made it impossible

Competitive image quality.

What does this modern CPU with no branch prediction, interrupt handling, or privileged access do when it tries to read data that is not resident in RAM?

Right now, or in the future hardware? In the future it can DMA from specific controller/CPU.
P.S. and continuing the car analogy: GPU is the engine, and CPU is the starter motor, something like that.

GPUs are not suited for running general, or branchy code, including things like AI, or even some physics workloads.

CPUs are equally not suited. But modern CPUs have a lot of tricky hardware around them to make it less painful to run "branchy code".
And the solution for both GPU or CPU is to stop writing branchy code (both will have a better performance and world will be a better place, because no, single-thread performance is not gonna improve anymore, it' a dead end).
 
Last edited:
@AlNets is B3D's gift to humanity. I might change my name again to Alnets's Bitch. Sorry Shifty, dear. Wait, is it Alnet's Bitch or Alnets' Bitch? Or The Bitch Previously Known As LB And Shifty's Bitch And Now Exclusively Belonging To Alnets?

:runaway: :runaway: :runaway: :runaway:

“Dont switch the blame to Cell, that's the oldest trick of all lazy devs”

“The sight of a platinum trophy was the greatest gift Kutaragi could offer to gamers.”

“Only the Gamer who does not need it, is fit to inherit DLC - the publisher who would make its own fortune no matter where it started. If a Gamer is equal to his money, it serves him; if not, it destroys his wallet.”

“To live, Gamers must hold three things as the supreme and ruling values of his life: Graphics - Fun - Sociability. Graphics, as his only tool of knowledge - Fun, as his choice of the happiness which that tool must proceed to achieve - Sociability, as his inviolate certainty that his mind is competent to think and his person is worthy of happiness, which means: is worthy of living."
 
CPUs are equally not suited.
Demonstrably not true! OoOE, branch prediction and just generally a much more robust, powerful caching system amongst other things make CPUs considerably faster at these types of workloads. This is not surprising or strange; they've had decades of practice getting better at it! ;) GPUs, as modern, programmable devices, are still comparatively immature and lacking in both features and capabilities.

...That doesn't mean they'll suddenly gain a lot of CPU capabilities though, as that would run counter to the purpose of a GPU as a lean, mean computing machine. Adding general purpose code execution hardware to GPUs would bog them down and reduce their peak performance capabilities (transistors spent on boosting sequential code capability means less transistors available for parallel workloads.)

And the solution for both GPU or CPU is to stop writing branchy code
Right! Just...handwave the problem away! And how do you propose to do that? Many problems have no known non-sequential/branchy solution. There's Almdahl's Law interfering also, and so on. Lots of really really smart people have thought a lot on these problems and not found solutions (yet), which is why CPUs still are essential for computing.

But you say, just stop writing branchy code. Absolutely! Everybody will now stop, immediately! Why didn't they ever consider this option until you came along? ;)
 
Was the CELL processor something like an APU for its time due to being able to do CPU and GPU tasks well at the same time with developers managing to do async compute on it, right?
 
OoOE, branch prediction and just generally a much more robust, powerful caching system

All of these are a cruft. I don't know why it's so hard to understand: CPU or GPU ALUs suck in all these workloads equally. But CPUs have a lot of machinery that tries to improve on that, by using precious 80-90% of the die size for things that are as foreign to "processing" as it gets. And all these things do not make CPU "run faster", they just make bad code suck less (you will get 50% utilization instead of 5% utilization, for example). And code that doesn't use 100% of CPU processing power is bad, by definition.

And how do you propose to do that?

First of all admit that problem exists, and all code that's not parallel is a bad code and needs to be avoided at all costs.
When every programmer knows that, and it's taught in universities, then paradigm shift will happen and there will be solutions (i.e. more and more parallel algorithms).

Why didn't they ever consider this option until you came along?

I don't know why, but from this thread it's pretty apparent to me that people do not consider it seriously even now.
Like you, above: "we can still extract performance, using all the crappy tricks". :)
 
And why exactly a more sophisticated memory model is needed in game console?
The first answer is that it's what we've already gotten.
There is no tractable development target CPU architecture that has an incoherent memory hierarchy that barely maintains the order of reads from writes in the same thread.
Reasoning through the development, validation, and use of established operating systems and system architectures means getting a sophisticated memory system, whether the game knows it or not.
That's the hardware and software architecture the GPU gets to hook into.

Low-overhead communication and command submission means being able to interact with memory locations in that sophisticated protected portion, rather than have communications going through an intermediary system process that performs copies, patches addresses, and notifies the consumer of the communication. If you want the GPU to be able to handle that memory properly, it has to play by those rules.
Getting things wrong is a fast way to have the system shut everything down.

To switch faster into menu?
It keeps bad code in one of those contexts from stomping on the data of the other, and it secures the OS(s) and hypervisor from erroneous or malicious code.
It maintains protections, simplifies linking and communication between modules and libraries, and keeps the code from having to be rewritten because some instruction sequence in one place bumped things around a little bit in the stream, unlike what a certain local store architecture had a habit of doing.

All of these are one time setup tasks, which can be considered baggage as well.
Right now you need to feed CP with same commands each frame, which is a bigger problem. But it just means that drivers/APIs are bad, we know that already.
We've just gone to the effort of making sure that the GPU only has enough to process one triangle at a time, and nothing else.
And a digest of the system's carry-over state between frames.

Right now, or in the future hardware? In the future it can DMA from specific controller/CPU.
The future hardware is the fault mask used by Carrizo to detect when there's a fault and the GPU asks the grownup hardware to fix the booboo. After that is unclear.

P.S. and continuing the car analogy: GPU is the engine, and CPU is the starter motor, something like that.
If we want to torture the metaphor: In the cars I know of, the starter doesn't kick in whenever the driver taps the clutch, turns the wheel, or it starts raining.
Why not call the CPU the water pump of the GPU gasoline engine? In both cases, the system can operate about as long without.

CPUs are equally not suited. But modern CPUs have a lot of tricky hardware around them to make it less painful to run "branchy code".
So they're equally not suited, just that one does better when it encounters a branch in real life. I'm reading this right, right?

And the solution for both GPU or CPU is to stop writing branchy code
I tried adding that to my process.

IF code.is.branchy then...

...then...
..
. .

.
 
Status
Not open for further replies.
Back
Top