Importance of assembler knowledge for console devs (and its education) *spawn

Crossbar

Veteran
1) Engineers that know how to properly optimize for In-Order cores are very expensive and hard to find.
I really don´t get this. You are talking assembly programmers right? In what way are these assembly programmers so special. Ordering the instructions to avoid dependencies must be the objective of any assembly programmer to make the CPU fire on all (n-issue) cylinders, regardless if the CPU is in-order or OOO. You can´t really depend on the re-ordering pipe-line if you really want to optimise your code in assembly as I see it, or if you do, you better know in detail how it works, the length of it, etc.
Am I missing something?

2) Certain In-Order designs (such as the X360/PS3 PPU) have fundamental flaws that no matter how clever the engineer or time spend optimizing will still stall all over the place.
Yeah, I remember this post of yours. That sounds like a really shitty design and are you telling us that the compilers are still not helping in these situation by at least giving you a warning?

3) Out-of-Order cores tend to deal much better with compiler generated code - especially true for crappy compilers.
This is certainly true. In order execution puts more work on the compiler.

4) Out-of-Order cores tend to run legacy code better (and who doesn't have legacy code these days)
To be fair there should be considerable amount of legacy code optimised for the in-order PPE core in the PS3 and 360 by now, considering they´ve been on the market for 5 years.

So yeah, as a developer I'll take an Out-of-Order core any day.
True, but as a programmer I would also rather replace a 4 core cpu with a 4 times faster single core cpu. What ever cpu design is chosen for the the next gen consoles, it will likely depend heavily on what gives best ipc/die area within a reasonable power envelop and at a decent frequency. It will be really interesting to see how things will turn out this time.
 
I really don´t get this. You are talking assembly programmers right? In what way are these assembly programmers so special. Ordering the instructions to avoid dependencies must be the objective of any assembly programmer to make the CPU fire on all (n-issue) cylinders, regardless if the CPU is in-order or OOO. You can´t really depend on the re-ordering pipe-line if you really want to optimise your code in assembly as I see it, or if you do, you better know in detail how it works, the length of it, etc.
Am I missing something?

I wasn't talking about assembly programming necessarily, but mostly about engineers that understand the low level implications of their code - things like instructions, caches, data structures, concurrency and so on.
You might not believe it, but sadly it is very hard to find people that have the knowledge and skills for that kind of work.

That sounds like a really shitty design and are you telling us that the compilers are still not helping in these situation by at least giving you a warning?

The compiler, no. There are performance tools that can tell you, PIX, SN Tuner etc, but again you'll need the skills to read generated code and understand what the problem is.
And as I mentioned, with the PPU design, often you can't do anything about it - from Load-Hit-Stores, to microcoded instructions, to branch stalls, to slow atomic operations and even slower cache misses.
Most of these can be taken care of with an OOO design.

To be fair there should be considerable amount of legacy code optimised for the in-order PPE core in the PS3 and 360 by now, considering they´ve been on the market for 5 years.

Yes, but that same code will still run faster on an Out-of-Order chip.
 
I wasn't talking about assembly programming necessarily, but mostly about engineers that understand the low level implications of their code - things like instructions, caches, data structures, concurrency and so on.
You might not believe it, but sadly it is very hard to find people that have the knowledge and skills for that kind of work.
Yes, most C++ (or god forbid script) programmers do not have enough intimate hardware knowledge to optimize their code to circumvent all the various LHS stall cases. I have been trying to write some kind of simple guide to cover up the most common LHS cases. It's easy to tell programmers to avoid integer/float/vector casts (causing LHS because the variable needs to be transferred though the memory subsystem), or avoid updating class member variables in tight loops ("this" is a pointer and all member variables are thus accessed though a pointer and not kept in registers -> lots of LHS possibilities)... but it's harder to explain things like function prologue/epilogue LHS stalls to C++ game programmers that have no assembler and CPU architecture knowledge. I personally find caches easier, since good data structures (cache line aligned bucketed lists for example) can be automatically made to prefect properly when iterating though them. As long as the technology programmers provide all the data structures, the higher level C++ game programmers do not have to interact so much with the low level hardware details.

I really don´t get this. You are talking assembly programmers right? In what way are these assembly programmers so special. Ordering the instructions to avoid dependencies must be the objective of any assembly programmer to make the CPU fire on all (n-issue) cylinders, regardless if the CPU is in-order or OOO.
There's only a very limited pool of programmers that are capable of writing efficient assembly code, and only a (small) subset of these are interested in game programming or/and have any knowledge about console CPUs. For example here in Finland we only have four (if I count correctly) game development companies that have released PS3 games. All are pretty small compared to big international companies, so each has maybe 5 programmers capable of writing efficient PS3 assembly code. They don't teach (mandatory) assembly programming at universities anymore, and most of the assembly programming teached at universities is targeted towards OS programming and embedded systems (microcontrollers). Most of the professionals who understand performance critical assembly programming are self taught.

All the high level languages (Java, C#, etc) that they now use to teach programming in schools/universities instead of good old C/ASM are not making things better. The industry needs programmers capable of writing efficient assembly (esp. vector instructions).
 
Last edited by a moderator:
The still teach a manditory assembly class at the university I went to or did 3 years ago or so when I took it though it was mips. Unfortunately though they don't start teaching multithreading hazards the benifits of OOE and chache till the masters level courses. The only other exposure to multithrading in the madatory classes was my OS class and that was only about threading in general and spin locks, state changes etc...
 
Crossbar said:
I really don´t get this. You are talking assembly programmers right?
The notion that any optimization problem in-games is solved by throwing assembler and mad-scientists at it is frankly speaking more then a little silly - even if we DID have armies of those type of guys out there.

Any sufficiently large codebase* will inevitably lead to mostly unoptimized code in production critical paths (usually coupled with largely flat profiling graphs because most of it will be equally slow).


*grown through "standard" production deadlines.
 
sebbbi said:
All the high level languages (Java, C#, etc) that they now use to teach programming in schools/universities instead of good old C/ASM are not making things better. The industry needs programmers capable of writing efficient assembly (esp. vector instructions).
Java is like Opium, it makes people dumb.
The final exam of any programming education should include a part where the student had to port a random chosen sorting algorithm written in Java to a random chosen assembly language and make it work! They should complete it within 8 hours by a computer with an emulator.

The notion that any optimization problem in-games is solved by throwing assembler and mad-scientists at it is frankly speaking more then a little silly - even if we DID have armies of those type of guys out there.

Any sufficiently large codebase* will inevitably lead to mostly unoptimized code in production critical paths (usually coupled with largely flat profiling graphs because most of it will be equally slow).

My post was stated as a question concerning what Barbarian meant by "1) Engineers that know how to properly optimize for In-Order cores are very expensive and hard to find." Because I thought assembly programmers for in-order and OOO-cores are equally hard to find and they probably are. He could probably have written that "programmers with low-level knowledge are hard to find and the PPE cores of the PS3 and 360 benefits greatly from them".

If your profiling graphs are flat, good for you, you've probably picked the low hanging fruits and optimised the critical loops, sounds pretty normal to me. If you can avoid assembly that is the prefered way, but in some cases it can really make a difference.

I myself avoid assembly like the plague, you usually can get pretty far by loop unrolling, inline functions and common sense, but I do know how to read the list files and can predict them pretty well, though the compilers do surprise me at times. ;)
 
Last edited by a moderator:
Java is like Opium, it makes people dumb.
The final exam of any programming education should include a part where the student had to port a random chosen sorting algorithm written in Java to a random chosen assembly language and make it work! They should complete it within 8 hours by a computer with an emulator.
Why would I want everyone write some random algo in assembly? Most of us who don't write console games and even most of the people who do don't have to be able to write assembly. Especially since most of the compiler optimizations are going to outperform most of the asm we can write by hand. Sure, there are places where compiler can't guess what's your intention and you're the only person to force behavior that you know is safe but compiler doesn't. What Barbarian most likely meant* was that you should be aware of what's going on underneath the C/C++ code you're looking at. Calling one method isn't equal to calling some other method if one of them is virtual. There are tons of little things like this one good programmer should be aware of and that has little to do with writing asm. Yeah, perhaps that's kinda related to reading and understanding asm, but it's more about knowing how computer works. And that's not assembly, that's architecture.

*just a guess though
 
Why would I want everyone write some random algo in assembly? Most of us who don't write console games and even most of the people who do don't have to be able to write assembly.

My point is that if you just once in your education had to write some assembly you might actually find it easier to read a list file and make some wise decision when trying to write efficient code if you later in your life run into performance problems. Perhaps even better you may attempt to write efficient code right from the start.

Some students leaving school are so up in the blue and have little to no knowledge what´s really going on under the hood and in I think that is a bad thing per se, but that is just my opinion, feel free to disagree.
 
Last edited by a moderator:
My point is that if you just once in your education had to write some assembly you might actually find it easier to read a list file and make some wise decision when trying to write efficient code if you later in your life run into performance problems. Perhaps even better you will attempt to write efficient code right from the start.

The problem is that you equate assembly programming to efficient programming for the general case. That is just wrong.

There are only two reasons for using assembler. 1.) If the scheduler of the compiler doesn't do a proper job or 2.) if the CPU has some special functionality which isn't exposed in your high level language (SIMD add/mul/permute).

In these cases it can make a big difference to use assembler. Reason 1. should be solved by compilers and micro architecture (OOO), Reason 2. should be solved by languages (proper vector support, OpenCL/CUDA).

Time is sparse at university, performance programming should focus on:
1. Data structures.
2. Picking the right programming language.
3. Optimize for general hardware features. - Caches.

Some students leaving school are so up in the blue and have little to no knowledge what´s really going on under the hood and in I think that is a bad thing per se, but that is just my opinion, feel free to disagree.

I agree, but not for performance reasons. People not understanding why 32bit integers have limited range, or that you can't use floating point values to accurately express decimal fractions is worse (having worked on financial applications, the latter is a lot worse).

Cheers
 
The problem is that you equate assembly programming to efficient programming for the general case. That is just wrong.

There are only two reasons for using assembler. 1.) If the scheduler of the compiler doesn't do a proper job or 2.) if the CPU has some special functionality which isn't exposed in your high level language (SIMD add/mul/permute).

In these cases it can make a big difference to use assembler. Reason 1. should be solved by compilers and micro architecture (OOO), Reason 2. should be solved by languages (proper vector support, OpenCL/CUDA).

Time is sparse at university, performance programming should focus on:
1. Data structures.
2. Picking the right programming language.
3. Optimize for general hardware features. - Caches.

I was considering writing a piece about that problems can be optimised on so many levels, in retrospect I obviously should have, but thanks for the complimentary information.
I am not saying that assembly is the holy grail (I even wrote that I avoid it as the plague, can I be much clearer?), but understanding it helps a a lot when you ran into certain performance problems. If you understand assembly the threshold for understanding cache prefetch instructions, memory alignment and such will certainly be lower as well. To me understanding the implication of assembly level code and cpu architecture goes hand in hand.

To get back on topic I can see one reason for keeping existing CPU-cores for the PS4 and nextBox and just increase the number of cores. It would make it possible to re-use the future die shrinks (the core logic part) of the existing cpus for the future die shrinks of the new cpus. Which could save Sony and MS some substantial money.
 
Last edited by a moderator:
Time is sparse at university, performance programming should focus on:
1. Data structures.
2. Picking the right programming language.
3. Optimize for general hardware features. - Caches.

I'd change point 3 slightly, to say "Optimize for the memory hierarchy." Closely connected to point 1 of course.

And somewhere you have to talk about algorithms. From what I have seen from my perch in scientific computation, I see a lot of work being spent on adapting to very restrictive architectures in high performance computing. Because the only way to get at big FLOPs is to conform to limitations in data sets, communication et cetera. Simply put, the tendency is to do tons of computational work on simplistic problem descriptions, because that's the only thing that lets itself map decently to the underlying parallel hardware. As soon as you try to take more of your knowledge of the problem into account, more heuristics, more conditionals, et cetera, in short as soon as you try to make more intelligent algorithms, your utilization of the underlying hardware drops off the proverbial cliff. (YMMV and all that.)

There are balances to be struck obviously, but with time I've become less impressed by throwing FLOPS at simplistic problem descriptions, and believe more in human ingenuity and programming methods and architectures that support it. Algorithms shouldn't be taken for granted, they spring from human creativity and the educational system should make that clear.
 
If you understand assembly the threshold for understanding cache prefetch instructions, memory alignment and such will certainly be lower as well.

This I most certainly agree with.

I'd change point 3 slightly, to say "Optimize for the memory hierarchy." Closely connected to point 1 of course.

True.

And somewhere you have to talk about algorithms.

Mea culpa. Algorithms should be part of "Reason 1" above, datastructures and algorithms. Chosing one often dictates the other.

From what I have seen from my perch in scientific computation, I see a lot of work being spent on adapting to very restrictive architectures in high performance computing. Because the only way to get at big FLOPs is to conform to limitations in data sets, communication et cetera. Simply put, the tendency is to do tons of computational work on simplistic problem descriptions, because that's the only thing that lets itself map decently to the underlying parallel hardware. As soon as you try to take more of your knowledge of the problem into account, more heuristics, more conditionals, et cetera, in short as soon as you try to make more intelligent algorithms, your utilization of the underlying hardware drops off the proverbial cliff. (YMMV and all that.)
Isn't this a result of the ridiculous focus on Linpack for HPC ? Computers are measured by peak mega-bollocks, not by how effective they are at solving real problems.

Cheers
 
I'd change point 3 slightly, to say "Optimize for the memory hierarchy." Closely connected to point 1 of course.
I'd dare to say biggest thing missing from stuff that is taught is common sense. I've seen tons of people doing e.g binary search over couple of dozen element arrays and keeping them sorted while a simple linear search would be faster even without considering the savings you get from not having to re-sort it every time something changes.
 
I'd dare to say biggest thing missing from stuff that is taught is common sense. I've seen tons of people doing e.g binary search over couple of dozen element arrays and keeping them sorted while a simple linear search would be faster even without considering the savings you get from not having to re-sort it every time something changes.

Common sense is a superpower, didn't you get the memo ?

(I started writing a post on the 3 points, then assumed algorithms were implied in data structure (tightly coupled), and memory access patterns in memory hierarchy optimisation.)
 
There are only two reasons for using assembler. 1.) If the scheduler of the compiler doesn't do a proper job or 2.) if the CPU has some special functionality which isn't exposed in your high level language (SIMD add/mul/permute).

It's not uncommon to look at the disassembly when debugging optimized builds, otherwise it can be very difficult to understand what's actually going on. If you can't at least read and understand small amounts of compiler emitted asm, then your usefulness as an engineer is frankly limited.
 
All the high level languages (Java, C#, etc) that they now use to teach programming in schools/universities instead of good old C/ASM are not making things better. The industry needs programmers capable of writing efficient assembly (esp. vector instructions).

Java is effectively becoming what VB was in the '90s. :( We tend to hire most graduates from specialized programs, like the ETC at CMU, where students get the majority of the skills they need to be productive in the game industry.
 
I am in the midst of my graduation, and if I hadn't done so by myself, I wouldn't have learned a whole lot, besides Java. My professor demanded us programming in C++ for our thesis actually, but we aren't really forced to do it. But "disagreeing" with your professor isn't really what you want to do^^ I must say, learning C++ was worth it. Not just the different syntax, compared to Java, but also a lot of the compiler nitpicks (I mainly use Linux/QtCreator at home and Windows/Visual Studio at Uni) Although I can read assembler, I can't comprehend it, really. At least not more complex stuff. I had some assembly programming classes in my first semester, though (MIPS32), but nothing major.

My University did however completely change up the lower semesters to get more deversity into their studies. Now they learn a semester Java, one Haskell and one C. After that, I don't know what they'll have to do, if anything.
 
I am in the midst of my graduation, and if I hadn't done so by myself, I wouldn't have learned a whole lot, besides Java. My professor demanded us programming in C++ for our thesis actually, but we aren't really forced to do it. But "disagreeing" with your professor isn't really what you want to do^^ I must say, learning C++ was worth it. Not just the different syntax, compared to Java, but also a lot of the compiler nitpicks (I mainly use Linux/QtCreator at home and Windows/Visual Studio at Uni) Although I can read assembler, I can't comprehend it, really. At least not more complex stuff. I had some assembly programming classes in my first semester, though (MIPS32), but nothing major.

My University did however completely change up the lower semesters to get more deversity into their studies. Now they learn a semester Java, one Haskell and one C. After that, I don't know what they'll have to do, if anything.

Assembly is neat because it's kind of the opposite of OOP. Normally you can tell exactly what a specific instruction does, but it's difficult to tell what a group of instructions does whereas with OOP it may be hard to know the exact specifics of execution of a certain instruction, but you can get the gist of a block of code.
 
It's not uncommon to look at the disassembly when debugging optimized builds, otherwise it can be very difficult to understand what's actually going on. If you can't at least read and understand small amounts of compiler emitted asm, then your usefulness as an engineer is frankly limited.

True for the games industry. It is simply not a skill required for software development in general.

Cheers
 
True for the games industry. It is simply not a skill required for software development in general.

Cheers

I suppose I'd have to ask, what's your definition of "software development in general?" I'm certain it is useful for engineers at Apple working on iOS devices, at Microsoft working on Office, at Google working on Search, at Oracle working on their DB engine, etc., etc.
 
Back
Top