This is kind of a follow up to threads like "end of the GPU roadmap" or "software rendering thread" or quiet some "larrabee" thread.
I open this thread because off late after reading the aforementioned threads, or a presentation from Dice/Repi, quiet some questions have raised into my head.
I open it here because it affects consoles too, as it seems that both the ps4 and the next xbox are to use GCN based GPU (and I'm not that comfortable posting elsewhere).
To give some back ground to that thread I may go through a few "points" both to explain my POV so others can either clear what could be misunderstanding or to give a proper direction to the discussion that I wish would follow between the top members here.
First, I'm not a believer in GPGPU. Especially in the console forum there is a lot of enthusiasm about HSA but looking at the whole thing (from the outside that's it) I can't help but think that outside of few applications, it is going nowhere and it is going nowhere fast.
Starting with hardware, AMD is loud about HSA but I don't see the same level of enthusiasm coming form the others actors.
For example both Intel and ARM have gpu that share the memory space as the CPU. Neither are actively trying to turn that into a major "win".
There could be reasons for that (outside of technical merit), in both case the existing API do not present that kind of features.
Intel is more about selling their CPU than anything else.
Still I find here quiet a paradox, the "game changer" in GPGPU computing is kind of already here but nobody really rush to make "the win" out of that tech happen.
There is also the cost of GPGPU on our graphic processing units. We kind of forget but G in GPU stands for graphic, not "Gompute".
Throughout, the year a lot of thing have been introduced in our GPU for the sake of "compute".
So far I've not been able to find somebody that would give estimates about how much it "costs us" in term of silicon (and somehow retail price).
I've lot of question here, what is the cost of supporting integer computation? Is that really needed for the sake of graphic alone? What is the cost of the always improved compliance to IEEE standard (wrt to precision)?
I would guess that one would say not much on feature basis but all together it may add up to something, something "we" (consumers of realtime graphic) really need.
Another thing that bother me (and could really well be a misunderstanding) is the "way" GPU are evolving. I was reading yesterday the links Pete gave wrt to the latency access to the various "on chips memory pool", actually it looks like the results are discussed right now, though what striked me is actually how many "on chips memory pools" there is on the GPU.
To put the two previous points together and move forward, I feel like the GPU are getting "bloated". Worse I wonder if the way they are "growing" is really "sane". X86 CPU were for a long time criticized because they have to deal with X87 legacy, and overall legacy were holding a burden on their design.
Now, when I look at GPU, at it is quiet possibly a misunderstanding of mine, I don't think the situation is any better. In fact I think it is worse.
If I look at AMD GPU evolution, first were added local and global data share, the texture cache acted (tried to) ~a std cache when required. It was not good enough, with GCN AMD introduced a proper L1 cache.
I mean CPU are criticized but that look "worse" to me than supporting X87 (or what not) looking forward. It is the whole data path that is turning into a "mess".
CPU no matter the fact that they deal with backward compatibility too, looks a lot cleaner in how they have been integrating functional units and evolving throughout the years.
I mean you the "path" is straight forward, you have your caches the front end the alus, etc.
If I look at Intel last processor "everything is shared". I mean no matter what the CPU is to do, it get instruction from the I$, data from D$, the CPU has only one pool of instructions to be dispatch to the different execution units.
Now GPU... as I get it you get thing from the L1 texture cache, other from the L1, the LDS, GDS, constant cache /and what not.
So I wonder, CPU are not "completely clean" either, with time integration of various units have improved (/was not that clean early on), but it seems to me that they always been pretty clean when it comes to data path. And the way the memory sub-system evolved looks pretty clean to me too, they had L1 cache, then L2, L3, etc. so a pretty vertical integration. L1 got split (in most arch) in two, instruction and data but it is still pretty lean. GPU seems more horizontal and 'inferior" in how different memory pools are added on chips, as such I wonder about the impact that could have going forward in their evolution.
Pretty much I wonder if there is something that could hinder the "growth/evolution" of GPU vs CPU.
Could it be easier to grow something general purpose (more cpu like) than some specialized piece of hardware?
The advantage of the CPU could be that ultimately they were always general purpose, that is their main purpose and it never really changed, adding stuffs was adding "bonuses" and it shows in our lean the integration looks (at least to me...). On the other hand, GPU were about graphic but they are not really adding things they are evolving toward something else gradually and without that much of a clear view about where they are heading.
I will try now to speak about software, I read the threads multiple times the threads I mentioned at the beginning of this post, I also read the other day the presentation about 5 (25) challenges in real time rendering looking forward.
It seems to me that developers wants more and more CPU like GPU. I think that Andrew Lauritzen posts are pretty enlightening on the matter. The point is to me, too which extend that was not general purpose by design can evolve into something that is really general purpose, especially within the restrictions the GPU are design to function with (you have to run old code, for general purposedness you need LDS/GDS and what not, next are proper cache but you need the keep everything for BC purpose, etc.).
If I look again at how AMD gpu evolved, I see that from VLIW4/5 to GCN, a SIMD/CUs keeps tracks of 5 times more threads/wave front than the previous architure, for the sake of "compute".
In the same time, what I understand from reading a thread like the one about software rendering (or one of the numerous one about Larrabee) is that TLP only get you that far, at some point you need to exploit pretty much everything, ILP, TLP, Data level parallelism, data locality (and you want cache).
Another thing is (my gross understanding) that having many threads is not helping as workload even for graphic becomes less and less data parallel, quiet the contrary.
Back to concrete example, what AMD, their last effort being GCN, did with their GPU in their quest of "general purposedness"? You named it they multiplied by 5 the number of thread/wavefronts the GPU needs to function. It turns out well , with massive wins on nowadays (GPU) compute performances. But looking forward does that make sense?
I might be wrong (I'm used to it) but it looks like in their quest of "general purposedness" GPU evolve with pretty "short view" about where they are heading, today win is tomorrow headache.
So if I look at the next generation of console, putting R&D cost aside for the moment, I see that both Sony and MSFT used GCN based GPU. That comes a quiet a cost in silicon vs previoous AMD generation of product. One could argue that GCN does significantly better with graphic, but no one knows (AMD does I guess) out of the 50% more transistors which one are the one that are responsible for the increased performances in 3d performances and in which quantity.
(please pass on the example like BF3 using "compute" to improve graphics, as I see it the compute part in BF3 is more about doing graphic works outside off the usual "graphic pipeline, quiet different than texture decompression in civilization 5 for example or true compute workloads. By the way if the price the GPU pays is for decompress textures it is stupid, dedicated hardware would make a lot more sense).
Juniper (or an hypothetical 10SIMD vliw4 design) weights 1 billions transistors, the matching part using GCN weights 1.5 billions, what those 500 millions transistors could have bought in term of graphic performances only is unknown. The total could be more as manufacturers may also "sit" completely on compliance with IEEE standard (precision, rounding, etc.).
We are speaking here almost the de facto lowest end for what a gamers (consumers of realtime 3d graphic would use), imagine the figure for mid range and high end card 8O.
Overall the point of all of this is that, do you think that GPU can continue to evolve the way they are or something needs to happen? (Evolution made sense but in some aspects it is becoming an hindrance).
There could be a split between real GPU and truly general purpose "compute" hardware.
The truly general purpose hardware would look pretty close to actual GPU, it would be more lean with regard to data path and special units. Thing is I think it is set to loose to CPU and pretty soon (outside of a few workloads which may be enough as market/niche for quiet some time).
I would expect the heirs of larrabee and power a2, to seal the fate (mostly) of GPGPU computing, ma be within 5/10 years depending on how aggressive and successful the manufacturers of such products are.
For the GPU, a real GPU, as this point could consist of something significantly slimmed down, the bulk of computation being handle by CPU type of units.
Either way, I fail to see to see how GPU can succeed at becoming really general purpose the way they are evolving now at the same time, for gamers I think we are paying what could a high price for something we don't need, and on top of it that for the sake of battle GPU are not set to win (which doesn't mean that Big cores ala Haswell are to do be the answer to everything).
A pretty naughty side issue, is the matching market dynamic, even here we can read posts of researchers like Andrew Lauritzen, that are researching solutions to please ours eyes, actually find solutions but have no proper hardware to run what they find.
What we have is that graphic evolutions is locked down by both market dynamic and the matching API, the main driver that push the evolution of the hardware and the API is something that could very well be set to fail (ie turning GPU into general purpose devices, but how so?... well a "really" general vector machine? One may wonder and that could be the main reason about why I think it is set to fail it seems that the workload of tomorrow are less and less data parallel and anything too wide (relying to heavily on wide vectors processing) would be more a bother than anything else.
I open this thread because off late after reading the aforementioned threads, or a presentation from Dice/Repi, quiet some questions have raised into my head.
I open it here because it affects consoles too, as it seems that both the ps4 and the next xbox are to use GCN based GPU (and I'm not that comfortable posting elsewhere).
To give some back ground to that thread I may go through a few "points" both to explain my POV so others can either clear what could be misunderstanding or to give a proper direction to the discussion that I wish would follow between the top members here.
First, I'm not a believer in GPGPU. Especially in the console forum there is a lot of enthusiasm about HSA but looking at the whole thing (from the outside that's it) I can't help but think that outside of few applications, it is going nowhere and it is going nowhere fast.
Starting with hardware, AMD is loud about HSA but I don't see the same level of enthusiasm coming form the others actors.
For example both Intel and ARM have gpu that share the memory space as the CPU. Neither are actively trying to turn that into a major "win".
There could be reasons for that (outside of technical merit), in both case the existing API do not present that kind of features.
Intel is more about selling their CPU than anything else.
Still I find here quiet a paradox, the "game changer" in GPGPU computing is kind of already here but nobody really rush to make "the win" out of that tech happen.
There is also the cost of GPGPU on our graphic processing units. We kind of forget but G in GPU stands for graphic, not "Gompute".
Throughout, the year a lot of thing have been introduced in our GPU for the sake of "compute".
So far I've not been able to find somebody that would give estimates about how much it "costs us" in term of silicon (and somehow retail price).
I've lot of question here, what is the cost of supporting integer computation? Is that really needed for the sake of graphic alone? What is the cost of the always improved compliance to IEEE standard (wrt to precision)?
I would guess that one would say not much on feature basis but all together it may add up to something, something "we" (consumers of realtime graphic) really need.
Another thing that bother me (and could really well be a misunderstanding) is the "way" GPU are evolving. I was reading yesterday the links Pete gave wrt to the latency access to the various "on chips memory pool", actually it looks like the results are discussed right now, though what striked me is actually how many "on chips memory pools" there is on the GPU.
To put the two previous points together and move forward, I feel like the GPU are getting "bloated". Worse I wonder if the way they are "growing" is really "sane". X86 CPU were for a long time criticized because they have to deal with X87 legacy, and overall legacy were holding a burden on their design.
Now, when I look at GPU, at it is quiet possibly a misunderstanding of mine, I don't think the situation is any better. In fact I think it is worse.
If I look at AMD GPU evolution, first were added local and global data share, the texture cache acted (tried to) ~a std cache when required. It was not good enough, with GCN AMD introduced a proper L1 cache.
I mean CPU are criticized but that look "worse" to me than supporting X87 (or what not) looking forward. It is the whole data path that is turning into a "mess".
CPU no matter the fact that they deal with backward compatibility too, looks a lot cleaner in how they have been integrating functional units and evolving throughout the years.
I mean you the "path" is straight forward, you have your caches the front end the alus, etc.
If I look at Intel last processor "everything is shared". I mean no matter what the CPU is to do, it get instruction from the I$, data from D$, the CPU has only one pool of instructions to be dispatch to the different execution units.
Now GPU... as I get it you get thing from the L1 texture cache, other from the L1, the LDS, GDS, constant cache /and what not.
So I wonder, CPU are not "completely clean" either, with time integration of various units have improved (/was not that clean early on), but it seems to me that they always been pretty clean when it comes to data path. And the way the memory sub-system evolved looks pretty clean to me too, they had L1 cache, then L2, L3, etc. so a pretty vertical integration. L1 got split (in most arch) in two, instruction and data but it is still pretty lean. GPU seems more horizontal and 'inferior" in how different memory pools are added on chips, as such I wonder about the impact that could have going forward in their evolution.
Pretty much I wonder if there is something that could hinder the "growth/evolution" of GPU vs CPU.
Could it be easier to grow something general purpose (more cpu like) than some specialized piece of hardware?
The advantage of the CPU could be that ultimately they were always general purpose, that is their main purpose and it never really changed, adding stuffs was adding "bonuses" and it shows in our lean the integration looks (at least to me...). On the other hand, GPU were about graphic but they are not really adding things they are evolving toward something else gradually and without that much of a clear view about where they are heading.
I will try now to speak about software, I read the threads multiple times the threads I mentioned at the beginning of this post, I also read the other day the presentation about 5 (25) challenges in real time rendering looking forward.
It seems to me that developers wants more and more CPU like GPU. I think that Andrew Lauritzen posts are pretty enlightening on the matter. The point is to me, too which extend that was not general purpose by design can evolve into something that is really general purpose, especially within the restrictions the GPU are design to function with (you have to run old code, for general purposedness you need LDS/GDS and what not, next are proper cache but you need the keep everything for BC purpose, etc.).
If I look again at how AMD gpu evolved, I see that from VLIW4/5 to GCN, a SIMD/CUs keeps tracks of 5 times more threads/wave front than the previous architure, for the sake of "compute".
In the same time, what I understand from reading a thread like the one about software rendering (or one of the numerous one about Larrabee) is that TLP only get you that far, at some point you need to exploit pretty much everything, ILP, TLP, Data level parallelism, data locality (and you want cache).
Another thing is (my gross understanding) that having many threads is not helping as workload even for graphic becomes less and less data parallel, quiet the contrary.
Back to concrete example, what AMD, their last effort being GCN, did with their GPU in their quest of "general purposedness"? You named it they multiplied by 5 the number of thread/wavefronts the GPU needs to function. It turns out well , with massive wins on nowadays (GPU) compute performances. But looking forward does that make sense?
I might be wrong (I'm used to it) but it looks like in their quest of "general purposedness" GPU evolve with pretty "short view" about where they are heading, today win is tomorrow headache.
So if I look at the next generation of console, putting R&D cost aside for the moment, I see that both Sony and MSFT used GCN based GPU. That comes a quiet a cost in silicon vs previoous AMD generation of product. One could argue that GCN does significantly better with graphic, but no one knows (AMD does I guess) out of the 50% more transistors which one are the one that are responsible for the increased performances in 3d performances and in which quantity.
(please pass on the example like BF3 using "compute" to improve graphics, as I see it the compute part in BF3 is more about doing graphic works outside off the usual "graphic pipeline, quiet different than texture decompression in civilization 5 for example or true compute workloads. By the way if the price the GPU pays is for decompress textures it is stupid, dedicated hardware would make a lot more sense).
Juniper (or an hypothetical 10SIMD vliw4 design) weights 1 billions transistors, the matching part using GCN weights 1.5 billions, what those 500 millions transistors could have bought in term of graphic performances only is unknown. The total could be more as manufacturers may also "sit" completely on compliance with IEEE standard (precision, rounding, etc.).
We are speaking here almost the de facto lowest end for what a gamers (consumers of realtime 3d graphic would use), imagine the figure for mid range and high end card 8O.
Overall the point of all of this is that, do you think that GPU can continue to evolve the way they are or something needs to happen? (Evolution made sense but in some aspects it is becoming an hindrance).
There could be a split between real GPU and truly general purpose "compute" hardware.
The truly general purpose hardware would look pretty close to actual GPU, it would be more lean with regard to data path and special units. Thing is I think it is set to loose to CPU and pretty soon (outside of a few workloads which may be enough as market/niche for quiet some time).
I would expect the heirs of larrabee and power a2, to seal the fate (mostly) of GPGPU computing, ma be within 5/10 years depending on how aggressive and successful the manufacturers of such products are.
For the GPU, a real GPU, as this point could consist of something significantly slimmed down, the bulk of computation being handle by CPU type of units.
Either way, I fail to see to see how GPU can succeed at becoming really general purpose the way they are evolving now at the same time, for gamers I think we are paying what could a high price for something we don't need, and on top of it that for the sake of battle GPU are not set to win (which doesn't mean that Big cores ala Haswell are to do be the answer to everything).
A pretty naughty side issue, is the matching market dynamic, even here we can read posts of researchers like Andrew Lauritzen, that are researching solutions to please ours eyes, actually find solutions but have no proper hardware to run what they find.
What we have is that graphic evolutions is locked down by both market dynamic and the matching API, the main driver that push the evolution of the hardware and the API is something that could very well be set to fail (ie turning GPU into general purpose devices, but how so?... well a "really" general vector machine? One may wonder and that could be the main reason about why I think it is set to fail it seems that the workload of tomorrow are less and less data parallel and anything too wide (relying to heavily on wide vectors processing) would be more a bother than anything else.
Last edited by a moderator: