Various questions about GPUs.
Hi every body,
I've various questions about GPU, I'm not to ask how things work in details but I would like to clear some dark spots.
First, it must be a pretty simple one but still obscur to me. I think it's related to the command processor and with the thread dispatcher. I've read and read again last AlexV's reviews but it's sadly still unclear to me. So I'm going to try to explain what I don't get. It's about code and datas and where they are originate/handled.
The CPU/Host send commands and data to the GPU. The CP handles those commands and generate some works/task for the thread dispatcher... to dispatch, schedule.
First It's unclear to me what the CPU really send to the GPU and how/where. Does the CPU send data and orders/commands to GPU or only orders and then the GPU reads data from RAM.
For example the CPU send commands (what to do for me) to the gpu for example manipulate vertex and then the GPU read the vertex datas output by CPU in Ram or the CPU send all together directly to the GPU?
Then it's getting even messier for me once we pass vertex processing. Say vertex processing is done for some batches, the CP will be informed via the thread dispatcher right? Then it has "a result" but how does it knows what to do with it? I mean by watching to the pipeline it's obvious what it will do but I can't figure out how.
Going further the GPU figured out what to do and move from vertex to primitive, generates tasks of x primitives for the "ALUs" to handles, the thread dispatcher do his work, ok so far so good. "Same player shoot again" how does the "ALUs" know what to do with those primitives. I've an idea but it's vague (or worse...) I read multiple time speaking of GPU "code is data", does that mean that the GPU will go read "commands" in RAM? At this stage the CPU has no longer anything to do with it, right? So this code/shaders has been set here by developers. Is that the basic idea? In that case how the GPU know which data/command (as they are memory object) load to execute? It's due to some values carried initially to a vertex or a batch of vertex?
Actually my question would be the same at each stages of the pipeline, how the GPU knows that it has to move to next part of the pipeline and where he recovers the code/commands (even if for it it's just somehow other datas).
I would really appreciate if those things were clearer to me as I feel like I have growing misconceptions about what is really going n in GPU.
I've also something other that bothered me. There are a lot of talk about how many cores are in a given GPU, how make them more "clever" etc. I tried to understand what one would call a core and why and stuck with my super limited knowledge I ended thinking that in the end from my POV GPU is still one core as the Command processor is still the only "clever" (in a CPU way) part in a GPU.
By following my logic I think that before moving to something like Larrabee GPU have first to become "multiple cor"e or be able to work in a multi-core fashion. Say right now two GPUs work in a way less efficient fashion than say two mono core CPU back in time. Larrabee is larrabee (bunch of cpu augmented by SIMD) but I feel like actual GPU has still quiet some road to do to get there (if getting there is really that much of an important goal at short term). Intel made this choice but that doesn't mean that there is not an other middle ground between actual CPU and GPU, no?
Still following my logic, I wondered if GPU viewed as a single core has passed the optimal size and may have become really "multi-core" say the Host/CPU sees multiple GPU. From my POV/understanding Command Processor and Thread Dispatcher are critical part to make the GPU more flexible, while in volume they handle more than the CPU attached to the Larrabee SIMD (or it's the other way around :clown: ) I have the "feel" (it's not based on fact, I mean I don't know) that the logic in a larrabee core is more potent in that it can make more things.
So when I say "GPU has passed the optimal size" I mean that you could have more "potent" (not in volume, keep track of that many thread etc.) command processor and thread dispatcher handling a reduce amount of SIMD. For example it looks like for ATI the building block for the GPU is a SIMD array and matching texture units, for Nvidia it seems to be (to me) the texture processor (to which are attached a given number of SIMD array, 2?). My idea (I don't state this in a pompous manner like I know better, it's more I want to understand where my reasoning is messed up) is "would it make sense to build a tinier but fully formed GPU as a building block?
For example for the same size as a Cypress you end up with a "five cores", it's still a lot less logic overhead than in a larrabee for example and as those "cores" could end sharing datas it would still be easier to adress "communication problems" than in "a sea of cores design". Especially if GPU manufacturer move to fully coherent cache supporting read and write operation, no?
So basically that are my questions somehow "code and data path in GPU, the critical role of the command processor as I feel it is the one key to my first questions" and "why GPUs don't go really to multi core first when every body expect them to become "a sea of cores" in near future.
Thank in advance for you answers
Hi every body,
I've various questions about GPU, I'm not to ask how things work in details but I would like to clear some dark spots.
First, it must be a pretty simple one but still obscur to me. I think it's related to the command processor and with the thread dispatcher. I've read and read again last AlexV's reviews but it's sadly still unclear to me. So I'm going to try to explain what I don't get. It's about code and datas and where they are originate/handled.
The CPU/Host send commands and data to the GPU. The CP handles those commands and generate some works/task for the thread dispatcher... to dispatch, schedule.
First It's unclear to me what the CPU really send to the GPU and how/where. Does the CPU send data and orders/commands to GPU or only orders and then the GPU reads data from RAM.
For example the CPU send commands (what to do for me) to the gpu for example manipulate vertex and then the GPU read the vertex datas output by CPU in Ram or the CPU send all together directly to the GPU?
Then it's getting even messier for me once we pass vertex processing. Say vertex processing is done for some batches, the CP will be informed via the thread dispatcher right? Then it has "a result" but how does it knows what to do with it? I mean by watching to the pipeline it's obvious what it will do but I can't figure out how.
Going further the GPU figured out what to do and move from vertex to primitive, generates tasks of x primitives for the "ALUs" to handles, the thread dispatcher do his work, ok so far so good. "Same player shoot again" how does the "ALUs" know what to do with those primitives. I've an idea but it's vague (or worse...) I read multiple time speaking of GPU "code is data", does that mean that the GPU will go read "commands" in RAM? At this stage the CPU has no longer anything to do with it, right? So this code/shaders has been set here by developers. Is that the basic idea? In that case how the GPU know which data/command (as they are memory object) load to execute? It's due to some values carried initially to a vertex or a batch of vertex?
Actually my question would be the same at each stages of the pipeline, how the GPU knows that it has to move to next part of the pipeline and where he recovers the code/commands (even if for it it's just somehow other datas).
I would really appreciate if those things were clearer to me as I feel like I have growing misconceptions about what is really going n in GPU.
I've also something other that bothered me. There are a lot of talk about how many cores are in a given GPU, how make them more "clever" etc. I tried to understand what one would call a core and why and stuck with my super limited knowledge I ended thinking that in the end from my POV GPU is still one core as the Command processor is still the only "clever" (in a CPU way) part in a GPU.
By following my logic I think that before moving to something like Larrabee GPU have first to become "multiple cor"e or be able to work in a multi-core fashion. Say right now two GPUs work in a way less efficient fashion than say two mono core CPU back in time. Larrabee is larrabee (bunch of cpu augmented by SIMD) but I feel like actual GPU has still quiet some road to do to get there (if getting there is really that much of an important goal at short term). Intel made this choice but that doesn't mean that there is not an other middle ground between actual CPU and GPU, no?
Still following my logic, I wondered if GPU viewed as a single core has passed the optimal size and may have become really "multi-core" say the Host/CPU sees multiple GPU. From my POV/understanding Command Processor and Thread Dispatcher are critical part to make the GPU more flexible, while in volume they handle more than the CPU attached to the Larrabee SIMD (or it's the other way around :clown: ) I have the "feel" (it's not based on fact, I mean I don't know) that the logic in a larrabee core is more potent in that it can make more things.
So when I say "GPU has passed the optimal size" I mean that you could have more "potent" (not in volume, keep track of that many thread etc.) command processor and thread dispatcher handling a reduce amount of SIMD. For example it looks like for ATI the building block for the GPU is a SIMD array and matching texture units, for Nvidia it seems to be (to me) the texture processor (to which are attached a given number of SIMD array, 2?). My idea (I don't state this in a pompous manner like I know better, it's more I want to understand where my reasoning is messed up) is "would it make sense to build a tinier but fully formed GPU as a building block?
For example for the same size as a Cypress you end up with a "five cores", it's still a lot less logic overhead than in a larrabee for example and as those "cores" could end sharing datas it would still be easier to adress "communication problems" than in "a sea of cores design". Especially if GPU manufacturer move to fully coherent cache supporting read and write operation, no?
So basically that are my questions somehow "code and data path in GPU, the critical role of the command processor as I feel it is the one key to my first questions" and "why GPUs don't go really to multi core first when every body expect them to become "a sea of cores" in near future.
Thank in advance for you answers
Last edited by a moderator: