Will Future CPUs have On-Die Coprocessors and Would Those Have Any Impact on Graphics

I think that what xxx was saying is that there is no way a PS3 or xbox360 can keep up with the power of a Core 2 Extreme (that can be overclocked), 4 GB of ddr2-1066, quad sli 7950gx2, sb x-fi, and a Physics card. And I think he is right. None of the two consoles can deliver the same framerates at 2560x1600 that the mentioned PC system can.
 
I think that what xxx was saying is that there is no way a PS3 or xbox360 can keep up with the power of a Core 2 Extreme (that can be overclocked), 4 GB of ddr2-1066, quad sli 7950gx2, sb x-fi, and a Physics card. And I think he is right. None of the two consoles can deliver the same framerates at 2560x1600 that the mentioned PC system can.
Kind of expected given the price difference :)
 
Average Joe doesn't know what is in the big box.


"You mean the CPU?"

;)


And just because I'm clueless, what are these supposed co-processors lacking that would make them into a full CPU? And if the CPU is just directing tasks to them, why not just have a chip with a processor that only directs tasks to these co-processors instead of a "complete" CPU :?:

I guess I don't understand the difference between general purpose code and media processing exactly. Does GP code take that much die-space? Or is the whole point just running media-type apps on media processors instead of running them on "general purpose transistors" :?:
 
Last edited by a moderator:
consoles can deliver better games than an equal or somewhat better spec'd PC, because of all the bottlenecks in PC architecture (something is only as strong as its weakest link), because PC games are never or rarely built to take full advantage of the highest-end PC, because of the overhead from Windows, and probably a few other things I'm not thinking of at this moment.


with a 3000~5000 dollar PC, most/much of the performance of its individual components is squandered or cannot be used.


it takes an overwelming amount of higher (than console) specs in PCs to see better looking games, than the games on consoles.

Dreamcast did a lot with just 26 MB total RAM, PS2 did a lot with just 40 MB total RAM.
the Gamecube with its 43 MB and Xbox with its 64 MB. The same will be true of Xbox 360, Wii and PS3.
 
ok, those diagrams are nice but how exactly are they impressive

crapga6.png



I'd rather have a 20€ PCI card with a chip that encodes to xvid/mpeg4, where can I have it :p
 
If the R600 is unifying the GS, VS, and PS programs into a single hardware execution unit, it seems like most of the graphics pipeline is being transformed into a CPU-like architecture.

It seems like the next logical step would be to move the fixed function pipeline functions right into the processor datapath as execution units which are exposed through the ISA. For example, the setup/hiZ/rasterizer is executed via an instruction in the processor which generates pixel threads from a primitive, and injects them into the thread pool for processing.

If we consider that the shaders have IEEE floating point and now full 32 bit integer instructin set, it seems like this could be a general purpose processor with some extra functional units + ISA changes to make it "3D enabled" (the setup/raster execution units + texture units for filtered memory loads). Somehow z/stencil/blend would have to be tied into the write back stage.

For performance I guess you would need multicores + lots of ALU/FP per core, with SMT.
 
If we consider that the shaders have IEEE floating point and now full 32 bit integer instructin set, it seems like this could be a general purpose processor with some extra functional units + ISA changes to make it "3D enabled" (the setup/raster execution units + texture units for filtered memory loads). Somehow z/stencil/blend would have to be tied into the write back stage.

I thought D3D 10 shaders were not IEEE 754? even Cell SPE aren't.
 
It seems like the next logical step would be to move the fixed function pipeline functions right into the processor datapath as execution units which are exposed through the ISA. For example, the setup/hiZ/rasterizer is executed via an instruction in the processor which generates pixel threads from a primitive, and injects them into the thread pool for processing.

If we consider that the shaders have IEEE floating point and now full 32 bit integer instructin set, it seems like this could be a general purpose processor with some extra functional units + ISA changes to make it "3D enabled" (the setup/raster execution units + texture units for filtered memory loads). Somehow z/stencil/blend would have to be tied into the write back stage.

That's actually much along the lines of what I had been thinking would be the most likely path.

The only question I have is how much of the die space is taken up by these fixed function units in comparison to the shader ALUs in modern GPUs like R580 or G70? And how much do they need to be expanded in order to keep the shader ALUs filled with work as the number of ALUs are increased?

For performance I guess you would need multicores + lots of ALU/FP per core, with SMT.

I wouldn't be surprised if this is what Larrabee turned out to be similar too.
 
Alstrong said:
And just because I'm clueless, what are these supposed co-processors lacking that would make them into a full CPU? And if the CPU is just directing tasks to them, why not just have a chip with a processor that only directs tasks to these co-processors instead of a "complete" CPU :?:

I guess I don't understand the difference between general purpose code and media processing exactly. Does GP code take that much die-space? Or is the whole point just running media-type apps on media processors instead of running them on "general purpose transistors" :?:

The main difference comes down to fixed-function hardware. Current general purpose processors tend to be designed to run single threaded, branchy, random read/write code. So that of course means lots of branch prediction hardware, lots of OOE scheduling hardware. A co-processor will be designed to run it's subset of code very well, whether that is scan line conversion, triangle setup, or something else. It pretty much just comes down to fixed-function hardware.
 
The only question I have is how much of the die space is taken up by these fixed function units in comparison to the shader ALUs in modern GPUs like R580 or G70? And how much do they need to be expanded in order to keep the shader ALUs filled with work as the number of ALUs are increased?

Well I think the trend is definetly more die space allocated toward the shader cores. If not the cores themselves, the scheduling/dispatching logic to make sure the cores are being well fed. I think this is obvious with R520, and with unified shaders in R600, even more so.

Expansion of the fixed function units? What do you mean by expanded in order to keep the shader ALUs filled with work? Do you mean as the number of ALUs scale, you will need more/bigger raster units to interpolate more samples per clk?

I wouldn't be surprised if this is what Larrabee turned out to be similar too.

Yeah, I think so too. From the beyond3d news posting on the open source Intel G965 drivers, it looks like Intel is starting in that direction already:

Triangle setup and related operations are also done in the EUs. In traditional architectures, a special-purpose unit would exist for it.
 
Enos_Feedler said:
Expansion of the fixed function units? What do you mean by expanded in order to keep the shader ALUs filled with work? Do you mean as the number of ALUs scale, you will need more/bigger raster units to interpolate more samples per clk?

That's effectively what I was asking. Just how does the ratio of fixed function units change in regards to shader units increasing? One example being the interpolators you listed.
 
There are some things that could make a generic CPU/GPU integration atractive.

- Early access to advanced fabrication process (anyone would like a 65nm GPU now?)
- GPU design using CPU design methodologies (long cycle design, deeper pipeline, faster clocks)
- Faster CPU-GPU communication.
- Reduce memory trafic by pass the structure pointer instead of copy the entire data structures, use it with multiple heaps memory management (we did that in the 90´s inside IBM).

Anyway, remenber the i860 CPU.

I will try to elaborate more for a future post.
 
On die will probably only ever be for the budget level for a long time to come. We already got system level I don't expect to see any HTX cards any time soon for graphics cards however you never know ATI might go the road. Socket level intergration is interesting as it will most likely be CPU vendor specific unlike HTX slots there is no competiting product ( HTX vs PCI-Express ) so it will be interesting to see what happens.

Anyway who can tell me when I can plug my Cell Chip into my motherboard with my AMD cpu?
 
Last edited by a moderator:
deeper pipeline
I thought GPU pipelines were already way deeper than CPUs :?:

overall system performance when (theoretically) fully optimized
Given that PCIE*16 is currently a major bottleneck in current PCs, overall system performance should substantially improve if you provide a high bandwidth, low latency link between CPU & GPU (ie HTX external connector at 20.8gb/s instead of PCIE*16 at 4gb/s)
 
Last edited by a moderator:
On die will probably only ever be for the budget level for a long time to come. We already got system level I don't expect to see any HTX cards any time soon for graphics cards however you never know ATI might go the road. Socket level intergration is interesting as it will most likely be CPU vendor specific unlike HTX slots there is no competiting product ( HTX vs PCI-Express ) so it will be interesting to see what happens.

Anyway who can tell me when I can plug my Cell Chip into my motherboard with my AMD cpu?
Probably Mainstream systems too.
Imagine you are Intel and decide to make a new 65nm Conroe with only one CPU and only 512kb of L2 cache. What you will do with this 100 mm2 of unused 65nm silicon state? :)
The smaller l2 cache performance can be compensated with the onchip memory controller + very fast solded RAM.
 
There are some things that could make a generic CPU/GPU integration atractive.

- Early access to advanced fabrication process (anyone would like a 65nm GPU now?)
- GPU design using CPU design methodologies (long cycle design, deeper pipeline, faster clocks)
- Faster CPU-GPU communication.

- Reduce memory trafic by pass the structure pointer instead of copy the entire data structures, use it with multiple heaps memory management (we did that in the 90´s inside IBM).

Anyway, remenber the i860 CPU.

I will try to elaborate more for a future post.


yeah, that's what I was thinking, the parts in bold ^__^
 
Back
Top