will multicore take over 3d rendering?

I doubt it the manufacturers are going to target an 80 mini-core CPU at the desktop.
The performance profile on today's code would be horrible, today's code will be tomorrow's legacy software, and today's legacy software will be tomorrow's legacy software.

Future desktop applications aren't going to thread out 80 ways, and past about 4 CPUs, the benefits of more cores drops fast on most of the workloads the CPU must target.

An 80 mini-core desktop CPU would basically trash performance on tasks CPUs are traditionally good at, just to do a lousy job at catching up to the GPU or a passable job at replacing the IGP (so it can do badly at everything else).

I really couldn't agree more, but for some odd reason the CPU engineers I encounter don't seem to agree.

Now with that in mind how many of these massively multi-core CPUs do Intel and AMD think they are going to sell? Obviously they will sell really well into super-computers, render farms, server warehouses, etc...

And that leaves them offering what to the masses with decent ILP performance?
 
But to follow you logic, a dual core CPU isn't dual core as it only has one master scheduler?

What I think was meant was that all though they are similar they are not entirely the same. You can add/remove some quads from a GPU and still have the rest of it function as before, much like you can add/remove cores from multi-core and not affect functionality. What is different is that there is shared control logic in a GPU that need not be duplicated per quad, while the control logic in a CPU needs to be duplicated per core.
 
But to follow you logic, a dual core CPU isn't dual core as it only has one master scheduler?
Dual core has everything doubled. That includes all the control logic (and schedulers)... You can take one core out and still get a functional processor, as well as you could take one quad from a GPU and still get a functional part.
 
But to follow you logic, a dual core CPU isn't dual core as it only has one master scheduler?

A dual-core CPU has two CPU cores, each with their own instruction fetch units, scheduling hardware, and pipelines. There is no shared scheduler between them. (I'm not talking about the OS scheduler, that's not the level I'm discussing).

They fetch their own data and instructions, and they execute instrutions based on the data they have internally.
GPU quads don't go that far, they don't determine their own path in program execution. They are fed tasks given to them from a control unit that tells them what to do. Without that control unit, they would do nothing.

edit:
The newer GPUs allow a bit more variability in how instructions are executed, but overall control is maintained by the command processor.

I really couldn't agree more, but for some odd reason the CPU engineers I encounter don't seem to agree.
That's interesting. Which desktop CPU manufacturer do they work for?

Now with that in mind how many of these massively multi-core CPUs do Intel and AMD think they are going to sell? Obviously they will sell really well into super-computers, render farms, server warehouses, etc...

And that leaves them offering what to the masses with decent ILP performance?
The overall idea shown so far is multi-core designs with a mixture of core types, not just a lot of mini-cores.
Regardless of how parallel things become, the demand for single-threaded performance improvement isn't going to disappear.


The 80 mini-core device looked limited in what it would target if it were ever brought to market. The teraflop model that was shown required an EDRAM module bonded directly to the die, which is something that sounds pricey for desktop processors.
 
Last edited by a moderator:
The overall idea shown so far is multi-core designs with a mixture of core types, not just a lot of mini-cores.
Regardless of how parallel things become, the demand for single-threaded performance improvement isn't going to disappear.

Like I said before, I agree, and I sure hope that's how things go.

The 80 mini-core device looked limited in what it would target if it were ever brought to market. The teraflop model that was shown required an EDRAM module bonded directly to the die, which is something that sounds pricey for desktop processors.

If we take the Terascale chip at face value as something to be produced it certainly would be quite pricey, but what's to say that they couldn't decrease the number of cores slightly, and attach a huge amount of cache instead. I imagine they could fit something like that into the transistor budget at 32nm or below.
 
Last edited by a moderator:
The ArsTechnica article makes sense:

advantages (ARS):
- lower latency to the GPU in the AMD case
- lower power dissipation

now add:
- faster CPU-GPU communication
- GPU with early access to advanced fabrication process.
- single compiler to use or code for both CPU and GPU.
- flexibility in power processing allocation.
- Unified Memory Hierarchy.
- help create a larger installed base of visualization oriented PCs.

Hope it to be at least twice as fast as the best IGPs solutions.
And nvidia will concentrate its efforts on the intel CPUs mobos: http://www.beyond3d.com/forum/showthread.php?p=878673#post878673

And if you need more graphics power just add a PCIe card and/or new chip.
 
AMD said that Fusion GPUs will be many times faster than IGPs, what i hope is that they achieve their goal of a 3ghz 48 pipe GPU with 8 flops/cycle, now that will get us some real kick ass performance.
 
hey i just figured something out, if u relate the arstechnica article, looking at the part where the writer says

'To support CPU/GPU integration at either level of complexity (i.e. the modular core level or something deeper), AMD has already stated that they'll need to add a graphics-specific extension to the x86 ISA. Indeed, a future GPU-oriented ISA extension may form part of the reason for the company's recently announced "close to metal" (CTM) initiative. By exposing the low-level hardware of its ATI GPUs to coders, AMD can accomplish two goals. First, they can get the low-level ISA out there and in use, thereby creating a "legacy" code base for it and moving it further toward being a de facto standard. Second, they can get feedback from the industry on what coders want to see in a graphics-specific ISA.

Both of these steps pave the way for the introduction of GPU-specific extensions to the x86 ISA, extensions that eventually will probably be modeled to some degree on the ISA for the existing ATI hardware. These extensions will start life as a handful of instructions that help keep the CPU and GPU in sync and aware of each other as they share a common socket, frontside bus, and memory controller. A later, stream-processor-oriented extension could turn x86 into a full-blown GPU ISA. '

and

this qoute from Phi Hester

'To support CPU/GPU integration at either level of complexity (i.e. the modular core level or something deeper), AMD has already stated that they'll need to add a graphics-specific extension to the x86 ISA. Indeed, a future GPU-oriented ISA extension may form part of the reason for the company's recently announced "close to metal" (CTM) initiative. By exposing the low-level hardware of its ATI GPUs to coders, AMD can accomplish two goals. First, they can get the low-level ISA out there and in use, thereby creating a "legacy" code base for it and moving it further toward being a de facto standard. Second, they can get feedback from the industry on what coders want to see in a graphics-specific ISA.

Both of these steps pave the way for the introduction of GPU-specific extensions to the x86 ISA, extensions that eventually will probably be modeled to some degree on the ISA for the existing ATI hardware. These extensions will start life as a handful of instructions that help keep the CPU and GPU in sync and aware of each other as they share a common socket, frontside bus, and memory controller. A later, stream-processor-oriented extension could turn x86 into a full-blown GPU ISA. '

you will come to the conclusion that eventually gfx-specific instructions will be added to AMD's CPU cores allowing them to process gfx, here CPUs with 4+ cores come into play, meaning that multicore shall takeover 3d rendering.
 
Isn't there some Nvidia GPU planned that uses multiple cores? I think the idea is that manufacturing one giant die is unfeasible going forward. So the GPU uses separate smaller dies on the same package. Supposedly it works something like the old VSA-100 architecture, it can be scaled up by adding more cores.

NM, here is the Inquirer article I was thinking of:
Big GPUs are set to die
THE ATI R600 will represent the last of it's breed, the monolithic GPU.

It will be replaced by a cluster of smaller GPUs with the R700 generation. This is the biggest change in the paradigm since 3Dfx came out with SLI, oh so many years ago.

Basically, if you look at the architecture of any modern GPU, R5xx/6xx or G80, it comprises pretty modular units connected by a big interconnect. Imagine if the interconnect was more distributed like say an Opteron and HT, you could have four small chips instead of one big one.

This would have massive advantages on design time, you need to make a chip of quarter the size or less, and just place many of them on the PCB. If you want a low-end board, use one, mid-range use four, pimped out edition, 16. You get the idea, Lego.

It takes a good bit of software magic to make this work, but word has it that ATI has figured out this secret sauce. What this means is R700 boards will be more modular, more scalable, more consistent top to bottom, and cheaper to fab. In fact, when they launch one SKU, they will have the capability to launch them all. It is a win/win for ATI.

There have been several code names floating around for weeks on this, and we hear it is pretty much a done deal. Less concrete is the rumour that G90 will take a similar path, but things are pointing in that direction.

G80 and R600, or most likely their descendants in the next half-generation step, will be the biggest GPUs ever. I am not sure this is something to be proud of, but the trigger has been pulled on the next big thing. The big GPU is dead, long live the swarm of little GPUs. µ

It was ATI not Nvidia, but IMO they will both be forced to go this route in the future.
 
They'd only go the multi-chip route if they absolutely have to, having roughly double the number of transistors with each new process node is much better to look forward to than the guaranteed expense of multiple chips with few upsides not already partially realized by monolithic GPUs.

The concept may become more attractive as process generations take longer to transition, since competitive pressures might keep the time between product refreshes constant, even as transistor budget growth slows.

ATI's sidling up with AMD might be an interesting wrinkle in the long-term. AMD's process technologies might make the long drought between nodes more bearable, compared to a fabless competitor.
 
you will come to the conclusion that eventually gfx-specific instructions will be added to AMD's CPU cores allowing them to process gfx, here CPUs with 4+ cores come into play, meaning that multicore shall takeover 3d rendering.

So you're expecting these cores to be identical to each other? However many cores there are in your hypothetical CPU-that's-taken-over-graphics-as-well, they're all the same?

Edit: I should say I find this unlikely, personally. I think there will be enough special function type stuff going on as to not make this efficient. Things like Z tricks, video acceleration, etc. . . or even just all that stuff that's currently in NVIO (whatever that is). And that's before we even get to the parrallelism arguments, and whether there will ever be enough CPU "cores" to effectively replace current levels (let alone future levels) of GPU parallelism. . .
 
I think the idea is that manufacturing one giant die is unfeasible going forward.
That's not strictly true. The dynamics are relatively simple: Each chip you design out of a base architecture is going to cost you millions of dollars in engineering and tape-out costs, plus all the related overhead (sales, marketing, phasing out old products, etc.) - and if that chip is aimed at too much of a niche audience for too short of a timeframe, it's not going to be worth the investment anymore, unless you feel it's necessary from a mindshare POV.

The other dynamic that has to be taken into consideration is that new process nodes (such as 65nm today) start with a higher number of defects/wafer. Even if you got a fair bit of redundancy on your die, you'll still have more unusable chips, and you'll make a less productive usage of each mm2 available to you. In the end, the chip might be more expensive on the new process than on the old one, or at least not cheaper. This is especially true for large chips. So the process isn't going to be mature enough for a 400mm2 chip yet, but it might be for a 200mm2 one. It's difficult to judge exactly how important this aspect is, however.


Uttar
 
Edit: I should say I find this unlikely, personally. I think there will be enough special function type stuff going on as to not make this efficient. Things like Z tricks, video acceleration, etc. . . or even just all that stuff that's currently in NVIO (whatever that is). And that's before we even get to the parrallelism arguments, and whether there will ever be enough CPU "cores" to effectively replace current levels (let alone future levels) of GPU parallelism. . .

I can imagine a true multi-core environment like described above, with off-die components. But how does one package all this? surely not in a GX2 way. extract the ROP's scheduler and memory controller as unique elements of the package and add groups of shaders to your liking.

Offcourse, you'd have increased cost in developing the distribution to all these items but some costs are won back by the fact that you're just creating a lot of simpler units.

It's either that or.. 1 Billion transistors at 45nm.
 
Well, not necessarily "off-die", tho it could be I suppose. NVIO certainly is. I was thinking more heterogenous cores in the same package. . .
 
I'm imagining massive packages on a board, even a dual slot solution where one board holds the processing and i/o part and the obligatory power regulation and memory.

It's ugly, but it's the way the current trend is groing.
 
Sounds kinda nifty, imagine a board that looks like it only has ram on it, one large flat heatsink for the works. I wouldn't want to have to deal with routing and whatever special drivers would have to be made up to deal with it. They need the performance, but eating the cost of bad mammoth parts is something they must be looking at.
 
So you're expecting these cores to be identical to each other? However many cores there are in your hypothetical CPU-that's-taken-over-graphics-as-well, they're all the same?

Edit: I should say I find this unlikely, personally. I think there will be enough special function type stuff going on as to not make this efficient. Things like Z tricks, video acceleration, etc. . . or even just all that stuff that's currently in NVIO (whatever that is). And that's before we even get to the parrallelism arguments, and whether there will ever be enough CPU "cores" to effectively replace current levels (let alone future levels) of GPU parallelism. . .

didnt you read the phrase where Phil Hester said that GP CPU cores will be treatable as almost specialised hardware, and if u imagine a 48 core CPU with a GPU ISA dont u agree with me that it will be good at gfx processing?
 
didnt you read the phrase where Phil Hester said that GP CPU cores will be treatable as almost specialised hardware, and if u imagine a 48 core CPU with a GPU ISA dont u agree with me that it will be good at gfx processing?

Not compared to a GPU that uses the same amount of silicon real estate.

Cheers
 
Back
Top