Is the writing on the wall? ATI and NV doomed!

What is going to happen to ATI and NV over the next 5-10 years?

  • No, they will figure out a way to survive!

    Votes: 0 0.0%
  • They will probably merge with larger companies like AMD and Intel.

    Votes: 0 0.0%
  • ATI and NV will merge together to stay competitive!

    Votes: 0 0.0%
  • ATI and NV are the future!

    Votes: 0 0.0%
  • Power VR is the future!

    Votes: 0 0.0%
  • I don't care as long a more episodes of RED DWARF get made!

    Votes: 0 0.0%

  • Total voters
    209
Pete said:
I'm not qualified to answer, but that hasn't stopped me before. :)

In terms of 3D rendering, or in general? Logically, dual-core can at most offer a 2x performance increase, but that's assuming you're not bandwidth-limited. Considering the laughably greater bandwidth available to GPUs, I don't see CPUs catching up in rendering power anytime soon.

Yse, dual-core can sidestep heat waste/production issues for a time (you just put two relatively efficient 2-3GHz CPUs together, rather than building a very inefficient 4-5GHz one), but Intel and AMD will still have to solve the energy efficiency problems at smaller processes and higher speeds if they want to maintain small die sizes. Otherwise, they'll hit clock speed limits, at which point they'll be forced to go multi-core. That, in turn, will necessitate a change in programming--a focus on multi-threaded, rather than single-threaded, apps--to capitalize on those multi-core CPUs. Otherwise, that second core will be wasted on most office users (and humans), who do one task at a time.

So dual cores aren't a panacea to gamers or rendering, at least not yet. They're still eminently desirable, though. :)

Thank, that was one of the most straight forward answer that I have read on the forum.... :)
 
kenneth9265_3 said:
Pete said:
I'm not qualified to answer, but that hasn't stopped me before. :)

In terms of 3D rendering, or in general? Logically, dual-core can at most offer a 2x performance increase, but that's assuming you're not bandwidth-limited. Considering the laughably greater bandwidth available to GPUs, I don't see CPUs catching up in rendering power anytime soon.

Yse, dual-core can sidestep heat waste/production issues for a time (you just put two relatively efficient 2-3GHz CPUs together, rather than building a very inefficient 4-5GHz one), but Intel and AMD will still have to solve the energy efficiency problems at smaller processes and higher speeds if they want to maintain small die sizes. Otherwise, they'll hit clock speed limits, at which point they'll be forced to go multi-core. That, in turn, will necessitate a change in programming--a focus on multi-threaded, rather than single-threaded, apps--to capitalize on those multi-core CPUs. Otherwise, that second core will be wasted on most office users (and humans), who do one task at a time.

So dual cores aren't a panacea to gamers or rendering, at least not yet. They're still eminently desirable, though. :)

Thank, that was one of the most straight forward answer that I have read on the forum.... :)

But this is putting forward the notion that in near future multi-core designs from Intel will be based off of a cache designed core. Sure the upcoming dual core will re-use the Pentium 4 core, but after that, I think a radically different core will be introduced by Intel. A GPU from Nvidia is more like a stream processor. To me it seems Intel will ditch the complexity of a cache based CPU, and go with a stream style architechture. In a stream architechture you don't need to spend an enorumous amount of transistors on cache.

When mainstream CPU's start looking like stream processors thats when things will start to get very intresting.
 
With the tremendous amount of software that's invested in x86 designs, I have a hard time believing that a major architectural change like that is in the works for CPUs. Such a change won't happen until it needs to.
 
Brimstone said:
Ailuros said:
Oh no not again :LOL:

We are developing a prototype processor, SCALE, which is an instantiation of the vector-thread architectural paradigm designed for low-power and high-performance embedded systems. As transistors have become cheaper and faster, embedded applications have evolved from simple control functions to cellphones that run multitasking networked operating systems with realtime video, three-dimensional graphics, and dynamic compilation of garbage-collected languages. Many other embedded applications require sophisticated high-performance information processing, including streaming media devices, network routers, and wireless base stations. The SCALE architecture is intended to replace ad-hoc collections of microprocessors, DSPs, FPGAs, and ASICs, with a single hardware substrate that supports a unified high-level programming environment and provides performance and energy-efficiency competitive with custom silicon.

Considering the embedded/cellphone related stuff, it doesn't puzzle you one bit, that INTEL itself is using a dedicated 3D chip in the form of 2700G for that market?

There is an attempt by NeoMagic's MiMagic I think, with their APA (assosiative processor array). It doesn't look like a big successor so far.

But you have to ask the question how much faster and at what cost.

In the case above? Something like night and day perhaps? And we're not talking about excessive gate counts, high power consumption or even close to even PDA/mobile CPU current core frequencies. More like something between 50-150MHz for the 3D cores and 300-500MHz for the current CPUs.

Furthermore I can see WGF to have as one of it's primary target to even more offload the CPU than ever before; we're looking at a possible 2006 or later launch for that one and a more or less 4 years lifetime for that API, just like DX9.0.


Intel isn't looking to smart these days with their recent track record. Much of Intel's recent hardships stem from the complexity of their architectures. Their CISC/RISC super scalar and VLIW style CPU's are just to complex to be cost effcient. Clearly they are running into problems with heat, and the memory wall isn't going away anytime soon.

There is one processing style that does solve these problems though. Vector processors are a perfect fit for next gen architectures.They naturally exploit parrallelism, are scalable, and relativly simple to design. They don't rely on expensive cache to be effective. Data streaming is the future and you don't need much cache for that. VLIW and Super Scalar are wasteful with cache for multi-media applications. Multi-media applications are probably one of the few things that tax a computer these days for the majority of users. A person is physically limited on how fast they can type a document and read email.

A vector processor is going to needs lots of bandwidth, however; so instead of cache, eDRAM gets used. The density of eDRAM compared to SRAM is much higher, eDRAM consumes less power, and the bandwidth provided by eDRAM clobbers SRAM.


Intel is slow to change, but I think a change will come soon. IBM, Sony, and Toshiba have been working on CELL for a few years now. To me, CELL, comes across as a massive vector processor. The patents show a large amount of eDRAM, and talk about using REYES for rendering. Both of these are big hints at a Vector Processor.


Intel 2700G = PowerVR MBX (tell me that it lacks in bandwidth for what it's aimed for).

Question: what exactly would keep graphic IHVs from designing their future VPUs more in the stream processor direction, if it's really the superior sollution to look at?

This whole debate has been ongoing since the advent of 3D. For years now dedicated hardware has always proven itself to be more sufficient than general purpose HW and I don't expect it to change either. I won't deny that VPUs will move constantly closer to CPUs as time goes by, yet they'll still keep their well known so far advantages.
 
kenneth9265_3 said:
Thank, that was one of the most straight forward answer that I have read on the forum.... :)
Straight-forward, maybe. We'll have to wait for more proofreaders before we can call it correct, though the fact that Brimstone didn't find any glaring errors is promising. :)

Brimstone said:
But this is putting forward the notion that in near future multi-core designs from Intel will be based off of a cache designed core. Sure the upcoming dual core will re-use the Pentium 4 core, but after that, I think a radically different core will be introduced by Intel. A GPU from Nvidia is more like a stream processor. To me it seems Intel will ditch the complexity of a cache based CPU, and go with a stream style architechture. In a stream architechture you don't need to spend an enorumous amount of transistors on cache.

When mainstream CPU's start looking like stream processors thats when things will start to get very intresting.
Considering what TR reported of IDF, the near future multi-core designs would appear to be cache-designed, as you say (the notion of cache- and stream-designed CPUs is new to me).

TR said:
Turns out that Intel has gone crazy for dual cores across the board. The first chip it demoed this week was a dual-core Itanium called Montecito. With 1.72 billion transistors and 27MB of total onboard cache....
How's that for cache-designed? :D ;)

TR also reports the P4 and P-M architectures will each get dual-core versions. Do you consider the P4's "NetBurst" architecture still cache-designed, or stream-designed, or a sort of crossover?

And will stream-designed CPUs require as much bandwidth as nV affords its GPUs (which is to say, more than twice what current CPUs get)?
 
A couple of points from me:

Firstly, software rendering is always going to be way more interesting than hardware rendering from a programmers perspective since it gives you total freedom to do whatever you want. With software rendering you aren't limited in what sort of texture filtering you can use, what size your textures have to be or the kind of antialiasing you want to apply, heck, you don't even need to use triangles or scanline rasterization at all; you could implement a REYES system or a raytracer or a combination thereof. Much more exciting than having to play by the rules of the graphics hardware.

On the issue if CPUs will ever be able to replace GPUs in the future I say its entirely possible but not the way CPUs are going at the moment. But remember that 10 years in computing technology is a very long time. Unlike some people here I can see a complete architectural shift happening and it would make a lot of sense. Honestly, what sort of problems actually require more computing power? Multimedia applications, image processing, rendering (offline and realtime), games, servers and scientific calculations. What do all of these have in common? They can very easily benefit from massive parallism. There are very few computing problems left that require a lot of power that cannot be run in parallel. Few applications right now would benefit and writing efficient programs on parallel architectures is more difficult but if CPUs do go that way then the programmers will follow.

When I'm talking about a parallel architecture for CPUs I'm not talking about multicore. Its a nice stop-gap solution for the short term but the approach is rubbish in the long term. Once you get to having more than four cores the power drain and heat dissipation are going to be a killer I would imagine and it just feels like a dirty hack. I think we need a completely new CPU architecture; perhaps something like CELL. x86 has to die sooner or later, I just hope that its sooner. You could always keep an emulation layer somewhere for legacy software support even if its not very fast (since most critical legacy apps don't need much speed and those that do are likely to be upgraded to the new architecture anyway; games probably will suffer the most but they evolve so quickly that it shouldn't be a problem for long).

I'm not saying any of this will happen (predictions about technology have a habit of being very inaccurate) but it certainly can happen and if it does it could make software rendering for games viable again. Maybe its just wishful thinking but I'm really hoping for the death of 3d hardware accelerators within the next ten years. When (if!) they go I'll be one of the first to say good riddance!
 
Pete said:
kenneth9265_3 said:
Thank, that was one of the most straight forward answer that I have read on the forum.... :)
Straight-forward, maybe. We'll have to wait for more proofreaders before we can call it correct, though the fact that Brimstone didn't find any glaring errors is promising. :)

Brimstone said:
But this is putting forward the notion that in near future multi-core designs from Intel will be based off of a cache designed core. Sure the upcoming dual core will re-use the Pentium 4 core, but after that, I think a radically different core will be introduced by Intel. A GPU from Nvidia is more like a stream processor. To me it seems Intel will ditch the complexity of a cache based CPU, and go with a stream style architechture. In a stream architechture you don't need to spend an enorumous amount of transistors on cache.

When mainstream CPU's start looking like stream processors thats when things will start to get very intresting.
Considering what TR reported of IDF, the near future multi-core designs would appear to be cache-designed, as you say (the notion of cache- and stream-designed CPUs is new to me).

TR said:
Turns out that Intel has gone crazy for dual cores across the board. The first chip it demoed this week was a dual-core Itanium called Montecito. With 1.72 billion transistors and 27MB of total onboard cache....
How's that for cache-designed? :D ;)

TR also reports the P4 and P-M architectures will each get dual-core versions. Do you consider the P4's "NetBurst" architecture still cache-designed, or stream-designed, or a sort of crossover?

And will stream-designed CPUs require as much bandwidth as nV affords its GPUs (which is to say, more than twice what current CPUs get)?

The PS2 is really a stream processor based system. This article from Ars Technica will shed some light on the subject.

http://arstechnica.com/cpu/2q00/ps2/ps2vspc-1.html

The PS3 will probably continue down this stream based path. Of course this time they will have CELL powering the console which will be even more stream-like than the Emotion Engine, since it's has been built from the ground up to run in a multi-media enviroment.

NetBurst is cache based. The design shift of going with eDRAM instead of SRAM has to do with bandwidth vs latency. Just like graphic cards aren't really concerned with latency, but really need bandwidth, so do stream processors. So by forsaking SRAM, you sacrifice latency to achieve insane levels of bandwidth to do computations. The vector operations are fetched from the eDRAM each time. I think this is also know as Processing In Memory architechture or PIM. The Emotion Engine didn't go with this PIM style approach for vector operations, but I think CELL will and it should be much more powerful. Why else put on 64mb of eDRAM? I just wanted to note that, Stanfords Imagine stream processor doesn't use eDRAM either.

The big problem with eDRAM is getting good yields.


Here is an intresting paper on Open GL vs REYES on a Stream Architecture.

http://graphics.stanford.edu/papers/reyes-vs-opengl/
 
Goragoth said:
When I'm talking about a parallel architecture for CPUs I'm not talking about multicore. Its a nice stop-gap solution for the short term but the approach is rubbish in the long term. Once you get to having more than four cores the power drain and heat dissipation are going to be a killer I would imagine and it just feels like a dirty hack. I think we need a completely new CPU architecture; perhaps something like CELL. x86 has to die sooner or later, I just hope that its sooner.
The reason multicore is the future is that it's actually pretty efficient in terms of die space, is fully exposed in software, and is backwards-compatible with previous architectures. In the PC space, we're not going to get anything different any time soon, in terms of parallelism, and neither am I sure that it would be desireable to do so.

Here's what multicore will allow CPU manufacturers to do:

1. Stop focusing so much on the efficiency of a single processing unit. Processors today are mostly serial, and most of their transistors are spent in making that one (well, okay, two: integer and floating point) unit operate at full efficiency. By moving towards a multicore system, once software starts to make use of the multiple processors adequately, it will start to make sense to back off a bit on some of the efficiency improvements that have been made over the last few years, and create instead smaller processing units that have more parallelism.

2. Less need for very high frequencies. Very high frequencies are one of the reasons why power consumption is so high in current designs. By making the CPUs more parallel than serial, CPU manufacturers will be better able to manage heat problems (CPUs will still be hot, of course, in the high-end...).

3. Backwards compatibility! This fact cannot be stressed enough. To put it simply, there is vastly too much money invested in x86 software. You just can't sell a product for the desktop market that doesn't run x86 software fast. But it's really not that big of a deal: multicore CPUs should be good enough.
 
In the PC space, we're not going to get anything different any time soon, in terms of parallelism, and neither am I sure that it would be desireable to do so.
I pretty much agree, at least in the next five years which is a long time in computing, after that I'm not sure. The transition to a true parallel architecture won't be radical if it does come at all, rather it will evolve out of going multicore. Once programmers are used to doing things in parallel on the PC it will be easier to continue to move in that direction.

Less need for very high frequencies. Very high frequencies are one of the reasons why power consumption is so high in current designs.
That's true but how acceptable is it going to be to drop frequencies when introducing multiple cores? Older apps will run slower, which might make the processors look bad initialy. It may be a rough transition or maybe improved processes will cut the power/heat problem while giving similar frequencies with multiple cores? Oh well, I trust that Intel/AMD know what they are doing.

Backwards compatibility! This fact cannot be stressed enough. To put it simply, there is vastly too much money invested in x86 software.
Any new architecture should be able to emulate x86 if some thought is put into making it reasonably efficient. Isn't this what IA64 aka Itanium does? New architecture with x86 emulation? Not exactly a popular chip that one but that's what I'm suggesting could be done.

But it's really not that big of a deal: multicore CPUs should be good enough.
Good enough for what? I won't be happy until software rendering is feasible again but heck, it might never be. Or maybe it will be that graphics chips become entirely general purpose processors with no fixed functionality that just happen to be suited for graphics (highly parallel, lots of high bandwidth memory on board). I'll be fine with that too. I just think that the most exciting stuff in graphics happens when the programmers have free reign.
 
Goragoth said:
I pretty much agree, at least in the next five years which is a long time in computing, after that I'm not sure. The transition to a true parallel architecture won't be radical if it does come at all, rather it will evolve out of going multicore. Once programmers are used to doing things in parallel on the PC it will be easier to continue to move in that direction.
But all the code between now and then will be compiled on x86 processors, unless there's some amazing move to Linux or some other OS by then....

Less need for very high frequencies. Very high frequencies are one of the reasons why power consumption is so high in current designs.
That's true but how acceptable is it going to be to drop frequencies when introducing multiple cores? Older apps will run slower, which might make the processors look bad initialy.
Right, which is why the move to multicore will be modest at first. Once more and more apps make use of multiple processors, Intel and AMD will slowly scale up parallelism, while either not increasing or decreasing serial performance.

Any new architecture should be able to emulate x86 if some thought is put into making it reasonably efficient. Isn't this what IA64 aka Itanium does? New architecture with x86 emulation? Not exactly a popular chip that one but that's what I'm suggesting could be done.
The Itanium is ridiculously slow at emulating x86. This can be done with any architecture, of course, but it won't be nearly as fast as a dedicated x86 processor.

But it's really not that big of a deal: multicore CPUs should be good enough.
Good enough for what?
Parallelism. Basically, I don't see why another parallel computing system would be any more efficient than going multicore. All of the other potential benefits of other instruction sets than x86 would just as well benefit as single CPU as a multicore CPU, but as far as parallelism goes, I don't think you can do significantly better than multicore.

Or maybe it will be that graphics chips become entirely general purpose processors with no fixed functionality that just happen to be suited for graphics (highly parallel, lots of high bandwidth memory on board). I'll be fine with that too. I just think that the most exciting stuff in graphics happens when the programmers have free reign.
There will always be some dedicated hardware in GPU's, for the simple reason that there are some operations that are integral to the 3D pipeline, and are also very amenable to acceleration (triangle setup and texture filtering, for example). Not only that but, apparently, the actual number of transistors dedicated to fixed function-type stuff these days is quite small. I believe I read a number around 10%. If this is the case, then there just wouldn't be much need to get rid of these units: they'll take proportionally less space as time goes on anyway, and so aren't a problem to keep in the chip.
 
Hmmm..... seems to me the whole universe would be better off with more Red Dwarf episodes.......

To smeg with all this PC talk....... :rolleyes:
 
There will always be some dedicated hardware in GPU's, for the simple reason that there are some operations that are integral to the 3D pipeline, and are also very amenable to acceleration (triangle setup and texture filtering, for example). Not only that but, apparently, the actual number of transistors dedicated to fixed function-type stuff these days is quite small. I believe I read a number around 10%. If this is the case, then there just wouldn't be much need to get rid of these units: they'll take proportionally less space as time goes on anyway, and so aren't a problem to keep in the chip.
But as long as that stays it will pretty much rule out using more "exotic" rendering methods in realtime applications. You are stuck with the triangles, doing things the GPU way. I don't like it. I know it doesn't mean a damn thing, but I don't like it.

The Itanium is ridiculously slow at emulating x86. This can be done with any architecture, of course, but it won't be nearly as fast as a dedicated x86 processor.
Thing is, what apps are there that wouldn't be ported to the new architecture very quickly that are really speed critical and that don't have dedicated systems that won't be upgraded anyway? I can't think of much. Most apps have new versions come out all the time anyway, I just take the move to OSX as an example. Old apps run but not as well as true native apps. Most speed sensitive programs have since been ported. Big deal.

Parallelism. Basically, I don't see why another parallel computing system would be any more efficient than going multicore. All of the other potential benefits of other instruction sets than x86 would just as well benefit as single CPU as a multicore CPU, but as far as parallelism goes, I don't think you can do significantly better than multicore.
Maybe, I don't know. I actually know very about the inner workings of the CPU and am in no way qualified to make such a judgement. I've just heard a lot about how evil and backwards x86 is over the years so I figure a change would be good. Maybe not. I don't know. :|
 
Goragoth said:
But as long as that stays it will pretty much rule out using more "exotic" rendering methods in realtime applications. You are stuck with the triangles, doing things the GPU way. I don't like it. I know it doesn't mean a damn thing, but I don't like it.
No, not necessarily. Just because the hardware is there doesn't mean you absolutely need to make use of it. In the future, it is likely that GPU's will move away from the "vertex processing then pixel processing" sort of nature of things, and work more along the lines of, "Okay, I've got some data to work on now, oh, and here it asks me to call this function that I have implemented in hardware."

Thing is, what apps are there that wouldn't be ported to the new architecture very quickly that are really speed critical and that don't have dedicated systems that won't be upgraded anyway? I can't think of much. Most apps have new versions come out all the time anyway, I just take the move to OSX as an example. Old apps run but not as well as true native apps. Most speed sensitive programs have since been ported. Big deal.
1. Don't forget that Apple has a lot more control over their systems, and therefore has the option to break compatibility like this. PC companies don't.
2. Intel has already been trying to do this in the workstation space, but it's proven to be an uphill battle.

I expect it would be a lot more difficult than you would expect. First of all, you would need Windows to be ported to the new CPU architecture. Just look at how long it's been taking Microsoft to port to just the x86-64 architecture, for example. In the meantime, while Microsoft is working on this new architecture, you have to run all old software quickly on the new chip, or else risk not selling very many of them.

I think that the main problem here is drivers. Low-level programming tends to need to be rewritten for low-level programming, and drivers are all low-level. Windows ships with a hell of a lot of drivers.

Maybe, I don't know. I actually know very about the inner workings of the CPU and am in no way qualified to make such a judgement. I've just heard a lot about how evil and backwards x86 is over the years so I figure a change would be good. Maybe not. I don't know. :|
Right, x86 certainly isn't the best. Current architectures use a ton of transistors to work around the shortcomings of the instruction set. But that doesn't mean parallelism would be any better in any other system.
 
I think it's worth noting as well that today's current "x86" cpus scarcely resemble the x86 cpus of a decade ago. The instruction set obviously isn't restricted to a specific hardware implementation and is separate from it, as x86 cpus today much more closely resemble RISC architectures than their CISC "x86" antecedents. Thus the term "x86" today is far more applicable to being backwards compatible with the instruction set than it is descriptive of general cpu architecture. A decade ago it was widely and erroneously believed that "x86" was inherently CISC, but such was not actually the case, as Intel first proved with the original Pentium. Since then, so-called "x86" cpu architecture has moved further and further away from the traditional CISC x86 cpu architectures with each succeeding generation (especially Athlon.)

With respect to x86-64 support, M$ has two tasks: to seamlessly support IA-32 applications at the same time with little performance loss resulting from the IA-32/x86-64 nature of the OS itself; and x86-64 driver support (which is critical to the success of WinXP64.) My opinion is that M$ has delayed formal introduction of XP64 chiefly for the sake of robust x86-64 driver development & hardware support.
 
The underlying architecture being different from the instruction set is the primary reason why x86 processors waste so many transistors (in the translation layer), and, all other things the same, they also can't be as fast as a more efficient architecture, as if the compiler has more information about the underlying architecture, it can obviously optimize for it more.
 
Considering the embedded/cellphone related stuff, it doesn't puzzle you one bit, that INTEL itself is using a dedicated 3D chip in the form of 2700G for that market?

I think a more interesting question in the mobile space is whether programmable shaders will catch on (like the openGL es 2.0 work is hoping for), and if so, what will the implementation path be? Vertex shaders running on DSP's? Or following the GPU path?

There is an attempt by NeoMagic's MiMagic I think, with their APA (assosiative processor array). It doesn't look like a big successor so far.

Silicon valley is littered with bad mediaprocessor ideas...
 
I have three questions:
What is the difference between VLIW and Vector processors anyway? Don't both exploit parallelism by letting the software schedule SIMD instructions?

If a processor like Cell is sufficienly fast, why can't it emulate the x86 instruction set like Transmeta does and allow the industry to shift. If Cell can run Windows at half the speed of a 3GHz P4, then it's still pretty usable.

Why is it that computation is benefiting from parallel architectures (relatively lower frequency, more pipes) whereas bus technology is going the other way? Serial ATA and USB are all more serial but higher frequency. Why does parallel buses perform worse than serial ones (at the particular frequencies they can achieve?
 
waitaminute, aren't the first dual-core chips from Intel actually using Pentium-M type cores?
 
JF_Aidan_Pryde said:
I have three questions:
What is the difference between VLIW and Vector processors anyway? Don't both exploit parallelism by letting the software schedule SIMD instructions?

If a processor like Cell is sufficienly fast, why can't it emulate the x86 instruction set like Transmeta does and allow the industry to shift. If Cell can run Windows at half the speed of a 3GHz P4, then it's still pretty usable.

Why is it that computation is benefiting from parallel architectures (relatively lower frequency, more pipes) whereas bus technology is going the other way? Serial ATA and USB are all more serial but higher frequency. Why does parallel buses perform worse than serial ones (at the particular frequencies they can achieve?

In a VLIW architecture, the instructions explicitly schedule the different execution units, a vector processor is just a processor that executes the same code on multiple pieces of data (vectors). The two are not mutually exclusive.

IMO what I've seen of the Cell patents, it will be very fast at a certain subset of problems, but it will not be a particularly fast or efficient for general purpose computing. There are a lot of problems that require fast random access to large datastructures and unless there is a lot more to it than were seeing in the patents Cell is just not built to run these efficiently.
 
Back
Top