Anandtech: AMD-ATI Merger in the Works?

More Inq

Several very smart analysts I have talked to seem to think it is madness for AMD to hook up with their Canadian brethren, mainly because it would antagonize their closest partner, NVidia. They are right, and it would, but where does NV run, the loving arms of Intel? Not a chance.

Good point.
 
Well, I think that a merger between ATI and AMD would basically kill off ATI's presence in the high-end GPU market.
 
Chalnoth said:
Well, I think that a merger between ATI and AMD would basically kill off ATI's presence in the high-end GPU market.

Like Micron buying out rendition... :cry:
 
Chalnoth said:
Well, I think that a merger between ATI and AMD would basically kill off ATI's presence in the high-end GPU market.

I'd love to say you're wrong, but history suggests pretty strongly that you're not. Not in a month or even a year, but two or three years down the road. . .

Charlie D. at Inq certainly sounds much more confident about this than your average INQish rumor-mongering.
 
AMD is working with Chartered with 65nm SOI CPU manufacturing. AMD is also working with ISI to validate the Z-RAM technology. If Z-RAM works(even nearly) as advertised, 65nm SOI Xenos with Z-RAM would probably be cheaper and have much lower power consumption than the current model. Actually, if Z-RAM works as advertised, any new high end GPU design utilizing it would likely A) be very fast and B) have very low power consumption.
Curious, how would this directly help? Intel and AMD x86 CPUs divert a huge (~50% or more) die-area to cache alone. I thought GPUs devote less die-area to cache (:love:0%.) The embedded RAM (on-die) framebuffer doesn't fit as well in a Windows/PC-environment, because the resolution-target isn't fixed.

Doesn't fit as well - true. But you'd get to rip out all Z/framebuffer compression logic, and have very low latency access to Z/framebuffer. I have no clue if this would enable any savings in latency hiding logic, though. And you'd save many a watt - driving 256bit external memory bus(and bus/compression logic) at ludicrous speeds takes a lot of juice. The resolution target isn't fixed, but one could try to match maximum resolution somewhat sanely with fillrate.

1 MB of Z-Ram should be less than 3 mm2 at 90nm, less at 65nm. This is starting to get where it would be possible to embed useful amounts of memory.
 
sonyps35 said:
Whichever company ends up with Intel is better off.
Better off for who? You mean whichever company that's eaten up by Intel? That company would essentially cease to exist. "Intel Inside" would have a new definition.
 
depends on the type of merger, takeover, buyout etc. They happen all the time and not all fail. Some lead to greater synergies.... ok sorry. Been a day full of meetings.
 
Chalnoth said:
Amdahl's law is based upon an entirely wrong assumption. There is no upper limit on parallelism in software because when you are writing software for hardware that allows for more parallelism, you allow it to process more, instead of just attempting to execute the same code as you would on a single-threaded system.

You got it backwards.

There may not be an upper limit to parallism, but there is a lower limit for the sequential part of solving a problem, this is what Amdahl's law defines, - and this ultimately defines the speedup you can achieve by throwing more resources at a problem.

Here's a classic:
One woman can have a baby in nine months from inception to birth.
Nine women can have nine babies in nine months.
Nine women cannot have one baby in one month.

Cheers
 
Perhaps, but it doesn't apply in reality, because you just don't write the same program for the parallel architecture as you write for the single-threaded one.

Once you have the capacity for parallelism, it opens up new ways of doing processing. So instead of tackling the same exact problem, you tackle a slightly different problem with the same overall goal, but one that is more suited to operation on parallel processors.
 
Chalnoth said:
Perhaps, but it doesn't apply in reality, because you just don't write the same program for the parallel architecture as you write for the single-threaded one.

True. But there will always be a lower limit. What normally happens is that you convert a compute bound problem into a communication bound problem.

Cheers
 
Chalnoth said:
Perhaps, but it doesn't apply in reality, because you just don't write the same program for the parallel architecture as you write for the single-threaded one.

Once you have the capacity for parallelism, it opens up new ways of doing processing. So instead of tackling the same exact problem, you tackle a slightly different problem with the same overall goal, but one that is more suited to operation on parallel processors.

That's fine to zeroth order, but it really does depend on the problem. As Gubbi said, sooner or later your necessity to communicate between threads jumps up and bites you in the bum and kills your scalability.
 
nutball said:
That's fine to zeroth order, but it really does depend on the problem. As Gubbi said, sooner or later your necessity to communicate between threads jumps up and bites you in the bum and kills your scalability.
And once that happens you change how you deal with the problem again, and get that scalability right back.

In an abstract sense, take computer gaming. There are a multitude of ways that gaming can go from where we are today to make use of future processing power. Which way is best is determined by what hardware is available and becomes available later. Once you start running into scalability issues with algorithm X, you start to switch over to algorithm Y, which scales better with the available hardware.

An obvious example of this is 3D graphics: as the hardware changes, so do the graphics processing algorithms. Going with multicore is no different.

If people running applications on supercomputers can make use of thousands of processors at a time, eventually there will be ways for PC software to do the same, if we ever get as many processors.
 
Chalnoth said:
And once that happens you change how you deal with the problem again, and get that scalability right back.

If it was that easy super computers would be made from 2 million PDAs using MIPS or ARM processors running at 200MHz.

That is not the case. Supercomputers are either made from boutique hardware (vector computers) or from the fastest microprocessors around (Opterons, Itanium 2s, Xeons or Power 5).

Cheers
 
Gubbi said:
If it was that easy super computers would be made from 2 million PDAs using MIPS or ARM processors running at 200MHz.
If it was that easy PCs would be built from 2000 80386's!!!
 
Chalnoth said:
And once that happens you change how you deal with the problem again, and get that scalability right back.
Something about this statement is setting off my internal "perpetual motion machine" and "infinite lossless compression" alarms...

Maybe they're reporting a false positive.
 
Gubbi said:
If it was that easy super computers would be made from 2 million PDAs using MIPS or ARM processors running at 200MHz.
Of course not. You don't move to parallelism because it's easy, you move to it because it's the only way to increase processing power after a certain point. It still takes more programmer work to find ways to get that extra performance out.

Communication between threads doesn't ever need to be a problem, because you can pretty much always just do more per thread before communicating with the others. After all, we're not really talking here about requiring infinite scalability. The amount will always be finite. And the market will require that single-threaded performance never drop with newer processors, as you don't want current applications to run slower.

So what you'll get is that when computers make the transition to a new level of parallelism, the software will adjust to make use of that extra parallelism. As an example, games may move to higher-quality algorithms for physics (fewer errors, more fluid/cloth-type effects, etc.) as parallelism increases, because just adding more physics objects would inevitably reduce scalability as it increases the amount of communication that needs to be done. But fewer objects with higher-quality effects may make better use of available processing.
 
And in usual Inquirer fashion, they have to be on both sides of every damned issue. This way they're always right. Why do you people even bother to read the site again?
 
Chalnoth said:
It still takes more programmer work to find ways to get that extra performance out.
Right, which is why your bold statement that "oh well if this doesn't work then you trivially change to doing it another way" isn't founded on reality. The current parallelisation strategies employed for commonly-used algorithms in high-performance computing are the result of hundreds (probably thousands) of man-years of research and development over decades. If they stop working, it's really not just a case of sitting down with a pencil for twenty minutes and coming up with something better. It's very expensive and time-consuming to change your underlying parallelisation strategy. More often than not it entails a total re-write of your code which, if it's a couple of million lines of FORTRAN or whatever, is not a task you take on before breakfast.

Communication between threads doesn't ever need to be a problem, because you can pretty much always just do more per thread before communicating with the others.
That really depends on your problem! If your problem involves, for example, integrating the equations of motion for an ensemble of mutually interacting particles, you need to exchange state at the end of each timestep. You need to communicate, otherwise you get the wrong answer. You can't avoid the communication, otherwise you're not solving the problem correctly! You can't make the timesteps longer for numerical reasons. There are two ways of giving each thread more work: add more particles, or do redundant work (ie. use a numerically less efficient integration scheme). The first is only a viable option if more particles is what you want -- if you want the same number of particles but you want the results turned round faster, you're in trouble. The second is just plain perverse! The third option is to suck it up and accept that you're communication-bound not compute-bound!

So what you'll get is that when computers make the transition to a new level of parallelism, the software will adjust to make use of that extra parallelism. As an example, games may move to higher-quality algorithms for physics (fewer errors, more fluid/cloth-type effects, etc.) as parallelism increases, because just adding more physics objects would inevitably reduce scalability as it increases the amount of communication that needs to be done. But fewer objects with higher-quality effects may make better use of available processing.
Maybe. But I do wonder where we'll be in five years time with 32-core CPUs. My gut feeling is that TLP will go the same way as ILP: stagnant for general-purpose computing.
 
Back
Top