AMD: R7xx Speculation

silent_guy · May 29, 2008

Lukfi said:
=>Pressure: You wanted to feel better than the rest of us, you wanted to feel special, so you bought a Mac. It was your choice. Now please shut up and suffer the consequences.

Just when I thought these R700/GT200 threads couldn't get more boring...

(Your way of stereotyping doesn't require me to own a Mac to feel better and special.)

ShaidarHaran · May 29, 2008

Pressure said:
Actually it is.

Mac OS X has support for RV670 (for example and many others) but there is no Radeon HD 3870 with an EFI rom. It is up to ATI or NVIDIA to supply an aftermarket card (or a Build to Order option).

ATI even make the drivers for Apple, whereas NVIDIA gives them the code and make them write it themselves.

Macs are closed architecture. It's up to Apple to request ATi (or NV) create a card with EFI ROM to support their systems. It's not like there's an aftermarket for Mac parts (comparatively speaking) so ATi has no incentive to do it on their own.

Lukfi said:
=>ShaidarHaran: Arbitration logic? Why? If the R700 will be the same concept as R680, no other chips will be required - except the PLX PCIe bridge, but its main purpose is not communication *between* the chips.

I understand the function of the PLX chip, though some posting appear not to. The purpose of my previous post was to make it clear to these individuals that no bridge or arbiter is necessary for the sharing of data between the two GPUs.

Pressure · May 29, 2008

XMAN26 said:
And why is it ATI and NVs fault Apple desided to require EFIs instead of a regular BIOS chip for video cards? I know, Apples laziness to support more hardware than they want to.

The limitation of BIOS is really starting to show its ugly face and even Vista were supposed to go EFI.

ShaidarHaran said:
Macs are closed architecture. It's up to Apple to request ATi (or NV) create a card with EFI ROM to support their systems. It's not like there's an aftermarket for Mac parts (comparatively speaking) so ATi has no incentive to do it on their own.

The .kext has support for several graphic cards, including the Radeon HD 3800 series. There were a rumor floating around earlier that ATI would release an EFI based card. Otherwise it just makes no sense that the device ID is found in the graphic .kext

Lukfi said:
=>Pressure: You wanted to feel better than the rest of us, you wanted to feel special, so you bought a Mac. It was your choice. Now please shut up and suffer the consequences.

I sure did hit the mature audience here I see.

No, I actually use my Mac Pro for professional work. I am earning money on this lovely platform and my creativity is through the roof. Did wish I had a bit more choice in the graphic card department though.

Back to topic I presume.

Kaotik · May 29, 2008

At least x64 Vista has EFI support, and apparently at least some motherboard manufacturers are now bringing motherboards with normal BIOS, but which can be software updated to EFI instead later this year

Razor1 · May 29, 2008

hmm I have a mac, macs are cool, bootcamped there are EFI emulators out there, to get most graphics cards workin in the Mac OS

ChronoReverse · May 29, 2008

Pressure said:
No, I actually use my Mac Pro for professional work. I am earning money on this lovely platform and my creativity is through the roof. Did wish I had a bit more choice in the graphic card department though.

So a Mac somehow increases your creativity? This is one of the reasons why there's such a backlash against some of the Mac folks.

XMAN26 · May 29, 2008

ChronoReverse said:
So a Mac somehow increases your creativity? This is one of the reasons why there's such a backlash against some of the Mac folks.

OT, apparently, MAC users think that somehow the you can get for it is better than th same software n a PC.

Sound_Card · May 29, 2008

Wow, Mac bashing came out of no where. Back on topic yes?

silent_guy · May 29, 2008

ChronoReverse said:
So a Mac somehow increases your creativity? This is one of the reasons why there's such a backlash against some of the Mac folks.

Companies spend millions on nice buildings, nice furniture and a pleasant working environment. I've worked in the worst kind of conditions and the best, and it does make a difference in the overall attitude one has about the work place.

Why would it be any different about the day to day in-your-face tool you're using?

I'm not saying it's a necessary condition, but even if it helps just a few percent in making people feel better and more productive, it's worth it.

Anarchist4000 · May 29, 2008

I'm not sure we're suggesting that an external arbiter is needed but connecting what is effectively a 512bit bus doesn't seem plausible. Less pins and higher speeds between the chips seem to be the only solution to have a reasonable amount of bandwidth.

If both chips are going to share the same pool then I'd assume half of whatever bandwidth would be consumed by texture fetches would have to utilize that connection with minimal latency.

That's why we're suggesting some form of high speed interconnect and possibly offloading other features in the process. Ultimately it would come down to just how much space a feature consumed and whether or not it was even worth removing.

IbaneZ · May 29, 2008

Sound_Card said:
Back on topic yes?

No.

This is pre-release hysteria, let the weird times roll.

Just enjoy it, when we all have the bechmarks it'll be boring times again.

Jawed · May 29, 2008

What's the worst case scenario for texture bandwidth, assuming a 1GHz GPU with 16 TUs? Assuming fp16 texels, minified with no mipmap and bilinearly filtered, I think this comes out as 16 texels per pixel * 16 pixels per clock * 1GHz * 8 bytes per texel = 51.2GB/s.

So each GPU in a pair could read that out of its local memory. Or each GPU could, on average, read half of that from the other GPU's memory. So that would require 51GB/s connecting the two GPUs.

Is that reasonable as an upper bound on the bandwidth required to join two RV770s if they operate as a "shared memory" graphics card?

Jawed

ShaidarHaran · May 29, 2008

Jawed said:
What's the worst case scenario for texture bandwidth, assuming a 1GHz GPU with 16 TUs? Assuming fp16 texels, minified with no mipmap and bilinearly filtered, I think this comes out as 16 texels per pixel * 16 pixels per clock * 1GHz * 8 bytes per texel = 51.2GB/s.

So each GPU in a pair could read that out of its local memory. Or each GPU could, on average, read half of that from the other GPU's memory. So that would require 51GB/s connecting the two GPUs.

Is that reasonable as an upper bound on the bandwidth required to join two RV770s if they operate as a "shared memory" graphics card?

Jawed

This is what I'm talking about.

Why must a dual RV770 SKU have any sort of connection between its GPUs?

A shared memory pool utilizing DMA and existing memory interface infrastructure(s) are present on each GPU. No additional hardware nor separate traces need be run (beyond what is necessary to enable dual GPUs on a PCB, that is).

Jawed · May 29, 2008

ShaidarHaran said:
Why must a dual RV770 SKU have any sort of connection between its GPUs?

Depends on the bandwidth required to get scaling out of a shared memory configuration, if they are, indeed, configured that way.

Jawed

ShaidarHaran · May 29, 2008

Jawed said:
Depends on the bandwidth required to get scaling out of a shared memory configuration, if they are, indeed, configured that way.

Jawed

Good thing they've switched to GDDR5 from GDDR3 then

I just see all these bits as separate pieces of the same puzzle. They just seem to fit together too well for the obvious case to be anything but true, but I've been wrong before...

trinibwoy · May 29, 2008

ShaidarHaran said:
I just see all these bits as separate pieces of the same puzzle. They just seem to fit together too well for the obvious case to be anything but true, but I've been wrong before...

All we know is that R700 is a dual-GPU card with GDDR5. What other bits are there?

ShaidarHaran · May 29, 2008

trinibwoy said:
All we know is that R700 is a dual-GPU card with GDDR5. What other bits are there?

I believe the slides showing "shared memory in the R700 generation" has been linked to several times, the most recent is probably back a couple pages by now.

Even if you believe this information to be outdated (as some have suggested), it is clear that ATi has the desire to simplify multi-GPU rendering, while also increasing efficiency resulting in greater performance.

Maybe I just want to believe so hard that 4870 X2 is "something more" than yet another CF on a card solution.

trinibwoy · May 29, 2008

Yeah I saw those but weren't they made up by some Chinese website? ATi is obviously moving in this direction but I haven't seen anything indicating that we will see it in R700.

There's one simple reason for that....such a high level of inter-die integration would probably require significant architectural change. R600 was definitely an attempt at single die supremacy so I'm not expecting anything along these lines until AMD's next architecture rolls out.

ShaidarHaran · May 29, 2008

trinibwoy said:
Yeah I saw those but weren't they made up by some Chinese website? ATi is obviously moving in this direction but I haven't seen anything indicating that we will see it in R700.

There's one simple reason for that....such a high level of inter-die integration would probably require significant architectural change. R600 was definitely an attempt at single die supremacy so I'm not expecting anything along these lines until AMD's next architecture rolls out.

Re: integration of components - I know the slides showed an MCM, but I don't think it's absolutely necessary to achieve the desired effects. It does make trace-routing all that much more difficult, and of course PCB costs rise accordingly, though.

I think the change to R7xx generation (as small as it may be) by no means precludes the possibility of the introduction of any necessary micro-architectural changes to facilitate the use of a shared memory pool for a dual GPU SKU.

mczak · May 29, 2008

Jawed said:
What's the worst case scenario for texture bandwidth, assuming a 1GHz GPU with 16 TUs? Assuming fp16 texels, minified with no mipmap and bilinearly filtered, I think this comes out as 16 texels per pixel * 16 pixels per clock * 1GHz * 8 bytes per texel = 51.2GB/s.

I don't get your math. If I multiply the stuff you mentioned, I come up with 2TB/s... That said, I don't understand the calculation neither - why 16 texels per pixel? Shouldn't that be 4 for bilinear? In any case, I suspect even under somewhat bad conditions you'd usually only have 1 or so, bilinear (with mipmaps) tends to be perfect for texture caches. That still gives 128GB/s - meaning the chip doesn't have enough bandwidth for this anyway. Though DXT1 textures would only use 8GB/s, and DXT5 only 16GB/s...
I suspect for really good performance you'd want half the memory bandwidth as aggregate link bandwidth, with all textures split up (with some tiling pattern) between the two chips - meaning each chip would still have the same memory bandwidth as a single chip configuration (aside from pathological cases where all texture accesses from a chip go to the memory of the other chip). Though if you assume texture fetch doesn't consume that much bandwidth (after all, your ROPs probably want some too, and as said with compressed formats it should be much lower) maybe something like one fourth the bandwidth instead of half could be enough...

AMD: R7xx Speculation

silent_guy

ShaidarHaran

hardware monkey

Pressure

Kaotik

Drunk Member

Razor1

ChronoReverse

XMAN26

Sound_Card

silent_guy

Anarchist4000

IbaneZ

Jawed

ShaidarHaran

hardware monkey

Jawed

ShaidarHaran

hardware monkey

trinibwoy

Meh

ShaidarHaran

hardware monkey

trinibwoy

Meh

ShaidarHaran

hardware monkey

mczak

Similar threads