AMD RV770 refresh -> RV790

Yeah, I actually assumed that was how OpenCL stuff was gonna work on AMD hardware - as one thread per lane. Cause otherwise they'd be completely dependent on devs doing what Jawed described above. But like you said, branch granularity would suck mightily.
 
Coding for vec4 always as the default case is hard. There are rarely any apps in the HPC world that I know of where vec4 is a perfect match. Even in graphics world, it isn't the best fit as G80 shows. Getting max perf from AMD hardware is going to be mighty hard(impossible?) if you want to write portable code and AMD tools don't do some kind of vec4 packing.
 
Jawed said:
Why is .xyzw notation a part of D3D if the vec4 organisation of graphics and a lot of graphics hardware is irrelevant?
It's to make it easier for you (the human) to read / write shaders. I think you'll find that even on vectored hardware, the compiler is doing far more magic than you imply is going on.

It is, relatively speaking, easy to auto-vectorize languages that don't have pointers, nor side-effects, and where programs are ridiculously short.
 
Seems like you're assuming R600 to be a vec4, when it's vliw (Jawed is also talking cell etc which is different ofcourse). At least for shaders the compiler does a pretty good job of packing scalar code, when possible for branching. A bit of unrolling will usually be a good idea, but it would be stupid to vectorize yourself..
 
:LOL: That's parallel as occasionally fallen blocks not at the leading edge shift :LOL:
I knew that i should not have deleted my already typed text. It said something about "quite serial in a transcendental, not technical way". ;) But as I said, let's please move this discussion to another thread.
 
Current word is that AMD/ATi has been putting 5ghz chips on other parts...
160Gbps vs 115Gbps, 39% increase, probably not needed and not going to happen but would be fun and does make a nice dream.

When they announced the 4860, also released board shots which showed "Qimonda IDGV1G-05A1F1C-50X" chips. The shots later turned up on the amd site with the memory chips details photoshopped out. Sincerely hope someone didn't lose their job over that ;)

From qimondas gddr5 page that's 5.0Gbp/s. They don't list the current 4870 gddr5(3.6Gbp/s), too slow obviously....in the pdf they also list 5.5Gbps.

The leak on chiphell last week suggested 4.0Gbps, the current lowest listed qimonda sku. I suppose the possibility also exists that the overclocked version at the end of april may get 5.0Gbps to differentiate it enough from the regular RV790...also i think samsung is shipping 3.6 and 4.0 and has announced 5.0Gbps, just in case qimonda falls to pieces.

Edit: for completeness hynix here have listed 3.6, 4.0 and 4.5Gbps in mass production
 
Last edited by a moderator:
Can someone tell me why they would use the GDDR5 modules in the HD 4870 SKU at less than rated speeds?
 
Can someone tell me why they would use the GDDR5 modules in the HD 4870 SKU at less than rated speeds?
About every board out there with GDDR3 or GDDR4 has memory clocks slower than rated speeds. Maybe it has something to do with reliability? I honestly don't know, but it's a common practice.
 
Partner customizability?


ATI seems to be way more conservative in comparison to nVidia on this aspect since G80, though.
 
About every board out there with GDDR3 or GDDR4 has memory clocks slower than rated speeds.
This is often the case, and there are multiple factors why. Its less so with GDDR3 chips because the characteristics of them are fairly well known these days and they are running relatively low data rate in comparison to what the ASIC's memory interface can handle. In the case of GDDR5 remember that we are making a one time jump of the memory interface running in the 2.4-2.8GHz (GDDR4) to 3.6-4.0GHz range and it was never a guarantee that it would be achieved even close to the max capable data rates.

In fact, though, HD 4870 could have been set free at a slightly higher mem speed, but A12 silicon had nicer engine timing characteristics than A11 so we traded off a little memory speed for a higher engine speed to still stay in the same thermal/power envelope while giving more performance overall.
 
Back
Top