Intel G965 to support SM4.0?

asicnewbie said:
Nice catch. Somewhere in there (or in the 965 chipset family datasheet), Intel quotes
"400MHz GPU clock, 1.6GPixel/sec fillrate". (Elsewhere, they state 667MHz GPU clock.) I'm not up to date with 3D-architectures and terminology...Does the above statement imply 4 "pipelines"?

Since it's a USC, I'd rather say it has 4 TMUs with an unknown yet number of multi-purpose ALUs.
 
Ailuros said:
Since it's a USC, I'd rather say it has 4 TMUs with an unknown yet number of multi-purpose ALUs.
Both their fancy-pants flash demo and the whitepaper show 8 "Programmable Execution Units", but I guess that could be arbitrary and purely for visual presentation purposes.
 
Last edited by a moderator:
Zaphod said:
Both their fancy-pants flash demo and the whitepaper show 8 "Programmable Execution Units", but I guess that could be arbitrary and purely for visual presentation purposes.

8 ALUs don't sound that unlikely if we'd assume 4 TMUs as a given. It doesn't of course follow the 3:1 ALU<->TMU trend as seen in Xenos for instance, but then again for an IGP as the G965 it's good enough if it can render Aero Glass and handle multimedia functionalities.
 
Ailuros said:
8 ALUs don't sound that unlikely if we'd assume 4 TMUs as a given. It doesn't of course follow the 3:1 ALU<->TMU trend as seen in Xenos for instance
Also consider that they'll do double duty for vertex processing as well, and that the number of units doesn't say much about performance anyway (how much X, Y, or Z can they do per clock?). If Intel can achieve performance anywhere near 9600<=>X1300<=>7300 with this thing (while being 'full featured'), I'll chalk that up as a job well done.
 
Zaphod said:
Also consider that they'll do double duty for vertex processing as well, and that the number of units doesn't say much about performance anyway (how much X, Y, or Z can they do per clock?). If Intel can achieve performance anywhere near 9600<=>X1300<=>7300 with this thing (while being 'full featured'), I'll chalk that up as a job well done.

I think you're very and I mean VERY optimistic there.
 
Ailuros said:
8 ALUs don't sound that unlikely if we'd assume 4 TMUs as a given. It doesn't of course follow the 3:1 ALU<->TMU trend as seen in Xenos for instance, but then again for an IGP as the G965 it's good enough if it can render Aero Glass and handle multimedia functionalities.

GMA900 and 950 already have 4 TMUs and 8 FMACs (with poor latency hiding killing performance in many cases).
 
Zaphod said:
They posted a whitepaper too: ftp://download.intel.com/design/chipsets/applnots/31334301.pdf

If anyone spots any benchmarks, please share.

Disappointingly there is no mention about h.264 (or MPEG-4 AVC) decode in that whitepaper. IIRC the information we previously had indicated that the G965 would provide hardware assistance for h.264 decoding but it appears as though this may not be the case. If so it's a bad omission IMO.

I second the call for benchmarks. It'll be interesting to see just how much 'oomph' this new IGP has!
 
Tridam said:
GMA900 and 950 already have 4 TMUs and 8 FMACs (with poor latency hiding killing performance in many cases).

hmm. i'd say that claiming that GMA's TMU have 'poor latency hiding' is like saying that whales have certain difficulties at atmospheric gliding. the fmacs perform resonably, though, IMO.
 
darkblu said:
hmm. i'd say that claiming that GMA's TMU have 'poor latency hiding' is like saying that whales have certain difficulties at atmospheric gliding. the fmacs perform resonably, though, IMO.

Well when the CPU is not eating all the available memory bandwidth, TMU's performance is OK with sandard texture accesses. But as soon as dependend texture fetches are involved it's a complete different story and that's definitely something Intel can improve by tweaking the architecture. I expect them to have worked on that with the GMA X3000.
 
Tridam said:
Well when the CPU is not eating all the available memory bandwidth, TMU's performance is OK with sandard texture accesses.

have you been able to hit the part's theoretical fillrate with the simplest possible texture accessing? because i've not been able to, no matter how simple a synthetic test i've tried to come up with; fillrate halves as soon as i try to sample from a texture, even if the CPU is twiddling its thumbs at that moment.
 
darkblu said:
have you been able to hit the part's theoretical fillrate with the simplest possible texture accessing? because i've not been able to, no matter how simple a synthetic test i've tried to come up with; fillrate halves as soon as i try to sample from a texture, even if the CPU is twiddling its thumbs at that moment.

Yes I've been able to do that with my own tools, at least to get results showing that GMA can do 4 texture samplings par cycle. I don't remember the details though. Last time I ran a benchmark on a GMA core is more than a year ago.
 
Tridam said:
Yes I've been able to do that with my own tools, at least to get results showing that GMA can do 4 texture samplings par cycle. I don't remember the details though. Last time I ran a benchmark on a GMA core is more than a year ago.

i'll be damned! your post above made me run my synthetic test under winxp and... tadaa, full fill rate with one (miniature) texture! :oops:

it seems the root of my problem is in GMA's DRI drivers from the mesa CVS circa a couple of months ago - those do halve the chip's fillrate under the same test conditions. time for an update from CVS head followed by the eventual disturbing of some kind fellows..

thanks for the heads-up, Tridam!
 
Last edited by a moderator:
darkblu said:
it seems the root of my problem is in GMA's DRI drivers from the mesa CVS circa a couple of months ago - those do halve the chip's fillrate under the same test conditions. time for an update from CVS head followed by the eventual disturbing of some kind fellows..
It's not a big secret the intel gma (i915) dri driver could be faster. For instance, it doesn't support zone based rendering which alone is probably quite a performance killer. Feel free to improve it :) , though you can only get the graphic core docs under NDA. It might be possible to implement it without docs (the bandwidth-saving features of radeons (hyperz) have been implemented without docs for instance too).
 
On July 27 when Intel announced its Core 2 Duo CPU lineup based on the Intel Core microarchitecture, the chip giant did not yet start shipments of its G965 chipset that may later appear as the chip giant's first chipset with DirectX 10 integrated graphics processor (IGP), according to unspecified motherboard makers in Taiwan. The makers expected Intel's C-1 stepping of the G965 to be ready for volume shipments at the same day with the launch of Conroe processors. However, a defect in the integrated graphics subsystem was found, and Intel notified its customers that next stepping, called C-2, which fixes the flaw, will likely not be shipped until the middle of August, the makers indicated.
Source: Digitimes.
 
http://www.intel.com/products/chipsets/G965/index.htm
Intel® Graphics Media Accelerator 3000
3D enhancements enable greater game compatibility with support for Hardware T&L, and improved realism with support for Microsoft DirectX* 9.0c Shader Model 3.0, OpenGL* 1.5, and floating point operations. Intel graphics technology also support the highest levels of the Microsoft Vista* Aero experience.

PS. Oops, just saw a link to the technical paper...
 
Last edited by a moderator:
geo said:
Arrgh. I hate hearing new graphics cores suck.
I was hoping for them to surpass Xpress 1150 performance, but probably not it seems. Though, considering they're delaying availability for a respin, I'd defer judgement until information on the new revision (retail product) is available.
 
Last edited by a moderator:
Back
Top