View Full Version : NVidia Tegra ULP GeForce Speculation
TimothyFarrar
13-Feb-2009, 16:05
Anyone have any speculation as to NVidia Tegra ULP GeForce architecture and how this will compare in power and performance to PowerVR SGX?
Anyone have any speculation as to NVidia Tegra ULP GeForce architecture and how this will compare in power and performance to PowerVR SGX?Ask me that on wednesday/thursday and hopefully I'll be able to answer a bit better than I can right now... :) I don't think it's possible to get truly objective *and* comparable power consumption data from handheld companies in practice, so for the sake of objectivity I don't think I'd ever want to comment on that though.
Here's what one of their presentation says architecture-wise:
Early-Z and fragment caching
- These are big computation and bandwidth savers
Ultra Efficient 5x Coverage Sampling Anti-Aliasing Scheme
- Mobile version of CSAA technology from GeForce
Not a tiling architecture
- Tiling works reasonably well for DX7-style content
- For DX9-style content the increased vertex and state traffic was a net loss
Not a unified architecture
- Unified hardware is a win for DX10 and compute
- For DX9-style graphics, however, non-unified is more efficientAnd another page for performance:
Tegra APX can achieve:
- Over 40M triangles/sec
- Up to 600M pixels/sec
- Texture 240M pixels/sec
Run Quake 3 Arena
- 45+ fps WVGA (800 x 480)
- 8x Aniso Texture Filtering
- 5x Coverage Sampling AAi.e. it's 2 TMUs @ 120MHz with single-cycle 5xCSAA (which is 2xMSAA with 3 extra coverage samples).
Ailuros
14-Feb-2009, 07:24
Is that highest end variant or will there be another one?
Anyway 5xCSAA is just fine for the screen sizes it's aimed for. The most interesting part is the 8x AF bit.
Before anyone says it, games on mobile devices hardly ever enable AA or AF in games (I think the latest q3a mobile version has an option for enabling AA though). I wonder if Tegra's TMUs are strong enough to handle AF or if it's just a bandwidth constrained scenario as Q3A on mobile devices typically is.
As for the critical comments in the first quote: I've heard better excuses in my lifetime than those LOL :D
CarstenS
14-Feb-2009, 11:55
Would 1x MSAA +4 Coverage-Samples save any significant portion of the ROPs - apart from being some kind of "lame"?
Is that highest end variant or will there be another one?That's the APX 2500, but it looks like the SKUs for the chip are being shuffled around a little bit (type 'APX 2600' in Google, look at the cached entry, and go to 'Specifications' to see what I mean) and I have no idea what the clock speeds for all of them will turn out to be, especially not for the 3D part.
Chip-wise, this will likely be the highest-end 65nm chip before they go to 40nm. There was a lower-end chip in the pipeline according to what I heard several times, but I don't know what's happening to that. If it still exists, you'd expect it to have started sampling some time ago.
As for 1xMSAA + 4xCSAA, I don't think that makes theoretical sense given how CSAA works... :) The CSAA samples need to be able to choose between at least two colours, unless you're thinking of doing some weird semi-but-not-really-Quincunx stuff that uses the data of adjacent pixels? That'd seem very complex to me, and I hope nobody ever gets the idea of doing something so crazy...
Ailuros
16-Feb-2009, 10:16
Bite me:
http://v3.espacenet.com/publicationDetails/biblio?CC=EP&NR=1941447A1&KC=A1&FT=D&date=20080709&DB=EPODOC&locale=en_EP
TimothyFarrar
16-Feb-2009, 17:13
The AA and AF seems a little insane to me given the relatively ultra tiny pixels of these portable screens, unless that feature set is for those looking to run regular desktop sized displays from portable devices.
Ailuros
17-Feb-2009, 06:45
The AA and AF seems a little insane to me given the relatively ultra tiny pixels of these portable screens, unless that feature set is for those looking to run regular desktop sized displays from portable devices.
The primary moot point for current mobile phone games is that textures are usually of abysmally low quality due to memory footprint constrictions. Anything over bilinear will improve things slightly, but it won't make a huge difference either.
I've played Q3A mobile on a 320*240 screen and I can't make out what the texture sizes are at in that one, yet they're definitely higher than in most other mobile games. I still noticed though the lack of AF.
With AA it gets a wee bit trickier, since one might say that it's not absolutely necessary for games for a small form factor device (which is highly debatable IMO also), but you will need it for stuff like advanced UIs amongst others as of course for things like OpenVG where you'd need something around 16x sample AA to get the content to a decent aa'ed level. A menu with rolling icons is about enough to show you that AA can make a significant difference even on a < 3" screen.
So any idea how it will perform against PowerVR SGX?
Looking at that feature table. It looks like it is a strip down version of Old Tech Geforce......
Ailuros
01-Mar-2009, 18:26
So any idea how it will perform against PowerVR SGX?
If their performance rates are close to reality and since Arun says its an APX2500 and they're largest core under 65nm, you could eventually compare it against SGX540.
SGX 530 in the OMAP3530/Pandora is a 110MHz 1xTMU GPU, GoForce in the 65nm Tegra is a 120MHz 2xTMU GPU. The 45nm OMAP36xx seems to have a 200MHz SGX 530, FWIW, while the OGL ES 2.0 Imageon in the Qualcomm & Freescale platforms seems to be a 133MHz 1xTMU GPU.
Of course, in the lot, the SGX 530 should be the one with the highest real-world efficiency out of that fillrate since it's a TBDR, doesn't need a Z-Pass, etc. - in theory it's plausible that it might handle very small triangles a bit better too, but that's not a given.
Things get more complicated when you consider AA & AF, since you'd assume that a TBDR could also be more efficient at AA. But then again, NV has 5xCSAA (i.e. 2xMSAA + 3 coverage samples); I asked for a demo of Quake3 at MWC, and at WVGA on NV's prototype I really couldn't see much if any aliasing with 5xCSAA. The demo moved a tad too fast to have a very clear idea of the AF, although it seemed OK and didn't shimmer. Talking of AF, obviously that's one of the cases where Tegra should theoretically do best against a SGX 530. It isn't clear to me whether their they'll increase their numbers of TMUs in the 40nm chip; if not obviously SGX 540 is pretty much superior in every way.
As always for handheld architectures, we know basically nothing about ALU performance (besides the fact Tegra obviously isn't unified and it's scalable further downwards in both VS & PS)... So in fact talking about performance like I did above doesn't make a lot whole of sense, but you gotta do with what you have. And of course it'd be interesting if we could have a clear die size number, but there isn't a single '3D block' in the Tegra die shot; it's at least 3 blocks (maybe more), and the display pipeline might be included in one of the three. So pretty damn hard to get any comparable figure, although I did estimate it in a pretty credible way once.
BTW, semi-OT: I know we had some discussions in the past about the performance of the Imageon core in the STMicro STn8820, and well, I'm still not 100% sure but I can safely claim it doesn't matter because it will probably never sell a single unit... ST-Ericsson didn't even show it as part of their line-up/roadmap at MWC despite including the STn8815 in it. I was pretty happy about the little that ST-Ericsson did show though, FWIW. Definitely a big player in the handheld world and worth watching for...
Wishmaster
01-Mar-2009, 20:57
So in the end we have to wait for some real life testing on devices based on those h/w platforms and running the same OS. It will be the only fair way of telling how they stand against each other.
For now we only know about upcoming toshiba tg01 (snapdragon's imageon z430).
Palm Pre (SGX530) is running WebOS so comparing it against winmo will probably be impossible (lack of benchmarking tools).
And unfortunately not even 1 announced tegra based smartphone. It's not encouraging...
I think that those models that were supposed to be based on winmo7 weren't moved to winmo6.5 due to the older winCE (5 on wm6/6.1/6.5 and 6 on tegra prototypes and future wm7) version.
But I'd be happy to be proven wrong.
So in the end we have to wait for some real life testing on devices based on those h/w platforms and running the same OS. It will be the only fair way of telling how they stand against each other.Don't hold your breath...
And unfortunately not even 1 announced tegra based smartphone. It's not encouraging...Hmm? These are three ODM phones coming out in 2H09: http://www.engadget.com/photos/nvidias-tegra-in-the-flesh/1365027/
One slide mentioning those: http://www.engadget.com/2009/02/16/nvidias-tegra-jumps-on-the-android-bandwagon/
OEM phones are still slated for 2H09, although I guess at this point we're talking mid/late Q4 for availability for most (all?) of those. No idea if we'll see announcements at CTIA. MID/Netbooks are slated for Q3, should see announcements at Computex. WinMob7 delays probably hurt them, although they claim what hurt them the most was just the economic crisis in general reducing expenditures at their customers and project delays throughout the industry. I've heard Wolfson Micro (audio codecs & analogue etc.) basically say in a CC that the number of delays in the industry is substantially above average right now, so at least part of the problem isn't NV's fault.
Blazkowicz
02-Mar-2009, 00:34
Is that highest end variant or will there be another one?
Anyway 5xCSAA is just fine for the screen sizes it's aimed for. The most interesting part is the 8x AF bit.
Before anyone says it, games on mobile devices hardly ever enable AA or AF in games (I think the latest q3a mobile version has an option for enabling AA though). I wonder if Tegra's TMUs are strong enough to handle AF or if it's just a bandwidth constrained scenario as Q3A on mobile devices typically is.
As for the critical comments in the first quote: I've heard better excuses in my lifetime than those LOL :D
I remember the Tegra said to be based on geforce 6 architecture; filtering is cheap in transistors and fast on geforce 6/7.. though not 100% clean which annoys me for old games on my desktop PC; but that should look better on a mobile screen.
the texture rate disappoints me, 240Mtexels.. only 33% better than a voodoo2 :razz:
I'm not interested in smartphones though, obviously they would clock it higher on a netbook?
Of course, in the lot, the SGX 530 should be the one with the highest real-world efficiency out of that fillrate since it's a TBDR, doesn't need a Z-Pass, etc. - in theory it's plausible that it might handle very small triangles a bit better too, but that's not a given.
Why do you think it might have an advantage with small triangles?
Ailuros
02-Mar-2009, 08:57
I remember the Tegra said to be based on geforce 6 architecture; filtering is cheap in transistors and fast on geforce 6/7.. though not 100% clean which annoys me for old games on my desktop PC; but that should look better on a mobile screen.
Based is quite a vague term; you're die area is as limited as it can be on chips for the handheld/mobile market and each IHV has to set it's priorities. Filtering is anything but "cheap" for such miniscule cores and if you'd look at any mobile chips floorplan it would be quite obvious how much die area alone 1TMU can capture.
That said when you're bandwidth limited AF can come eventually for free.
the texture rate disappoints me, 240Mtexels.. only 33% better than a voodoo2 :razz:
I'm not interested in smartphones though, obviously they would clock it higher on a netbook?
They can scale frequencies as much as units for a higher end core. That said considering Ion already has a 9400 in it, I'd rather think that the next generation Ion2 would come rather from the future IGP corner than handheld chip area.
I remember the Tegra said to be based on geforce 6 architectureYup, as Ailuros said though, 'based' doesn't mean all that much in the handheld world.
the texture rate disappoints me, 240Mtexels.. only 33% better than a voodoo2 :razz:But with Early-Z, and I'm not sure how much bigger than a Voodoo2 the die size actually is. Let's take a random number and say it's 8mm2 (I estimated it once, but that's not even the right number because I forgot about it; at least this way it also applies to SGX) - that's on 65nm, and on 350nm that'd become 256mm˛. Voodoo2 was implemented as 3 chips on 350nm, each with a 64-bit memory bus IIRC. Surely that couldn't be less than 128mm˛, and probably more. Given the difference in programmability, I don't think it's all that surprising sadly.
I'm not interested in smartphones though, obviously they would clock it higher on a netbook?In ARM-based netbooks? Theoretically they could. I'm not sure it really matters though; 240MPixels/s is enough for a pretty 3D user interface even at 1280x1024, but overclocking it by 20% isn't magically going to give you enough performance to do anything useful gaming-wise at that resolution.
Why do you think it might have an advantage with small triangles?Well, TBH I was only thinking about pixel shading intensive cases (where you're probably going to be the most GPU-limited anyway) because of the shader's MIMD nature, but I couldn't get any info about whether they do anything interesting there or what, so I have no idea if they do anything interesting there. They keep most of their MIMD marketing centered around branching, obviously.
Wishmaster
02-Mar-2009, 12:07
Hmm? These are three ODM phones coming out in 2H09: http://www.engadget.com/photos/nvidias-tegra-in-the-flesh/1365027/
One slide mentioning those: http://www.engadget.com/2009/02/16/nvidias-tegra-jumps-on-the-android-bandwagon/
But none of them seems to be running WM. Instead they will probably use android and that means that we won't be able to test them thoroughly due to lack of benchmarking tools.
But none of them seems to be running WM. Instead they will probably use android and that means that we won't be able to test them thoroughly due to lack of benchmarking tools.I think the Compal is probably WM, but I'm not sure. Ironcially, I know which basebands are in them but I don't know the OS - oops? :D Either way you'll have WM/WinCE devices coming out in 2H09...
Well, TBH I was only thinking about pixel shading intensive cases (where you're probably going to be the most GPU-limited anyway) because of the shader's MIMD nature, but I couldn't get any info about whether they do anything interesting there or what, so I have no idea if they do anything interesting there. They keep most of their MIMD marketing centered around branching, obviously.
I'm not sure a MIMD advantage becomes any greater with small triangles. I guess it depends on the shader. A chip like the one Qualcomm bought from AMD probably branches on a quad granularity anyway so the point is probably moot. Though I don't know for sure what the branch width is.
If someone packs pixels from different quads they're likely to have an advantage.
TimothyFarrar
03-Mar-2009, 18:24
As for small triangles, I wonder what SGX is doing different than the MBX to improve peak tri rates per MHz?
I thought that the MBX was a 32x32 pixel tile size with a parallel 32x1 pixel tile line of parallel triangle equation and depth test per clock. Also I thought the peak rates on some white paper were in the order of 7 Mtri/sec at 200 MHz, which is about one triangle every 32 clocks. So perhaps every triangle regardless of size runs through the full 32x32 pixels per tile for triangle equation and depth test.
Isn't SGX to be on the order of up to 16 or so times the peak triangle throughput of the MBX per MHz? So do they now process any triangle per clock for triangle equ/visibility, and likely a block of pixels rather than a tile scanline?
Simon F
04-Mar-2009, 08:52
So perhaps every triangle regardless of size runs through the full 32x32 pixels per tile for triangle equation and depth test.
a) MBX doesn't do that and b) I'm not sure the tile size is that large on MBX.
Ailuros
04-Mar-2009, 10:30
Also I thought the peak rates on some white paper were in the order of 7 Mtri/sec at 200 MHz,
Yep. The problem is that they don't mention die sizes anymore for MBX.
I'd guess that MBX might work with 32*16 tiles.
Not 32x8?
I figured Series5XT was where the size was bumped to 32x16.
I thought the 543 was 32x32 and everything prior to that was 32x16
Ailuros
17-Oct-2009, 16:34
Not 32x8?
I figured Series5XT was where the size was bumped to 32x16.
Well it's definitely smaller than 32*32. Also are you sure you'd want fixed (macro or micro) tile sizes for something like Series5XT?
dynamic load balancing and on-demand task allocation at the pipeline level
no fixed allocation of given pixels to specific cores, enabling maximum processing power to be allocated to the areas of highest on-screen action
rpg.314
17-Oct-2009, 17:01
Does anyone here know anything about the GPU core in Tegra2? Any hints, info, rumours, speculation ....
Ailuros
18-Oct-2009, 07:16
Does anyone here know anything about the GPU core in Tegra2? Any hints, info, rumours, speculation ....
Well it's definitely neither a GeForce 6 nor GeForce 9 equivalent as BSN is mentioning.
vBulletin® v3.8.6, Copyright ©2000-2013, Jelsoft Enterprises Ltd.