Although now that Rys is at IMG/PowerVR, I might want to try and convince them that resistance is futile
No need to worry; there's not much room left to breathe with Rys in that cupboard ROFL
Although now that Rys is at IMG/PowerVR, I might want to try and convince them that resistance is futile
May I ask, what do you think could be a realistic number for running high-end 3D games (e.g. with the UE3) on the announced 1GHz dual-core Tegra2 (i.e. real-world worst case scenario)? The mentioned 500mW?500mW is a marketing number, it doesn't mean anything whatsoever. It's not a TDP per-se; there is such a number (all subsystems activated at once), but nobody really cares about it since you can just down-throttle in that case, or just prevent it from happening completely. 1080p decode logic is around 100mW IIRC, which is a very nice improvement (although expected, TSMC 40LP is better than most people seem to realize).
May I ask, what do you think could be a realistic number for running high-end 3D games (e.g. with the UE3) on the announced 1GHz dual-core Tegra2 (i.e. real-world worst case scenario)? The mentioned 500mW?
And how does the Tegra2 compare to the OMAP4 spec-wise? As far as I can see they seem to be in the same ballpark (at least clock for clock).
The 500mW doesn't mean anything, it's not a real TDP, it's not an average, and it's not a specific use case. Just forget it. It's about as useless as the "<1W" they once gave for Tegra1.May I ask, what do you think could be a realistic number for running high-end 3D games (e.g. with the UE3) on the announced 1GHz dual-core Tegra2 (i.e. real-world worst case scenario)? The mentioned 500mW?
CPU-wise, it's pretty much the same but without NEON presumably. Video-wise, it's arguably better (higher bitrates, but seemingly less support for less usual codecs and limited to H.264 for encode). 3D-wise, it should be faster but it depends a bit on efficiency (SGX540 is 2 TMUs, Tegra2 is 4 TMUs but at a slightly lower clock rate and without the TBDR efficiency benefits). In the Image Signal Processing department, it's not as good as OMAP4 but I'm not sure how it compares there power-wise (not that ISP power matters much in reality, the sensor itself takes a lot actually). Bandwidth-wise, OMAP4 supports 64-bit LPDDR2 versus 32-bit for NV, but I very much doubt the latter will ever be much of a bottleneck in practice.And how does the Tegra2 compare to the OMAP4 spec-wise? As far as I can see they seem to be in the same ballpark (at least clock for clock).
I don't mind encoding x264 at 3~5 times the speed if it has NEON... Seriouly, ARM SoC enviroment needs more standartization if they want third-part developers to optmize their software to it. If it don't, then performance critical aplications will forever be slower in ARM than x86.AFAIK NV hasn't even licensed it, but I could be wrong. If it does have NEON, it would probably only be on one core (i.e. heterogeneous), and I very much doubt that. As I said in the past, I genuinely believe it's a pretty dumb piece of silicon in the current market environment and NV believes the same AFAIK.
I may be retarded, but who cares about Libavcodec or x264 on these platforms? You can encode at 1080p in real-time at okayish quality - I know you could do much better with a much slower encode, but what's the use case for that in smartphones or tablets?Libavcodec, x264 developers, etc, chosed NEON in part because they can't code for dozens of different, ill-documented and exotic DSPs. Of course, right now there is not many aplications that use NEON, but silicon must come first, and then software comes.
You're confusing H.264 and x264. Tegra2 can do 1080p30 H.264 encoding in real-time in hardware. Doing that in software on the ARM cores via x264 in real-time is completely insane and absolutely not an option, it's not going to happen for a long long time. Once again, it's very hard to find use cases for NEON that affect more than 1% of the potential userbase of these kinds of devices (and where a 1GHz A9 without NEON couldn't handle it just fine) - I'm not saying there aren't any, I'm sure there are, but it's easy to assume they are more frequent than they really are.They were thinking encoding x264 in real-time is required for devices to feature Video/Film capabilities such as all portable devices are converging to.
You're confusing H.264 and x264. Tegra2 can do 1080p30 H.264 encoding in real-time in hardware. Doing that in software on the ARM cores via x264 in real-time is completely insane and absolutely not an option, it's not going to happen for a long long time. Once again, it's very hard to find use cases for NEON that affect more than 1% of the potential userbase of these kinds of devices (and where a 1GHz A9 without NEON couldn't handle it just fine) - I'm not saying there aren't any, I'm sure there are, but it's easy to assume they are more frequent than they really are.
Of course, if ARM is to invade desktops in the long-term, then NEON would be extremely desirable. But this is not for this generation of hardware or even the next if ever, so I wouldn't get ahead of myself.
You're confusing H.264 and x264. Tegra2 can do 1080p30 H.264 encoding in real-time in hardware.
Libavcodec provides decode suport for much more formats than one can dream of puting in fixed function decoders. For example, in linux comunity many people use theora to distribute video, as it can be shipped in linux distros w/o breaking stupid USA laws, and firefox will have it integrated as part of HTML5 support. I never heard of hardware theora decoding, not even DSP acelerated theora decoding. But we already have some initial NEON optmizations. Without SIMD optmizations, software video decoders will be much slower, and become unusable.I may be retarded, but who cares about Libavcodec or x264 on these platforms? You can encode at 1080p in real-time at okayish quality - I know you could do much better with a much slower encode, but what's the use case for that in smartphones or tablets?
The point, from NV's point of view, is that adding silicon for use cases that affect basically nobody doesn't make a whole lot of sense. It's much better to invest that money into exposing the acceleration hardware better, if you had to spend it at all. I mostly share that POV myself, although I still think it'd be nice to implement NEON on at least one of the two/four cores in the highest-end chips for compatibility's sake..
Software developers wont code for something that very few people have suport, or that don't give any performance improvement for most people (only added for compatibility sake, but too slow). Open source developers in general only start optmizing for an new instruction set when they buy new hardware with it. So the lag between you start selling a silicon with suport for an new instruction set and a big number of softwares start using it is usually more than one year (see the adpotion rate of new SSE instructions, new versions of Direct X, Open CL, etc). Till then you have an almost uselless piece of silicon in your hands, buf if you wait for it be useful before you implement it, we would have the problem of the egg and the chicken.Of course, if ARM is to invade desktops in the long-term, then NEON would be extremely desirable. But this is not for this generation of hardware or even the next if ever, so I wouldn't get ahead of myself.
I would be very surprised if there was a single video codec on the planet, except probably H.265, that you couldn't decode a VGA/D1-level stream of on a 1GHz Cortex-A9 *without* NEON. Would it be more power efficient with NEON? Yeah, obviously. But it'd be perfectly usable, and if you cared about power efficiency you'd still gain a lot from transcoding it to something else.Without SIMD optmizations, software video decoders will be much slower, and become unusable.
I'm not familiar with those, so I can't really judge - are you sure? If so, I guess that's definitely a viable niche application.Also, the post-processing options in software video decoders like ffdshow or mplayer are also very nice, and dependent on SIMD optimizations to run in real-time.
2-4x smaller than a HW encoder? This is not a super-naive encoder we're talking about most likely so that's probably with very high quality settings on x264. And HW encoder quality will only improve with time. Do you really think it's worth it to waste 1+ hour of battery life to encode 20 minutes of video twice as efficiently? I'm skeptical, there certainly are cases where it is but that's hardly a mainstream application.I might simply want to encode something on the go in my Cortex A9 based smartbook, and preffer to have an file 2~4 times smaller for the same quality than the hardware encoder.
Heh that's an interesting/amusing point! I wouldn't do that on a netbook though unless I could make it run entirely off the power plug rather than the battery, because batteries do degrade over time (i.e. decreased battery life).High end quad-cores systems consume 100 times more than a beagle board while encoding, but probably are "only" around 10 times faster than an OMAP4.
Well, in the case of NEON you don't, since the vast majority of companies are implementing NEON in nearly all their A9-based chips right now. But yes, I certainly agree with you in principle.Till then you have an almost uselless piece of silicon in your hands, buf if you wait for it be useful before you implement it, we would have the problem of the egg and the chicken.
Ah yes, that's a nice application, hadn't thought of it...other areas in wich integer SIMD might be useful is image processing (GIMP in the tablets, anyone?)
Hum, ok, I understand. I do have some ~720p 60fps divx/xvid stuff. AMVs, older traillers, video game captures and stuff like that. Also, theora is not the most optimized standard ever, so it is actually somewhat slow. People make screen captures with it in 800x600 + resolutions. I think that all this would be dificult to handle w/o any SIMD or DSP suport, but I'm not certain.Those are some interesting points Manabu, thanks - there are some premises I don't agree with though:
I would be very surprised if there was a single video codec on the planet, except probably H.265, that you couldn't decode a VGA/D1-level stream of on a 1GHz Cortex-A9 *without* NEON. Would it be more power efficient with NEON? Yeah, obviously. But it'd be perfectly usable, and if you cared about power efficiency you'd still gain a lot from transcoding it to something else.
This is the key: a lot of applications would benefit from NEON, but not a lot of them actually need it to be fast enough. That makes people talk past each other quite frequently in my experience
I'm not sure. But many of those filters are ported from avisynth. Many, but not all, avisynth filters use SIMD to gain some speed. I don't know how much actually. High quality deinterlacers, sharpeners and denoisers seems the most cpu intensive filters. I could not enable much post processing in the days of my celeron CPU, because almost all of it was strugling to decode SD video. But this processing is more and more moved to dedicated logic, and I think that much of it can be speeded up by GPU if given a chance. But SIMD is easier to program to, it seems.I'm not familiar with those, so I can't really judge - are you sure? If so, I guess that's definitely a viable niche application.
I don't expect that the HW encoder is more advanced than badaboom 1.2.1, that IIRC is already main or high profile. And it's quality is still horrible for a given bitrate.2-4x smaller than a HW encoder? This is not a super-naive encoder we're talking about most likely so that's probably with very high quality settings on x264. And HW encoder quality will only improve with time. Do you really think it's worth it to waste 1+ hour of battery life to encode 20 minutes of video twice as efficiently? I'm skeptical, there certainly are cases where it is but that's hardly a mainstream application.
I can take off the batery in most notebooks to avoid this. But I don't know how smartbooks for example will work... They seem like much more closed plataforms... this is a thing I don't like about the direction we are currently heading. If I can't change the OS in my smartbook, or make an dual-boot, I will think very hard about getting one or not, as I can do that in any netbook.Heh that's an interesting/amusing point! I wouldn't do that on a netbook though unless I could make it run entirely off the power plug rather than the battery, because batteries do degrade over time (i.e. decreased battery life).
None, but if what you care about is a clean die shot, I just edited my first post
Assuming the 49mm² die size is correct (Anand isn't the best source, and it's suspiciously near 7x7, although it makes sense given the specs) then each individual Cortex-A9 core (including L1/FPU, excluding L2 I/F, PTM, NEON, etc.) takes only 1.3mm²! Let me repeat that again: 1.3mm². Including all that stuff (except NEON presumably ofc), the dual-core+L2 takes ~7.25mm² (also keep in mind the L2 is reused as a buffer for video etc.)
Intriguing. That shows the video decoder as much bigger than the encoder.HKEPC has a slide with annotated die-shot: http://www.hkepc.com/4435
I'm more inclined to go with Arun's annotation than NVIDIA's
So basically, you'd need to do the equivalent of a brain scan - get a IR camera and set the chip to do different tasks and see which parts get warmI'm more inclined to go with Arun's annotation than NVIDIA's