nvidia mobile kepler more powerful than ps3 - New era of mobile games!

http://www.anandtech.com/show/7169/nvidia-demonstrates-logan-soc-mobile-kepler

Nvidia tegra 5 or logan will have one SMX of kepler cores - i.e; 192 cores which will be more powerful than any currentgen console and also THE GTX 8800 according to nvidia .

Screen-Shot-2013-07-24-at-2.41.18-AM_678x452.jpg


GeForce_GTX_680_SM_Diagram_FINAL_575px.png


Screen%20Shot%202013-07-24%20at%203.18.58%20AM_575px.png
 
I doubt this will stay here as on topic, but yeah that was very interesting!

They quote ~400 gflops at 1ghz, but Anand speculates in product clocks would be much lower (500 mhz, 200 gflops).

It's a straight Kepler SIMD, 192 cuda cores.

I'm trying to figure how this will compare to next gen...I think the issue would likely be it will fall well flat everywhere except FP. Such as bandwidth. In pure FP at that 1GHZ clock it is 1/3 as good as rumored Xbone.

That jungle demo is very impressive, looks a bit like Crysis.

 
That's fairly impressive. One more year until I have a mobile refresh, hopefully something like this will be available then. Maybe the Power VR6.

I do wonder how long it'll take for mobile SOCs to catch up with the PS4/Xbox One.
 
Mobile devices won't have a chance until they have comparable bandwidth. PS4 has ~176 GB/s iirc. However, that could be sooner than you think depending on your perspective.

Samsung is a member of the Hybrid Memory Cube consortium, and I would not be surprised to see them try to get it into their premier phone/tablet ASAP. I would guess possibly in 2016, maybe, if we are lucky, by 2015.

Even the lowest end HMC could provide ~160 GB/sec of bandwidth. That would be "competitive" with the newest consoles.

But the manufacturing process (and power consumption) for the SoCs probably will probably continue to be an issue. Still, you could easily see something "close" to XB One in 2016.
 
Their demo video looks inferior to 2004's Far Cry. I'd guess this is actually well below 360 and PS3 except in some raw numbers that are impractical.
 
I disagree. The lighting, shadows and water effects are much better than original FarCry. The HDR looks kinda off though. They also demoed UE4 running on Logan at Siggraph.
 
I suggested that Mobile Kepler (ie. Kepler.M for Project Logan) could have 192 CUDA "cores" operating at up to ~ 1GHz several months ago. The surprising part is that Kepler.M reportedly has 3x better perf/watt compared to the Ipad 4 GPU, even though it is reportedly fabricated on a 28nm fabrication process and not a 20nm fabrication process.
 
I suggested that Mobile Kepler (ie. Kepler.M for Project Logan) could have 192 CUDA "cores" operating at up to ~ 1GHz several months ago. The surprising part is that Kepler.M reportedly has 3x better perf/watt compared to the Ipad 4 GPU, even though it is reportedly fabricated on a 28nm fabrication process and not a 20nm fabrication process.

It is? Where?
 
I disagree. The lighting, shadows and water effects are much better than original FarCry. The HDR looks kinda off though. They also demoed UE4 running on Logan at Siggraph.
Rendering techniques have improved considerably and so have feature sets. But that jungle island is rather sparse don't you think? This will probably be a giant leap for ultra low power GPUs but compared to 8800GTX? 8800GTX has 86 GB/s memory bandwidth, 14 Gpix/sec, 37 Gtex/sec. 128 1.35 GHz G80 ALUs vs. 192 ~500MHz Kepler ALUs is an interesting question too.

The other question is power consumption. The low end 384 ALU Kepler notebook parts are ~30W. How will they get 192 ALUs to work in Tegra 5 at ~2W or whatever.
 
Rendering techniques have improved considerably and so have feature sets. But that jungle island is rather sparse don't you think? This will probably be a giant leap for ultra low power GPUs but compared to 8800GTX? 8800GTX has 86 GB/s memory bandwidth, 14 Gpix/sec, 37 Gtex/sec. 128 1.35 GHz G80 ALUs vs. 192 ~500MHz Kepler ALUs is an interesting question too.

I think it is achievable with improvements over the years in process technology and power efficiency. This is from the Kepler whitepaper :

Similar to GK104 SMX units,the cores within the new GK110 SMX units use the primary GPU clock rather
than the 2x shader clock. Recall the 2x shader clock was introduced in the G80 Tesla-architecture GPU
and used in all subsequent Tesla and Fermi-architecture GPUs. Running execution units at a higher clock
rate allows a chip to achieve a given target throughput with fewer copies of the execution units, which is
essentially an area optimization, but the clocking logic for the faster cores is more power-hungry. For
Kepler, our priority was performance per watt. While we made many optimizations that benefitted both
area and power, we chose to optimize for power even at the expense of some added area cost, with a
larger number of processing cores running at the lower, less power-hungry GPU clock.

The other question is power consumption. The low end 384 ALU Kepler notebook parts are ~30W. How will they get 192 ALUs to work in Tegra 5 at ~2W or whatever.

One of the notable papers presented by Nvidia this year at 2013 IEEE International Solid-State Circuits Conference (ISSCC) was a 20 Gbit/s serial die-to-die link made in 28-nm CMOS. The link runs on a 0.9 V supply and has power efficiency of 0.54pJ/b. This is probably the 'new low-power interconnect' they have been talking about.
 
IIRC, This is GLBenchmark 2.7 where Logan achieved ~18 fps while Ipad4 was ~17 fps.

The other crazy thing is that Kepler.M "Logan" reportedly has ~ 5x higher performance (frames per second) than the ipad 4 GPU (presumably while still being useable in a thin fanless tablet).

So Tegra 4 "Wayne" comes to market much later than expected, but Tegra 5 "Logan" will probably come to market much earlier than expected.
 
Last edited by a moderator:
I'm suspicious about the power consumption, especially when it's pushing that 5x ipad4 performance level.
 
Performance/Watt can be misleading when performance/mm2 is substantially different (ie. if it's a much larger chip much lower clocked). Performance/watt/mm2 against PowerVR6 will be the big battle (assuming IMG and it's partners manage to get it out in time to compete).

Still, mobile devices which can run relatively straight XBOX360/PS3 ports will be interesting.
 
I'm suspicious about the power consumption, especially when it's pushing that 5x ipad4 performance level.
nVidia is saying they can reach iPad 4 performance levels at 1/3rd the power and that Logan's peak theoretical performance is 5x the the iPad 4's. Assuming linear performance/watt scaling, if Logan is going full tilt at 5x the iPad 4's performance it's power consumption will be 66% higher than the iPad 4's GPU. A less than linear performance/watt scaling would make it worse of course. The iPad 4 is considered big, thick, and heavy so the direction of future tablets will be smaller. In such a case, is Logan's peak performance and accompanying power consumption and thermal load going to be achievable and sustainable within the confines of a reasonably thin tablet with reasonable power consumption?

nVidia does claim that the Ira demo was done at less than 3W. If this is Logan operating at less than full tilt, the graphics that are achievable at this performance and power level are still impressive and it bodes well for the graphics that can be achieved at peak performance in a thicker tablet like a hybrid/convertible or a Shield 2. Whether they'll lead performance/watt we'll have to wait for Rogue and Adreno 4xx.

Anand does point out that they're not sure what else might be hanging off the GPU power rail on the iPad 4 so it's best not to focus too much on the specific iPad 4 power number. Logan's 900 mW figure at iPad 4 performance levels seems like the only solid figure.


Performance/Watt can be misleading when performance/mm2 is substantially different (ie. if it's a much larger chip much lower clocked). Performance/watt/mm2 against PowerVR6 will be the big battle (assuming IMG and it's partners manage to get it out in time to compete).
Imagination seems to be addressing both options with small area G6x00 designs and large area G6x30 designs. It'll be interesting to see which option is more popular.
 
Last edited by a moderator:
...

The other question is power consumption. The low end 384 ALU Kepler notebook parts are ~30W. How will they get 192 ALUs to work in Tegra 5 at ~2W or whatever.

Perf/W doesn't scale linearly, due to the non-linear relationship between Vdd and power. 1GHz requires probably 1.1V or higher for 28nm, while 400~500MHz needs only 0.8V or lower. Simple voltage and freq scaling will get about 5x power difference. There are other differences that could push it to the 7.5x (2SMX@30W vs 1SMX@2W) such as the removal of FP64, slightly different processes from TSMC (although both in 28nm), notebook's board design, etc.
 
I think it is achievable with improvements over the years in process technology and power efficiency. This is from the Kepler whitepaper :

One of the notable papers presented by Nvidia this year at 2013 IEEE International Solid-State Circuits Conference (ISSCC) was a 20 Gbit/s serial die-to-die link made in 28-nm CMOS. The link runs on a 0.9 V supply and has power efficiency of 0.54pJ/b. This is probably the 'new low-power interconnect' they have been talking about.

That's a die - to - die (two chips) link.

Logan is one chip.
 
Back
Top