Broadcom GPU in Nokia N8

codedivine

Regular
Recently Forum Nokia had a good webinar about optimizing code for this GPU. From there I gathered the following info which most of you probably already know:

1. Unlike other handheld GPUs, Broadcom GPU does not share memory with the CPU. 32MB dedicated graphics RAM of which about 20MB is usable by applications so there is a lot of stress on using memory sparingly.

2. The driver actually runs on the GPU. The OpenGL ES calls on the CPU are nothing more than a shell doing RPC call to the actual driver sitting on the GPU. Even shader compilation happens on the GPU.

3. Sending or receiving data or commands between CPU and GPU can be a bottleneck so reduce communication as much as possible.

The webinar can be found here http://forumnokia.emea.acrobat.com/p80978161/ or here http://forumnokia.emea.acrobat.com/p12028868/
 
I was just about to start a thread about the Broadcom BCM2727.

Like you said (and it's in broadcom's own spec sheet), it has 32MB embedded memory which makes it a bandwidth monster, compared to all the other devices (Tegra2, SG540 and Adreno 205).
Looking at GLBenchmark 1.1 scores, it has 6x the swapbuffer speed of all the other top end devices. GLBenchmark 2.0 shows only a 2x performance advantage in swapbuffer, which could be because it's using more than 20MB and the GPU needs to resort to the main memory.

Plus, 20MB should be more than enough to fill that 640*360 resolution + compressed textures.


Moreover, the "multimedia processor" also has two dedicated vector processors which, unlike previous statements in this forum, are available to programmers and have even been used to accelerate flash content. (actually, both Broadcom and Adobe made a big fuss about it back in 2009).
I wonder if these vector processors can be used in game development for physics or sound processing to offload the comparatively weak CPU.


Given that every Symbian^3 smartphone is using this same hardware (680MHz ARM11 + BCM2727) across all devices, with some kind of marketing effort and developer support we could see a NGage 3.0 gaming platform, one that could finally be successful. One thing they'll have for sure is a huge user base.


The entire architecture (non-unified memory, separate CPU and GPU with kinda slow bus between them) seems a little bit like a step backwards, though. A 2-chip solution sounds a bit desktop-like. I wonder how energy-efficient this actually is.
 
Last edited by a moderator:
Is the Broadcom GPU a scanline renderer?

Between the modern fixed function processor of Pica, a TBDR, and a few other interesting IMRs with tiled memory access, the landscape for mobile graphics architectures is a diverse one.
 
Last edited by a moderator:
I was just about to start a thread about the Broadcom BCM2727.

Like you said (and it's in broadcom's own spec sheet), it has 32MB embedded memory which makes it a bandwidth monster, compared to all the other devices (Tegra2, SG540 and Adreno 205).
Looking at GLBenchmark 1.1 scores, it has 6x the swapbuffer speed of all the other top end devices. GLBenchmark 2.0 shows only a 2x performance advantage in swapbuffer, which could be because it's using more than 20MB and the GPU needs to resort to the main memory.

Plus, 20MB should be more than enough to fill that 640*360 resolution + compressed textures.

Swapbuffer speed has nothing to do with graphics performance it's just the rate at whch you can swap between front and back buffers, in the N8 case this is not VSync locked, the other platforms are (v.refresh of 56 or 60Hz typically).

You'll find that the likes of ID and Epic don't agree that 20MB is anywhere near enough memory!

John.
 
Like you said (and it's in broadcom's own spec sheet), it has 32MB embedded memory which makes it a bandwidth monster, compared to all the other devices (Tegra2, SG540 and Adreno 205).

It just says 'stacked SDRAM', which also accurately describes just about any other top-end SoC. If it was on-device eDRAM, then that would be interesting, but there is no indication that is the case.

Looking at GLBenchmark 1.1 scores, it has 6x the swapbuffer speed of all the other top end devices. GLBenchmark 2.0 shows only a 2x performance advantage in swapbuffer, which could be because it's using more than 20MB and the GPU needs to resort to the main memory.

The swapbuffer test can only tell you two things:
a) the device is vsynced
b) the device is not vsynced

(well, three: c) the device is unbelievably slow)

Plus, 20MB should be more than enough to fill that 640*360 resolution + compressed textures.

20MB may be enough for the working set at this screen size, but it isn't enough to hold all of the resources an app will load concurrently. The current top-end iOS software consumes >100MB of memory, of which a healthy fraction are probably textures. The success of this design decision will be decided by the ability to page across that bus as the working set changes without causing rendering hiccups.

Virtual Texturing approaches could also work very well for handling this limit.
 
Last edited by a moderator:
Sorry, I thought swapbuffer performance was an indicator of memory bandwidth.

So, any idea of those 32MB's bandwidth?
 
My guess is the bandwidth is 32-bit LPDDR1 which is nothing incredible but it's not shared with the CPU so I can't imagine it ever being a real bottleneck. The limited amount of memory is a much bigger problem as others have said.

FWIW, the BCM2727 is based on two Vec16 DSPs with HW accelerators. It's based on technology Broadcom acquired from the UK startup Alphamosaic (which was a spin-off of Cambridge Consultants). I found some more (limited) details here: http://www.curiouscat.org/Steve/
Launched in October 2007, Broadcom's third generation VideoCore product shattered three mobile multimedia records with a single device: 720p HD Video (H264 and MPEG4), 12 Megapixel Camera (144 Mpixels/s processing), and 3D Gaming (32 million triangles/s). In a break from the earlier designs, the BCM2727 makes extensive use of dedicated HW blocks to augment its programmable 2D vector cores (it contains two enhanced VideoCoreII engines). The HW blocks provide a pipelined data-flow structure to achieve industry leading power-performance figures, with the VideoCore engines providing the processing for the OpenGL 2.0 programmable shaders, the audio codecs, specialist ISP functions, and the middleware.

Role With a bigger team, my main contribution was ensuring that things came together smoothly, the design met requirements, and that nothing fell through the gaps. My primary focus was software and system architecture, however I also looked after the validation and verification environment, running sims, and building FPGAs so that we had something to develop our software on.
 
The Wii has 64MB for CPU and 24MB for GPU (+3MB dedicated framebuffer) for 720*480.
How come 32MB is that low for a GPU that handles 640*360?
Besides, the synthetic tests have been putting the BCM2727 GPU pretty much above everything else that's not CPU-dependant.

I know of Epic's demands for tremendous ammounts of RAM everywhere, but from what I could tell about John Carmack, he just complains a lot about not being able to use the mass memory as virtual memory. But that's an OS limitation, I don't know if Symbian allows it or not.
 
Nokia N8

Not aware about the GPU of nokia N8 but my friend has alienwear laptop and its awesome. He can install every games and can play even the games work fine.

In my Pc it stuck some times it all about the graphics card.
 
Back
Top