PCI-express -> better CPU / GPU load balancing when?

g__day

Regular
Anyone here have insights on when game developers may be able to better load balance a mis-matched CPU / GPU combination and what further s/w or h/w innovations such sophistication would require?

Looking at the latest 7800GTX benchmarks on a Athlon64 4000 its interesting to see how many games are CPU limited. So the obivious question is if a game engine detects a mismatched CPU / GPU arrangement, what can it do today to minimise the issue - and how much smarter can this capability be in the future?

Obivously having a bi-directional bus and probably WGF 2.0 would be hugely benefical here - but what else is needed and when do folk expect to see it materialise?
 
I would imagine that the rumoured introduction of multithread-capable drivers might be a good start.
 
I am still trying to reason why multi-threaded video drivers whould make any impact on a CPU limited title - other than a negative impact - you'd have more wait time at the GPU end if it could process its tasks even quicker that it did today.

The logic behind this thread is:

1. CPUs are getting 50% more powerful every 18 months, GPUs more then double in power in this timetable.

2. Today maybe a 1/3 to 1/2 of all games are CPU limited.

3. Adding the above two points means GPU gains are set to become increasingly more CPU limited as time goes on.

4. To change this situtation requires a change to 3d graphics programming and hardware.

5. Load balancing is asking how do we distribute the workload in any game / shader set across the power of available CPU and GPU within a target system.

6. A bi-directional bus and a future ability to send more of the CPU workload to a GPU - or vice versa should lead to better utilisation of detected hardware in any system.

This is complex but a must do for the future. I don't suggest a total change to the 3d pipeline, nor for GPUs to become more like CPUs. It would be nice if the 3d pipeline could be more transparently mapped to available CPU / GPU hardware resources and sub tasks scheduled dynamically with dynamic load balancing working out what goes where.

What do folk think?
 
If your suggesting offloading work on the GPU its not particularly likey. Number one it could take a long time to research the algorithims and potentially do manual modify long shaders for multiple passes ect to get something that is normally done on a cpu serially on a gpu in parrel. Number two is the problem in latency it takes a fair while to get the information back to the cpu.

GPU will only be faster at doing operations that happen in parrely GPU are dramatically slower at solving problems that occur in series.

Ultimatly the solution is either A) wait for new cpus to come out or B) remove a whole bunch of fancy features and end up making the physics,ai ect rather pathetic ( hey look at BF2 )
 
So its not as simple as asking what tasks could be done on either a CPU or GPU (e.g. possibly collision detection, certain physics interactions), then managing each as a queue and when you go to schedule jobs that could be done on either a CPU or GPU queue you schedule it to the least busy queue?
 
g__day said:
I am still trying to reason why multi-threaded video drivers whould make any impact on a CPU limited title - other than a negative impact - you'd have more wait time at the GPU end if it could process its tasks even quicker that it did today.

Because a large amount of CPU cycles are burned in the video driver. (Can easily be 20-30%)
Multi threading the video driver on multi-core CPUs can make the driver run parallel to the (likely single threaded) application logic.
So it has the possibility of speeding up existing applications.

Once applications start to become multithreaded themselves the MT video driver will provide no additional speed gain.
 
I have an old IBM server with dual pentium pro, raid SCSI and 160MB EDO ECC under windows 2k pro; I put a voodoo2 on it (and later, my voodoo1)

I wanted to try quake 3's SMP mode, but it's incompatible with 3dfx drivers.

So I found this : SMP MiniGL Layer Beta
http://www.denniskarlsson.com/smp/

How it works :

The basic idea is to divide work across two processors by letting the application (Quake3 in the case) run on one processor and letting the OpenGL driver run on a second processor. There is some overhead in queuing and synchronizing data, but if the benefit of running things in parallel outweighs the cost of the overhead, SMP rendering acceleration is possible. If the overhead is too high or if dividing the work at this point does not cause it to be divided very evenly, then it may cause lower performance. This means that this technology, by design, will only work properly in certain scenarios. Like Quake3's built in r_smp mode, this project cannot provide SMP rendering acceleration when video card fill rate rather than CPU speed becomes the limiting factor. Unlike Quake3's built in SMP mode, this layer doesn't require proper multiple thread/multiple rendering context support in the OpenGL driver (which some broken or incomplete OpenGL and MiniGL drivers do not provide).

I gained about 20% in timedemo. (a pentium pro 200 with single voodoo2 is HEAVILY cpu limited.). Was a bit disappointed as framerate still sucked, yet I was pleased to see any performance gain at all :)

I guess you may still try it if you want.
 
g__day said:
Say we ask my question another way; if GPUs get faster according to Moore's law cubed, whilst CPUs follow Moore's law, surely GPUs will become increasingly CPU bound unless something radical comes along to change the 3d pipeline paradigm, - so what might that something be?
Better shader support is about the only thing that could change. Rendering images to a screen is inherently a parrel problem as generally each pixel is reasonable independant of each other.
 
Your inference to "better shader support" just isn't clear to me, what do you mean and how do you see supporting shaders better solves what I understand is a CPU bound data generation and transfer bottleneck?

Please explain!

Also anyone have a reasonable count on how many new titles are CPU vs GPU bound at high resolution? I'd like to know if you plotted the top 20 titles each year for the last 3 years for CPU vs GPU bound at high what the trend line would look like!
 
g__day said:
Your inference to "better shader support" just isn't clear to me, what do you mean and how do you see supporting shaders better solves what I understand is a CPU bound data generation and transfer bottleneck?
Dedicating more the processing power to shaders which tend to run in a far more serial process rather then parrel. If you build it they will come so developers will use more power and we will stop ramping up the screen resolution which is a parrel operation. As such we moved close to a tradational serial process which of course is bound by the clock speed and IPC ratio which means more pipelines doesn't help.
 
g__day wrote:

Say we ask my question another way; if GPUs get faster according to Moore's law cubed, whilst CPUs follow Moore's law, surely GPUs will become increasingly CPU bound unless something radical comes along to change the 3d pipeline paradigm, - so what might that something be?

This is not an aswer but rather my POV on this.
First we "have" the AMD64 X2 that atleast in theory is twice as fast or could be in games but the software is not here yet.
The second is again software and shaderheavy titles that has to come to set the GPU at work.
So while the new titles come out we get a "balanced" scenario and meanwhile the DC cpus gets cheaper/faster and will have built up the marketshare of DC vs SC.

I guess/hope that the new consoles as based on this formula will help kickstart that transition with software.

[/b]
 
Back
Top