What custom hardware features could benefit a console?

Its interesting, what with all the talk about these specialized units on the Durango. I would have thought that we would be moving more and more into a situation whereby these systems have enough compute power to adapt to whatever new technique and rendering method a developer wants to do instead of relying on fixed function hardware, especially with reference to the way Tim Sweeney and John Carmack are predicting the move to software rasterizers and the likes. It seems that, because of the rising power draw of modern CPUs and GPUs, the console manufacturer-at least Microsoft- are designing these upcoming consoles with specialized units in order to help or accelerate graphics and other makeup of a console compute ability. I guess we just have to wait to get the details of what these units are, and what the can do.

Anyway, judging by the way they (Microsoft and ATi) designed the Xbox 360, especially the design of Xenos itself, I expect Durango to be a very interesting system, especially given that the CPU and GPU are both designed from AMD in collaboration with Microsoft. I expect them to do the best they can given the power, space, size and financial budget they have to work with.
 
I don't understand why audio takes so much processing power, is that just due to decoding multiple audio tracks simultaneously?
Or perhaps applying echo, occlusion etc to sounds in different environments? (eg cave vs corridor vs open space)
 
I don't understand why audio takes so much processing power, is that just due to decoding multiple audio tracks simultaneously?
Or perhaps applying echo, occlusion etc to sounds in different environments? (eg cave vs corridor vs open space)
Decompressing 50+ compression streams, mixing and EQ them in frequency space, applying various DSP, wet/dry mixes etc for 8 channels.
And that without any 3D effects like occlusion.

HQ audio can cost quite a bit if done well.
 
A Copperlist? :LOL:

I like the idea of a Blitter. Maybe it could be useful to have a blitter with advanced frame buffer functions instead of just the bitwise stuff... being aware of pixel formats, a nice bicubic scaler, the src could be a different resolution and pixel format than the dest, and could even go through a 3D LUT, texture decompression, alpha blending, HDR stuff, or even anti-aliasing. Today a GPU can do all this anyway, but I think a separate Blitter would have a very small footprint, and let the GPU do what it's best at. All of these operations would be wasting the GPU otherwise, because they are too simple and would be limited by the memory I/O.

I'd love to see a shader blitter for want of a better word, basically a systolic array accessed like a blitter that can process data as it moved around. Then make a GPU just a fast triangle rasterisor/interpolator and let the shader blitter do all the magic. GPU designs are just too boring these days ;)

And yep a copper would still be high up my list :D
 
Hell i'd be happy if all these "secret sauces" gave somewhere close to 2 TF from a 1 TF setup.
That's not the topic. I'll just explain though that counting flops is inaccurate. If it takes one processor 16 floating point operations to achieve what another processor can manage in 2, the larger processor being 5x more powerful than the smaller would result in lower performance (for that operation). If we count Xenon as 115 Gflops, and audio as consuming one core, that makes the audio equivalent to ~40 Gflops of resources it takes from the CPU. If a few Gflop DSP can handle all the audio, it's worth ~40 Gflops to the system.

The overall performance of a system is its ability to do work. Flops is only a measure of one performance metric. Proper benchmarking evaluates many different aspects, and some are hard to measure (like audio).

This thread is looking at ways to increase the system work rate without increasing pure programmable flops.
 
I'd love to see a shader blitter for want of a better word, basically a systolic array accessed like a blitter that can process data as it moved around.
How is memory moved around in a modern engine? I'm completely clueless on this and imagine the RAM to be pretty static beyond CPU/GPU read/writes processing assets.
 
A special CPU core that would be optimized for extremely branchy code and bit operations. Something ideal for AI and physics?
 
How is memory moved around in a modern engine? I'm completely clueless on this and imagine the RAM to be pretty static beyond CPU/GPU read/writes processing assets.

In theory, moving chunks of memory is a fairly rare thing, in practise its massive as it difficult often to process things in-place (i.e. combining arrays into a third, then switching src/dest pointer for next frame) with standard completely generic libraries.

So having a clever blitter/DMA type unit might be useful. Could be programmed as custom decompression unit (switching routine based in format), updater (agent or particle logic), audio mixer, as well as the more traditional blitter and graphic unit.
 
Thinking about assists to human interface input/outputs the thought of speech recognition came up. How much computation is needed to achieve 99.9% recognition? Is that computation easily accelerated with fixed-function HW? Is it needed? It could be a nice gimmick.
 
Maybe we could see some new CPU instructions that allow execution flow to break out of the CPU pipeline and transfer to the compute units of the GPU, then go back to the CPU if required.

You can do something quite similar with mwait/monitor in a unified CPU-GPU memory space. On a console you may provide an API instead of allowing it to be executable in user-mode.
 
Normal interpolation and filtering. Interpolation is already done with the unified shaders, but on a MS-box you could add additional semantics to HLSL to request it, though it's quite complex (technically this could be quaternion interpolation to simplify things) and I think hardware would be faster. Filtering could also be done manually, but fixed function blocks for that would be faster. I heard that on the XBox 3Dc (aka ATI2 aka BC5) has 16bit interpolated filtering.
 
Last edited by a moderator:
Hardware scaler to 4K resolutions.
We already have 4K projectors and TV's so this feature is almost guaranteed.
One thing I wonder is how much latency this type of chip can add as input latency for me is quite important with TV's already stupidly laggy even in game modes.

Oh and I like Blitter and Copper ideas, it seems Amiga did some things right :)
 
Every 4k display is going to have 4k upscaling. I don't see the point in adding hardware to 100% of machines that'll be used by only 2-5% of users.
 
Every 4k display is going to have 4k upscaling. I don't see the point in adding hardware to 100% of machines that'll be used by only 2-5% of users.

Like upscale in 1080p when PS360 arrived, usefull for 5% of users, 5 years after for 30% and now more than 60%? ;)
 
When 360 was launched, HD displays weren't very standardised so a scaler made some sense. Now every display upscales; AFAIK on PS3 virtually no games upscale to 1080p. the only advantage to including a scaler in hardware is to ensure a quality standard, where you could ensure it's low latency and high quality. Alternatively forget about it and let the 4k displays handle that. (Likely 4k adoption is a whole other discussion)

There are certainly far more useful places to focus one's engineering talent on hardware optimisations.
 
A special CPU core that would be optimized for extremely branchy code and bit operations. Something ideal for AI and physics?
You mean something like a modern OoOE processor with a good branch prediction, a pipeline not too long, and some Bit Manipulation Instruction extension?
 
When 360 was launched, HD displays weren't very standardised so a scaler made some sense. Now every display upscales; AFAIK on PS3 virtually no games upscale to 1080p. the only advantage to including a scaler in hardware is to ensure a quality standard, where you could ensure it's low latency and high quality. Alternatively forget about it and let the 4k displays handle that. (Likely 4k adoption is a whole other discussion)

There are certainly far more useful places to focus one's engineering talent on hardware optimisations.

If they intend to render to assorted buffers and then scale to OS screens (e.g., embedded in web browser, send to wireless screens, animated in window transition), it may be used more often.

Perhaps not just scaling, but generalized to handle some of DeanoC's needs ?
 
Well... all the games that won't run at native 1080P will need to be scaled. And most, if not all, TV scalers that don't cost you an arm and a leg, suck. So yeah, a good upscaler should be included, if anything. Not to speak of 4k, either.
 
Back
Top