What custom hardware features could benefit a console?

If they intend to render to assorted buffers and then scale to OS screens (e.g., embedded in web browser, send to wireless screens, animated in window transition), it may be used more often.

Perhaps not just scaling, but generalized to handle some of DeanoC's needs ?
Maybe. Could a next-gen blitter be the device of choice here? Copy this block of RAM (backbuffer, icon, soundwave) to this bit of RAM with such-and-such interpolation. Although a good video upscaler needs be fairly complex with multiple samples around each pixel, where I'd expect a blitter type device to just use some linear interpolation.
 
You mean something like a modern OoOE processor with a good branch prediction, a pipeline not too long, and some Bit Manipulation Instruction extension?
With magic added so it doesn't look like I'm reinventing the wheel :???:

I guess modern processors are already good at it, but I was thinking that there has to be a way to help compute a truckload of logical inferences. a way to reduce the amount of opcode versus data for decision models. Maybe some very wide instructions specifically for AI, I don't know if that would be possible.
 
Maybe. Could a next-gen blitter be the device of choice here? Copy this block of RAM (backbuffer, icon, soundwave) to this bit of RAM with such-and-such interpolation. Although a good video upscaler needs be fairly complex with multiple samples around each pixel, where I'd expect a blitter type device to just use some linear interpolation.

Blitter + basic functions (multi tap fixed point interpolator, packer decoder, etc.) + decompression engine could be pretty useful for asset streaming. Make it easy and you've got a nice helper for streaming large worlds,
L1 = HW friendly format
L2 = In RAM Tightly packed format off disk
L3 = Disk
Disk controller + MMU HW page from L3 to L2 on demand, Blitter unpacks L2 to L1, HW uses L1
Could be done without much (any?) CPU interference in theory I guess...
 
So... anyone (someone) have thoughts on a ringbus, particular for CPU/GPU (like Sandy Bridge etc)? :p Wonder about cache coherency in such a setup w.r.t. GPU. Make the edram big enough to mitigate # of fetches/accesses on the external bus whilst not worrying about framebuffer tiling anymore? Might have gotten some ideas crossed there...

>_>
<_<
 
One thing to note is that AMD is not a heavy user of high-performance ring bus interconnects, although this may not be wholly up to AMD.

Another thing is that one of the big arguments for a ring bus made by Intel is that it can cost-effectively scale up and down between various SKUs and architecture variants, whereas a crossbar that AMD tends to favor (or settle on) requires redesign and revalidation for every adjustment.
However, why would a console design care about that kind of scalability? This decision did mean Intel left some performance on the table, but it could save resources that could improve other parts of the system instead.

For Cell, IBM indicated that if they had more time or an opportunity to improve things, a different interconnect would have been used.
 
A Blitter sounds really really interesting. Perhaps it'll serve well for a system with such massive amount of relative slow memory.
 
It would have to be called Super Ultra Fat Agnus :smile:

Would it be better instead to have a very powerful I/O processor? An ASIC for de/compression, de/encryption, generate mipmaps on the fly when reading textures, it could manage data from Bluray , HDD, and network, it could also manage a Flash cache, wear leveraging for flash storage, remapping data to external storage.
 
A fixed function that can assist voxel based rendering. For example, it could be an hardware raytracer accellerator, that can improve rendering times for Sparse Voxel Octree Global Illumination and Cone Raytracing. CausticRT 2 is said to improve raytracing on CPU of 2-5x, http://www.geek.com/articles/games/...ce-card-is-anything-but-conventional-2009076/.

On Nvidia slides about SVOGI they state that rendering times for 1080p on a GK104 are 62ms. An fixed function could make it playable.

This is what occurred to me. Carmack and Id have been talking about SVO techniques and possible inclusion in Id Tech 6. Could be interesting to have dedicated hardware for that going forward. It would certainly be interesting if Bethesda moved to Id's engines as well.
 
This is what occurred to me. Carmack and Id have been talking about SVO techniques and possible inclusion in Id Tech 6. Could be interesting to have dedicated hardware for that going forward. It would certainly be interesting if Bethesda moved to Id's engines as well.


Look what i have found. It's a RPU, aka a Raytracing Processing Unit.
http://graphics.ethz.ch/teaching/former/seminar/handouts/Fierz_RPU.pdf
I'm speculating that Microsoft could have taken that design, improved it, standardized it for the next-generation of Direct3D API, making it more flexible, and it will be integrated in Durango.
A GPU+RPU for future hybrid rendering pipeline.
 
Look what i have found. It's a RPU, aka a Raytracing Processing Unit.
http://graphics.ethz.ch/teaching/former/seminar/handouts/Fierz_RPU.pdf
I'm speculating that Microsoft could have taken that design, improved it, standardized it for the next-generation of Direct3D API, making it more flexible, and it will be integrated in Durango.
A GPU+RPU for future hybrid rendering pipeline.

Too many 'and-if's for me to buy that's what is happening. Is the industry ready for raytracing built into directX? Is the technique understood well enough to have a hardware implementation be relevant in 5 years? What if developers don't use it? How good is it at generic compute? Can it handle normal graphics workloads? Does the potential of it not being used still justify the die area?
 
Look what i have found. It's a RPU, aka a Raytracing Processing Unit.
http://graphics.ethz.ch/teaching/former/seminar/handouts/Fierz_RPU.pdf
I'm speculating that Microsoft could have taken that design, improved it, standardized it for the next-generation of Direct3D API, making it more flexible, and it will be integrated in Durango.
A GPU+RPU for future hybrid rendering pipeline.

That is the future of graphics!. It is similar to what i was speculating. A specialized piece of hardware to render the lighting would make 1.2 tflop for the other effects enough to consider the system behaviour like a 680gtx or better.I wish it was the real thing in Durango.
 
Too many 'and-if's for me to buy that's what is happening. Is the industry ready for raytracing built into directX? Is the technique understood well enough to have a hardware implementation be relevant in 5 years? What if developers don't use it? How good is it at generic compute? Can it handle normal graphics workloads? Does the potential of it not being used still justify the die area?
A raytracing chip wouldn't just be for graphics. However, there's a whole discussion here that's bigger than the intention of this thread. I point you here for further research and discourse.
 
Unreal Engine 4 uses a SVOGI algorithm (Sparse Voxel Octree Global Illumination). This technique was publicated in a thesis by Cyril Crassin currently in Nvidia I think. In his thesis presentation about this technique you can read:

Our solution is based on an adaptive hierarchical data representation depending on the current view and occlusion information, coupled to an efficient ray-casting rendering algorithm. We introduce a new GPU cache mechanism providing a very efficient paging of data in video memory and imple- mented as a very efficient data-parallel process. This cache is coupled with a data production pipeline able to dynamically load or produce voxel data directly on the GPU. One key element of our method is to guide data production and caching in video memory directly based on data requests and usage information emitted directly during rendering.

Sounds like something could be harwarized with ESRAM and a chunk of transistors.

More here:
http://maverick.inria.fr/Publications/2011/Cra11/
 
Last edited by a moderator:
Back
Top