What custom hardware features could benefit a console?

Shifty Geezer · Jan 13, 2013

patsu said:
If they intend to render to assorted buffers and then scale to OS screens (e.g., embedded in web browser, send to wireless screens, animated in window transition), it may be used more often.

Perhaps not just scaling, but generalized to handle some of DeanoC's needs ?

Maybe. Could a next-gen blitter be the device of choice here? Copy this block of RAM (backbuffer, icon, soundwave) to this bit of RAM with such-and-such interpolation. Although a good video upscaler needs be fairly complex with multiple samples around each pixel, where I'd expect a blitter type device to just use some linear interpolation.

MrFox · Jan 13, 2013

Gipsel said:
You mean something like a modern OoOE processor with a good branch prediction, a pipeline not too long, and some Bit Manipulation Instruction extension?

With magic added so it doesn't look like I'm reinventing the wheel :???:

I guess modern processors are already good at it, but I was thinking that there has to be a way to help compute a truckload of logical inferences. a way to reduce the amount of opcode versus data for decision models. Maybe some very wide instructions specifically for AI, I don't know if that would be possible.

almighty · Jan 13, 2013

A small GPU for GPGPU physics?

Davros · Jan 13, 2013

you mean a dedicated gpu along side the graphics gpu ?

Brad Grenz · Jan 13, 2013

Just make the gpu bigger so devs can do what they want.

ERP · Jan 14, 2013

Brad Grenz said:
Just make the gpu bigger so devs can do what they want.

The only exception to this is if you could get better GPGPU performance by adjusting the GPU design, to optimize for it.

DeanoC · Jan 14, 2013

Shifty Geezer said:
Maybe. Could a next-gen blitter be the device of choice here? Copy this block of RAM (backbuffer, icon, soundwave) to this bit of RAM with such-and-such interpolation. Although a good video upscaler needs be fairly complex with multiple samples around each pixel, where I'd expect a blitter type device to just use some linear interpolation.

Blitter + basic functions (multi tap fixed point interpolator, packer decoder, etc.) + decompression engine could be pretty useful for asset streaming. Make it easy and you've got a nice helper for streaming large worlds,
L1 = HW friendly format
L2 = In RAM Tightly packed format off disk
L3 = Disk
Disk controller + MMU HW page from L3 to L2 on demand, Blitter unpacks L2 to L1, HW uses L1
Could be done without much (any?) CPU interference in theory I guess...

AlNom · Jan 14, 2013

So... anyone (someone) have thoughts on a ringbus, particular for CPU/GPU (like Sandy Bridge etc)?

Wonder about cache coherency in such a setup w.r.t. GPU. Make the edram big enough to mitigate # of fetches/accesses on the external bus whilst not worrying about framebuffer tiling anymore? Might have gotten some ideas crossed there...

>_>
<_<

3dilettante · Jan 14, 2013

One thing to note is that AMD is not a heavy user of high-performance ring bus interconnects, although this may not be wholly up to AMD.

Another thing is that one of the big arguments for a ring bus made by Intel is that it can cost-effectively scale up and down between various SKUs and architecture variants, whereas a crossbar that AMD tends to favor (or settle on) requires redesign and revalidation for every adjustment.
However, why would a console design care about that kind of scalability? This decision did mean Intel left some performance on the table, but it could save resources that could improve other parts of the system instead.

For Cell, IBM indicated that if they had more time or an opportunity to improve things, a different interconnect would have been used.

Proelite · Jan 14, 2013

A Blitter sounds really really interesting. Perhaps it'll serve well for a system with such massive amount of relative slow memory.

nightshade · Jan 14, 2013

But ofcourse the PS4 needs to be built from secret alien tech that only Sony has.

Ethatron · Jan 14, 2013

Is this for real? You want Agnus and Paula back?

MrFox · Jan 14, 2013

It would have to be called Super Ultra Fat Agnus :smile:

Would it be better instead to have a very powerful I/O processor? An ASIC for de/compression, de/encryption, generate mipmaps on the fly when reading textures, it could manage data from Bluray , HDD, and network, it could also manage a Flash cache, wear leveraging for flash storage, remapping data to external storage.

Shifty Geezer · Jan 14, 2013

MrFox said:
It would have to be called Super Ultra Fat Agnus :smile:

Morbidly Obese Agnus.

anexanhume · Jan 14, 2013

MarkoIt said:
A fixed function that can assist voxel based rendering. For example, it could be an hardware raytracer accellerator, that can improve rendering times for Sparse Voxel Octree Global Illumination and Cone Raytracing. CausticRT 2 is said to improve raytracing on CPU of 2-5x, http://www.geek.com/articles/games/...ce-card-is-anything-but-conventional-2009076/.

On Nvidia slides about SVOGI they state that rendering times for 1080p on a GK104 are 62ms. An fixed function could make it playable.

This is what occurred to me. Carmack and Id have been talking about SVO techniques and possible inclusion in Id Tech 6. Could be interesting to have dedicated hardware for that going forward. It would certainly be interesting if Bethesda moved to Id's engines as well.

MarkoIt · Jan 15, 2013

anexanhume said:
This is what occurred to me. Carmack and Id have been talking about SVO techniques and possible inclusion in Id Tech 6. Could be interesting to have dedicated hardware for that going forward. It would certainly be interesting if Bethesda moved to Id's engines as well.

Look what i have found. It's a RPU, aka a Raytracing Processing Unit.
http://graphics.ethz.ch/teaching/former/seminar/handouts/Fierz_RPU.pdf
I'm speculating that Microsoft could have taken that design, improved it, standardized it for the next-generation of Direct3D API, making it more flexible, and it will be integrated in Durango.
A GPU+RPU for future hybrid rendering pipeline.

anexanhume · Jan 15, 2013

MarkoIt said:
Look what i have found. It's a RPU, aka a Raytracing Processing Unit.
http://graphics.ethz.ch/teaching/former/seminar/handouts/Fierz_RPU.pdf
I'm speculating that Microsoft could have taken that design, improved it, standardized it for the next-generation of Direct3D API, making it more flexible, and it will be integrated in Durango.
A GPU+RPU for future hybrid rendering pipeline.

Too many 'and-if's for me to buy that's what is happening. Is the industry ready for raytracing built into directX? Is the technique understood well enough to have a hardware implementation be relevant in 5 years? What if developers don't use it? How good is it at generic compute? Can it handle normal graphics workloads? Does the potential of it not being used still justify the die area?

Love_In_Rio · Jan 15, 2013

MarkoIt said:
Look what i have found. It's a RPU, aka a Raytracing Processing Unit.
http://graphics.ethz.ch/teaching/former/seminar/handouts/Fierz_RPU.pdf
I'm speculating that Microsoft could have taken that design, improved it, standardized it for the next-generation of Direct3D API, making it more flexible, and it will be integrated in Durango.
A GPU+RPU for future hybrid rendering pipeline.

That is the future of graphics!. It is similar to what i was speculating. A specialized piece of hardware to render the lighting would make 1.2 tflop for the other effects enough to consider the system behaviour like a 680gtx or better.I wish it was the real thing in Durango.

Shifty Geezer · Jan 15, 2013

anexanhume said:
Too many 'and-if's for me to buy that's what is happening. Is the industry ready for raytracing built into directX? Is the technique understood well enough to have a hardware implementation be relevant in 5 years? What if developers don't use it? How good is it at generic compute? Can it handle normal graphics workloads? Does the potential of it not being used still justify the die area?

A raytracing chip wouldn't just be for graphics. However, there's a whole discussion here that's bigger than the intention of this thread. I point you here for further research and discourse.

Love_In_Rio · Jan 15, 2013

Unreal Engine 4 uses a SVOGI algorithm (Sparse Voxel Octree Global Illumination). This technique was publicated in a thesis by Cyril Crassin currently in Nvidia I think. In his thesis presentation about this technique you can read:

Our solution is based on an adaptive hierarchical data representation depending on the current view and occlusion information, coupled to an efficient ray-casting rendering algorithm. We introduce a new GPU cache mechanism providing a very efficient paging of data in video memory and imple- mented as a very efficient data-parallel process. This cache is coupled with a data production pipeline able to dynamically load or produce voxel data directly on the GPU. One key element of our method is to guide data production and caching in video memory directly based on data requests and usage information emitted directly during rendering.

Sounds like something could be harwarized with ESRAM and a chunk of transistors.

More here:
http://maverick.inria.fr/Publications/2011/Cra11/

What custom hardware features could benefit a console?

Shifty Geezer

uber-Troll!

MrFox

Deludedly Fantastic

almighty

Davros

Brad Grenz

Philosopher & Poet

ERP

DeanoC

Trust me, I'm a renderer person!

AlNom

Moderator

3dilettante

Proelite

nightshade

Wookies love cookies!

Ethatron

MrFox

Deludedly Fantastic

Shifty Geezer

uber-Troll!

anexanhume

MarkoIt

anexanhume

Love_In_Rio

Shifty Geezer

uber-Troll!

Love_In_Rio

Similar threads