What custom hardware features could benefit a console?

Discussion in 'Console Technology' started by Shifty Geezer, Jan 12, 2013.

  1. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    44,106
    Likes Received:
    16,898
    Location:
    Under my bridge
    So when you have a sprite in RAM and want it in the framebuffer (in a different part of RAM), you...do...what exactly, if not copy the data across?

    TVs don't use scanlines any more, so how is the Copper relevant? I didn't bother including the copper in my list of possible valuable features because it doesn't fit anywhere in the 3D pipeline, or even 2D pipeline now. We can easily generate rainbow backgrounds without loading a new value every scanline (which don't exist any more).
     
  2. Love_In_Rio

    Veteran

    Joined:
    Apr 21, 2004
    Messages:
    1,627
    Likes Received:
    226
    ey!. Do you realize Amiga was the best custom hardware in history?. And it had also the coolest names for its chips!. Where are those engineers?.
     
  3. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    948
    Likes Received:
    417
    I think it's very difficult to design a perfectly balanced chip. It would help, you can make it fast(er), but there remains the question if the effort is worth it. If you take the DirectX-pipeline as a rough model, then this would basically cut the chip in two - between traditional shader pipeline and compute shaders. I believe not much developers would be happy with a more restricted compute-only chip and a general-purpose chip without compute. That both functionalities are very near to each other makes it attractive, both share the same resources, ALUs, caches, etc.
    I don't think the paradigm to use one chip for all is really a problem.
    I've implemented a software sphere-tracer a while ago, and it's execution time was completely dominated by z-buffer clear (30%). If in Microsofts shoes I wouldn't look at general and broad performance issues, those will go away with new generations, or with brute force designs, scaling everything up until enough. I'd look more for hot-spots like the z-buffer clear, or things that don't scale well. That is not a beauty-thing, like AA-bandwidth, or slow anisotropic filtering, or something, you can live without that. But not without re-normalization of filtered normals fe., it's a requirement. I don't have a statistic of hot-spots over all game-engines to make a good guess what that may be, but AMD surely has the data, and Microsoft can elect the hot-spots it doesn't want and could make them put special hardware to solve them.
    I also expect MS not to go with anything which strays away from the DirectX-schemes. A ray-tracing chip would surprise me a lot. A chip related to shadow-mapping not so much.

    You'd replace a pointer if you could.

    The Copper was an autonomous co-processor able to program another co-processor on-the-fly. On the XBox 360 you can prepare execution-"lists" and then let the GPU go over it. But you had to use the CPU for that, and the lists where in GPU-code. A Copper equivalent would have it's own dialect, dedicated to produce those lists itself, and to initiate their dispatch ... with a completely dead CPU.
     
    #83 Ethatron, Jan 17, 2013
    Last edited by a moderator: Jan 17, 2013
  4. ebola

    Newcomer

    Joined:
    Dec 13, 2006
    Messages:
    99
    Likes Received:
    0
    (as ethatron says)
    Copper was a *command list* processor. it just happened to be reset every videoframe hence the scanline effects. the copper could be used to drive the blitter asynchronously.
    of course GPU's all have this already.

    heh. straight clear is something Blitter could do :)
    maybe they just mean they have the abillty to transfer buffers around asynchronously better.(better efficiency for clears, resolves etc)


    Didn't the nintendo 3DS have a custom gpu which was designed by looking at todays popular shaders and implementing them in hardware - i'd imagine aa post-process unit doing just that. specific acceleration for MLAA, to encourage this bandwidth-saving technique.
    i know microsoft originally wanted 4xMSAA to be mandatory ("this is how its wired up, this is how to use the EDRAM to best effect..").. so they were not adverse to wiring up their hardware to suit a specific use case
     
  5. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    44,106
    Likes Received:
    16,898
    Location:
    Under my bridge
    The framebuffer's a 2D bitmap, not a list of pointers to objects. The only way to get a bitmap in there is to write the image data, and if that image data is based on a preloaded graphic, that constitutes a copy.

    Yeah, it was, but I still don't see what it brings to the table of a new console. It was mostly used for syncing with the scanline. I don't see the value in poking GPU instructions outside of the CPU, especially when GPU instructions consist of long shader programs and cached data and sudden changes aren't good for them. We need preemptive GPU architectures for that sort of thing..
     
  6. MrFox

    MrFox Deludedly Fantastic
    Legend

    Joined:
    Jan 7, 2012
    Messages:
    6,488
    Likes Received:
    5,996
    What I liked about the Copper was that there wasn't any intermediate buffer, it would be like a GPU that calculates and renders each line on the fly, and send it immediately to the hdmi without any frame buffer. The frame lag was basically 0ms which is impossible to do even today. You can feel it when comparing an Amiga emulator to the Real Thing with a CRT. There's something weirdly snappy on it which cannot be reproduced even the latest PC a million times faster.

    I don't know if they could implement at least the post-processing effects in such an "in-line" way, having GPU cores that can do the 2D effects at "wire-speed" while sending the data through HDMI. We'd save one frame of lag.
     
  7. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    44,106
    Likes Received:
    16,898
    Location:
    Under my bridge
    I don't know how HDMI works to know if to know if partial data could be steamed, but most (all?) games are using full screen effects anyway (blur/bloom) that need a complete framebuffer to work on. There's no obvious purpose to a direct video injection into the video out.
     
  8. Cyan

    Cyan orange
    Legend

    Joined:
    Apr 24, 2007
    Messages:
    9,734
    Likes Received:
    3,460
    Me too, nice videos btw, thanks for sharing.

    I was pretty disappointed about the fact that the PS3 and the X360 didn't have something like a DSP. Prior to them ALL the consoles had a dedicated chip just for sound!!

    I always loved the capabilities of specialized chips for audio. As bkillian already pointed out is a serious overhead for a CPU which could expend those cycles in more useful things.

    All I am saying is that I think it is unnecessary to use the CPU for such important tasks when you can use dedicated hardware, thus alleviating the CPU and decreasing processing times.
     
  9. bkilian

    Veteran

    Joined:
    Apr 22, 2006
    Messages:
    1,539
    Likes Received:
    3
    Especially since CPUs arent really getting any faster lately, just more efficient. Good audio requires heavy use of FP math, so it can basically tie up the entire SSE unit, especially if you're doing anything interesting. One car game maker wanted a hundred voices per vehicle, plus DSP effects, compression, 3d positioning, etc. It would basically have taken an entire 360 to pull off their wishlist for audio alone.
     
  10. rekator

    Regular

    Joined:
    Dec 21, 2006
    Messages:
    793
    Likes Received:
    30
    Location:
    France
    And Audio need very good latency and synchronization (from my old memories), so not really easy with Multi-threaded Engine, I'm thinking? So a specific chip for audio solve the problems.
     
  11. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    948
    Likes Received:
    417
    Well, you have half of it. Copper could wait on external events to initate something/progress in it's program. That's directly equivalent to the earlier mwait/monitor.
    The problem is that GPUs can (apparently) not dispatch code to themself. That's why we have the strange constructs of compute-shader chains passing results of earlier stages to later stages, via "passthrough" compute shaders, not only because the shader can't be automatically started on an event, but also because the communication-channel isn't changeable by the GPU itself (the GPU can't rewrite a shader to pass a variable in a buffer instead of a constant, by itself).
    I say apparently, because I've not yet seen/read anything which indicates a GPU can control itself, (re)write programs for itself etc. Even though I could write a shader which writes out GPU-ISA into a texture, I can't feed it as a program to the GPU from within that same shader.

    A "Copper" could rewrite camera-matrices (manipulation of a shader's constant buffer) based on listening to a USB-port, without the CPU. I suspect a simple "Copper" would already be so capable that it probably could compile HLSL-assembler to GPU-ISA, or re-optimize GPU-ISA when a extern variable becomes a constant. This is just relative, today 100k transistors isn't very much, and cheap.
    It doesn't really need to have caches or a real complex memory-controller, as we are talking about possibly a 500kB working set. It only needs the appropriate connectivity to i/o, to the GPU and the event-producers, maybe to the L2/L3 cache of the CPU.

    Well, no. The solution to this was (it has been solved long ago in hardware): not to clear at all. The z-buffer is often hierarchical, or at least has a minimum tile resolution, and each tile is represented by a bit in the GPU-internal z-buffer map (that map also holds the compressed z-buffer information). That bit indicates if a memory region is cleared or not. The memory isn't even touched. :)

    You agree that the fastest copy is: not to copy. Right? We also agree that in composing the Windows display-surface we actually never copy (as in duplicate) anything, no fonts, no rectanges, no fills etc. but that a source-pixel (as a pointer) or an abstract description of a display-element (most elements are procedual now) enters a transforming function and then is written slightly changed to the display-surface. Correct?
    That's the thing I wanted to remind of. A data-duplicating blitter IMHO is really useless, we have no UIs anymore which consist only of identical repeated elements. A non-programmable blitter is also useless because display-composition is so complex now that you can not gain anything by just accelerating a tiny fraction of the utilized composition-methods.
    A special data-transforming programmable "blitter" is unnecessary if a GPU is present.
     
    #91 Ethatron, Jan 18, 2013
    Last edited by a moderator: Jan 18, 2013
  12. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    44,106
    Likes Received:
    16,898
    Location:
    Under my bridge
    I don't know. That's why I was asking. It is still needed to copy 2D bitmaps from RAM to framebuffer in 2D games, but that's a job I've already discounted as being eminently doable on CPU and GPU. On a split RAM pool too, you'll need to copy data from one to t'other. AFAIK there's no other general moving of memory around, but a couple of the devs here have suggested otherwise. Without an understanding of the low-level functions within a game engine, I'm pretty clueless on this one! :D
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...