AMD: Volcanic Islands R1100/1200 (8***/9*** series) Speculation/ Rumour Thread

Discussion in 'Architecture and Products' started by Nemo, May 7, 2013.

Tags:
  1. itsmydamnation

    Veteran Regular

    Joined:
    Apr 29, 2007
    Messages:
    1,296
    Likes Received:
    395
    Location:
    Australia
    i am hoping for a big memory OC seeing that some R* gpu's have 6.5gbps memory.
     
  2. LordEC911

    Regular

    Joined:
    Nov 25, 2007
    Messages:
    788
    Likes Received:
    74
    Location:
    'Zona
    One of the R9 290s listed on Newegg had a 6Ghz memory speed in the specs...
    It was the Gigabyte.
     
  3. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    So is the pixel-to-shader mapping static for AMD? With the rasterizer feeding fragments to statically assigned shaders?
     
  4. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    The L2 was scaled by 33%, matching compute increase by 33%.

    We are not going to see more cache until the compute workloads start to get a bit more complex. And even then, I am sceptical. LRB or even KC is too far from kepler/gcn. The latter are still too optimized for massively parallel workloads.
     
  5. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Absolutely. L2, compute scaled by by a middling 30%. Most of the work here seems to have gone into the frontend, geometry and ROPs. Frontend, probably because it is cheap and after the consoles they had the IP lying around. Geometry will help for the 4K and ROPs seem made for 4K.
     
  6. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,435
    Likes Received:
    263
    So you think Nvidia doesn't tie its ROPs to specific pixels?

    What makes you think AMD requires more on chip space than Nvidia?

    I don't understand your second paragraph. Which wavefronts are you saying should be post-tessellated vertices? I assume you're referring to DS waves as that's what you're describing, but I'm not sure of the link between your last sentence and the rest of the paragraph.
     
  7. Shtal

    Veteran

    Joined:
    Jun 3, 2005
    Messages:
    1,344
    Likes Received:
    3
  8. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,172
    Location:
    La-la land
    How could it be the 11th when you posted your link on the 13th? :)
     
  9. Lightman

    Veteran Subscriber

    Joined:
    Jun 9, 2008
    Messages:
    1,802
    Likes Received:
    473
    Location:
    Torquay, UK
    Quantum mechanics at work? :)

    2 more days to go for us living in normal world.
     
  10. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    So that would be fine.
     
  11. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    No, simple round-robin, as long as the compute unit has space. I'm not sure if that's what you're asking though. The mapping from pixel to wavefront index (work item ID within a wavefront) is static, because hierarchical-Z maps its tile hierarchy to render target pixels statically.

    Yes, console compute is going to take a while to kick in. Consoles are "stuck at 1920x1080" so there'll prolly be a relatively rapid climb in interest in graphics-compute, but games have 1-5 year+ development cycles... On the other hand, there will be compute on the console GPUs which may be left as CPU compute on games when they are transferred to PC.

    When you say frontend, are you referring to the increase in ACEs? That should be compute friendly (e.g. in "guaranteeing" response times for certain compute tasks), but again that's going to take a while.
     
  12. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    Titan has 5 rasterisers and 48 ROPs.

    I've realised I've been sloppy and should have been referring to fragments. Fragments in AMD are locked to ROPs. I don't see anything like that in NVidia.

    I suppose it's possible NVidia has a fixed tiling of ROPs to render target pixels, but the hierarchy of rasterisation, render back end, L1, memory crossbar, L2 and memory channels doesn't seem to require that.

    NVidia's implementation of hierarchical-Z could be a factor here, fixing certain things. So, maybe I'm missing something there.
     
  13. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    Rasterizers should be locked to specific pixels (a set of render target tiles) and ROPs are also locked to specific pixels (a set of render target tiles, ROPs are tied to memory channels in nV's architecture, so a ROP can only access a subset of the render target anyway; even on Tahiti the crossbar between ROPs and memory controllers is not complete, meaning the same should be true to some extent). But nV doesn't have to use the same sets for front and back end pixel processing (resulting in some interleaving scheme distributing the load). ;)
     
  14. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Well that's a given. You can't avoid that.

    By FF you mean fixed function? Actually, I'd be even happier with that, but I don't think AMD did FF. I think AMD made a few small tweaks to the geometry shader so that tessellation could be done with shader code.

    The ideal sol'n, AFAICS, is to have a FIFO with patch tesselation factors (one wavefront is more than enough plenty) and params, and then some FF-tessellator logic (which should be very tiny, given the limited data paths needed) generates barycentric coords at a rate of 1-4 per clock. Those coords stuff a new wavefront and the domain shader proceeds from there.

    Alternatively, and what I suggested in my previous post, is that there should be some FF logic that just generates simple indices for each vertex to be created by the tessellator, , stuffs them into wavefronts, and a shader generates the barycentric coords (possibly even at the front of the domain shader itself).

    But when we see bad scaling with tessellation factor
    [​IMG]
    and talk about off chip buffer support for high tessellation levels, it's clear to me that AMD is doing something wrong.

    There is no need for more than a token amount of buffering space. I think they are generating all triangles from a set of control points, and then doing the domain shader after they are generated. That's the only explanation I can think of for needing off-chip storage.

    See the graph above.

    Okay, I probably didn't describe it well.

    Suppose you have a hull shader wavefront with 64 tripatches (I don't know if this figure is correct, but let's assume so). I think AMD processes tessellation in parallel, so if they all had a factor of 4 (24 tris per patch, 19 verts), we'll get 1536 tris. For higher factors, this number gets out of control, so AMD dumps the generated tessellation (uv pairs to become verts via the DS) to RAM.

    What I think they should be doing is just buffering the tesselation factors for the 64 tripatches, and then have a FF unit go through the patches one by one, i.e patch #1 has 19 verts, patch #2 has 19 verts, etc, and then put together a domain shader wavefront of 64 verts (19 from each of patches #1-3, 7 from patch #4). This wavefront will either have barycentric coords calculated by a FF unit or simply indices so that the shader can calculate the coords. It will also generate an index buffer (alternatively, you could use more verts per patch and do implicit tristrips). Now you don't need to store 1536 tris.

    I believe NVidia is doing something of this sort. Their polygon throughput is constant with tesselation factor.
     
    #1234 Mintmaster, Oct 13, 2013
    Last edited by a moderator: Oct 13, 2013
  15. Albuquerque

    Albuquerque Red-headed step child
    Veteran

    Joined:
    Jun 17, 2004
    Messages:
    3,845
    Likes Received:
    329
    Location:
    35.1415,-90.056
    Whatever picture you tried to post didn't 'work, you instead got a "no deeplinking please!" placeholder.
     
  16. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Thanks. I mirrored it.
     
  17. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    If the pixel to wavefront ID is mapped statically, then how do they fill the wavefronts fully? A triangle is quite likely to not fill a wavefront fully and multiple triangles will generate a lot of superfluous fragments along the edges due to quad shading. Those superfluous fragments will have to fill the next wavefront, which won't have fragments from the bulk of triangle to fill up.

    Well, tbh, with the single threaded graphics dispatch, they aren't going to do any good. May be that's where Mantle will help. True parallel submission.
     
  18. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    If ROPs sit on the other side of the shader-memory cross bar, then ROPs have to be statically tiled. At least when ROPs are a multiple of the memory channels. So, I would think they have fixed tiling on nv but not on AMD, which has ROPs on the shader side.
     
  19. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    I never got a full answer on this topic, but I think up to 4 triangles each of up to 16 fragments can share a wavefront, or combinations thereof (on the basis that the rasteriser has a granularity of 16 fragments, and they are derived from a single triangle, per clock).

    The edges is a question I don't know how to answer.

    Arguably the simplest solution is to say that my 1:1 mapping from earlier is wrong.
     
  20. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,435
    Likes Received:
    263
    Each pixel wave can be made up of as many as 16 triangles. The smallest granularity is a quad.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...