AMD: Volcanic Islands R1100/1200 (8***/9*** series) Speculation/ Rumour Thread

Discussion in 'Architecture and Products' started by Nemo, May 7, 2013.

Tags:
  1. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,430
    Likes Received:
    433
    Location:
    New York
    The 280X already does a fine job against the 770. It would be silly for AMD to think that 7xx prices wouldn't adjust to the new competitive reality.
     
  2. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    I've only now started to read the R290X reviewer, starting out with the one from Anandtech. Ryan Smith writes the following:
    I'm confused: shouldn't an SE of AMD correspond to a GPC of Nvidia? And CU with an SMX?

    Nvidia can cut down the number of SMX'en per GPC, while leaving all GPCs active. So this paragraph doesn't make sense at all, IMHO?

    Am I missing something?
     
  3. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    An SMX comes with its own polymorph engine, so there are certain classes of logic that get subdivided differently.
    At an vector engine level, I would say the ALU resources of an SMX most resemble a group of CUs sharing the same scalar and instruction cache.
    However, the sharing of the register file, texturing, and shared memory don't make it a very strong pattern.

    I'm curious about whether someone will come out and explain if the "new" shader engine hierarchy of Hawaii differs from Tahiti's shader engines, which were discussed in this thread before the big reveal of the 290X.
     
  4. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    Got it: I didn't realize that SMX also had this polymorph business going on.

    Is it fair to say that, for AMD and R290X, all geometry handling is split into 4 blocks, while for Nvidia and GK110, there's an additional level hierarchy in that you have 5 GPCs and 15 polymorph engines?
     
  5. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    I'm thinking there's a rough split between primitive setup and generation and rasterization.

    The rasterizer portion subdivides screen space, and this leaves a GPC and SE similarly delineated.

    The portion that belongs in the geometry and primitive blocks is more numerous for Nvidia, but it doesn't conflict with the GPC arrangement.
    AMD has kept the same count, but the geometry processors have arrows that feed their output to rasterizers in other SEs, just as the polymorph engines need to distribute to other GPCs.

    So it seems to be a similar hierarchy, except that Nvidia has more units at the setup level. The raster portion has fewer blocks in Hawaii, but they seem to be heftier given the raw pixel throughput of the design.
     
  6. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,489
    Likes Received:
    400
    Location:
    Varna, Bulgaria
    All the geometry processing done by NV is distributed at multiprocessor level. The primitive setup stage distribution is more coarse, grouping several multiprocessors together (GF100 - 4, GK104 - 2, GK110 - 3).

    AMD is still using the more conventional "monolithic" front-end approach, by replicating the whole block to distribute the workload.
     
  7. flopper

    Newcomer

    Joined:
    Nov 10, 2006
    Messages:
    150
    Likes Received:
    6
    290 vs the 770?
    Hardocp tells that the 7970 beats the 770.
    so I would say the 290 is meant to sit in between the 770 and 780.
    New drivers down the line is likely to increase output a lot

    my 7970 with BF4 sits at 100fps, runs better than Bf3 and Mantle isnt even out yet.
    for a card made so long ago its a beast.
     
  8. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,430
    Likes Received:
    433
    Location:
    New York
    It's easier for me to look at it as 1 CU = 1 SMX. Multiple CUs share an L1 cache but otherwise each CU is self contained and indivisible, just like an SMX.

    Both the SMX and CU are the smallest units of computation on their respective architectures.
     
  9. Psycho

    Regular

    Joined:
    Jun 7, 2008
    Messages:
    745
    Likes Received:
    39
    Location:
    Copenhagen
    The official picture (actually a rendering) of 290 had 6+6 as the only visible difference to the 290x. (Can no longer find them on the redesigned amd.com, but they were there).
     
  10. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    I'm still disappointed in AMD's tessellation implementation.
    http://techreport.com/review/25509/amd-radeon-r9-290x-graphics-card-reviewed/6
    Multiply NVidia's score by the tests' tessellation factors, and you get a roughly constant number, i.e. tris/s is constant. AMD, OTOH, loses throughput with t-factor. Hawaii didn't fix anything.

    There's no need for off-chip buffering, no matter how high the t-factors. It's a really lazy algorithm.
     
  11. UniversalTruth

    Veteran

    Joined:
    Sep 5, 2010
    Messages:
    1,747
    Likes Received:
    22
    A couple of potential explanations come to mind. One, TessMark uses OpenGL, and it's possible AMD hasn't updated its OpenGL drivers to take full advantage of Hawaii's quad geometry engines. Two, the drivers could be fine, and we could be seeing an architectural limitation of the Hawaii chip. As I noted earlier, large amounts of geometry amplification tend to cause data flow problems. It's possible the 290X is hitting some internal bandwidth barrier at the x32 and x64 tessellation levels that's common to GCN-based architectures. I've asked AMD to comment on these results but haven't heard back yet. I'll update this text if I find out more.

    :???::???::???: ???

    http://techreport.com/review/25509/amd-radeon-r9-290x-graphics-card-reviewed/6
     
  12. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,435
    Likes Received:
    263
    I think of a AMD SE being closest to an Nvidia GPC. A difference with regard to geometry processing being Nvidia has an pre-cull rate that's independent of the post cull rate.

    You're getting caught up in the off-chip buffering statement and not understanding what it means. Grant it that no one has explained it. I'm not at liberty to go into much detail, but replace "off-chip buffering" with "cached memory." If you process a HS on CU0 you don't want all of the DS verts for that patch to execute on the same CU so the output data needs to go to a location up the memory hierarchy. On AMD hardware that's cached memory that could get flushed off chip.

    Nvidia's SMX's have more compute per LDS than an AMD CU, but I suspect they can write HS output to the L2 as well.
     
  13. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    If you design the tesselation unit properly, there's no need to generate all the DS verts at once. The hardware tessellator should have access to patch data (maybe a few wavefronts so that the case of zero amplification doesn't cause excessive slowdown) and create wavefronts of DS verts as they're needed.

    Why should any DS verts have to go off chip? AMD doesn't even need the tesselator to do anything but put a couple indices in each DS vert, and a shader program (inserted into the DS) can calculate the barycentric coords from there.
     
    #1733 Mintmaster, Oct 31, 2013
    Last edited by a moderator: Oct 31, 2013
  14. lanek

    Veteran

    Joined:
    Mar 7, 2012
    Messages:
    2,469
    Likes Received:
    315
    Location:
    Switzerland
    Tessmark is really special.. i dont know why, for a long time they was not even test their soft on AMD GPU's. I dont know if it come from the driver, from Tessmark who have not been updated, but they got lower result at 32-64x of the Pitcain .. But if you look Unigine results, it is waay different. ( even if in unigine extreme tesselation, the tesselation is not the only part tested )
     
  15. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,430
    Likes Received:
    433
    Location:
    New York
  16. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,079
    Likes Received:
    648
    Location:
    O Canada!
  17. homerdog

    homerdog donator of the year
    Legend Veteran Subscriber

    Joined:
    Jul 25, 2008
    Messages:
    6,153
    Likes Received:
    928
    Location:
    still camping with a mauler
    These results are weird. Techspot shows the GTX760 beating the 670 and Guru shows the 680 ahead of the 770.
     
  18. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,435
    Likes Received:
    263
    You generate enough DS verts to fill a wavefront/warp and launch it. One per clock per tessellator.

    Your last sentence is correct, but some of the previous comments are not.

    The hardware tessellator only needs tessellation factors. It outputs UV's and the DS does the rest. The DS needs access to the HS output and if you only have a few wavefronts in flight performance will suck. Hence you want DS waves from the same patch to execute on multiple CUs/SMXs in parallel.
     
  19. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,779
    Likes Received:
    2,566
    Concerning Mantle and BF4 ..

    -BF3 used heavy object instancing to reduce draw calls count improving performance, on one example from 4000 to 900! less than a quarter. Don't know if BF4 does the same or not, but it's highly likely.

    -In BF4 single player : HD 7850 achieves about 55fps, HD 7870 achieves about 65fps both @1680x1050 and High preset .

    -PS4 GPU is running the game at 1600x900@60fps,at the High preset also. So results seem to be in between HD 7850 and 7870 as expected and consistent with the console's low clock speeds.

    -XB1 runs at 1280x720 @60fps High preset also. Again consistent with it's much weaker GPU.

    -Don't know if both consoles are running a code that is close to the metal, and whether it's much closer than Mantle or not. That raises questions about how much performance can be extracted using Mantle API, and whether it will affect the CPU more than the GPU or vice versa.
     
  20. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    7,583
    Likes Received:
    703
    Location:
    Guess...
    The tested resolution is 22.5% more pixels than the PS4's resolution. And we've yet to see what advantage Mantle brings to the table. With those elements factored in it looks like the 7850 should be as fast or faster than the PS4.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...