AMD: Speculation, Rumors, and Discussion (Archive)

Discussion in 'Architecture and Products' started by iMacmatician, Mar 30, 2015.

Thread Status:
Not open for further replies.
  1. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    RecessionCone likes this.
  2. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    The die shots of Polaris show it was striving heavily to keep a lid on die size. That might have something to do with TrueAudio 2.0 opting to put functions on the CUs, and the ROP count.
    The FinFET transition seems to have gobbled up most of the attentions for Polaris 10/11 (and apparently at least one 16nm TSMC APU with the Xbox One S and a PS4 Neo of indeterminate node and architecture).
    Hopefully some of what was learned from the teething pains with this pipe cleaner can carry over to the rumored architectural change with Vega. Polaris 10 brings density, but the nifty voltage tweaks are applied to cards whose voltages seem pretty stuck at higher levels.

    One interesting difference from the first Polaris die shot and the more polished one is that the little bright areas are less visible in the newer shots. It looks like those might be interconnect links?
     
  3. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    Those could be part of the clock distribution tree.
     
  4. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    That could be. It looks like it would be something with more localized metal to show up after grinding down to that level.
    It seemed like there was some correlation with portions of the chip that also would be sending/receiving a fair amount of data, but those would weigh on the clock network at the same time.
     
  5. New APU roadmaps leaked on semiaccurate forums:
    http://wccftech.com/amd-roadmap-2016-2017-leaked-zen/

    [​IMG]
    [​IMG]


    Big news here are the Raven Ridge specs.
    4-core/8-thread that starts at 4W on mobile and an iGPU with up to 12 CUs / 768 sp.
    I can't see anything regarding memory configuration, so it's probably just 128bit DDR4 which will probably strangle the GPU a lot.
    I would love to see Raven Ridge carrying at least a single HBM2 stack, but when AMD's roadmaps are omitting stuff, I've learned to expect the least interesting stuff.

    I can't wait to see what those 4W-15W Raven Ridge APUs can do, compared to Intel's Y and U lines. I imagine not all CUs will be enabled on the lower power variants, but AMD has the potential to beat every GT2 implementation from Intel.
     
    Alexko likes this.
  6. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
    That's what the HPC-APU is for
     
  7. I know, but Raven Ridge would be a lot more interesting with even just a single HBM stack. Hynix seemingly only makes 4-Hi stacks at the moment but I think even a 2-Hi stack with 2GB would be more than enough for that iGPU to blast through all Intel offerings with equivalent TDP.

    Maybe the mobile variant will support LPDDR4. Do any of the current AMD apus support lpddr3?
     
    #4747 Deleted member 13524, Aug 9, 2016
    Last edited by a moderator: Aug 9, 2016
  8. gamervivek

    Regular

    Joined:
    Sep 13, 2008
    Messages:
    805
    Likes Received:
    320
    Location:
    india
    The only worthwhile improvement AMD have shown is getting similar performance with fewer ROPs. Polaris has half the ROPs of comparable performing cards and that bodes well for Vega if you believe that ROPs held back Fury X performance.
     
  9. homerdog

    homerdog donator of the year
    Legend Subscriber

    Joined:
    Jul 25, 2008
    Messages:
    6,294
    Likes Received:
    1,075
    Location:
    still camping with a mauler
    AMD could make a kickass "steambox" type APU next year with Zen + GCN + HBM. If the price is right it could really sell in Asian markets I think. I would definitely be interested in such a thing for the living room and OEMs would have a ball with it.
     
  10. no-X

    Veteran

    Joined:
    May 28, 2005
    Messages:
    2,451
    Likes Received:
    471
    RX 460 doesn't have fully enabled GPU, two CUs are disabled, so the comparision isn't correct.
     
  11. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY

    I don't think the Fury X was ROP bound...... Don't remember anytime we saw that, and this is why at higher resolutions it tended to perform better since shader needs were greater than the ROP needs.
     
  12. Anarchist4000

    Veteran

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    Given all the resources IHVs appear to be dumping into linux drivers lately, I really wouldn't be surprised if something like this is in the works. Only caveat is a bunch of games would need to be looking at Vulkan for that to really be practical. Sure OpenGL works, but porting DX12 to Vulkan or using it natively makes far more sense than a DX11 to OpenGL port.
     
  13. lanek

    Veteran

    Joined:
    Mar 7, 2012
    Messages:
    2,469
    Likes Received:
    315
    Location:
    Switzerland
    Well it is more a question of "time", intencively, when at the start, the choice appears, do we port it to Vulkan or OpenGL x.x ? I think the choice will be obvious. The good thing, is AMD have issue new drivers on Linux who seems pretty good. And when developpers develop with OpenGL and AMD in mind, the result seems way better than what we are used to see with the old OpenGL games, software implementation.
     
    #4753 lanek, Aug 10, 2016
    Last edited: Aug 10, 2016
  14. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    Fury X has both high bandwidth and high compute performance. I would guess that Fury X compute units are most of the time underutilized in games. Geometry pipeline is likely a big bottleneck for it.

    I suggest reading this GDC presentation by Graham Wihlidal:
    http://www.frostbite.com/2016/03/optimizing-the-graphics-pipeline-with-compute/

    Page 12 shows a GCN occupancy graph. As soon as you start pushing more (smaller) triangles and more draws, GCN cannot keep the occupancy up. Geometry pipeline (including fixed function units and the vertex shader) is the bottleneck. As a result not enough pixel shader waves are spawned to fill the GPU. This graph is from a console GPU with reduced CU count and reduced bandwidth as bigger PC parts (but the same geometry throughput). This problem should be more severe on PC (esp on Fury X). Unfortunately there are no PC tools that record runtime occupancy graph of the whole GPU. You can't see the geometry pipeline problems by doing static analysis to shaders.

    Fortunately Polaris improved the geometry pipeline a lot. It is able to quickly reject triangles that don't contribute to the image (strip degenerates, sub-pixel sized, etc). This results in higher vertex shader occupancy, which leads to higher pixel shader occupancy. Polaris also added instruction prefetch. Prefect should reduce the stall when a new vertex shader starts execution (important when there's lots of small draws as the stall cascades through the whole GPU).

    Fury X most likely was occasionally also ROP bound (big triangles close to camera create bursts of occupancy). It has quite low ROP : compute ratio. It has around 30% more bandwidth and compute than R9 390, but the same amount of ROPs. And it has DCC as well (= Fury X is practically is never memory bound). Hopefully Vega doubles the ROP count and further improves the geometry pipeline from Polaris. AMD would also benefit from more efficient rasterization. Nvidia added tiled rasterization in Maxwell, and got a big efficiency boost.
     
    Lightman, Kej, Heinrich04 and 12 others like this.
  15. AnomalousEntity

    Newcomer

    Joined:
    Jun 6, 2016
    Messages:
    38
    Likes Received:
    25
    Location:
    Silicon Valley
    Don't you run vertex shader before a triangle can be culled out in primitive discard stage? This shouldn't affect VS occupancy although should improve PS occupancy.
     
  16. Ext3h

    Regular

    Joined:
    Sep 4, 2015
    Messages:
    428
    Likes Received:
    497
    Effective culling prior to being bottle necked on the fixed function part prevents both the stall on the VS and the starvation on the PS, effectively increasing throughput and occupancy on both.
     
    Heinrich04, Razor1, Newguy and 4 others like this.
  17. xEx

    xEx
    Veteran

    Joined:
    Feb 2, 2012
    Messages:
    1,060
    Likes Received:
    543
    My biggest hope is that with vega lunch I will be able to buy a 480/70...Im actually now tempting for a 1060 since it has stock in amazon unlike AMDs. I dont know why AMD chose to match the 470 so close to the 480, its basically the same card.(Im talking about what it is the market)look ridiculous to me but anyways Ive been patient but they keep announcing and reviewing cards and then there is almost no stock. Im still waiting them to get stock in amazon but for some reason AMD and its partners doesn't want to be in the biggest store in the world.
     
  18. Michellstar

    Regular

    Joined:
    Mar 5, 2013
    Messages:
    662
    Likes Received:
    380
    Being 470 a 480 with disabled CUs, so same die, they want everybody to buy 480 in this price segment
     
  19. xEx

    xEx
    Veteran

    Joined:
    Feb 2, 2012
    Messages:
    1,060
    Likes Received:
    543
    The price is practically the same...
     
  20. RecessionCone

    Regular Subscriber

    Joined:
    Feb 27, 2010
    Messages:
    505
    Likes Received:
    189
    Was looking at the GCN instructions and had a question about DS_PERMUTE_B32: what happens if two lanes write to the same address? The ISA documentation doesn't mention a defined behavior, so I assume it's just a race condition?
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...