AMD: Speculation, Rumors, and Discussion (Archive)

Discussion in 'Architecture and Products' started by iMacmatician, Mar 30, 2015.

Thread Status:
Not open for further replies.
  1. CarstenS

    Legend Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,284
    Likes Received:
    2,958
    Location:
    Germany
    Even though it's not explicitly stated there, it need to be read as an "up to" still, because, you know, corner cases and financial statements.


    They did. But then they also emphasized on how (IIRC!!) half a dozen power saving features where not even enable in the prototype they are basing this figure on.
     
    #1241 CarstenS, Apr 22, 2016
    Last edited: Apr 22, 2016
  2. SimBy

    Regular Newcomer

    Joined:
    Jun 21, 2008
    Messages:
    700
    Likes Received:
    391
    Agreed. That's how I always look at perf/W. But again, perf/W is probably one of the most fancy metrics...that tells you almost nothing.

    Compared to what exactly, which SKU in which benchmark etc. I would imagine its far lower than 2x compared to Nano.
     
    #1242 SimBy, Apr 22, 2016
    Last edited: Apr 22, 2016
    CarstenS likes this.
  3. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,175
    Location:
    La-la land
    If AMD could do another Radeon HD 4890, that would be ridiculously welcome, IMO. That card was fuken awesome from a gaming perspective. Kickass performance, ridiculously price competitive!
     
    Razor1 and Lightman like this.
  4. CarstenS

    Legend Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,284
    Likes Received:
    2,958
    Location:
    Germany
    You really want AMD to die, don't you? Look at what that ridiculous price war did to them in the aftermath.
     
  5. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY
    Well I think its always a comparative amount, not that they will want to cut down margins to gain an advantage, if they have the ability to do it and keep their margins healthy it will be good.
     
  6. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,175
    Location:
    La-la land
    It's not my job to set AMD's prices, that's AMD's job. GPU prices have climbed steadily, and is now at completely ridiculous prices. When the GTX 8800 Ultra hit $600 we thought we were at Peak Ridiculous, but we hadn't seen nothing yet. Nvidia hasn't launched a high-end gaming card at $600 for how long now? :p

    A high-performant GPU for a good price, who wouldn't want that? Even if a $300 Polaris can't beat a $600+ NV GPU, for that price you could buy two AMD cards and run DX12 multiadapter mode in future games and get superior FPSes too as a result...! :)
     
  7. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,536
    Likes Received:
    4,635
    Location:
    Well within 3d
    I was meaning to return to this since putting a scalar unit downstream from whatever buffer or front end stage that splits the CU scalar pipe from the SIMD pipes provides an opportunity for reducing delays, or places the logic more closely to where it can track dependences.

    If the delays are reduced or eliminated due to the hardware being more integrated, or implementing a wavefront stall, then the software-exposed model for the architecture doesn't need to change. It might be more optimal if the padding were eliminated, but the compiler could lag behind hardware evolution without compromising correctness.
    Short of a local scoreboard, possibly a more general set of flag registers for wavefronts that can stall if a specific instruction hazard might happen would be incremental in impact.

    I'm not sure if it would need to share the vector file, since it might require more complex handling when instructions can source both a vector and scalar register. CPUs can handle the basic interlocks needed between an FP and integer-linked set of pipelines within far fewer cycles with incremental complexity increases, and the SIMD pipeline could handle things even more conservatively.

    That would be extra hardware, but after two process nodes the area taken by scalar resources dedicated to a SIMD would be where it is now.
     
  8. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    11,159
    Likes Received:
    1,674
    Location:
    New York
    Was it the price war or the previous thrashing of the 29xx/38xx line?

    Granted, asking $300 for the 4870 was a bit too aggressive and unnecessary in hindsight.
     
  9. Frenetic Pony

    Regular Newcomer

    Joined:
    Nov 12, 2011
    Messages:
    631
    Likes Received:
    323
    Samsung (who's process is licensed by GloFlo) claims their defect density rate is < .2 (per cm^2 as the usual) for 14nm fifnet. http://www.anandtech.com/show/10272/samsung-foundry-updates-7nm-euv-10lpp-and-14lpc

    For reference, 28nm was at 0.05 defect rate by the beginning of last year. I don't know what TSMC's current 16nm defect rate is, but last year it was at 0.18. So by my guess I'd say TSMC has a lower defect rate at than Samsung currently, though since Samsung has a denser process I'm not sure how much a transistor comparison one could make even if there were less murky numbers to go on.

    Still, it does show that their are yield problems on bigger chips, at least for AMD. Of course with HBM gen 2 seemingly in such low supply it's questionable how much that matters at the moment.
     
    jbq.junior01 likes this.
  10. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,293
    Location:
    Helsinki, Finland
    Some info about this:
    http://gpuopen.com/dcc-overview/

    This article didn't specify MSAA and depth decompression and direct read in detail, so I asked some extra questions in their Twitter post.

    Me:
    @TimothyLottes did I understand correctly: MSAA + custom resolve is a bad case for DCC? Can GCN 1.2 read MSAA color + Z without decompress?

    Timothly Lottes:
    "@SebAaltonen GCN 1.2 has DCC texture-read path without separate decompress pass. But that DCC mode doesn't compress as well as no-read case."

    So it seems that it can directly read both depth and MSAA without decompression. However the readable format compresses slightly worse. A huge improvement over GCN 1.0/1.1.
     
  11. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,175
    Location:
    La-la land
    Is it really denser though? .16/14 are marketing numbers that don't reflect reality fully.
     
  12. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,598
    Likes Received:
    3,715
    Location:
    Finland
    It's slightly denser, Apple did chips on both, and ones made by Samsung were a bit smaller
     
  13. itsmydamnation

    Veteran Regular

    Joined:
    Apr 29, 2007
    Messages:
    1,328
    Likes Received:
    438
    Location:
    Australia
    I figure this mans posts are worth sharing:

    http://forums.anandtech.com/search.php?searchid=2739829


     
  14. CarstenS

    Legend Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,284
    Likes Received:
    2,958
    Location:
    Germany
    Seems to be locked if you're not a member of the AT forums. Who is this poster?
     
  15. itsmydamnation

    Veteran Regular

    Joined:
    Apr 29, 2007
    Messages:
    1,328
    Likes Received:
    438
    Location:
    Australia
    zlatan in my opinion one of the few people who is worth listening to on that forum. Claims to be a game dev, quality of post seems to back it up .
     
    CarstenS likes this.
  16. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,175
    Location:
    La-la land
    I'm a member, and there just doesn't seem to be a post there anymore. Deleted, or typo in the link perhaps?
     
    CarstenS likes this.
  17. itsmydamnation

    Veteran Regular

    Joined:
    Apr 29, 2007
    Messages:
    1,328
    Likes Received:
    438
    Location:
    Australia
    odd it loads for me just fine, its the post history of zlatan
     
  18. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    There was a French paper linked to me I read on this as well on APUs. The CPU was polling L3 cache hit rate to determine how far ahead to run with prefetching. The L3 on APUs part really makes me wonder if GPU memory will react like a L3 cache out of system memory going forward. AMD seems to have been making driver changes suggesting this with the entire system memory pool showing up as available VRAM.

    I have still been pondering over this as well. Still reading over that hybrid architectures to depth imaging paper. That paper seems to suggest a high performance CPU is required to do prefetching, but I'm not sure how it could keep up with a discrete GPU. Especially if constrained by PCIE bandwidth, even with Onion.

    I recall reading that. Still thinking there was another limitation or that "slightly worse" was a bit more than slight. It just seemed like some devs were tripping over it more than would be expected if the fix was a simple creation option.

    It was a search for member posts so required a login.

    This should be it.
    http://forums.anandtech.com/showpost.php?p=38180442&postcount=46
     
    CarstenS likes this.
  19. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    As AMD has implemented this Primitive Discard Accelerator, would it also had made sense for them to support Conservative Raster with its Occlusion Culling in DX12?
    Wonder how well the Primitive Discard Accelerator will work with engines such as UE4/Unity/CryEngine with their own internal occlusion culling designs and what level of integration is needed.
    Cheers
     
  20. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,293
    Location:
    Helsinki, Finland
    I would assume that the "Primitive Discard Accelerator" is just a marketing term for some additional early out tests for backfacing & smaller than pixel & out of the screen triangles (and maybe an early out test for small triangles vs HTILE). Currently Nvidia beats AMD badly in triangle rate benchmarks, especially in cases where the triangles result in zero visible pixels. Nvidia certainly has more advanced triangle processing hardware, but the interesting question is whether they just win by brute force (Nvidia has distributed geometry processing to parallelize the work and have better load balancing), or whether they also have better (early out) culling for triangles that are not visible.

    Programmable vertex shaders make it almost impossible to do robust automatic coarse grained culling (by driver & hardware). Not all engines use vertex buffers anymore, and even if vertex buffers are used, the vertex position data might be bit packed in a custom format. The transform matrix might be 4x4, 4x3, 3x4, there might be two or three of them (separate world * view * projection), there might be separate position transform and 3x3 rotation, or quaternion (or dual quaternion) rotation instead of a matrix. So I would assume that the GPU has to run the vertex shader. Of course the driver could split the SV_Position related vertex shader code to reduce some math and data loads. Still this would require loading position data for each vertex and transforming each vertex before the culling decision could be made. This greatly reduces the potential savings.

    Coarse occlusion culling (object and/or sub-object granularity) is still highly beneficial, even if the GPU had hardware to occlusion cull at triangle granularity (and/or really high triangle rate). Coarse culling doesn't need to fetch any per vertex data (just bounding boxes/spheres), meaning that is accesses much less memory, and does much less calculations per culled triangle. Coarse (software based) occlusion culling will be still highly relevant in the future.
     
    chris1515, Pixel, CSI PC and 2 others like this.
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...