Recent content by keldor314

  1. K

    Nvidia Volta Speculation Thread

    Do we even know that this is a hardware bug? Given that Volta significantly changed thread scheduling in a warp, perhaps this is a rare race condition in the code that cannot occur at all with the older scheduling? The Cuda programming guide makes it abundantly clear multiple times that warp...
  2. K

    Nvidia Volta Speculation Thread

    Basically, each SM has a block of shared memory/L1 cache associated with it. In GP100, they doubled the L1/shared memory by having two blocks of it per what used to be a SM. Due to addressing/whatever, each half of the original SM can only see one of the blocks, so it behaves like 2 64-thread...
  3. K

    Nvidia Volta Speculation Thread

    I'm not sure prefetch is meaningful in the context of Nvidia GPUs - memory instructions are already asynchronous, and are tied to barriers that instructions explicitly wait for when they need the data. In this way, all loads are prefetches if you want to look at it that way. I believe that...
  4. K

    Nvidia Volta Speculation Thread

    Normals and other directional vectors don't need that much precision. 23 bits of mantissa give you enough precision to point a laser pointer at a specific person's house... assuming you're an astronaut currently standing on the moon. To further put this in perspective, this beats out the...
  5. K

    Nvidia Volta Speculation Thread

    Just to make things interesting, the above code may or may not actually terminate - it all has to do with whether the compiler decides to schedule the then or the else first. If it schedules the else first, then the producer will release the lock and the consumer will happily go on to the next...
  6. K

    GCN and mixed wavefronts

    If AMD hardware is anything like Nvidia (and I rather think it is), you can absolutely have multiple blocks resident on the same CU. I use this to allow wave sized blocks which avoid the need for syncthreads in some kernels. Allowing blocks from different kernels on one CU at the same time is...
  7. K

    NVIDIA shows signs ... [2008 - 2017]

    Definitely under strain. Those bags of money are heavy, and there are still another half dozen out in the truck!
  8. K

    AMD: Speculation, Rumors, and Discussion (Archive)

    It's not that hard to generate code that has a high degree of ILP in the arithmetic portions. Your big tool is loop unrolling combined with not reusing registers between iterations, which trades off higher register count for higher usable ILP. Here's an example of what I'm talking about. The...
  9. K

    Will there be 300W Discrete GPUs in 5 years? 10?

    The real question is, if APUs are going to take over the world, where on earth are they? I mean, you have the PS4 and XBOX1, which have moderately powerful APUs, but you see nothing even remotely close in the PC space. Why is this? I suspect a large part of that answer has to do with...
  10. K

    Will there be 300W Discrete GPUs in 5 years? 10?

    A more interesting question is whether there will be a discrete CPU in 5-10 years. CPUs stopped getting faster 5-10 years ago, and even mobile class CPUs are beginning to get close to high performance CPUs (within 2-5x lets say). GPUs are still rapidly scaling, and there's still a 10-30x gap...
  11. K

    NVIDIA Maxwell Speculation Thread

    Hrmm. Judging by the relative size... 3072 cuda cores for the full chip?
  12. K

    NVIDIA Maxwell Speculation Thread

    Large amounts of memory are used in high resolution textures. Thus, if you're targeting PS4, you use 6GB of memory so you can have the sharpest textures possible.
  13. K

    NVIDIA Maxwell Speculation Thread

    You have to remember that Kepler's ALUs can only be fully utilized at IPC > 1.5 in the code (4 threads can dual issue, but only 6 ALUs behind it). This means that the 2560 case represents a best case for Kepler where the workload has lots of available ILP. In the real world, Kepler does worse...
  14. K

    Native FP16 support in GPU architectures

    The thing that everyone's forgetting is that ALUs aren't the main driver of power consumption or die area any more - it's communication across the chip and off that really does it, as well as complexity inside each scheduler. One of the problems is that adding FP16 makes your scheduler more...
  15. K

    NVIDIA Maxwell Speculation Thread

    It really does. I actually wouldn't be surprised if internally Denver used the same internal ISA as something like Maxwell, though with additional execution pipes.
Back
Top