Recent content by Bob

  1. B

    Nvidia Pascal Announcement

    I agree; that would not be a very competitive product.
  2. B

    How does a computer allocate memory in hardware? (pointers,stack, heap)

    This isn't a stupid question at all! To a first order, they achieve the same effect: leverage existing ILP to increase performance. SIMD execution is a huge, huge simplification for the HW implementation of parallel execution of instructions. Consider that in the non-SIMD case, the CPU must...
  3. B

    Is everything on one die a good idea?

    There's a lot of wrong in your reply, but I want to address this point specifically. If branch prediction makes a process "out of order", then the majority of what are considered "in order" machines need to be rebranded. I think you're confusing "out of order execution" with "speculative...
  4. B

    NVIDIA Maxwell Speculation Thread

    On Maxwell sm50 profiles, blocks are HW-limited to 48 KB of shared memory (each). This is the same as on Kepler and Fermi. ptxas was misreporting the capability as 64 KB per block. This is incorrect. The real HW maximum size is 48 KB of shared memory per block. SMM has a total shared memory...
  5. B

    NVIDIA Maxwell Speculation Thread

    There is a bug in the RC version of the ptx assembler. The limit is indeed 48 KB of shared memory per block. This will be corrected for the official release.
  6. B

    NVIDIA Maxwell Speculation Thread

    Shared memory per block is limited to a smaller value than the total shared memory capacity per SM.
  7. B

    GPGPU running on FPGA

    Jeff B: Impressive!
  8. B

    NVIDIA Kepler speculation thread

    NVIDIA GPUs have been exposing a flat address space for generic pointers since Fermi. The same pointer can be used to point to global memory, shared memory, or thread local memory. Optionally, you can make the pointers explicit and gain a performance advantage in certain situations.
  9. B

    Nvidia Volta Speculation Thread

    That was quick...
  10. B

    Software/CPU-based 3D Rendering

    Genefer uses fp64 for its computation. That the GeForce 680 is slower than the 580 has little to do with "scheduling". Edit: I see the 560 Ti being faster than the 680. Now that's more worrisome. I will run this benchmark tomorrow, when I have access to a 680.
  11. B

    Software/CPU-based 3D Rendering

    Do you have any non-OpenCL benchmarks to prove that point? Or do you believe there's something about OpenCL that reduces Kepler's performance, something that doesn't apply to CUDA?
  12. B

    Software/CPU-based 3D Rendering

    Nick, you're setting yourself up such that (pretty much) no matter what happens when, you'll consider yourself as having being right. This kind of prediction is not very useful: Save for the zombie apocalypse, most of us are already assuming that future humans will use computing devices, likely...
  13. B

    NVIDIA Kepler speculation thread

    Sure they can, as long as they're independent instructions (and generate no resource conflict), but not at a sustainable rate. Due to a quirk of the architecture, you can issue a few instructions back-to-back from one warp before you take a small penalty, which can be completely covered by a...
  14. B

    NVIDIA Kepler speculation thread

    I'm not sure I understand. The 680 isn't two 580s glued together. As such, its performance will vary based on the application, and you should not expect exactly 2x the performance except on artificial series of independent multiply-adds. For example, AES uses a lot of bit shifts. But bit...
Back
Top