Recent content by Bob

B
Nvidia Pascal Announcement

I agree; that would not be a very competitive product.
- Bob
- Post #220
- Apr 8, 2016
- Forum: Architecture and Products
B
Nvidia BigK GK110 Kepler Speculation Thread

SXM != SMX
- Bob
- Post #1,867
- Mar 29, 2015
- Forum: Architecture and Products
B
How does a computer allocate memory in hardware? (pointers,stack, heap)

This isn't a stupid question at all! To a first order, they achieve the same effect: leverage existing ILP to increase performance. SIMD execution is a huge, huge simplification for the HW implementation of parallel execution of instructions. Consider that in the non-SIMD case, the CPU must...
- Bob
- Post #33
- Nov 24, 2014
- Forum: PC Hardware, Software and Displays
B
Is everything on one die a good idea?

There's a lot of wrong in your reply, but I want to address this point specifically. If branch prediction makes a process "out of order", then the majority of what are considered "in order" machines need to be rebranded. I think you're confusing "out of order execution" with "speculative...
- Bob
- Post #129
- Aug 25, 2014
- Forum: Architecture and Products
B
NVIDIA Maxwell Speculation Thread

On Maxwell sm50 profiles, blocks are HW-limited to 48 KB of shared memory (each). This is the same as on Kepler and Fermi. ptxas was misreporting the capability as 64 KB per block. This is incorrect. The real HW maximum size is 48 KB of shared memory per block. SMM has a total shared memory...
- Bob
- Post #1,175
- Feb 20, 2014
- Forum: Architecture and Products
B
NVIDIA Maxwell Speculation Thread

There is a bug in the RC version of the ptx assembler. The limit is indeed 48 KB of shared memory per block. This will be corrected for the official release.
- Bob
- Post #1,170
- Feb 20, 2014
- Forum: Architecture and Products
B
NVIDIA Maxwell Speculation Thread

Shared memory per block is limited to a smaller value than the total shared memory capacity per SM.
- Bob
- Post #1,142
- Feb 19, 2014
- Forum: Architecture and Products
B
GPGPU running on FPGA

Jeff B: Impressive!
- Bob
- Post #12
- Aug 17, 2013
- Forum: Rendering Technology and APIs
B
NVIDIA Kepler speculation thread

NVIDIA GPUs have been exposing a flat address space for generic pointers since Fermi. The same pointer can be used to point to global memory, shared memory, or thread local memory. Optionally, you can make the pointers explicit and gain a performance advantage in certain situations.
- Bob
- Post #6,261
- May 1, 2013
- Forum: Architecture and Products
B
Nvidia Volta Speculation Thread

That was quick...
- Bob
- Post #2
- Mar 19, 2013
- Forum: Architecture and Products
B
Software/CPU-based 3D Rendering

Genefer uses fp64 for its computation. That the GeForce 680 is slower than the 580 has little to do with "scheduling". Edit: I see the 560 Ti being faster than the 680. Now that's more worrisome. I will run this benchmark tomorrow, when I have access to a 680.
- Bob
- Post #121
- Dec 3, 2012
- Forum: Rendering Technology and APIs
B
Software/CPU-based 3D Rendering

Do you have any non-OpenCL benchmarks to prove that point? Or do you believe there's something about OpenCL that reduces Kepler's performance, something that doesn't apply to CUDA?
- Bob
- Post #117
- Dec 3, 2012
- Forum: Rendering Technology and APIs
B
Software/CPU-based 3D Rendering

Nick, you're setting yourself up such that (pretty much) no matter what happens when, you'll consider yourself as having being right. This kind of prediction is not very useful: Save for the zombie apocalypse, most of us are already assuming that future humans will use computing devices, likely...
- Bob
- Post #93
- Nov 17, 2012
- Forum: Rendering Technology and APIs
B
NVIDIA Kepler speculation thread

Sure they can, as long as they're independent instructions (and generate no resource conflict), but not at a sustainable rate. Due to a quirk of the architecture, you can issue a few instructions back-to-back from one warp before you take a small penalty, which can be completely covered by a...
- Bob
- Post #4,003
- Apr 7, 2012
- Forum: Architecture and Products
B
NVIDIA Kepler speculation thread

I'm not sure I understand. The 680 isn't two 580s glued together. As such, its performance will vary based on the application, and you should not expect exactly 2x the performance except on artificial series of independent multiply-adds. For example, AES uses a lot of bit shifts. But bit...
- Bob
- Post #3,957
- Apr 6, 2012
- Forum: Architecture and Products