Hi,
I'm writing a little piece of software for which I need to draw a lot of moving spheres (at least hundreds). They move independently of one another and may have different colors as well. I know very little about OpenGL, so I'm really just winging it here. I thought of three different possible ways to do this:
Option 1, Giant unified mesh, CPU-managed: The idea here would be to have a single Vertex Buffer Object containing all the vertices of all the spheres, and to move each vertex individually in the C++ code. Then I would just have to render a single object. This seemed too CPU-heavy and not very much in the spirit of OpenGL, so I didn't actually do this. Should I have?
Option 2, Naive rendering of each sphere separately: Here I just upload the mesh of my sphere, compute the Model-View-Projection matrix + color vector of the first sphere in C++, upload them to the GPU, and draw. I upload the MVP+color of the second sphere, draw, upload the MVP+color of the third sphere, and so on. This works, and I get 60 FPS with up to 500 spheres, but after that performance starts to decline pretty sharply.
Using two quad-core 2.4GHz Haswell Xeons and a crappy NVIDIA NVS 310 (GF119, so Fermi with 48 Shaders) I get about 30 FPS with 1 000 spheres and a 4 FPS slideshow with 12 000 spheres. I guess that's probably normal given the number of draw calls. I'm rendering in a 1000×1000 window with AA 4× (by simply using glfwWindowHint(GLFW_SAMPLES, 4);).
Option 3, Instanced rendering: OK, so then I heard about instanced rendering, so I figured I ought to do that. I now do almost nothing on the CPU, I just upload the mesh once, plus an array with all the positions and an array with all the colors of each sphere. Then, in the shader, I get the correct position based on gl_InstanceID, from which I can build the MVP matrix in the shader itself; I get the color in a similar fashion and I can render the sphere. Thus, all the heavy-lifting is done on the GPU.
Problems: while this works, I seem to be limited to about 1000 (probably closer to 1024) elements in my array of sphere positions. If I try to go higher, I get an error telling me OpenGL couldn't locate a suitable resource to bind my variable. Worse yet, this limit appears to be global to all uniform variables, because when I added an array of colors, I had to go down to 500 spheres.
And, on top of this, there appears to be no measurable performance benefit over option 2, which really surprised me. As far as I can tell I only have one draw call per frame now, although I can't really be sure of what the driver is actually doing.
Apparently, I could get over the size limitation by using a one-dimensional texture, so perhaps I ought to try that. But still, shouldn't this be faster?
Thoughts? Did I overlook something to cause such lackluster performance? Is instanced rendering even the right option in this case? Have I even done anything that wasn't completely stupid?
Thanks!
I'm writing a little piece of software for which I need to draw a lot of moving spheres (at least hundreds). They move independently of one another and may have different colors as well. I know very little about OpenGL, so I'm really just winging it here. I thought of three different possible ways to do this:
Option 1, Giant unified mesh, CPU-managed: The idea here would be to have a single Vertex Buffer Object containing all the vertices of all the spheres, and to move each vertex individually in the C++ code. Then I would just have to render a single object. This seemed too CPU-heavy and not very much in the spirit of OpenGL, so I didn't actually do this. Should I have?
Option 2, Naive rendering of each sphere separately: Here I just upload the mesh of my sphere, compute the Model-View-Projection matrix + color vector of the first sphere in C++, upload them to the GPU, and draw. I upload the MVP+color of the second sphere, draw, upload the MVP+color of the third sphere, and so on. This works, and I get 60 FPS with up to 500 spheres, but after that performance starts to decline pretty sharply.
Using two quad-core 2.4GHz Haswell Xeons and a crappy NVIDIA NVS 310 (GF119, so Fermi with 48 Shaders) I get about 30 FPS with 1 000 spheres and a 4 FPS slideshow with 12 000 spheres. I guess that's probably normal given the number of draw calls. I'm rendering in a 1000×1000 window with AA 4× (by simply using glfwWindowHint(GLFW_SAMPLES, 4);).
Option 3, Instanced rendering: OK, so then I heard about instanced rendering, so I figured I ought to do that. I now do almost nothing on the CPU, I just upload the mesh once, plus an array with all the positions and an array with all the colors of each sphere. Then, in the shader, I get the correct position based on gl_InstanceID, from which I can build the MVP matrix in the shader itself; I get the color in a similar fashion and I can render the sphere. Thus, all the heavy-lifting is done on the GPU.
Problems: while this works, I seem to be limited to about 1000 (probably closer to 1024) elements in my array of sphere positions. If I try to go higher, I get an error telling me OpenGL couldn't locate a suitable resource to bind my variable. Worse yet, this limit appears to be global to all uniform variables, because when I added an array of colors, I had to go down to 500 spheres.
And, on top of this, there appears to be no measurable performance benefit over option 2, which really surprised me. As far as I can tell I only have one draw call per frame now, although I can't really be sure of what the driver is actually doing.
Apparently, I could get over the size limitation by using a one-dimensional texture, so perhaps I ought to try that. But still, shouldn't this be faster?
Thoughts? Did I overlook something to cause such lackluster performance? Is instanced rendering even the right option in this case? Have I even done anything that wasn't completely stupid?
Thanks!