Real-Time Ray Tracing : Holy Grail or Fools’ Errand? *Partial Reconstruction*

Are you serious? MY bias? If anything, all of my previous articles on the topic have leaned towards ray tracing being the superior long term option.
That was bleedingly obvious, yes :)
This all points to work that hardware and software developers have done over the years to overcome those inherent draw backs that rasterization has for for some rendering.
How do you go from the statement of fact they both need and use hierarchies to that?
(To quote Dr. Kirk again: "Rasterization is blisteringly fast, but not well-suited to all visual effects.")
Yes, rasterization is fast for well suited visual effects. Without qualification, not because of years of research and a head start on raytracing, a simple statement of fact. David Kirk was saying rasterization is inherently faster for some things, that is why he suggested hybrid rendering.

If you had said that was because of his bias and actually he was misguided or lying his face off ... ala, you would have still have been wrong but at least you wouldn't have put words in his mouth which basically contradict what he actually said.

Once again ...
NVIDIA's stance is that rasterization is not inherently worse for gaming than ray tracing is if only because all the years of work and research that has gone into up to today.
That's not what David Kirk said ... not in the fucking remotest sense. You are reading everything he says with blinders on ... blinders with teleprompters inside :) If you read what he said without bias there is only one simple conclusion, NVIDIA's stance is that rasterization is inherently better for some parts of game rendering.

PS. Of course that is what they would say as a company making rasterizers, but in this case their PR and truth coincide IMO. In the far future those parts of game rendering will make up such a small part of the pie of the total rendering time that you can simply forget about rasterization, which is why ray tracing is the technology of the future ... but not of the near future.

PPS. when David Kirk says you are leaning the wrong way about 3D computer graphics you should consider it as a real possibility ...
 
Last edited by a moderator:
I'm still not sure why I should believe that will be the case though? I agree it's likely, just not certain. Larrabee, because it's x86, has significant ISA overhead which can only be bypassed by creating a new vector-based API. Other architectures can be much more creative in terms of ISA to achieve other potentially more appealing trade-offs.

That implies that (in theory) a from-the-grounds-up architecture can be (slightly?) more flexible than Larrabee for a given level of performance, not less. So the correct question to ask is not whether GPUs can be more flexible; they obviously can. The question is whether they want to be. Certainly if I was starting the design of a next-generation GPU today with the threat of Larrabee in mind, flexibility would be pretty high on my todo list... Same for one year ago. Two years ago? That becomes a much more difficult (and interesting) question.

I'm 100% certain that ray tracing is NOT the path Intel is pushing towards. Lets take your shadow point further: Larrabee is 16-way SIMD, so the only efficient raytracing will be packet tracing with a multiple of 16 non-divergent rays at a time. Secondary rays simply don't high enough computational locality to be done, they are just too divergent. Re-sorting secondary rays into non-divergent ray packets is way too expensive (doesn't map well to SIMD). All most all the truly impressive graphical phenomenon requires too many secondary rays: soft shadows, subsurface scattering, participating media, motion blur, transparency, depth of field, etc. The true strength in rendering is that most computations are kept with good 2D computational locality and you can use logarithmic reductions in complexity to simulate tough to compute phenomenon.

Getting back to Larrabee, I bet Intel is going to simply try and dominate the middle-end GPU by getting Larrabee on all motherboards and getting Larrabee II into the next XBOX. I wouldn't be surprised (Intel has the Project Offset team, TomF, and others) if Intel plans to release Larrabee at the same time with their own optimized in-house rendering graphics engine for developers. Something with built-in very flexible virtual texturing support, something with a very good programmable post processing pipeline, etc. Reducing the need for external developers to tackle the tough parallel programming problems and simply allowing them to maintain the lazy C++ fine grain locking programming style parallel programming that most devs want for all the core game code (Larrabee is x86 after all). While x86 may be a crux for performance, it is great at making lazy programmer's code perform well.

Personally I would rather manage my own cache/localstore and my ideal CPU+GPU machine would be something very similar to a mix of the Cell and CUDA, but that kind of thinking is so very rare these days!

I would like to see five things from NVidia's future offerings to compete against Larrabee,

1.) Ability to run more than one shader program at a time on one of the cores (meaning the core would have to schedule from at least a second instruction pointer). This would allow for interleaving of programs so you could pair a high TEX or ROP bound shader with a high ALU bound shader and better keep the various pipelines fully saturated. Would solve lots of problems such as the problem of filtering operations always being TEX bound, g-buffer creation being ROP bound, etc... (but might not be worth the extra hardware complexity).

2.) Ability to use a shader shared local store (like CUDA). Have support on shipping hardware, just an API issue.

3.) Ability to use a programmable read/write surface cache (useful for a programmable ROP, etc). Probably already planned for future hardware probably just a question of API exposing the functionality.

4.) Ability to do process control from the GPU instead of requiring CPU to build a command buffer.

5.) Double precision. Probably here 2008 or 2009.
 
I'm 100% certain that ray tracing is NOT the path Intel is pushing towards.
I'm not sure I would say they're not pushing towards it, but I would definitely agree that I see no serious indication whatsoever that they are really pushing towards pure raytracing with the 2010 incarnation of the Larrabee architecture. As for when they really will be, who the hell knows. So overall I guess we pretty much agree here.

Lets take your shadow point further: Larrabee is 16-way SIMD, so the only efficient raytracing will be packet tracing with a multiple of 16 non-divergent rays at a time. Secondary rays simply don't high enough computational locality to be done, they are just too divergent. Re-sorting secondary rays into non-divergent ray packets is way too expensive (doesn't map well to SIMD).
I think I'd agree 100% with you if I agreed that Larrabee is 16-way SIMD. Remember Larrabee is a CPU, so it has scalar MIMD units in each core too. If they were smart enough, they can benefit from this hybrid MIMD-SIMD approach significantly for both raytracing and physics in general. Whether they were smart enough is another question completely, but if I was managing a strategy team at NVIDIA or AMD, I certainly wouldn't want to underestimate Intel too much in that respect.

Back before the G80 launched, I was a proponent of *not* going unified because I felt that flexibility (including GPGPU flexibility) would actually increase through a hybrid MIMD-SIMD solution rather than a plain SIMD one. Of course, ideally in terms of flexibility you'd go scalar MIMD all the way (with VLIW or other more exotic approaches), but that is another debate completely.

The true strength in rendering is that most computations are kept with good 2D computational locality and you can use logarithmic reductions in complexity to simulate tough to compute phenomenon.
Yes, even if Larrabee can benefit from a hybrid MIMD-SIMD approach, it won't magically be able to fix the locality problem in terms of memory bandwidth burst size - there simply is no magical solution to that problem. Certainly you can cache the top levels of the acceleration structure, but that doesn't fix all of the problem.

Getting back to Larrabee, I bet Intel is going to simply try and dominate the middle-end GPU by getting Larrabee on all motherboards and getting Larrabee II into the next XBOX.
Not a bad bet, I'd wager.
I wouldn't be surprised (Intel has the Project Offset team, TomF, and others) if Intel plans to release Larrabee at the same time with their own optimized in-house rendering graphics engine for developers.
And that's an excellent bet, IMO. A few years ago, I started thinking that given the direction the console market was going and the rate at which development costs were going up, NVIDIA and ATI absolutely needed to create free or near-free AAA middleware to spur growth in the market. And now it looks like it's Intel which has adopted that strategy instead, at least up to a certain extend - I certainly can't help but congratulate them for doing what I thought their competitors should be doing all along!

Reducing the need for external developers to tackle the tough parallel programming problems and simply allowing them to maintain the lazy C++ fine grain locking programming style parallel programming that most devs want for all the core game code (Larrabee is x86 after all). While x86 may be a crux for performance, it is great at making lazy programmer's code perform well.
Yeah, I agree with that. Although I'd like to point out that in terms of perf/mm²/(amount of effort), x86 is a complete and utter joke, and x86 many-core is even more of a joke. It's not the right direction for the industry to take, but this is a complex debate and I don't feel it's either the time nor the place to have it.

Personally I would rather manage my own cache/localstore and my ideal CPU+GPU machine would be something very similar to a mix of the Cell and CUDA, but that kind of thinking is so very rare these days!
Yeah, that'd be pretty cool. Personally, my ideal processor is a mix of PowerVR's SGX and Ambric's technology. That's a complex debate once again though, so if you want to have it we can create a new thread for that. Alternatively, I was thinking about writing an article on that kind of stuff...

1.) Ability to run more than one shader program at a time on one of the cores (meaning the core would have to schedule from at least a second instruction pointer). This would allow for interleaving of programs so you could pair a high TEX or ROP bound shader with a high ALU bound shader and better keep the various pipelines fully saturated. Would solve lots of problems such as the problem of filtering operations always being TEX bound, g-buffer creation being ROP bound, etc... (but might not be worth the extra hardware complexity).
Uhm, isn't this the case already? It *is* limited, but it does happen as far as I can tell. You can certainly have one PS program and one VS program running on the same multiprocessor on G8x or at the same time on R6xx.

2.) Ability to use a shader shared local store (like CUDA). Have support on shipping hardware, just an API issue.
Agreed.

3.) Ability to use a programmable read/write surface cache (useful for a programmable ROP, etc). Probably already planned for future hardware probably just a question of API exposing the functionality.
There are problems with that, but I agree it's necessary, and I also agree part of the problem will be API exposure.

4.) Ability to do process control from the GPU instead of requiring CPU to build a command buffer.
Agreed, having MIMD on the same chip as where you're doing graphics processing is very important in my book.

5.) Double precision. Probably here 2008 or 2009.
That's in GT200, see the related thread... :)

Sorry if this reply might have felt a bit critical or negative, I do believe your post was very good - although I obviously have a different opinion of the MIMD capabilities of Larrabee. For Intel's sake, I hope I'm right, but that doesn't mean I am obviously.
 
Yeah, I agree with that. Although I'd like to point out that in terms of perf/mm²/(amount of effort), x86 is a complete and utter joke, and x86 many-core is even more of a joke. It's not the right direction for the industry to take, but this is a complex debate and I don't feel it's either the time nor the place to have it.

100% agree with x86 perf/mm^2 being really really bad!

Yeah, that'd be pretty cool. Personally, my ideal processor is a mix of PowerVR's SGX and Ambric's technology. That's a complex debate once again though, so if you want to have it we can create a new thread for that. Alternatively, I was thinking about writing an article on that kind of stuff...

You definitely should do the article and thread. I hadn't heard about Ambric prior to this post and the technology looks rather interesting. Will have to dive into it later.

Uhm, isn't this the case already? It *is* limited, but it does happen as far as I can tell. You can certainly have one PS program and one VS program running on the same multiprocessor on G8x or at the same time on R6xx.

I don't know enough about the AMD HD cards to say, but I'm not sure if this happens on NVidia's 8 series. Obviously they overlap execution of the vertex and fragment shader of a draw call to keep the FF hardware busy (tri setup, etc), however I've been thinking this is done by running the vert and frag programs on different cores and load balancing that way. I'm thinking that each 8-wide SIMD core has only one instruction pointer shared across all threads/warps (probably to keep the hardware thread scheduler simple, fast, and area efficient). Of course this is all speculation on my part.

Sorry if this reply might have felt a bit critical or negative, I do believe your post was very good - although I obviously have a different opinion of the MIMD capabilities of Larrabee. For Intel's sake, I hope I'm right, but that doesn't mean I am obviously.

Not at all critical or negative.
 
100% agree with x86 perf/mm^2 being really really bad!
Yup, and the point I also wanted to emphasize was that perf/mm²/effort isn't good either. The only perf/effort advantage x86 has is in terms of development environment, and honestly it shouldn't be exagerated. On the other hand, I'm quite convinced raw perf/effort and perf/mm²/effort can be noticeably higher with other architectures and ISAs even for general-purpose code.

You definitely should do the article and thread. I hadn't heard about Ambric prior to this post and the technology looks rather interesting. Will have to dive into it later.
Yes, Ambric is definitely interesting stuff. Bob Colwell (the P6 guy) is also on their advisory board. If you know where to look though, there are plenty of really smart things being thought of in the startup world. Hardware compuiter architecture really should be about so much more than what is typically talked about... Anyhow, I'll ponder writing an article on that in the coming weeks, although if I want it to be really good it'd have to be quite ambitious.

however I've been thinking this is done by running the vert and frag programs on different cores and load balancing that way. I'm thinking that each 8-wide SIMD core has only one instruction pointer shared across all threads/warps (probably to keep the hardware thread scheduler simple, fast, and area efficient). Of course this is all speculation on my part.
Erik Lindholm's original patent on the G80 shader core clearly implies that certain embodiments of the invention can run multiple programs per multiprocessor:
http://v3.espacenet.com/textdes?DB=EPODOC&IDX=US7038686&F=0&QPN=US7038686 said:
Alternatively, in an embodiment permitting multiple programs for two or more thread types, Thread Control Unit 320 also receives a program identifier specifying which one of the two or more programs the program counter is associated with. Specifically, in an embodiment permitting simultaneous execution of four programs for a thread type, two bits of thread state information are used to store the program identifier for a thread. Multithreaded execution of programs is possible because each thread may be executed independent of other threads, regardless of whether the other threads are executing the same program or a different program.
And in a later patent, also obviously related to G8x:
http://v3.espacenet.com/textdes?DB=EPODOC&IDX=WO2007111743&F=0&QPN=WO2007111743 said:
[0054] Pixel controller 306 delivers the data to core interface 308, which loads the pixel data into a core 310, then instructs the core 310 to launch the pixel shader program. Where core 310 is multithreaded, pixel shader programs, geometry shader programs, and vertex shader programs can all be executed concurrently in the same core 310. Upon completion of the pixel shader program, core interface 308 delivers the processed pixel data to pixel controller 306, which forwards the pixel data PDATA to ROP unit 214 (FIG. 2).

[0055] It will be appreciated that the multithreaded core array described herein is illustrative and that variations and modifications are possible. Any number of processing clusters may be provided, and each processing cluster may include any number of cores. In some embodiments, shaders of certain types may be restricted to executing in certain processing clusters or in certain cores; for instance, geometry shaders might be restricted to executing in core 310(0) of each processing cluster. Such design choices may be driven by considerations of hardware size and complexity versus performance, as is known in the art. A shared texture pipeline is also optional; in some embodiments, each core might have its own texture pipeline or might leverage general-purpose functional units to perform texture computations.
That quote was a bit longer than it had to be, but I thought the second paragraph was interesting for a variety of reasons (not only related to your point) so I thought I'd post it anyway... :)
 
I'm thinking that each 8-wide SIMD core has only one instruction pointer shared across all threads/warps (probably to keep the hardware thread scheduler simple, fast, and area efficient).
That would seem to make sense given the control flow properties coherence requirements of G80. If the hardware could load balance/schedule independent programs at a smaller granularity than that then there would be no reason why the control flow granularity needs to be that "high" (16ish). Single programs with control flow and different programs are really quite similar concepts as far as SIMD/scheduling/coherence goes, although of course they may differ in data paths particularly for typical graphics pipelines. I'm skeptical that G80/92 can handle different control paths/programs individually within one SIMD unit (be that the 8-way or full 16-way SM... not sure of the limitations here). i.e. despite it's "scalar" interface, G80 is still an 8/16-way SIMD machine at its core.
 
That would seem to make sense given the control flow properties coherence requirements of G80. If the hardware could load balance/schedule independent programs at a smaller granularity than that then there would be no reason why the control flow granularity needs to be that "high" (16ish). Single programs with control flow and different programs are really quite similar concepts as far as SIMD/scheduling/coherence goes, although of course they may differ in data paths particularly for typical graphics pipelines. I'm skeptical that G80/92 can handle different control paths/programs individually within one SIMD unit (be that the 8-way or full 16-way SM... not sure of the limitations here). i.e. despite it's "scalar" interface, G80 is still an 8/16-way SIMD machine at its core.
There's a difference between load balancing at a smaller granularity and swapping warps. To swap warps you only need to save and restore a certain amount of state including a program counter. So if one warp is busy waiting (in the warp buffer) on a texture fetch other warps can run ALU instructions. So if you have a texture heavy program and a ALU heavy program they should run mostly in parallel. I think the real difficulty is interleaving such programs when history has shown large batch sizes are beneficial.

The SIMD nature is on a thread basis. Note that I've used Nvidia terminology here. AMD terminology is different.
 
I think the real difficulty is interleaving such programs when history has shown large batch sizes are beneficial.

The SIMD nature is on a thread basis. Note that I've used Nvidia terminology here. AMD terminology is different.

BTW, Arun thanks for the patent links. They are going to require another set of re-reads to fully digest.

3dcgi, I think you have brought up something very important here, in that even if the hardware can load balance between 2 draw calls, drivers might well be designed to be load balancing between only the vertex and fragment calls of one shader, and then interleaving other draw calls only at the beginning and end of the draw call (serialize draw calls). Obviously interleaving two large (in number of pixels processed) TEX bound pixel shaders would be very bad for the texture cache (would perform worse), so the driver would have to be really smart about what programs to run simultaneously vs in series ... or have to have an API for the programmer to instruct the driver what to run in series vs parallel (something I was hoping to get some day).

My comment was mostly in regards to having the ability to run a full screen quad based post processing kernel (highly TEX bound) in parallel with other ALU bound draw calls. While this might not be as important in this generation, I think having a fully programmable (somewhat film CG like) post processing / compositing pipeline (TEX bound mostly) will indeed be an integral part of future rendering engines. This is something which still doesn't map well to GPUs and doesn't look like will map well to Larrabee either.

Arun, looks like Ambric would be ideal for this (in area/performance as well), in that intermediate results mostly do NOT have to goto memory and simply can be directly used (read from hardware queues) in the next processing step.
 
There's a difference between load balancing at a smaller granularity and swapping warps. To swap warps you only need to save and restore a certain amount of state including a program counter. So if one warp is busy waiting (in the warp buffer) on a texture fetch other warps can run ALU instructions.
Right, but the fundamental hardware is still running operations in 8/16-wide SIMD. You can't split that up any further without having independent PCs and if you had that there would be no reason to disallow control flow divergence at a smaller granularity (at 100% efficiency).

"Multiple programs" runnings "simultaneously" can be implemented in terms of control flow, and the hardware will block that up to run at the very best different control flow paths on the different SMs (or half-SMs?) to my knowledge. As I mentioned, the real difference here is that fully "independent" programs would have different data paths (particularly ROP-level), although control flow in the vertex shader could certainly scatter data arbitrarily into memory. Thus what you're asking for could certainly be implemented in terms of control flow.

The remaining issue if you wanted the hardware to run things simultaneously over several program invocations is data dependencies. The driver would have to track and optimize for those in a fairly non-trivial way, something which is handled "automatically" (by the programmer) if you explicitly use control flow.
 
I think the main challenge is going to be in good data partitioning, unless you want to run all those cores through the same set of caches. So you need a different way to do texture lookups and such. Like, instead of fetching the texel and swapping the thread, you first make a list of all the texels you need to lookup on that texture, and process all of those in one go on a single core.
 
My comment was mostly in regards to having the ability to run a full screen quad based post processing kernel (highly TEX bound) in parallel with other ALU bound draw calls. While this might not be as important in this generation, I think having a fully programmable (somewhat film CG like) post processing / compositing pipeline (TEX bound mostly) will indeed be an integral part of future rendering engines. This is something which still doesn't map well to GPUs and doesn't look like will map well to Larrabee either.
That might be fairly parallelizable today. The first draw call will setup the pixels very fast and then they'll stall for a while in the pixel shaders. Immediately following a more alu bound draw will start hitting the pixel shaders. If there was a decent amount of work in the post processing kernel many of the pixels will likely still be around. I guess how much the alu work can fill in the gap depends on how many warps can be in flight at one time.

Right, but the fundamental hardware is still running operations in 8/16-wide SIMD. You can't split that up any further without having independent PCs and if you had that there would be no reason to disallow control flow divergence at a smaller granularity (at 100% efficiency).
I agree with that. I think I misinterpreted your initial comment and thought you said something you didn't. When you referred to SIMD I was thinking you meant that as an encompassing term for Nvidia's multiprocessor/cluster.
 
That might be fairly parallelizable today. The first draw call will setup the pixels very fast and then they'll stall for a while in the pixel shaders. Immediately following a more alu bound draw will start hitting the pixel shaders. If there was a decent amount of work in the post processing kernel many of the pixels will likely still be around. I guess how much the alu work can fill in the gap depends on how many warps can be in flight at one time.
Yeah the interesting thing about these sorts of setups are the data paths. In particular for the simple case of no killing/depth writes in the fragment shader (and no stencil, and a few other things) this sort of setup can work, but otherwise it gets a bit complicated. For instance, a triangle may be completely z-culled by one that you've "stalled" and it's hard to know how to handle these cases efficiently without knowing something about the triangles themselves. Now arguably a similar thing already occurs with different triangles in the same batch, but the situation becomes even more complicated when arbitrary state/code has changed for programs that are run in parallel. Now the "simple" solution is to only allow parallel execution of different draw calls if a number of conditions are met which may already be done... it would certainly be something interesting to see, and probably not something that's terribly hard to test. Thus I expect to see it on the next B3D hardware review ;)

I agree with that. I think I misinterpreted your initial comment and thought you said something you didn't. When you referred to SIMD I was thinking you meant that as an encompassing term for Nvidia's multiprocessor/cluster.
Ah okay that makes sense. Sorry for being unclear - my thoughts were admittedly a bit jumbled and I'm not sure that I said what I meant to say (particularly the first time).
 
That might be fairly parallelizable today. The first draw call will setup the pixels very fast and then they'll stall for a while in the pixel shaders. Immediately following a more alu bound draw will start hitting the pixel shaders. If there was a decent amount of work in the post processing kernel many of the pixels will likely still be around. I guess how much the alu work can fill in the gap depends on how many warps can be in flight at one time.

After taking a really good re-read of the NVidia multiprocessor patent again, it seems as if the patent is covering too many different ways to formulate the hardware to really get a good idea on exactly what is in the 8 or 9 series hardware. One thing which seems only loosely implied (ie only by the diagrams) is that there is a thread controller unit per multiprocessor. I was thinking that there was one thread control unit for the entire board (which probably doesn't make sense). But it seems as if the patent definitely provides for a hardware path for to overlap execution of 2 draw calls.

Anyway, I have a few ideas on how to test if the later 8 series hardware/driver can parallelize draw calls, but I will have to wait until the weekend to try them. If anyone else has some ideas to add to this I will try them and post the results. My first idea is to setup two full screen pixel shaders: one TEX bound only with dependent texture reads, and another ALU bound only with a mix of parallel and dependent ALU operations. Then profile the following cases: time the shader draw calls individually (only issue one draw call then sleep), then time the shaders together under the following cases in forward and reverse order: using the same render target (RT), using different render targets. Should hopefully answer this question.
 
Ryan has posted an interview with John Carmack discussing his thoughts on raytracing. The topic takes an unexpected turn when John details some of the work he is researching with a hybrid raster/raytrace method... but using raytracing with octrees for data storage and representation.

Fascinating stuff.

http://www.pcper.com/article.php?aid=532
 
Thanks for the link! Interestingly enough, he says many of the same things that we've been saying, although certainly more cleverly. Maybe now people will really try to figure out the best way to do rendering instead of just attaching themselves to dogmatic views... but I'm not bitter ;)

Seriously though, I'm glad that David Kirk, John Carmack and Matt Pharr have all presented a bit more reasonable views on the whole situation. Like them I'm excited about the utility of more complex data structures and algorithms (including raytracing) but I see nothing to indicate that there will be any pressing reason to get rid of rasterization any time soon.

I saw the quote from Intel about making no sense for a hybrid approach, and I disagree with that.
He's so much more diplomatic than me though ;)
 
Last edited by a moderator:
Heh, its no leap to say that rasterization will not be leaving anytime soon. It is what developers know, and it is what the VAST majority of hardware supports. Still, I am always in support of doing things more intelligently, and there are compelling concepts behind each of the rendering architectures that certainly could be combined to make better pixels for our viewing (and fragging) pleasure. The idea of shooting rays at data structures to retrieve that data rather than just "create a pixel" is pretty novel. I guess it is all data in the end.
 
Well at least we know it's best that all graphics hardware becomes fully generalized and--

It’s interesting in that the algorithms would be something that, it’s almost unfortunate in the aspect that these algorithms would take great advantage of simpler bit-level operations in many cases and they would wind up being implemented on this 32-bit floating point operation-based hardware. Hardware designed specifically for sparse voxel ray casting would be much smaller and simpler and faster than a general purpose solution but nobody in their right mind would want to make a bet like that and want to build specific hardware for technology that no one has developed content for.

Heehee.

Not that this particular future should come to pass, but it would be another iteration of the great circle of generalization to specialization and back.

I haven't heard about voxels in a while now.
I recall vaguely there were some stumbling blocks with those, including some wicked patents (edit: non-technical obstacles) on some pretty basic algorithms (marching cubes, I think, is or was one of them).
 
Last edited by a moderator:
Another injection of good old common sense..plus some teasing (now I want to know what he is working on..:) )
 
Specialization doesn't matter as much if you can generalize around the problem rather than around an arbitrary ISA. But in graphics, you have so many different problems that you likely can't achieve desirable characteristics for all of them on the same core, so fixed-function hardware becomes especially attractive (IMO). And how many times do I still need to point out that rudimentary calculations based on 'die' shots of the Raytracing FPGA clearly indicates raytracing would also benefit from special-purpose hardware?

Anyhow, excellent interview! I agree with John on just about everything, although I can't help but ponder what he has in mind in terms of toolsets for creating unique static objects everywhere...
 
Back
Top