Writing a CPU Raytracer

Well, I decided to write to file once at the end of the program. No point in writing to the output file after each pixel. I just used a string function to reserve the space for the number of characters the file will contain, which is straightforward.

Took a look at Instruments. Seems pretty neat. Did some time profiling. The program basically spends all of it's time on drand48(). Played around with the counters instrument. Looks like you can record any performance counters you want. There are a lot of different CPU counters. Not sure what I'd be looking for if I wanted to see L2 misses. Mostly just curious in how to use the tools.

Edit:
L2_RQSTS.MISS
Recorded about 30 seconds of run time and had almost 227 million L2 misses. So that's interesting. The function that has the vast majority of the misses is a deeply recursive function, so I guess that makes sense.

Another edit:
I guess what I'd really care about is an L3 miss? Doesn't seem like there are obvious L3 miss counters.
 
Last edited:
Haven't had a chance to work on it this week. I did actually end up buying a new PC for other projects, but will probably continue this one on my macbook. I'd like to play around with Swift a bit more, do some performance testing and try some alternate implementations of what I'd done. I'd like to play with Classes vs Structs and see what the performance implications are in terms of passing parameters.
 
I think one of the things that maybe confused me about data-oriented design was not understanding how cpu prefetching. It was maybe the missing piece. I do understand how accessing the heap is expensive with cache misses, but I didn't make a connection to prefetching data into the cache.
 
Back
Top