Writing a CPU Raytracer

Discussion in 'Beginners Zone' started by Scott_Arm, Mar 25, 2018.

  1. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    12,876
    Likes Received:
    2,971
    Well, I decided to write to file once at the end of the program. No point in writing to the output file after each pixel. I just used a string function to reserve the space for the number of characters the file will contain, which is straightforward.

    Took a look at Instruments. Seems pretty neat. Did some time profiling. The program basically spends all of it's time on drand48(). Played around with the counters instrument. Looks like you can record any performance counters you want. There are a lot of different CPU counters. Not sure what I'd be looking for if I wanted to see L2 misses. Mostly just curious in how to use the tools.

    Edit:
    L2_RQSTS.MISS
    Recorded about 30 seconds of run time and had almost 227 million L2 misses. So that's interesting. The function that has the vast majority of the misses is a deeply recursive function, so I guess that makes sense.

    Another edit:
    I guess what I'd really care about is an L3 miss? Doesn't seem like there are obvious L3 miss counters.
     
    #21 Scott_Arm, Apr 7, 2018
    Last edited: Apr 7, 2018
    Malo likes this.
  2. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    12,876
    Likes Received:
    2,971
    Haven't had a chance to work on it this week. I did actually end up buying a new PC for other projects, but will probably continue this one on my macbook. I'd like to play around with Swift a bit more, do some performance testing and try some alternate implementations of what I'd done. I'd like to play with Classes vs Structs and see what the performance implications are in terms of passing parameters.
     
  3. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    12,876
    Likes Received:
    2,971
  4. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    12,876
    Likes Received:
    2,971
    I think one of the things that maybe confused me about data-oriented design was not understanding how cpu prefetching. It was maybe the missing piece. I do understand how accessing the heap is expensive with cache misses, but I didn't make a connection to prefetching data into the cache.
     

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...