Thanks, @JoeJ for the in-depth explanation. I have an additional question here, path tracing is all about the global illumination = direct light + indirect light. Now, if we ignore the shadow light, and rely on only the indirect light, may be the result will be pleasant, but will it be called global illumination then?
Some confusion here, because not doing shadow rays (= next event estimation) does not miss direct lighting. Results are equivalent. Next event estimation is just a lot faster. I still recommend to start without optimizations, so you already have a reference image when adding the optimizations later. It helps to minimize uncertainty along the way and causes no extra work.
But i assume you have a test scene where all light comes form area lights, e.g. Cornell Box with its box light. It's a good GI test scene because it shows color bleeding and area shadows, and we can render it in reasonable time also without next event estimation or russian roulette.
Say we do do five bounces. Our path hits the box, the left wall, the right, the other box, finally it hits the light.
That's ideal, because the light so illuminates all our path segments and we get bounces across the entire path.
Another path goes box, other box, light, wall, other wall.
Problem: Our latter hits on the walls receive no lighting and are thus wasted work.
Another path does not hit the light at all, just boxes and walls, which is the worst case. All work spend on ray tracing was for nothing, and this worst case happens a lot actually.
And that's the motivation to introduce shadow rays. It's not really about capturing direct lighting, which we do anyway for examples 1 and 2.
I would even say it's not helpful to make a mental difference between direct and indirect lighting at all, since imo it only adds confusion once we want to get everything right and correct.
No - we only observe most of our work spent is for nothing, and we want to fix that.
Usually light sources are sparse in our scenes, and usually we know where those lights are, so we come up with the following optimization:
At each hitpoint, we trace a shadow ray to some light source. And if it is not shadowed, we can add the lights contribution, also affecting all the former vertices in our path.
Now the chances that an entire path is not lit at all are very small. Most work we spend on tracing will help to integrate towards the correct solution, and we're happy.
That's all about it. It's not related to physics or optics at all. It is an optimization for technical reasons. At least i prefer to look at it this way.
But it raises questions like 'Is it really statistically correct to pick a random light? And can we improve this eventually by making a better choice? Also, how should i distribute my random samples on the light surface, while still guaranteeing statistical correctness with integrating all samples?' etc. It becomes a lot more complicated.
The same applies even more for a rich material model. If we use just Lambert diffuse for everything, importance sampling is obvious and simple. But if we want complex materials to implement some PBS standard, it becomes very difficult to generate random reflection rays which corretly integrate all those complex terms like roughness, fresnel, etc. And you loose the option to compare your render with a reference generated with some other offline renderer, because everybody has his own PBS standard, varying in details. I've tried this quickly, and seemingly i got it to work. But i can't be sure if it's right or wrong. If i want to become an expert with PT, i would need to invest much more time to learn all that's needed to become certain.
Unlike with hacky rasterization gfx, when working on PT you really get obsessed about correctness eventually. That's fun and interesting, but ideally you start with minimized complexity so you can add more things one after another, avoiding to pile up too many doubts.
In my experience this means: No optimizations, no analytical light sources just area lights, no complex materials just diffuse.
This way the basic PT algorithm is intuitive, and making reference images from other programs is easy. That's a good start without countless details raising doubts.