If anything it's just an interesting experiment in terms of figuring out if gamers can actually recognize small variances in framerate on variable refresh monitors, or particular graphics options in real time. Depends if he keeps settings aligned in his experiment, I guess.
Once it starts injecting human variability, there are a lot of potential gotchas and difficulty in providing reproducibility or easily communicable results.
I do not know the methodology employed, but it sounds like a testing situation with limited time for analysis or rigor in sampling and controls. The probable outcome seems to have been predetermined by the tester's choices as well.
Tweaking things would be limited by the time it would take to redo observations, and how much the observers' perceptions vary over time.
It doesn't seem like AMD's events were double-blind, and it isn't clear how much data was gathered to determine if the outputs were consistent over time.
There could be ways to consciously and unconsciously influence results, and without a lot of transparency and attention to detail we may not get something that really helps us understand.
It would be nice if there weren't the confounding factor of incompatible VRR methods, where the monitors or the standards cannot be switched.
One idea, if it were possible, would be a benchmark or playthrough recording with constant frame contents coupled with something like a simple compute shader looping on a time check prior to the end of a frame to get the cards to handle what would appear the generally the same output.
Perhaps the timing mode could provide something that would be more broadly reproducible if that were carried across sessions.