Exactly right on panel variation.
A test under your conditions is a good test to see if it would benefit you. I am doing much the same thing for my specific panels. But it's not really a fair test to see whether MPPT would or would not benefit others who may run their batteries lower overnight or have more load during the day.
To make a more general test I think you'd test the two boundary conditions: yours, and the 11v case. That would set a reasonable upper and lower bound on the benefit for people with the two controllers tested, based on different usage models.
I am trying to take how effective a particular MPPT controller algorithm out of the equation. I am looking for the upper bound of a perfect MPPT algorithm's performance by sweeping the voltage curve manually and finding what I know to be the Vmpp with no controller mistakes. That is what i will compare to PWM output.
Unfortunately my human problem is I cannot easily do this continuously so my accumulated amp hours for each controller will be off if Vmpp is moving around. But at a given point in time I am pretty happy with how close I can get to peak watts out of the panel, and compare that to watts generated using PWM under the exact same conditions into the exact same load.
Jim