After the first two rounds of the Sudoku GP, I found that some puzzles' point values didn't reflect their real difficulty. Some puzzles' point values are much higher than their real difficulty, such as the Extra Regions Sudoku and the Irregular Sudoku in Sudoku GP2, resulting in 8 competitors with 1000+ points and 42 competitors with (the total) 719+ points.
In my opinion, the point values should reflect puzzles' real difficulty, but should not be too overvalued to scare competitors. The GP should be a test of brain power, not psychology.
I suggest the authors contact more testers and estimate the puzzles' difficulty more precisely. The total points can change, but the scale should remain at 10 pts/min and not change.
Efficiency should be considered
When determining the point values of puzzles, testers should consider not only the actual time they spend on each puzzle, but also their efficiency of earning points. This is because not all testers are top solvers and some testers' "efficiency" of earning points is lower than 10 pts/min. If efficiency is not considered, it must result in some puzzles having point values higher or lower than their actual difficulty.
For example, a tester spends 8 minutes solving a puzzle. If he is a top solver around the world and can get 10 points per minute in GP or WSC/WPC, then his calculated point value can be 80 points. However, if he can only get 7 points per minute in competitions, then his calculated point value should only be 56 points. And then, use the average of all testers' calculated point values as its final point value.
Trial should also be considered
Moreover, because GP and WSC/WPC don't care how competitors get the solution, sometimes a competitor may use trial to get the solution, which significantly reduces his/her solving time. Therefore, to balance, point values should be reduced a little if the puzzle is easy via trial, even it's hard via logic.
Catch 22: Puzzle Difficulty vs. Tester's Solving Skill
Couldn't agree more with the point that the GP should be a test of brain power, not psychology. I've always interpreted that as meaning you want a round where there isn't too much "points inequality" in the round - which you can measure by comparing number of puzzles to their total points values. If 3-4 puzzles make up 50% of the points that doesn't feel desirable to me. It's even worse when the points don't seem accurate. I haven't had a go at round 2, so can't really comment more than that.
I'll add the observation that you can never really know how hard a puzzle unless you have a good idea of the solving ability of your testers. But then you can never really know how good your testers are unless you have a good idea of the difficulty of the puzzles you get them to solve. The problem is exactly the same thing to everyone taking the round. It's catch 22.
The GP has a further problem to overcome, because it is to remain a credible competition then the scores from one round to the next need to be directly comparable given the best 6 from 8 mechanic. The scoring this round would seem to have inflated scores compared to the last couple of years! Someone like me who has missed round 2 is presumably now at a big disadvantage for the rest of the series, assuming the scoring returns to a more stable level.
I don't think I've understood clearly how you would use solving efficiency to test directly (rather than just their times) - some testers will clearly have favourite types and other types they struggle with - determining a definitive value for an individual's solving efficiency isn't an easy thing to do. The current testing approach ought to work if you have a sufficient number of testers, apply a stable benchmark (I often use the median) whilst applying some expert judgement to take into account variance in testing times (which often arise where there is a quick way to guess/use uniqueness).
For what it's worth, I think the puzzle GP has a harder problem with puzzle variance compared to the sudoku GP. I think it also has a bigger problem (at least as far as assigning points values) with the top solvers increasing/accelerating their points efficiency on the hardest puzzles compared to everyone else. Maybe this is where I need to thing about solving efficiency (which has inverse relationship to solving time) a bit more...