tito at talkchess.com wrote:
> ... I Know all that, but isn't possible to give partial results? The best would
> be to say how many games played before to post here! 100, 200, 300???
If you want to use tournament results to show that one engine is better
than another engine, then the table below gives the needed number of
games for 95% confidence.
(view with fixed width font)
minimum games difference in engine elo ratings win per
4,000 10 51.4%
900 20 52.9%
400 30 54.3%
150 50 57.1%
65 75 60.6%
36 100 64%
16 150 70%
8 200 76%
My guess is that Rybka 1.1 should win more than 61% against Toga II 1.2
beta 2a. So if Rybka scored +15 against Toga after 65 games we could
say "Rybka 1.1 is stronger than Toga II 1.2beta2a" with 95% confidence.
So I guess what I am recommending is to run a 2 engine tournament until
the win percent exceeds the percentage shown in the table above. For
example, if after 8 games the score is 5wins 1tie and 2losses = 68.75%,
then continue. If at 16 games the score is 10 wins 3 ties and 3 losses
= 71.8%, then you could post the results.
Any comments?
Cheers,
Irchans
>> Stay informed about: Minimum number of games to be "significant"