
These projects might be coding a new version over a month, testing/debugging another month, and then this month testing if an improvement (of course could have several changes in development at the same time - maybe once a year merging those that appear to be an improvement, retest, and a release. And sorry (ROFLOL) it isn't faster coding in the open source environment and it takes us programmers a lot more than a few minutes to do anything worthwhile. But 1000 could easily be done in a month on an upt to date workstation. Sure, if you needed 100,000 games or even 10,000 games. You don't need a distributed testing system for that. In any case, on up to date hardware (a current workstation class machine) all of them will be nowhere near a "knee of the curve" even at controls of 10-20 seconds per move.ī) So you would need to argue that much more than a sample size of say 1000 games is needed. Giving more time to each will improve the chances of MCTS selecting the correct move but that probability of improvement will be the same for both. All of these programs are using MCTS for their evaluator and all will be over the threshold for this algorithm to work (on even obsolescent hardware at short time controls). These programs aren't making mistakes "who won that game" and that is what MCTS is using to build its statisitics.Ī) Trials at different time controls probably not needed. There is no equivalent for the part of a chess playing program "evaluate who is ahead at this point of the game" (at the limit of it's "look ahead"). Because go games are much longer than chess games (number of moves) there might be a much larger chance that the situation were the improvement applies will show up in at least some of the games. The question is, how large must the sample size be in games between version A and version B for a difference in results of size X give us confidence Y that A is better than B (or vice versa). The situation may be different than with chess. This post by mohzus was liked by 2 people: emeraldemon, larseirik P.S.:Another pages of interest are the testing framework of stockfish: which currently shows that 44 different people are "giving" their computers to test different versions of stockfish.Īnd their google group forums. They could be sure with a high % if the changes are "bad" or "good" ones. It would be good for the programmers of pachi and fuego since they could test with accuracy any small changes made in their code. I am wondering if there is anyone interested in setting this up for pachi and/or fuego.
#Fishtest stockfish how to#
If the test passes at both time controls, it means there's a high likelihood of having made an improvement on the original code and so the original code gets upgraded.Īs I am no programmer, I am not really sure how to create a fishtest platform (the entire code can be found there: ) but since everything needed is open source, I am sure a programmer could set it up under a few minutes. shows to be an improvement of at least X elo points) at a particular time control, it must still be run at a different time control. The data (win, losses, number of games, etc.) are sent automatically to the server that calculates the rating differences between the 2 programs/versions being tested. The platform goal is to let people let their computers run games between 2 different versions of say, pachi.

This brings to mind the impact that Alphazero had on the chess community back in 2017.In the chess programming world there is currently a revolution being held, an open source platform used to improve a particular chess engine (stockfish) made it gain enough strength within a few months to become number 2 in the world and very close to overpass the number 1 (which is a commercial engine, Houdini 3). This makes it a very interesting chess program, as it might introduce new ideas which so far weren’t adopted by the chess community. They also noticed that many of its moves are less intuitive. The development team at DecodeChess reports that Stockfish NNUE is an aggressive, sacrifice-oriented playing machine. What is it like in terms of playing style? With NNUE, the same chess concepts are integrated into the neural network, which weighs them by itself. Until NNUE, all chess concepts that were programmed into Stockfish, were weighed by humans in a very limited manner. The big change in this release is the addition of an efficiently updatable neural network (NNUE).

With Stockfish 12, which was released in August 2020, the team announced that it’s stronger than the previous version by almost 100 ELO points, which is very significant. The Stockfish team constantly updates their chess engine, making it stronger in terms of its rating.
