When you submit your evaluation function, it will run a "depth test". This test uses a few preselected boards to run your evaluation function on (using my alpha-beta minimax solutions). It runs them in increasing depth until the average time exceeds a certain time limit. The maximum depth before it exceeds the time limit is the depth to which your evaluation function will always be run in the tournament.
The time limit is 10 seconds of CPU time (including garbage collection time). FYI, the first example evaluation function was able to run to depth 8 within this time.
You can submit an evaluation function as many times as you want to this web tester. There are no late penalties associated with this problem --- only a hard deadline (as explained at the top of the page).
This oracle will run your evaluation function on my alpha-beta minimax solutions and print out a transcript of the run. This may be useful for developing your evaluation function. If you want to know why your evaluation function made a certain move, you can put the game state into this oracle and get the transcript from the alpha-beta minimax search.
Unless one of the players uses some random element in their evaluation function, a match between two players will always come out the same, so it doesn't make sense to request a match with the same version of another students' player.
The standings are reset when you upload a new evaluation function. By the way, it is possible to manipulate the standings by requesting lots of matches against weaker players. I suggest to everyone that we not start down this path... This is part of the reason that the actual Connect 4 tournament (for grading purposes) will be run offline after the final deadline for sumbitting an evaluation function.