https://github.com/AppliedDataSciencePartners/DeepReinforcementLearning
Main components
- Make two bots of different versions play against each other
- Logger that logs a sequence of actions and their score (Logging is actually the most interesting part of this codebase)
- Uses graphviz and pydot to output a graph of the neural network architectures
- Model is a CNN followed by batch normalizatino followed by Relu and then out dense output. Model has value and policy network that are the same but for some reason value head has 2 layers while policy has 1
- Has a seperate memory implementation with long and short term memory of the same size. Logs the board state, id, chosen action, action value and player turn
- Many global variables are pulled out from a central global config file which is python code (no seperate configuration language). This includes the neural net architecture. Config has ints, floats and dictionaries
- Game interface which
- Inits the game
- Reset, step, identities? (not sure what is meant by identities)
- Enumerates the allowed actions
- Checks if game has ended
- Get the current value of a state
- Render the board as a text file
- Implementation of the Connect4 game
- Agent
- Simulates MCTS by moving to a leaf, logging it, evaluating the leaf and then backfilling the value (I think this is something the MCTS tree should do automatically instead)
- performs an action with some selection strategy (act takes as input a function that gets the best state by doing MCTS simulations then applies the action to the world and triggers logging code around which action was taken)
- Agent also converts the state to somethign that the model file can actually consume
- Replay which retrains the model from memories with training targets as its past q value
- It's useful to have a logger for the runs so you can reinspect all the values. Logger also has a central config that disables or enables it at different points
- Loss function in its own seperate file for some reason (this should be part of the config as well)
- Monte Carlo Tree SEarch implementation
- Node data structure which keeps track of neighbors, if it's leaf, player turn and board state
- Edge doesn't make much sense (NWQP?)
- Most code is in move to leave
- Q and U values determine
- Add Node is 1 line of code
- Backfill go back up the tree and update values
https://github.com/tejank10/AlphaGo.jl
Main components
- Readme allows training by adjusting number of layers and num_games to train
- Allows play takes in an environment, the model net, num_readouts of MCTS and player turn
- Replay batch manager
- UX rendering take in as a config of a bunch of css files for play vs player only
- Has board featurizations with the pre Alpha Go Zero techniques where number of liberties among others
- Has bindings for a Go engine called GTG
- Has bindings for kgs coordinates
- Interface for Web IO
- A resnet stack implementation
- MCTS
- Board repreesnted as a n x N^2 tensor of value estimates. n is number of turns, N is size of the board
- Longest code
- MCTS player that loads a MC tree and can pick moves and manages updates to the Monte Carlo Tree Search
- Main loads all the net params and if they don't exist train MCTS player