围棋吧 关注:351,266贴子:10,729,376
  • 11回复贴,共1

Minigo: A minimalist Go engine modeled after AlphaGo Zero, b

只看楼主收藏回复

https://github.com/tensorflow/minigo
This is a pure Python implementation of a neural-network based Go AI, using TensorFlow. While inspired by DeepMind's AlphaGo algorithm, this project is not a DeepMind project nor is it affiliated with the official AlphaGo project.
This is NOT an official version of AlphaGo
Repeat, this is not the official AlphaGo program by DeepMind. This is an independent effort by Go enthusiasts to replicate the results of the AlphaGo Zero paper ("Mastering the Game of Go without Human Knowledge," Nature), with some resources generously made available by Google.
Minigo is based off of Brian Lee's "MuGo" -- a pure Python implementation of the first AlphaGo paper "Mastering the Game of Go with Deep Neural Networks and Tree Search" published in Nature. This implementation adds features and architecture changes present in the more recent AlphaGo Zero paper, "Mastering the Game of Go without Human Knowledge". More recently, this architecture was extended for Chess and Shogi in "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm". These papers will often be abridged in Minigo documentation as AG (for AlphaGo), AGZ(for AlphaGo Zero), and AZ (for AlphaZero) respectively.
Goals of the Project
Provide a clear set of learning examples using Tensorflow, Kubernetes, and Google Cloud Platform for establishing Reinforcement Learning pipelines on various hardware accelerators.
Reproduce the methods of the original DeepMind AlphaGo papers as faithfully as possible, through an open-source implementation and open-source pipeline tools.
Provide our data, results, and discoveries in the open to benefit the Go, machine learning, and Kubernetes communities.
An explicit non-goal of the project is to produce a competitive Go program that establishes itself as the top Go AI. Instead, we strive for a readable, understandable implementation that can benefit the community, even if that means our implementation is not as fast or efficient as possible.
While this product might produce such a strong model, we hope to focus on the process. Remember, getting there is half the fun. :)
We hope this project is an accessible way for interested developers to have access to a strong Go model with an easy-to-understand platform of python code available for extension, adaptation, etc.


1楼2018-01-30 20:50回复
    Good jod, but it seems that a lot of resources are needed to train a strong enough agent. And python is relatively slow, which makes this process even harder. BTW, there is already an open source project Leelazero on github which also wants to reproduce the result of alphago zero, and leelazero is a 7 dan player on fox go server now.


    IP属地:浙江来自手机贴吧4楼2018-01-30 21:29
    收起回复
      2026-01-10 17:54:42
      广告
      不感兴趣
      开通SVIP免广告
      翻译一下。


      IP属地:吉林来自Android客户端5楼2018-01-30 22:14
      回复
        ……


        来自Android客户端6楼2018-01-31 12:01
        回复
          昨天看见leela-zero的作者和这个Minigo的作者交流心得了,挺有意思,他们在讨论是不是能互相兼容网络权重。Minigo用了谷哥云的机器训练的,最初练了一个9x9小棋盘的,是用了1000个核心,没用GPU。后来练的19x19的网络用了20 block network with 128filters,比现在leela-zero的6x128大了不少,在CGOS上用的somebot的系列名字测试的,当时比同期的leela-zero 5block的各个权重强不少。最好玩的是他们练出来这个somebot全部是双三三开局,他们说想办法想让它随机尝试更多的开局,就是一直做不到,它就是固执的双三三开局。如果有兴趣可以上CGOS上去看看这些somebot的棋谱。其实它的实力可以更强,因为他们在CGOS上测试用的是单线程,python也没有leela-zero的c++效率高,所以他们只能做到每步棋搜索300到500次,leela-zero都是用的1600次搜索。


          IP属地:美国9楼2018-02-01 22:15
          回复
            会很慢啊,用python。。。
            关键是这人哪来的Google Cloud TPU资源,前不久看还只能申请用呢,就一千个。
            这锅还得是Deepmind背,没有一个哪怕是专业机构可用的AlphaZero


            IP属地:北京10楼2018-02-02 00:17
            收起回复