Releases: LeelaChessZero/lc0
Releases · LeelaChessZero/lc0
v0.32.1
In this version:
- Strict timing is applied only if
isreadywas seen, for more accurate timing. - Better onnx-trt installation script that will download everything needed without user intervention.
- Improved transposition table memory use calculation for the memory limit.
- A small speed improvement for dag-preview search.
- Some important bug fixes:
- Two en-passant related bugs
- Guard against infinite fp16 input in cuda Softmax kernel.
- Changed the way the WDL draw value is calculated in some backends to avoid underflows.
- The onnx backend WDL Softmax calculation was moved to the cpu for improved accuracy (like other backends already do).
- Correct onnx moves left head final activation.
- Fix for cudnn attention policy with convolutional nets.
- A few more minor fixes.
- Assorted build system improvements.
v0.32.0
In this release, the code has been reorganized and undergone major changes. Therefore this changelog will be less detailed and describe the changes in major groups.
- We have a new search API that allows search algorithms to co-exist. Currently available are
classic(the default),dag-preview(more later),valueheadandpolicyhead. The default algorithm can be changed either at build time by thedefault_searchoption or by renaming the executable to include the algorithm name (e.g. lc0-valuehead). - We also have a new backend interface that is chess oriented and not tied to the network architecture. The existing backends still use the old interface through a wrapper.
- The source code is reorganized, with a more logical directory structure.
- The original search was ported to the new search and backend interfaces and is renamed to
classic. This has allowed some streamlining and simplifications. - The
dag-previewsearch is the DAG algorithm that lived in a separate branch up to now. It hasn't been so well tested, that's why it has "preview" in its name for now, but lives in thesrc/search/dag-classicdirectory. - The
valueheadsearch replacesValueOnlymode and selects the move with the best value head evaluation. - The
policyheadsearch is equivalent to a single node search, selecting the best move using just the policy head. - The new
default_backendbuild option allows to override the fixed priority for the backend used by default. - The new
native_archbuild option to override the-march=nativecompiler default for linux release builds, to help with distribution package creation. - We have a new
syclbackend that will work with amd, intel and nvidia gpus. - There is also a new
onnx-trtbackend, using tensorrt on nvidia gpus. - The metal backend received several improvements.
- Support simple/normal/pro mode in options was cleaned up, using a common mechanism.
- Added the
waituci extension command to allow running simple tests from the command line. - Removed the
fenuci extension command as it was unnecessarily complicating things. - Some preliminary fp8 support was added for onnx and xla. This is not functional, just there to make experimentation easier.
- Several build system changes and improvements.
- We now generate binaries for cuda 12, onnx-trt and macos.
- The onnx-trt package has a readme with instructions and an install script.
- Support for using lc0 with openbench.
- New
benchmode for a quicker benchmark. - RPE nets are now detected and give an error instead of bad results.
- The rescorer code and training data header were refactored to make them usable by external tools.
- Assorted small fixes and improvements.
v0.32.0-rc2
In this version:
- Fix for onnx-trt bug, where the wrong network could be used from the cache.
- Added code to detect RPE nets and give an error instead of bad results.
- Better instructions in the readme and install script for onnx-trt.
- Made
UCI_ShowWDLagain off by default again as some GUIs have issues. - Fixed a long standing issue when compiled with
-ffast-math(oricx -O3). - Several improvements to the sycl backend.
- Several improvements to the metal backend.
- Refactored the rescorer code and training data header to make them usable by external tools.
- Relaxed cuda/cudnn version checks so that no warnings are shown for mismatched versions that are supported.
- Several build system updates.
- Assorted small fixes and improvements.
v0.32.0-rc1
In this release, the code has been reorganized and undergone major changes. Therefore this changelog will be less detailed and describe the changes in major groups.
- We have a new search API that allows search algorithms to co-exist. Currently available are
classic(the default),dag-preview(more later),valueheadandpolicyhead. The default algorithm can be changed either at build time by thedefault_searchoption or by renaming the executable to include the algorithm name (e.g. lc0-valuehead). - We also have a new backend interface that is chess oriented and not tied to the network architecture. The existing backends still use the old interface through a wrapper.
- The source code is reorganized, with a more logical directory structure.
- The original search was ported to the new search and backend interfaces and is renamed to
classic. This has allowed some streamlining and simplifications. - The
dag-previewsearch is the DAG algorithm that lived in a separate branch up to now. It hasn't been so well tested, that's why it has "preview" in its name for now, but lives in thesrc/search/dag-classicdirectory. - The
valueheadsearch replacesValueOnlymode and selects the move with the best value head evaluation. - The
policyheadsearch is equivalent to a single node search, selecting the best move using just the policy head. - The new
default_backendbuild option allows to override the fixed priority for the backend used by default. - The new
native_archbuild option to override the-march=nativecompiler default for linux release builds, to help with distribution package creation. - We have a new
syclbackend that will work with amd, intel and nvidia gpus. - There is also a new
onnx-trtbackend, using tensorrt on nvidia gpus. - Support simple/normal/pro mode in options was cleaned up, using a common mechanism.
- Added the
waituci extension command to allow running simple tests from the command line. - Removed the
fenuci extension command as it was unnecessarily complicating things. - Some preliminary fp8 support was added for onnx and xla. This is not functional, just there to make experimentation easier.
- Several build system changes and improvements.
- We now generate binaries for cuda 12, onnx-trt and macos.
- Support for using lc0 with openbench.
- New
benchmode for a quicker benchmark. - Assorted small fixes and improvements.
v0.31.2
v0.31.1
v0.31.0
In this version:
- The blas, cuda, eigen, metal and onnx backends now have support for multihead network architecture and can run BT3/BT4 nets.
- Updated the internal Elo model to better align with regular Elo for human players.
- There is a new XLA backend that uses OpenXLA compiler to produce code to execute the neural network. See https://github.com/LeelaChessZero/lc0/wiki/XLA-backend for details. Related are new leela2onnx options to output the HLO format that XLA understands.
- There is a vastly simplified lc0 interface available by renaming the executable to
lc0simple. - The backends can now suggest a minibatch size to the search, this is enabled by
--minibatch-size=0(the new default). - If the cudnn backend detected an unsupported network architecture it will switch to the cuda backend.
- Two new selfplay options enable value and policy tournaments. A policy tournament is using a single node policy to select the move to play, while a value tournament searches all possible moves at depth 1 to select the one with the best q.
- While it is easy to get a single node policy evaluation (
go nodes 1using uci), there was no simple way to get the effect of a value only evalaution, so the--value-onlyoption was added. - Button uci options were implemented and a button to clear the tree was added (as hidden option).
- Support for the uci
go mateoption was added. - The rescorer can now be built from the lc0 code base instead of a separate branch.
- A dicrete onnx layernorm implementation was added to get around a onnxruntime bug with directml - this has some overhead so it is only enabled for onnx-dml and can be switched off with the
alt_layernorm=falsebackend option. - The
--onnx2pytochoption was added to leela2onnx to generate pytorch compatible models. - There is a cuda
min_batchbackend option to reduce non-determinism with small batches. - New options were added to onnx2leela to fix tf exported onnx models.
- The onnx backend can now be built for amd's rocm.
- Fixed a bug where the Contempt effect on eval was too low for nets with natively higher draw rates.
- Made the WDL Rescale sharpness limit configurable via the
--wdl-max-shidden option. - The search task workers can be set automatically, to either 0 for cpu backends or up to 4 depending on the number of cpu cores. This is enabled by
--task-workers=-1(the new default). - Changed cuda compilation options to use
-arch=nativeor-arch=all-majorif no specific version is requested, with fallback for older cuda that don't support those options. - Updated android builds to use openblas 0.3.27.
- The
WDLDrawRateTargetoption now accepts the value 0 (new default) to retain raw WDL values ifWDLCalibrationElois set to 0 (default). - Improvements to the verbose move stats if `WDLEvalObjectivity is used.
- The centipawn score is displayed by default for old nets without WDL output.
- Several assorted fixes and code cleanups.
v0.31.0-rc3
In this version:
- The
WDLDrawRateTargetoption now accepts the value 0 (new default) to retain raw WDL values ifWDLCalibrationElois set to 0 (default). - Improvements to the verbose move stats if `WDLEvalObjectivity is used.
- The centipawn score is displayed by default for old nets without WDL output.
- Some build system improvements.
v0.31.0-rc2
In this version:
- Changed cuda compilation options to use
-arch=nativeor-arch=all-majorif no specific version is requested, with fallback for older cuda that don't support those options. - Updated android builds to use openblas 0.3.27.
- A few small fixes.
v0.31.0-rc1
In this version:
- The blas, cuda, eigen, metal and onnx backends now have support for multihead network architecture and can run BT3/BT4 nets.
- Updated the internal Elo model to better align with regular Elo for human players.
- There is a new XLA backend that uses OpenXLA compiler to produce code to execute the neural network. See https://github.com/LeelaChessZero/lc0/wiki/XLA-backend for details. Related are new leela2onnx options to output the HLO format that XLA understands.
- There is a vastly simplified lc0 interface available by renaming the executable to
lc0simple. - The backends can now suggest a minibatch size to the search, this is enabled by
--minibatch-size=0(the new default). - If the cudnn backend detected an unsupported network architecture it will switch to the cuda backend.
- Two new selfplay options enable value and policy tournaments. A policy tournament is using a single node policy to select the move to play, while a value tournament searches all possible moves at depth 1 to select the one with the best q.
- While it is easy to get a single node policy evaluation (
go nodes 1using uci), there was no simple way to get the effect of a value only evaluation, so the--value-onlyoption was added. - Button uci options were implemented and a button to clear the tree was added (as hidden option).
- Support for the uci
go mateoption was added. - The rescorer can now be built from the lc0 code base instead of a separate branch.
- A dicrete onnx layernorm implementation was added to get around a onnxruntime bug with directml - this has some overhead so it is only enabled for onnx-dml and can be switched off with the
alt_layernorm=falsebackend option. - The
--onnx2pytochoption was added to leela2onnx to generate pytorch compatible models. - There is a cuda
min_batchbackend option to reduce non-determinism with small batches. - New options were added to onnx2leela to fix tf exported onnx models.
- The onnx backend can now be built for amd's rocm.
- Fixed a bug where the Contempt effect on eval was too low for nets with natively higher draw rates.
- Made the WDL Rescale sharpness limit configurable via the
--wdl-max-shidden option. - The search task workers can be set automatically, to either 0 for cpu backends or up to 4 depending on the number of cpu cores. This is enabled by
--task-workers=-1(the new default). - Several assorted fixes and code cleanups.