Iterator.Chan() considered harmful

_From @chewxy on August 13, 2017 2:9_


sketch space for describing how to create a `chan int` of negative length, and how to reproduce it


# Background/Context of the Issue #

Gorgonia is a library for representing and executing mathematical equations, and performing automatic differentiation. It's like Tensorflow and PyTorch for Go. It's currently undergoing some major internal refactor (that will not affect the public APIs much)

I was improving the backend `tensor` package by splitting up the data structure into a data structure + pluggable execution engine, instead of having built in methods (see also #128). The reasons are so that it's easier to change out execution backends (CPU, GPU... even a network CPU (actual experiment I did was to run a small neural network on a Raspberry Pi and all computation is offshored to my workstation, and vice versa, which turned out to be a supremely bad idea)). 

Another reason was due to the fact that I wanted to do some experiments at my work which use algorithms that involve sparse tensors (see also #127)  for matrix factorization tasks. 

Lastly, I wanted to clean up the generics support of the `tensor` package. The current master branch of the `tensor` package had a lot of code to support arbitrary tensor types. With the split of execution engines and data structure, more of this support could be offloaded to the execution engine instead.  This package provides a default execution engine  (`type StdEng struct{}`: https://github.com/chewxy/gorgonia/blob/debugrace/tensor/defaultengine.go), which could be extended (example: https://github.com/chewxy/gorgonia/blob/debugrace/tensor/example_extension_test.go) . The idea was to have an `internal/execution` package which held all the code for the default execution engine.

# Data Structures # 
The most fundamental data structure is [`storage.Header`](https://github.com/chewxy/gorgonia/blob/debugrace/tensor/internal/storage/header.go#L11), which is an analogue for a Go slice: it's a three word structure. It's chosen because it is a ridiculously simple structure can store Go-allocated memory, C-allocated memory and device-allocated memory (like CUDA).

On top of `storage.Header` is  [`tensor.array`](https://github.com/chewxy/gorgonia/blob/debugrace/tensor/array.go#L13). It's essentially a `storage.Header` with an additional field for the type. The `v` field will eventually be phased out once the refactor is complete.

On top of `tensor.array` are the various implementations of `tensor.Tensor`. Chief amongst these is the `tensor.Dense` struct. Essentially it's a `tensor.array` coupled with some [access patterns and meta information](https://github.com/chewxy/gorgonia/blob/debugrace/tensor/ap.go#L20). 

Access to the data in the `tensor.Tensor` can be achieved by use of [`Iterator`s](https://github.com/chewxy/gorgonia/blob/debugrace/tensor/iterator.go#L43). The `Iterator` basically assumes that the data is held in a flat slice, and returns the next index on the slice. There are auxiliary methods like `NextValidity` to handle special case tensors like masked tensors, where some elements are masked from operations. 

The bug happens in the `Chan` method of the `FlatIterator` type.

# How to reproduce #

The branch where the bug is known to exist is the `debugrace` branch,  which can be found here: 1dee6d2 . 

1. `git checkout debugrace`
1. Run tests with various `GOMAXPROCS` like so: `GOMAXPROCS=1 go test -run=.` . Try it with various `GOMAXPROCS`, one of them is bound to trigger an issue. 
2. The test won't panic, because I have added a `recover` statement here https://github.com/chewxy/gorgonia/blob/debugrace/tensor/dense_viewstack_specializations.go#L636. Removing the deferred function causes a index out of bounds panic.
2. All the tests must be run to trigger the issue. 
2. The issue is found in the test for the `Stack` function: https://github.com/chewxy/gorgonia/blob/debugrace/tensor/dense_matop_test.go#L768 . If only the stack test is run (for example `GOMAXPROCS=1 go test -run=Stack`), it is unlikely the problem will show up (I wrote a tiny python script to run it as many times as possible with many `GOMAXPROCS` configurations and none of them caused an error). 

You should get something like this:

![image](https://user-images.githubusercontent.com/471890/29297381-ef9a80b4-81a3-11e7-96c6-88251930a5d8.png)



# Environments #
I've managed to reproduce the issue on OS X, with Go 1.8 and on Ubuntu 16.10 with Go 1.8.2 and Go tip (whatever gvm thinks is Go tip). I've no access to Go on a windows box so I can't test it on Windows. 

# Magic and Unsafe Use #

As part of the refactoring, there are a few magic bits being used. Here I attempt to list them all (may not be exhaustive):

* The Go slice structure is re-implemented in https://github.com/chewxy/gorgonia/blob/debugrace/tensor/internal/storage/header.go. Note that here an `unsafe.Pointer` is used instead of the standard one like `reflect.SliceHeader`  which stores a `uintptr`. This is due to the fact that I want Go to keep a reference to the actual slice. This may affect the runtime and memory allocation.. I'm not too sure.
* `//go:linkname` is used in some internal packages (specific example here: https://github.com/chewxy/gorgonia/blob/debugrace/tensor/internal/execution/generic_arith_vv.go). It's basically just a rename of functions in github.com/chewxy/vecf32 and github.com/chewxy/vecf64. Those packages contain optional AVX/SSE related vector operations like arithmetics. However, those have to be manually invoked via a build tag. By default it uses go algorithms, not SSE/AVX operations.
* `//go:linkname` is used in unsafe.go: https://github.com/chewxy/gorgonia/blob/debugrace/tensor/unsafe.go#L105. However it should be noted that `memmove` is never called as after some tests I decided it would be too unsafe to use (also explains why there are comments that say `TODO: implement memmove`.
* There are several naughty pointer arithmetics at play:
    * [array.sliceInto](https://github.com/chewxy/gorgonia/blob/debugrace/tensor/array.go#L74)
    * [reflect based getting and setting](https://github.com/chewxy/gorgonia/blob/debugrace/tensor/array_getset.go#L71)
    * [overlap checks](https://github.com/chewxy/gorgonia/blob/debugrace/tensor/dense_assign.go#L5)

# What I suspect # 

I suspect that there may be some naughty things happening in memory (because it only happens when all the tests are run). The problem is I don't know exactly where to start looking.

_Copied from original issue: chewxy/gorgonia#135_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Iterator.Chan() considered harmful #5

Background/Context of the Issue

Data Structures

How to reproduce

Environments

Magic and Unsafe Use

What I suspect

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Iterator.Chan() considered harmful #5

Description

Background/Context of the Issue

Data Structures

How to reproduce

Environments

Magic and Unsafe Use

What I suspect

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions