Describe the bug
ScalarNdarray implements __del__() to allow python to get rid of the allocated memory,
but other subclasses of Ndarray (like VectorNdarray() or MatrixNdarray()) don't  implement the same behaviour.
Seems to me the __del__() method should be moved to the Ndarray class.
I tested this simple change and indeed it fixes the memory leak.
To Reproduce
import taichi as ti
import taichi.types as tt
ti.init(ti.gpu)
while True:
    arr = ti.ndarray(tt.vector(3, ti.f32), (1000, 1000))
    arr = arr.to_numpy()
    # whatch gpu memory skyrocket 
If, instead:
import taichi as ti
import taichi.types as tt
ti.init(ti.gpu)
while True:
    arr = ti.ndarray(ti.f32, (1000, 1000, 3))
    arr = arr.to_numpy()
    # whatch gpu memory basically stay the same 
Log/Screenshots
Additional comments