Merge pull request #99 from libffcv/main

andrewilyas · web-flow · commit d817cc28a9d8 · 2022-01-24T10:35:13.000-05:00
Update no_jit_assert branch with bug fixes
diff --git a/docker/Dockerfile b/docker/Dockerfile
@@ -0,0 +1,32 @@
+FROM pytorch/pytorch:latest
+
+RUN apt-get update && apt-get install -y --no-install-recommends \
+        software-properties-common \
+        build-essential \
+        curl \
+        git \
+        ffmpeg
+
+RUN conda create -n ffcv python=3.9 \
+        cupy \
+        pkg-config \
+        compilers \
+        libjpeg-turbo \
+        opencv \
+        pytorch \
+        torchvision \
+        cudatoolkit=11.3 \
+        numba -c pytorch -c conda-forge 
+
+RUN echo "source activate" >> ~/.bashrc
+RUN echo "conda activate ffcv" >> ~/.bashrc
+
+RUN git clone https://github.com/libffcv/ffcv.git  
+
+RUN conda run -n ffcv pip install ffcv
+
+# To test:
+# 1- build the Dockerfile (e.g. docker build -t ffcv .)
+# 2- login to the docker container (e.g. docker run -it --gpus all ffcv bash)
+# 3- cd ffcv/examples/cifar
+# 4- bash train_cifar.sh
diff --git a/docs/ffcv_examples/cifar10.rst b/docs/ffcv_examples/cifar10.rst
@@ -106,8 +106,8 @@ For the model, we use a custom ResNet-9 architecture from `KakaoBrain <https://g
 
     class Mul(ch.nn.Module):
         def __init__(self, weight):
-        super(Mul, self).__init__()
-        self.weight = weight
+            super(Mul, self).__init__()
+            self.weight = weight
         def forward(self, x): return x * self.weight
 
     class Flatten(ch.nn.Module):
diff --git a/docs/index.rst b/docs/index.rst
@@ -16,6 +16,8 @@ Install ``ffcv``:
    conda activate ffcv
    pip install ffcv
 
+We also provide a `Dockerfile <https://github.com/libffcv/ffcv/blob/main/docker/Dockerfile>`_ that installs ``ffcv`` in few steps.
+
 
 Introduction
 ------------
diff --git a/docs/making_dataloaders.rst b/docs/making_dataloaders.rst
@@ -49,6 +49,13 @@ takes an ``enum`` provided by :class:`ffcv.loader.OrderOption`:
   # Memory-efficient but not truly random loading
   # Speeds up loading over RANDOM when the whole dataset does not fit in RAM!
   ORDERING = OrderOption.QUASI_RANDOM
+  
+.. note::
+    ``order`` options require different amounts of RAM, thus should be used considering how much RAM available in a case-by-case basis.
+    
+    - ``RANDOM`` requires RAM the most since it will have to cache the entire dataset to sample perfectly at random. If the available RAM is not enough, it will throw an exception.
+    - ``QUASI_RANDOM`` requires much less RAM than ``RANDOM``, but a bit more than ``SEQUENTIAL``, in order to cache a part of samples. It is used when the entire dataset can not fit RAM. 
+    - ``SEQUENTIAL`` requires least RAM. It only keeps several samples loaded ahead of time used in incoming training iterations.
 
 Pipelines
 '''''''''
@@ -165,12 +172,12 @@ Other options
 
 You can also specify the following additional options when constructing an :class:`ffcv.loader.Loader`:
 
-- ``os_cache``: If True, the entire dataset is cached
+- ``os_cache``: If ``True``, the OS automatically determines whether the dataset is held in memory or not, depending on available RAM. If ``False``, FFCV manages the caching, and the amount of RAM needed depends on ``order`` option.
 - ``distributed``: For training on :ref:`multiple GPUs<Scenario: Multi-GPU training (1 model, multiple GPUs)>`
 - ``seed``: Specify the random seed for batch ordering
 - ``indices``: Provide indices to load a subset of the dataset
 - ``custom_fields``: For specifying decoders for fields with custom encoders
-- ``drop_last``: If True, drops the last non-full batch from each iteration
+- ``drop_last``: If ``True``, drops the last non-full batch from each iteration
 - ``batches_ahead``: Set the number of batches prepared in advance. Increasing it absorbs variation in processing time to make sure the training loop does not stall for too long to process batches. Decreasing it reduces RAM usage.
 - ``recompile``: Recompile every iteration. Useful if you have transforms that change their behavior from epoch to epoch, for instance code that uses the shape as a compile time param. (But if they just change their memory usage, e.g., the resolution changes, it's not necessary.)
 
diff --git a/docs/parameter_tuning.rst b/docs/parameter_tuning.rst
@@ -22,7 +22,7 @@ Scenario: Large scale datasets
 If your dataset is too large to be cached on the machine we recommend:
 
 - Use ``os_cache=False``. Since the data can't be cached, FFCV will have to read it over and over. Having FFCV take over the operating system for caching is beneficial as it knows in advance the which samples will be needed in the future and can load them ahead of time.
-- For ``order``, we recommend using the ``QUASI_RANDOM`` traversal order if you need randomness but perfect uniform sampling isn't mission critical. This will optimize the order to minimize the reads on the underlying storage while maintaining very good randomness properties. If you have experience with the ``shuffle()`` function of ``webdataset`` and the quality of the randomness wasn't sufficient, we still suggest you give ``QUASI_RANDOM`` a try as it should be significantly better.
+- For ``order``, we recommend using the ``QUASI_RANDOM`` traversal order if you need randomness but perfect uniform sampling isn't mission critical. This will optimize the order to minimize the reads on the underlying storage while maintaining very good randomness properties. If you have experience with the ``shuffle()`` function of ``webdataset`` and the quality of the randomness wasn't sufficient, we still suggest you give ``QUASI_RANDOM`` a try as it should be significantly better. Using ``RANDOM`` is unfeasible in this situation because it needs to load the entire dataset in RAM, causing an out-of-memory exception.
 
 
 Scenario: Multi-GPU training (1 model, multiple GPUs)
diff --git a/docs/quickstart.rst b/docs/quickstart.rst
@@ -18,15 +18,15 @@ PyTorch datasets and `WebDatasets <https://github.com/webdataset/webdataset>`_):
     # Pass a type for each data field
     writer = DatasetWriter(write_path, {
         # Tune options to optimize dataset size, throughput at train-time
-        'image': RGBImageField({
+        'image': RGBImageField(
             max_resolution=256,
             jpeg_quality=jpeg_quality
-        }),
+        ),
         'label': IntField()
     })
 
     # Write dataset
-    writer.from_indexed_dataset(ds)
+    writer.from_indexed_dataset(my_dataset)
 
 Then replace your old loader with the `ffcv` loader at train time (in PyTorch,
 no other changes required!):
@@ -58,4 +58,4 @@ no other changes required!):
     for epoch in range(epochs):
         ...
 
-See :ref:`here <Getting started>` for a more detailed guide to deploying `ffcv` for your dataset.
+See :ref:`here <Getting started>` for a more detailed guide to deploying `ffcv` for your dataset.
diff --git a/docs/writing_datasets.rst b/docs/writing_datasets.rst
@@ -40,7 +40,7 @@ returns an input vector and its corresponding label:
             self.Y = np.randn(N)
 
         def __getitem__(self, idx):
-            return (self.X[idx], self.Y[idx])
+            return (self.X[idx].astype('float32'), self.Y[idx])
 
         def __len__(self):
             return len(self.X)