DOC/ENH: Better AutoEncoder examples & support (#200)

adriangb · web-flow · commit b26a22d29062 · 2021-02-21T20:28:13.000-08:00
diff --git a/docs/source/notebooks/AutoEncoders.md b/docs/source/notebooks/AutoEncoders.md
@@ -5,8 +5,8 @@ jupyter:
     text_representation:
       extension: .md
       format_name: markdown
-      format_version: '1.2'
-      jupytext_version: 1.9.1
+      format_version: '1.3'
+      jupytext_version: 1.10.2
   kernelspec:
     display_name: Python 3
     language: python
@@ -19,7 +19,9 @@ jupyter:
 
 # Autoencoders in SciKeras
 
-Autencoders are an approach to use nearual networks to distill data into it's most important features, thereby compressing the data. We will be following the [Keras tutorial](https://blog.keras.io/building-autoencoders-in-keras.html) on the topic, which goes much more in depth and breadth than we will here. You are highly encouraged to check out that tutorial if you want to learn about autoencoders in the general sense.
+Autencoders are an approach to use nearual networks to distill data into it's most important features, thereby compressing the data.
+We will be following the [Keras tutorial](https://blog.keras.io/building-autoencoders-in-keras.html) on the topic, which goes much more in depth and breadth than we will here.
+You are highly encouraged to check out that tutorial if you want to learn about autoencoders in the general sense.
 
 ## Table of contents
 
@@ -28,6 +30,7 @@ Autencoders are an approach to use nearual networks to distill data into it's mo
 * [3. Define Keras Model](#3.-Define-Keras-Model)
 * [4. Training](#4.-Training)
 * [5. Explore Results](#5.-Explore-Results)
+* [6. Deep AutoEncoder](#6.-Deep-AutoEncoder)
 
 ## 1. Setup
 
@@ -73,99 +76,110 @@ print(x_test.shape)
 
 ## 3. Define Keras Model
 
-We will be defining a very simple autencoder. We define _three_ model building methods:
+We will be defining a very simple autencoder. We define _three_ model architectures:
 
-1. One to build a full end-to-end autoencoder.
-2. One to create a model that includes only the encoder portion.
-3. One that creates a model that includes only the decoder portion.
+1. An encoder: a series of densly connected layers culminating in an "output" layer that determines the encoding dimensions.
+2. A decoder: takes the output of the encoder as it's input and reconstructs the original data.
+3. An autoencoder: a chain of the encoder and decoder that directly connects them for training purposes.
 
 The only variable we give our model is the encoding dimensions, which will be a hyperparemter of our final transformer.
 
-```python
-from tensorflow import keras
+The encoder and decoder are views to the first/last layers of the autoencoder model.
+They'll be directly used in `transform` and `inverse_transform`, so we'll create some SciKeras models with those layers
+and save them as in `encoder_model_` and `decoder_model_`. All three models are created within `_keras_build_fn`.
 
+For a background on chaining Functional Models like this, see [All models are callable](https://keras.io/guides/functional_api/#all-models-are-callable-just-like-layers) in the Keras docs.
 
-def get_fit_model(encoding_dim: int) -> keras.Model:
-    """Get an autoencoder.
+```python
+from typing import Dict, Any
 
-    This autoencoder compresses a 28x28 image (784 pixels) down to a feature of length
-    `encoding_dim`, and tries to reconstruct the input image from that vector.
-    """
-    input_img = keras.Input(shape=(784,), name="input")
-    encoded = keras.layers.Dense(encoding_dim, activation='relu', name="encoded")(input_img)
-    decoded = keras.layers.Dense(784, activation='sigmoid', name="output")(encoded)
-    autoencoder_model = keras.Model(input_img, decoded)
-    return autoencoder_model
+from sklearn.base import TransformerMixin
+from sklearn.metrics import mean_squared_error
+from scikeras.wrappers import BaseWrapper
 
-def get_tf_model(fit_model: keras.Model) -> keras.Model:
-    """Get an encoder model.
 
-    We do this by extracting the encoding layer from the fitted autoencoder model.
+class AutoEncoder(BaseWrapper, TransformerMixin):
+    """A class that enables transform and fit_transform.
     """
-    return keras.Model(fit_model.get_layer("input").input, fit_model.get_layer("encoded").output)
 
-def get_inverse_tf_model(fit_model: keras.Model, encoding_dim: int) -> keras.Model:
-    """Get an deencoder model.
+    encoder_model_: BaseWrapper
+    decoder_model_: BaseWrapper
+    
+    def _keras_build_fn(self, encoding_dim: int, meta: Dict[str, Any]):
+        n_features_in = meta["n_features_in_"]
 
-    We do this by extracting the deencoding layer from the fitted autoencoder model
-    and adding a new Keras input layer.
-    """
-    encoded_input = keras.Input(shape=(encoding_dim,))
-    output = fit_model.get_layer("output")(encoded_input)
-    return keras.Model(encoded_input, output)
-```
+        encoder_input = keras.Input(shape=(n_features_in,))
+        encoder_output = keras.layers.Dense(encoding_dim, activation='relu')(encoder_input)
+        encoder_model = keras.Model(encoder_input, encoder_output)
 
-Next we create a class that that will enable the `transform` and `fit_transform` methods, as well as integrating all three of our models into a single estimator.
+        decoder_input = keras.Input(shape=(encoding_dim,))
+        decoder_output = keras.layers.Dense(n_features_in, activation='sigmoid', name="decoder")(decoder_input)
+        decoder_model = keras.Model(decoder_input, decoder_output)
+        
+        autoencoder_input = keras.Input(shape=(n_features_in,))
+        encoded_img = encoder_model(autoencoder_input)
+        reconstructed_img = decoder_model(encoded_img)
 
-```python
-from sklearn.base import TransformerMixin, clone
-from scikeras.wrappers import BaseWrapper
+        autoencoder_model = keras.Model(autoencoder_input, reconstructed_img)
 
+        self.encoder_model_ = BaseWrapper(encoder_model, verbose=self.verbose)
+        self.decoder_model_ = BaseWrapper(decoder_model, verbose=self.verbose)
 
-class KerasTransformer(BaseWrapper, TransformerMixin):
-    """A class that enables transform and fit_transform.
-    """
-
-    def __init__(self, *args, tf_est: BaseWrapper = None, inv_tf_est: BaseWrapper = None, **kwargs) -> None:
-        super().__init__(*args, **kwargs)
-        self.tf_est = tf_est
-        self.inv_tf_est = inv_tf_est
-
+        return autoencoder_model
+    
+    def _initialize(self, X, y=None):
+        X, _ = super()._initialize(X=X, y=y)
+        # since encoder_model_ and decoder_model_ share layers (and their weights)
+        # X_tf here come from random weights, but we only use it to initialize our models
+        X_tf = self.encoder_model_.initialize(X).predict(X)
+        self.decoder_model_.initialize(X_tf)
+        return X, X
+
+    def initialize(self, X):
+        self._initialize(X=X, y=X)
+        return self
 
-    def fit(self, X, sample_weight=None):
+    def fit(self, X, *, sample_weight=None) -> "AutoEncoder":
         super().fit(X=X, y=X, sample_weight=sample_weight)
-        self.tf_est_ = clone(self.tf_est)
-        self.inv_tf_est_ = clone(self.inv_tf_est)
-        self.tf_est_.set_params(fit_model=self.model_)
-        self.inv_tf_est_.set_params(fit_model=self.model_, encoding_dim=self.encoding_dim)
-        X = self.feature_encoder_.transform(X)
-        self.tf_est_.initialize(X=X)
-        X_tf = self.tf_est_.predict(X=X)
-        self.inv_tf_est_.initialize(X_tf)
+        # at this point, encoder_model_ and decoder_model_
+        # are both "fitted" because they share layers w/ model_
+        # which is fit in the above call
         return self
 
-    def transform(self, X):
-        X = self.feature_encoder_.transform(X)
-        X_tf = self.tf_est_.predict(X)
-        return X_tf
-    
-    def inverse_transform(self, X_tf):
-        X = self.inv_tf_est_.predict(X_tf)
-        X = self.feature_encoder_.inverse_transform(X)
-        return X
+    def score(self, X) -> float:
+        # Note: we use 1-MSE as the score
+        # With MSE, "larger is better", but Scikit-Learn
+        # always maximizes the score (e.g. in GridSearch)
+        return 1 - mean_squared_error(self.predict(X), X)
+
+    def transform(self, X) -> np.ndarray:
+        X: np.ndarray = self.feature_encoder_.transform(X)
+        return self.encoder_model_.predict(X)
+
+    def inverse_transform(self, X_tf: np.ndarray):
+        X: np.ndarray = self.decoder_model_.predict(X_tf)
+        return self.feature_encoder_.inverse_transform(X)
 ```
 
-Next, we wrap the Keras Model with Scikeras. Note that for our encoder/decoder estimators, we do not need to provide a loss function since no training will be done. We do however need to have the `fit_model` and `encoding_dim` so that these will be settable by `BaseWrapper.set_params`.
+Next, we wrap the Keras Model with Scikeras. Note that for our encoder/decoder estimators, we do not need to provide a loss function since no training will be done.
+We do however need to have the `fit_model` and `encoding_dim` so that these will be settable by `BaseWrapper.set_params`.
 
 ```python
-tf_est = BaseWrapper(model=get_tf_model, fit_model=None, verbose=0)
-inv_tf_est = BaseWrapper(model=get_inverse_tf_model, fit_model=None, encoding_dim=None, verbose=0)
-autoencoder = KerasTransformer(model=get_fit_model, tf_est=tf_est, inv_tf_est=inv_tf_est, loss="binary_crossentropy", encoding_dim=32, epochs=5)
+autoencoder = AutoEncoder(
+    loss="binary_crossentropy",
+    encoding_dim=32,
+    random_state=0,
+    epochs=5,
+    verbose=False,
+    optimizer="adam",
+)
 ```
 
 ## 4. Training
 
-To train the model, we pass the input images as both the features and the target. This will train the layers to compress the data as accurately as possible between the encoder and decoder. Note that we only pass the `X` parameter, since we defined the mapping `y=X` in `KerasTransformer.fit` above.
+To train the model, we pass the input images as both the features and the target.
+This will train the layers to compress the data as accurately as possible between the encoder and decoder.
+Note that we only pass the `X` parameter, since we defined the mapping `y=X` in `KerasTransformer.fit` above.
 
 ```python
 _ = autoencoder.fit(X=x_train)
@@ -208,8 +222,77 @@ What about the compression? Let's check the sizes of the arrays.
 
 ```python
 encoded_imgs = autoencoder.transform(x_test)
-print(f"x_test.shape[1]: {x_test.shape[1]}")
-print(f"encoded_imgs.shape[1]: {encoded_imgs.shape[1]}")
+print(f"x_test size (in MB): {x_test.nbytes/1024**2:.2f}")
+print(f"encoded_imgs size (in MB): {encoded_imgs.nbytes/1024**2:.2f}")
 cr = round((encoded_imgs.nbytes/x_test.nbytes), 2)
 print(f"Compression ratio: 1/{1/cr:.0f}")
 ```
+
+## 6. Deep AutoEncoder
+
+
+We can easily expand our model to be a deep autoencoder by adding some hidden layers. All we have to do is add a parameter `hidden_layer_sizes` and use it in `_keras_build_fn` to build hidden layers.
+For simplicity, we use a single `hidden_layer_sizes` parameter and mirror it across the encoding layers and decoding layers, but there is nothing forcing us to build symetrical models.
+
+```python
+from typing import List
+
+
+class DeepAutoEncoder(AutoEncoder):
+    """A class that enables transform and fit_transform.
+    """
+    
+    def _keras_build_fn(self, encoding_dim: int, hidden_layer_sizes: List[str], meta: Dict[str, Any]):
+        n_features_in = meta["n_features_in_"]
+
+        encoder_input = keras.Input(shape=(n_features_in,))
+        x = encoder_input
+        for layer_size in hidden_layer_sizes:
+            x = keras.layers.Dense(layer_size, activation='relu')(x)
+        encoder_output = keras.layers.Dense(encoding_dim, activation='relu')(x)
+        encoder_model = keras.Model(encoder_input, encoder_output)
+
+        decoder_input = keras.Input(shape=(encoding_dim,))
+        x = decoder_input
+        for layer_size in reversed(hidden_layer_sizes):
+            x = keras.layers.Dense(layer_size, activation='relu')(x)
+        decoder_output = keras.layers.Dense(n_features_in, activation='sigmoid', name="decoder")(x)
+        decoder_model = keras.Model(decoder_input, decoder_output)
+
+        autoencoder_input = keras.Input(shape=(n_features_in,))
+        encoded_img = encoder_model(autoencoder_input)
+        reconstructed_img = decoder_model(encoded_img)
+
+        autoencoder_model = keras.Model(autoencoder_input, reconstructed_img)
+
+        self.encoder_model_ = BaseWrapper(encoder_model, verbose=self.verbose)
+        self.decoder_model_ = BaseWrapper(decoder_model, verbose=self.verbose)
+
+        return autoencoder_model
+```
+
+```python
+deep = DeepAutoEncoder(
+    loss="binary_crossentropy",
+    encoding_dim=32,
+    hidden_layer_sizes=[128],
+    random_state=0,
+    epochs=5,
+    verbose=False,
+    optimizer="adam",
+)
+_ = deep.fit(X=x_train)
+```
+
+```python
+print("1-MSE for training set (higher is better)\n")
+score = autoencoder.score(X=x_test)
+print(f"AutoEncoder: {score:.4f}")
+
+score = deep.score(X=x_test)
+print(f"Deep AutoEncoder: {score:.4f}")
+```
+
+Suprisingly, our score got worse. It's possible that that because of the extra trainable variables, our deep model trains slower than our simple model.
+
+Check out the [Keras tutorial](https://blog.keras.io/building-autoencoders-in-keras.html) to see the difference after 100 epochs of training, as well as more architectures and applications for AutoEncoders!
diff --git a/docs/source/notebooks/Basic_Usage.md b/docs/source/notebooks/Basic_Usage.md
@@ -5,8 +5,8 @@ jupyter:
     text_representation:
       extension: .md
       format_name: markdown
-      format_version: '1.2'
-      jupytext_version: 1.9.1
+      format_version: '1.3'
+      jupytext_version: 1.10.2
   kernelspec:
     display_name: Python 3
     language: python
diff --git a/docs/source/notebooks/Benchmarks.md b/docs/source/notebooks/Benchmarks.md
@@ -5,8 +5,8 @@ jupyter:
     text_representation:
       extension: .md
       format_name: markdown
-      format_version: '1.2'
-      jupytext_version: 1.9.1
+      format_version: '1.3'
+      jupytext_version: 1.10.2
   kernelspec:
     display_name: Python 3
     language: python
diff --git a/docs/source/notebooks/DataTransformers.md b/docs/source/notebooks/DataTransformers.md
@@ -5,8 +5,8 @@ jupyter:
     text_representation:
       extension: .md
       format_name: markdown
-      format_version: '1.2'
-      jupytext_version: 1.9.1
+      format_version: '1.3'
+      jupytext_version: 1.10.2
   kernelspec:
     display_name: Python 3
     language: python
diff --git a/docs/source/notebooks/MLPClassifier_MLPRegressor.md b/docs/source/notebooks/MLPClassifier_MLPRegressor.md
@@ -5,8 +5,8 @@ jupyter:
     text_representation:
       extension: .md
       format_name: markdown
-      format_version: '1.2'
-      jupytext_version: 1.9.1
+      format_version: '1.3'
+      jupytext_version: 1.10.2
   kernelspec:
     display_name: Python 3
     language: python
diff --git a/docs/source/notebooks/Meta_Estimators.md b/docs/source/notebooks/Meta_Estimators.md
@@ -5,8 +5,8 @@ jupyter:
     text_representation:
       extension: .md
       format_name: markdown
-      format_version: '1.2'
-      jupytext_version: 1.9.1
+      format_version: '1.3'
+      jupytext_version: 1.10.2
   kernelspec:
     display_name: Python 3
     language: python
diff --git a/scikeras/wrappers.py b/scikeras/wrappers.py
@@ -403,14 +403,13 @@ def _build_keras_model(self):
         else:
             model = final_build_fn(**build_params)
 
-        # compile model if user gave us an un-compiled model
-        if not (hasattr(model, "loss") and hasattr(model, "optimizer")):
-            if compile_kwargs is None:
-                compile_kwargs = self._get_compile_kwargs()
-            model.compile(**compile_kwargs)
-
         return model
 
+    def _ensure_compiled_model(self) -> None:
+        # compile model if user gave us an un-compiled model
+        if not (hasattr(self.model_, "loss") and hasattr(self.model_, "optimizer")):
+            self.model_.compile(**self._get_compile_kwargs())
+
     def _fit_keras_model(
         self,
         X: Union[np.ndarray, List[np.ndarray], Dict[str, np.ndarray]],
@@ -447,6 +446,7 @@ def _fit_keras_model(
             A reference to the instance that can be chain called
             (ex: instance.fit(X,y).transform(X) )
         """
+
         # Make sure model has a loss function
         loss = self.model_.loss
         no_loss = False
@@ -828,6 +828,7 @@ def _fit(
             X, y = self._initialize(X, y)
         else:
             X, y = self._validate_data(X, y)
+        self._ensure_compiled_model()
 
         if sample_weight is not None:
             X, sample_weight = self._validate_sample_weight(X, sample_weight)