hackingmaterials
diff --git a/‎.pre-commit-config.yaml
Lines changed: 15 additions & 0 deletions b/‎.pre-commit-config.yaml
Lines changed: 15 additions & 0 deletions
diff --git a/‎CONTRIBUTING.md
Lines changed: 43 additions & 23 deletions b/‎CONTRIBUTING.md
Lines changed: 43 additions & 23 deletions
diff --git a/‎automatminer/__init__.py
Lines changed: 8 additions & 8 deletions b/‎automatminer/__init__.py
Lines changed: 8 additions & 8 deletions
diff --git a/‎automatminer/base.py
Lines changed: 7 additions & 3 deletions b/‎automatminer/base.py
Lines changed: 7 additions & 3 deletions
diff --git a/‎automatminer/pipeline.py
Lines changed: 48 additions & 33 deletions b/‎automatminer/pipeline.py
Lines changed: 48 additions & 33 deletions
diff --git a/‎automatminer/preprocessing/__init__.py
Lines changed: 1 addition & 1 deletion b/‎automatminer/preprocessing/__init__.py
Lines changed: 1 addition & 1 deletion
@@ -0,0 +1,15 @@
+repos:
+  - repo: https://github.com/pre-commit/mirrors-isort
+    rev: v4.3.21
+    hooks:
+      - id: isort
+        language_version: python3.7
+  - repo: https://github.com/ambv/black
+    rev: stable
+    hooks:
+      - id: black
+        language_version: python3.7
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v2.3.0
+    hooks:
+      - id: flake8
@@ -1,44 +1,64 @@
 # Contributing to automatminer
+
 We love your input! We want to make contributing to automatminer as easy and transparent as possible, whether it's:
-*   Reporting a bug
-*   Discussing the current state of the code
-*   Submitting a fix
-*   Proposing or implementing new features
-*   Becoming a maintainer
+
+- Reporting a bug
+- Discussing the current state of the code
+- Submitting a fix
+- Proposing or implementing new features
+- Becoming a maintainer
 
 ## Reporting bugs, getting help, and discussion
+
 At any time, feel free to start a thread on the automatminer [Discourse forum](https://hackingmaterials.discourse.group/c/matminer/automatminer).
 
 If you are making a bug report, incorporate as many elements of the following as possible to ensure a timely response and avoid the need for followups:
-*   A quick summary and/or background
-*   Steps to reproduce - be specific! **Provide sample code.**
-*   What you expected would happen, compared to what actually happens
-*   The full stack trace of any errors you encounter
-*   Notes (possibly including why you think this might be happening, or steps you tried that didn't work)
+
+- A quick summary and/or background
+- Steps to reproduce - be specific! **Provide sample code.**
+- What you expected would happen, compared to what actually happens
+- The full stack trace of any errors you encounter
+- Notes (possibly including why you think this might be happening, or steps you tried that didn't work)
 
 We love thorough bug reports as this means the development team can make quick and meaningful fixes. When we confirm your bug report, we'll move it to the GitHub issues where its progress can be further tracked.
 
-## Contributing code modifications or additions through Github
-We use github to host code, to track issues and feature requests, as well as accept pull requests.
+## Contributing code modifications or additions through GitHub
+
+We use GitHub to host code, to track issues and feature requests, as well as accept pull requests.
 
-Pull requests are the best way to propose changes to the codebase. Follow the [Github flow](https://www.atlassian.com/git/tutorials/comparing-workflows/forking-workflow) for more information on this procedure.
+Pull requests are the best way to propose changes to the codebase. Follow the [GitHub flow](https://www.atlassian.com/git/tutorials/comparing-workflows/forking-workflow) for more information on this procedure.
 
 The basic procedure for making a PR is:
-*   Fork the repo and create your branch from master.
-*   Commit your improvements to your branch and push to your Github fork (repo).
-*   When you're finished, go to your fork and make a Pull Request. It will automatically update if you need to make further changes.
+
+- Fork the repo on GitHub and clone it to your machine.
+
+  ```sh
+  git clone https://github.com/<your_github_name>/automatminer
+  ```
+
+- Install both regular and development dependencies and setup the `git` pre-commit hook.
+  
+  ```sh
+  pip install -r requirements.txt requirement && pre-commit install
+  ```
+
+  This step is important as your changes may otherwise contain style violations that will throw errors when running our CI on your pull request.
+- Commit your improvements and push to your GitHub fork.
+- When you're finished, go to your fork and make a pull request. It will automatically update if you need to make further changes.
 
 ### How to Make a **Great** Pull Request
+
 We have a few tips for writing good PRs that are accepted into the main repo:
 
-*   Use the Google Code style for all of your code. Find an example [here.](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html)
-*   Your code should have (4) spaces instead of tabs.
-*   If needed, update the documentation.
-*   **Write tests** for new features! Good tests are 100%, absolutely necessary for good code. We use the python `unittest` framework -- see some of the other tests in this repo for examples, or review the [Hitchhiker's guide to python](https://docs.python-guide.org/writing/tests/) for some good resources on writing good tests.
-*   Understand your contributions will fall under the same license as this repo. 
+- Use the Google Code style for all of your code. Find an example [here.](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html)
+- Your code should have (4) spaces instead of tabs.
+- If needed, update the documentation.
+- **Write tests** for new features! Good tests are 100%, absolutely necessary for good code. We use the python `unittest` framework -- see some of the other tests in this repo for examples, or review the [Hitchhiker's guide to python](https://docs.python-guide.org/writing/tests/) for some good resources on writing good tests.
+- Understand your contributions will fall under the same license as this repo.
 
-When you submit your PR, our CI service will automatically run your tests. 
+When you submit your PR, our CI service will automatically run your tests.
 We welcome good discussion on the best ways to write your code, and the comments on your PR are an excellent area for discussion.
 
 #### References
-This document was adapted from the open-source contribution guidelines for Facebook's Draft, as well as briandk's [contribution template](https://gist.github.com/briandk/3d2e8b3ec8daf5a27a62). 
+
+This document was adapted from the open-source contribution guidelines for Facebook's Draft, as well as briandk's [contribution template](https://gist.github.com/briandk/3d2e8b3ec8daf5a27a62).
@@ -1,10 +1,10 @@
-from automatminer.preprocessing import DataCleaner, FeatureReducer
-from automatminer.automl import TPOTAdaptor, SinglePipelineAdaptor
-from automatminer.featurization import AutoFeaturizer
-from automatminer.pipeline import MatPipe
-from automatminer.presets import get_preset_config
+from automatminer.automl import SinglePipelineAdaptor, TPOTAdaptor  # noqa
+from automatminer.featurization import AutoFeaturizer  # noqa
+from automatminer.pipeline import MatPipe  # noqa
+from automatminer.preprocessing import DataCleaner, FeatureReducer  # noqa
+from automatminer.presets import get_preset_config  # noqa
 
-__author__ = 'Alex Dunn, Qi Wang, Alex Ganose, Alireza Faghaninia, Anubhav Jain'
-__author_email__ = '[email protected]'
-__license__ = 'Modified BSD'
+__author__ = "Alex Dunn, Qi Wang, Alex Ganose, Alireza Faghaninia, Anubhav Jain"
+__author_email__ = "[email protected]"
+__license__ = "Modified BSD"
 __version__ = "2019.10.14"
@@ -3,9 +3,13 @@
 """
 import abc
 import logging
+
+from automatminer.utils.log import (
+    AMM_LOGGER_BASENAME,
+    initialize_logger,
+    initialize_null_logger,
+)
 from sklearn.base import BaseEstimator
-from automatminer.utils.log import initialize_logger, \
-    initialize_null_logger, AMM_LOGGER_BASENAME
 
 __authors__ = ["Alex Dunn <[email protected]>", "Alex Ganose <[email protected]>"]
 
@@ -24,7 +28,7 @@ def logger(self):
     @logger.setter
     def logger(self, new_logger):
         """Set a new logger.
-        
+
         Args:
             new_logger (Logger, bool): A boolean or custom logger object to use
             for logging. Alternatively, if set to True, the default automatminer
 
@@ -1,20 +1,25 @@
 """
 The highest level classes for pipelines.
 """
-import os
 import copy
+import os
 import pickle
 from typing import Dict
 
 import pandas as pd
-
-from automatminer.base import LoggableMixin, DFTransformer
+from automatminer.base import DFTransformer, LoggableMixin
 from automatminer.presets import get_preset_config
-from automatminer.utils.ml import regression_or_classification
-from automatminer.utils.pkg import check_fitted, set_fitted, \
-    return_attrs_recursively, AutomatminerError, VersionError, get_version, \
-    save_dict_to_file
 from automatminer.utils.log import AMM_DEFAULT_LOGGER
+from automatminer.utils.ml import regression_or_classification
+from automatminer.utils.pkg import (
+    AutomatminerError,
+    VersionError,
+    check_fitted,
+    get_version,
+    return_attrs_recursively,
+    save_dict_to_file,
+    set_fitted,
+)
 
 
 class MatPipe(DFTransformer, LoggableMixin):
@@ -88,15 +93,23 @@ class MatPipe(DFTransformer, LoggableMixin):
         target (str): The name of the column where target values are held.
     """
 
-    def __init__(self, autofeaturizer=None, cleaner=None, reducer=None,
-                 learner=None, logger=AMM_DEFAULT_LOGGER):
+    def __init__(
+        self,
+        autofeaturizer=None,
+        cleaner=None,
+        reducer=None,
+        learner=None,
+        logger=AMM_DEFAULT_LOGGER,
+    ):
         transformers = [autofeaturizer, cleaner, reducer, learner]
         if not all(transformers):
             if any(transformers):
-                raise AutomatminerError("Please specify all dataframe"
-                                        "transformers (autofeaturizer, learner,"
-                                        "reducer, and cleaner), or none (to use"
-                                        "default).")
+                raise AutomatminerError(
+                    "Please specify all dataframe"
+                    "transformers (autofeaturizer, learner,"
+                    "reducer, and cleaner), or none (to use"
+                    "default)."
+                )
             else:
                 config = get_preset_config("express")
                 autofeaturizer = config["autofeaturizer"]
@@ -117,7 +130,7 @@ def __init__(self, autofeaturizer=None, cleaner=None, reducer=None,
         super(MatPipe, self).__init__()
 
     @staticmethod
-    def from_preset(preset: str = 'express', **powerups):
+    def from_preset(preset: str = "express", **powerups):
         """
         Get a preset MatPipe from a string using
         automatminer.presets.get_preset_config
@@ -238,8 +251,7 @@ def predict(self, df, ignore=None):
         return merged_df
 
     @set_fitted
-    def benchmark(self, df, target, kfold, fold_subset=None, cache=False,
-                  ignore=None):
+    def benchmark(self, df, target, kfold, fold_subset=None, cache=False, ignore=None):
         """
         If the target property is known for all data, perform an ML benchmark
         using MatPipe. Used for getting an idea of how well AutoML can predict
@@ -292,22 +304,26 @@ def benchmark(self, df, target, kfold, fold_subset=None, cache=False,
             if os.path.exists(cache_src):
                 self.logger.warning(
                     "Cache src {} already found! Ensure this featurized data "
-                    "matches the df being benchmarked.".format(cache_src))
+                    "matches the df being benchmarked.".format(cache_src)
+                )
             self.logger.warning("Running pre-featurization for caching.")
             self.autofeaturizer.fit_transform(df, target)
         elif cache_src and not cache:
             raise AutomatminerError(
                 "Caching was enabled in AutoFeaturizer but not in benchmark. "
                 "Either disable caching in AutoFeaturizer or enable it by "
-                "passing cache=True to benchmark.")
+                "passing cache=True to benchmark."
+            )
         elif cache and not cache_src:
             raise AutomatminerError(
                 "MatPipe cache is enabled, but no cache_src was defined in "
                 "autofeaturizer. Pass the cache_src argument to AutoFeaturizer "
-                "or use the cache_src get_preset_config powerup.")
+                "or use the cache_src get_preset_config powerup."
+            )
         else:
-            self.logger.debug("No caching being used in AutoFeaturizer or "
-                              "benchmark.")
+            self.logger.debug(
+                "No caching being used in AutoFeaturizer or " "benchmark."
+            )
 
         if not fold_subset:
             fold_subset = list(range(kfold.n_splits))
@@ -372,25 +388,20 @@ def summarize(self, filename=None) -> Dict[str, str]:
             "drop_na_targets",
         ]
         cleaner_data = {
-            attr: str(getattr(self.cleaner, attr))
-            for attr in cleaner_attrs
+            attr: str(getattr(self.cleaner, attr)) for attr in cleaner_attrs
         }
 
-        reducer_attrs = [
-            "reducers",
-            "reducer_params",
-        ]
+        reducer_attrs = ["reducers", "reducer_params"]
         reducer_data = {
-            attr: str(getattr(self.reducer, attr))
-            for attr in reducer_attrs
+            attr: str(getattr(self.reducer, attr)) for attr in reducer_attrs
         }
 
         attrs = {
             "featurizers": self.autofeaturizer.featurizers,
             "ml_model": str(self.learner.best_pipeline),
             "feature_reduction": reducer_data,
             "data_cleaning": cleaner_data,
-            "features": self.learner.features
+            "features": self.learner.features,
         }
         if filename:
             save_dict_to_file(attrs, filename)
@@ -416,12 +427,16 @@ def save(self, filename="mat.pipe"):
 
         temp_logger = copy.deepcopy(self._logger)
         loggables = [
-            self, self.learner, self.reducer, self.cleaner, self.autofeaturizer
+            self,
+            self.learner,
+            self.reducer,
+            self.cleaner,
+            self.autofeaturizer,
         ]
         for loggable in loggables:
             loggable._logger = AMM_DEFAULT_LOGGER
 
-        with open(filename, 'wb') as f:
+        with open(filename, "wb") as f:
             pickle.dump(self, f)
 
         # Reassign live memory objects for further use in this object
@@ -446,7 +461,7 @@ def load(filename, logger=True, supress_version_mismatch=False):
         Returns:
             pipe (MatPipe): A MatPipe object.
         """
-        with open(filename, 'rb') as f:
+        with open(filename, "rb") as f:
             pipe = pickle.load(f)
 
         if pipe.version != get_version() and not supress_version_mismatch:
 
@@ -1 +1 @@
-from .core import DataCleaner, FeatureReducer
+from .core import DataCleaner, FeatureReducer  # noqa
Original file line number	Diff line number	Diff line change
`@@ -1 +1 @@`
`1`		`-from .core import DataCleaner, FeatureReducer`
	`1`	`+from .core import DataCleaner, FeatureReducer # noqa`