diff --git a/solution/SOLUTIONS.md b/solution/SOLUTIONS.md
new file mode 100644
index 0000000..9c2d99c
--- /dev/null
+++ b/solution/SOLUTIONS.md
@@ -0,0 +1,170 @@
+## Challenge 1 - Refactor DEV code
+
+The refactorization was done taking into consideration three main objectives:
+
+- Time optimization
+- Increase code maintainability
+- Make the code testable at different stages
+
+### Time optimization
+
+To increase the time efficiency of the code, I analyzed the different fragments of code in the notebooks.
+They were four fragments that could potentially be optimized. Mainly apply pandas functions. For each function to be refactored,
+I created a function to apply the original and the refactored code to different data inputs, measure the time and created a plot.
+
+Three of the refactored code snippets achieved the reduction in time, the difference increased linearly with the data size. To reproduce 
+this test, I created a python script called `generate_plots.py`, the plots are stored in the folder `plots`. Here are the plots to each 
+function. In all the plots, the blue line corresponds to the original code.
+
+#### 1. Parse bathroom text to integer
+![parse_bathroom](/solution/plots/bathroom.png)
+
+#### 2. Extract amenities
+This function extracts the different amenities, while the time is close to the implemented function the code
+could easily lead to error due to the copy/paste of the same code with different columns, my implementation only relies on
+a list with the different amenities to extract. This code while tested sometimes it did outperform the original code, either way, I 
+think that the new implementation is better to code maintenance.
+
+![Amenities](/solution/plots/cat_encoder.png)
+
+#### 3. Pandas cut function
+
+The numpy implementation is similar in code complexity but pandas cut is easier to understand so I kept this part as it is.
+
+
+![Pandas_cut](/solution/plots/pd_cut.png)
+
+#### 4. Parse the string price to int
+![Parse_price](/solution/plots/price.png)
+
+Note: Some of this conclusions may vary slightly if the code is executed in docker or local.
+
+In addition, there are two scripts dedicated to test the original implementation with mine, one for each notebook. This tests can 
+be found in the the path `code/test/develope_test` the `test_eda.py` compares the notebook `01-experatory-data-analysis.ipynb` and 
+the  `test_explore_classifier.py` compares the notebook `02-explore-classifier-model.ipynb`. The results of this code can be seen in 
+the logs folder `test_eda.log` abd `test_explorer.log` respectibely.
+
+The result for the first one is always a bit worse than the original code. This is due to the implementation of the different steps 
+of the cleaning a processing process via Skelean `ColumnTransformer` and `Pipelines` the fitting method add some extra time to 
+compute the result. Nevertheless, I think that this delay is worth it because it allows to apply the same preprocessing steps to 
+unseen data which is usually desired when trying the generated model in unseen data.
+
+In regards to the seccond script, the execution time es slightly better in the refactored code, but the difference in the implementation is very small.
+
+### Maintainability 
+
+As mentioned before, to improve maintainability, I decided to implement the various steps as separate custom column transformers using 
+`sklearn`. This approach allows for easier modification of the process and the addition of new steps. The different transformers are 
+saved in `code/src/transformer.py`. Unit tests are also included to ensure that the code behaves as expected. By using pipelines, the 
+entire process can be summarized as follows:
+
+```python
+preprocessing_pipeline = Pipeline(steps=[
+        ('col_selector', ColumnSelector(COLUMNS)),
+        ('bathroom_processing', StringToFloatTransformer({'bathrooms_text': 'bathrooms'})),
+        ('cast_price', StringToInt('price', r"(\d+)")),
+        ('filter_rows', QueryFilter("price >= 10")),
+        ('drop_na', DropNan(axis=0)),
+        ('bin_price', DiscretizerTransformer('price', 'category', bins=[10, 90, 180, 400, np.inf], labels=[0, 1, 2, 3])),
+        ('array_to_cat', ArrayOneHotEncoder('amenities', CAT_COLS)),
+        ('col_renamer_conditioning', ColumnRenamer(columns={'Air conditioning': 'Air_conditioning', 'neighbourhood_group_cleansed': 'neighbourhood'})),
+        ('drop_cols', DropColumns('amenities'))
+        ]
+    )
+
+ct =  ColumnTransformer(
+        [
+            ('ordinal_encoder', CustomOrdinalEncoder(categories=[list(MAP_NEIGHB.keys()), list(MAP_ROOM_TYPE.keys())], start_category=1), ["neighbourhood", "room_type"])
+        ],
+        remainder = "passthrough",
+        verbose_feature_names_out=False
+    )
+
+processing_pipeline = Pipeline(steps=[
+        ('drop_na', DropNan(axis=0)),
+        ('categorical', ct),
+        ('col_selector', ColumnSelector(FEATURE_NAMES + [TARGET_VARIABLE]))
+        ]
+    )
+
+data_pipeline = Pipeline(steps=[
+        ('data_preprocessing', preprocessing_pipeline),
+        ('data_processing', processing_pipeline)
+    ])
+```
+
+To apply all the transformations at once, it is only necessary to call `data_pipeline`. The process is divided in order to facilitate 
+the testing of the
+different transformations. This could also be implemented by creating different regular Python functions, but, in my opinion, this 
+approach is easier to 
+understand, export to other environments, and allows the trained transformers to be applied to new data, avoiding data leakage.
+
+The different transformers could probably be improved or even merged for a cleaner implementation of the transformations. However, I tried to focus more 
+on the whole solution rather than aiming for an excellent transformation code, as that part is easier to fix.
+
+### Code testeable
+
+To make the code testable, I separated the different stages of development into different scripts as already explained above. I also 
+added unit tests for the transformers to ensure that the results remain correct after changes. And the tests for the results from the 
+original code are usefull to check debiations in the global result.
+
+To facilitate the use of the code in different stages within CI, I divided the cleaning process into different pipelines according 
+to the notebooks. These pipelines are saved using joblib to make them reusable. Additionally, I deployed an `MLflow` instance to 
+save the model and the pipeline using the `MLflow.Pyfunc` class for the entire pipeline, the processing pipeline, and the trained 
+model. This makes it easier to use this code in the API, avoiding issues with the environment, code changes, or updates in the 
+models themselves.
+
+## Challenge 2 - Build an API
+
+
+To implement the API, I used the `FastAPI` framework along with Pydantic for validation of input/output data. The API is hosted 
+locally on `localhost:8000`. FastAPI includes an automatically generated documentation interface, `http://localhost:8000/`, where 
+example calls can be tested interactively. 
+
+The primary endpoint for this API can be accessed programmatically at `http://localhost:8000/model-inference`. The expected input 
+and ouput matches the format in the README file. Additionally, the endpoint also supports an array of elements, provided all 
+elements have the same length and adhere to the defined input schema. Here an example calling the endpoint programatically:
+
+```python
+import requests
+
+payload = {
+  "accommodates": [4, 4],
+  "bathrooms": [2,2],
+  "bedrooms": [1,1],
+  "beds": [2,2],
+  "elevator": [1,1],
+  "id": [1001,1001],
+  "internet": [0,0],
+  "latitude": [40.71383, 456],
+  "longitude": [-73.9658, 56],
+  "neighbourhood": ["Brooklyn", "Brooklyn"],
+  "room_type": ["Entire home/apt", "Entire home/apt"], 
+  "tv": [1, 1]
+}
+response = requests.post("http://localhost:8080/model-inference", json=payload)
+response.json()
+# expected output
+{'id': [1001, 1001], 'price_category': ['High', 'High']}
+```
+
+## Challenge 3 - Dockerize your solution
+
+To dockerize the solution I used Docker Compose with three Docker Images:
+-	**App**: Which creates the endpoint for the API to get the predictions.
+-	**Mlflow**: Which creates a server to save and load the model without copying the enviroment from one place to another.
+-	**Pipeline**: This image contains all the code explained before, it saves the models, the logs and the plots. This image takes a bit of time due to the testing of bigger sample data for the plots.
+
+There is alson a `.env` file to store the endpoint of MLFlow in the other images to grant conectibity to the MLFlow server. To deploy 
+the solution it is only necessary to run `docker compose up --build` in the docker directory and wait arround one minute to have 
+everything ready.
+
+
+Note: To run the different scripts locally, execute the code from the solution folder as follows:
+```bash
+PYTHONPATH="${PYTHONPATH}:../" python code/generate_plots.py
+PYTHONPATH="${PYTHONPATH}:../" python code/pipeline.py
+PYTHONPATH="${PYTHONPATH}:../" python code/test/test_transformers.py 
+PYTHONPATH="${PYTHONPATH}:../" python code/test/develope_tests/test_eda.py
+PYTHONPATH="${PYTHONPATH}:../" python code/test/test_transformers.py 
+```
\ No newline at end of file
diff --git a/solution/app/main.py b/solution/app/main.py
new file mode 100644
index 0000000..16f8fa0
--- /dev/null
+++ b/solution/app/main.py
@@ -0,0 +1,46 @@
+from fastapi import FastAPI, HTTPException
+import pandas as pd
+import numpy as np
+from app.models import ModelInput, ModelOutput
+from app.utils import load_model, load_transformer
+
+FEATURE_NAMES = ['neighbourhood', 'room_type', 'accommodates', 'bathrooms', 'bedrooms']
+OUT_CLASSES = np.array(['Low', 'Mid', 'High', 'Lux'])
+
+app = FastAPI(
+    title= "Building Category prediction",
+    description= "Api to infer the price category of a building from its characteristics",
+    version= "1.0.0", 
+    docs_url="/"
+)
+
+
+@app.post("/model-inference")
+async def infer_price_caegory(input: ModelInput):
+
+    model = load_model()
+    transformer = load_transformer()
+    
+    model_input = dict(input)
+    
+    if model:
+        try:  
+            # build data frame with the input to the transformer
+            # if all the field dont have the same lenght it will raise an error
+            if isinstance(model_input['id'], int):
+                input_data = pd.DataFrame(model_input, index=[0])
+            else:
+                input_data = pd.DataFrame(model_input, index=list(range(len(model_input['id']))))
+            
+            # preprocess the data 
+            data = transformer.predict(input_data)
+            data = data[FEATURE_NAMES].dropna(axis=0)
+            category = model.predict(data)
+            # parse the numerical outpout to the corresponding classes
+            category_str = OUT_CLASSES[category]
+
+            return ModelOutput(id=input.id, price_category=category_str[0] if len(category) == 1 else category_str)
+        except Exception as e:
+            raise HTTPException(status_code=500, detail=f"Error during the prediction: {str(e)}")
+    else:
+        return HTTPException(status_code=500, detail="Model or pipeline not ready")
diff --git a/solution/app/models.py b/solution/app/models.py
new file mode 100644
index 0000000..962af50
--- /dev/null
+++ b/solution/app/models.py
@@ -0,0 +1,73 @@
+from pydantic import BaseModel, field_validator, conint, confloat, Field
+from pydantic.functional_validators import AfterValidator
+from enum import Enum
+from typing import List, Union
+from typing_extensions import Annotated
+
+
+def validate_one_hot(self, value: Union[int, List[int]])-> Union[int, List[int]]:
+
+    if isinstance(value, int):
+        if value not in [0, 1]:
+            raise ValueError("The input should be wither 1 or 0")
+    if isinstance(value, list):
+        if not all(map(lambda x: x in [0, 1], value)):
+            raise ValueError("All inputs in the list should be wither 1 or 0")
+    return value
+
+OneZero = Annotated[Union[int, List[int]], AfterValidator(validate_one_hot)]
+
+
+class RoomTypeEnum(str, Enum):
+    shared_room = "Shared room"
+    private_room = "Private room"
+    entire_home_apt = "Entire home/apt"
+    hotel_room = "Hotel room"
+
+class NeighbourhoodEnum(str, Enum):
+    bronx = "Bronx"
+    queens = "Queens"
+    staten_island = "Staten Island"
+    brooklyn = "Brooklyn"
+    manhattan = "Manhattan"
+
+
+class ModelInput(BaseModel):
+    id: Union[int, List[int]]
+    accommodates: Union[conint(ge=0), List[conint(ge=0)]]
+    room_type: Union[RoomTypeEnum, list[RoomTypeEnum]] 
+    beds: Union[conint(ge=0), List[conint(ge=0)]] 
+    bedrooms: Union[conint(ge=0), List[conint(ge=0)]] 
+    bathrooms: Union[conint(ge=0), List[conint(ge=0)], confloat(ge=0), List[confloat(ge=0)]] 
+    neighbourhood: Union[NeighbourhoodEnum, list[NeighbourhoodEnum]] 
+    tv: OneZero 
+    elevator: OneZero 
+    internet: OneZero 
+    latitude: Union[float, List[float]]
+    longitude: Union[float, List[float]] 
+
+    
+    class Config:
+        json_schema_extra = {
+            "examples": [
+                {
+                    "id": 1001,
+                    "accommodates": 4,
+                    "room_type": "Entire home/apt",
+                    "beds": 2,
+                    "bedrooms": 1,
+                    "bathrooms": 2,
+                    "neighbourhood": "Brooklyn",
+                    "tv": 1,
+                    "elevator": 1,
+                    "internet": 0,
+                    "latitude": 40.71383,
+                    "longitude": -73.9658
+                }
+            ]
+        }
+
+
+class ModelOutput(BaseModel):
+    id: Union[int, List[int]]
+    price_category: Union[str, List[str]]
diff --git a/solution/app/utils.py b/solution/app/utils.py
new file mode 100644
index 0000000..3cba0b4
--- /dev/null
+++ b/solution/app/utils.py
@@ -0,0 +1,31 @@
+import os
+from pathlib import Path
+import mlflow
+
+
+mlflow.set_tracking_uri(os.environ.get("MLFLOW_TRACKING_URI"))
+
+def load_model():
+
+    try:
+        return mlflow.sklearn.load_model("models:/price_category_clf@prod")
+    except Exception as e:
+        print(f"Error loading the model: {e}")
+        return None
+    
+def load_pipeline():
+
+    try:
+        return mlflow.pyfunc.load_model("models:/processing_pipeline@prod")
+    except Exception as e:
+        print(f"Error loading the pipeline: {e}")
+        return None
+
+
+def load_transformer():
+
+    try:
+        return mlflow.pyfunc.load_model("models:/mapping_transformer@prod")
+    except Exception as e:
+        print(f"Error loading the transformer: {e}")
+        return None
\ No newline at end of file
diff --git a/solution/code/__init__.py b/solution/code/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/solution/code/generate_plots.py b/solution/code/generate_plots.py
new file mode 100644
index 0000000..052178c
--- /dev/null
+++ b/solution/code/generate_plots.py
@@ -0,0 +1,39 @@
+from pathlib import Path
+import pandas as pd
+import matplotlib.pyplot as plt
+import seaborn as sns
+from code.src.plots import *
+
+DIR_REPO = Path.cwd().parent
+DIR_DATA_RAW = Path(DIR_REPO) / "data" / "raw"
+FILEPATH_DATA = DIR_DATA_RAW / "listings.csv"
+FILEPATH_PLOTS = Path(DIR_REPO) / "solution" / "plots"
+CAT_COLS = ['TV', 'Internet', 'Air conditioning', 'Kitchen', 'Heating', 'Wifi', 'Elevator', 'Breakfast'] 
+
+
+
+df_raw = pd.read_csv(FILEPATH_DATA, low_memory=False)
+
+print("Generating bathroom func time test plot")
+plot = plot_num_bathroom_from_text_time_test(df_raw)
+fig1 = sns.lineplot(plot, x='n', y='time', hue='method').set_title('Time optimizer function: num_bathroom_from_text_time')
+plt.savefig(FILEPATH_PLOTS / "bathroom.png")
+plt.close()
+
+print("Generating price func time test plot")
+plot = plot_price_to_test_time_test(df_raw)
+fig2 = sns.lineplot(plot, x='n', y='time', hue='method').set_title('Time optimizer function: price_text')
+plt.savefig(FILEPATH_PLOTS / "price.png")
+plt.close()
+
+print("Generating pd.cut func time test plot")
+plot = plot_pd_cut_time_test(df_raw)
+fig3 = sns.lineplot(plot, x='n', y='time', hue='method').set_title('Time optimizer function: price_text')
+plt.savefig(FILEPATH_PLOTS / "pd_cut.png")
+plt.close()
+
+print("Generating category encoder func time test plot")
+plot = plot_category_encoder_time_test(df_raw, CAT_COLS)
+fig4 = sns.lineplot(plot, x='n', y='time', hue='method').set_title('Time optimizer function: preprocess_amenities_column')
+plt.savefig(FILEPATH_PLOTS / "cat_encoder.png")
+plt.close()
diff --git a/solution/code/pipeline.py b/solution/code/pipeline.py
new file mode 100644
index 0000000..56260e3
--- /dev/null
+++ b/solution/code/pipeline.py
@@ -0,0 +1,286 @@
+import os
+from pathlib import Path
+import joblib
+import numpy as np
+import pandas as pd
+from sklearn.pipeline import Pipeline
+from sklearn.compose import ColumnTransformer
+from sklearn.model_selection import train_test_split
+from sklearn.ensemble import RandomForestClassifier
+from sklearn.metrics import accuracy_score, roc_auc_score, classification_report, confusion_matrix
+import mlflow
+from mlflow.models import infer_signature
+
+# Import custom functions
+from  code.src.transformer import (
+    ArrayOneHotEncoder,
+    ColumnRenamer,
+    ColumnSelector,
+    CustomOrdinalEncoder,
+    DropColumns,
+    DropNan,
+    DiscretizerTransformer,
+    QueryFilter,
+    StringToFloatTransformer,
+    StringToInt,
+)
+
+# Global variables
+DIR_REPO = Path.cwd().parent
+DIR_DATA_RAW = Path(DIR_REPO) / "data" / "raw"
+FILEPATH_DATA = DIR_DATA_RAW / "listings.csv"
+FILEPATH_PLOTS = Path(DIR_REPO) / "solution" / "plots"
+MODEL_PATH = DIR_REPO / "solution"/ "models"
+
+COLUMNS = ['id', 'neighbourhood_group_cleansed', 'property_type', 'room_type', 'latitude', 'longitude', 'accommodates', 'bathrooms_text', 'bedrooms', 'beds','amenities', 'price']
+CAT_COLS = ['TV', 'Internet', 'Air conditioning', 'Kitchen', 'Heating', 'Wifi', 'Elevator', 'Breakfast'] 
+
+MAP_ROOM_TYPE = {"Shared room": 1, "Private room": 2, "Entire home/apt": 3, "Hotel room": 4}
+MAP_NEIGHB = {"Bronx": 1, "Queens": 2, "Staten Island": 3, "Brooklyn": 4, "Manhattan": 5}
+
+FEATURE_NAMES = ['neighbourhood', 'room_type', 'accommodates', 'bathrooms', 'bedrooms']
+TARGET_VARIABLE = "category"
+
+
+
+mlflow.set_tracking_uri(os.environ.get("MLFLOW_TRACKING_URI"))
+
+if __name__ == "__main__":
+        
+    print("Reading data")
+    df_raw = pd.read_csv(FILEPATH_DATA)
+    df_raw.head()
+
+    
+    print("Building preprocessing pipeline")
+    preprocessing_pipeline = Pipeline(steps=[
+        ('col_selector', ColumnSelector(COLUMNS)),
+        ('bathroom_processing', StringToFloatTransformer({'bathrooms_text': 'bathrooms'})),
+        ('cast_price', StringToInt('price', r"(\d+)")),
+        ('filter_rows', QueryFilter("price >= 10")),
+        ('drop_na', DropNan(axis=0)),
+        ('bin_price', DiscretizerTransformer('price', 'category', bins=[10, 90, 180, 400, np.inf], labels=[0, 1, 2, 3])),
+        ('array_to_cat', ArrayOneHotEncoder('amenities', CAT_COLS)),
+        ('col_renamer_conditioning', ColumnRenamer(columns={'Air conditioning': 'Air_conditioning', 'neighbourhood_group_cleansed': 'neighbourhood'})),
+        ('drop_cols', DropColumns('amenities'))
+        ]
+    )
+    preprocessing_pipeline.set_output(transform='pandas')
+
+    print("Building processing pipeline")
+
+    ct =  ColumnTransformer(
+        [
+            ('ordinal_encoder', CustomOrdinalEncoder(categories=[list(MAP_NEIGHB.keys()), list(MAP_ROOM_TYPE.keys())], start_category=1), ["neighbourhood", "room_type"])
+        ],
+        remainder = "passthrough",
+        verbose_feature_names_out=False
+    )
+
+    ct.set_output(transform='pandas')
+
+    processing_pipeline = Pipeline(steps=[
+        ('drop_na', DropNan(axis=0)),
+        ('categorical', ct),
+        ('col_selector', ColumnSelector(FEATURE_NAMES + [TARGET_VARIABLE]))
+        ]
+    )
+
+    processing_pipeline.set_output(transform='pandas')
+
+    data_pipeline = Pipeline(steps=[
+        ('data_preprocessing', preprocessing_pipeline),
+        ('data_processing', processing_pipeline)
+    ])
+
+    # fit the pipeline only with the training data
+    print("Fitting pipeline")
+    data_pipeline.fit(df_raw)
+
+    os.makedirs(MODEL_PATH.parent, exist_ok=True)
+    print("Saving col transformer")
+    try:
+        joblib.dump(ct, open(MODEL_PATH / "col_transformer.joblib", "wb+"))
+    except FileNotFoundError:
+        joblib.dump(ct, open(MODEL_PATH / "col_transformer.joblib", "wb+"))
+
+    print("Saving preprocessing pipeline")
+    try:
+        joblib.dump(preprocessing_pipeline, open(MODEL_PATH / "preprocessing_pipeline.joblib", "wb+"))
+    except FileNotFoundError:
+        joblib.dump(preprocessing_pipeline, open(MODEL_PATH / "preprocessing_pipeline.joblib", "wb+"))
+
+    print("Saving processing pipeline")
+    try:
+        joblib.dump(processing_pipeline, open(MODEL_PATH / "processing_pipeline.joblib", "wb+"))
+    except FileNotFoundError:
+        joblib.dump(processing_pipeline, open(MODEL_PATH / "processing_pipeline.joblib", "wb+"))
+
+    print("Saving pipeline")
+    try:
+        joblib.dump(data_pipeline, open(MODEL_PATH / "pipeline.joblib", "wb+"))
+    except FileNotFoundError:
+        joblib.dump(data_pipeline, open(MODEL_PATH / "pipeline.joblib", "wb+"))
+
+    print("Saving pipeline artifacts to mlflow")
+
+    class ProcessingPipeline(mlflow.pyfunc.PythonModel):
+        
+        def __init__(self):
+            self.whole_pipeline = None
+            self.preprocessing_pipeline = None
+            self.processing_pipeline = None
+            self.column_transformer = None
+
+
+        def load_artifacts(self, context):
+
+            self.whole_pipeline = joblib.load(open(context.artifacts['whole_pipe'], 'rb'))
+            self.preprocessing_pipeline = joblib.load(open(context.artifacts['prepro_pipe'], 'rb'))
+            self.processing_pipeline = joblib.load(open(context.artifacts['proc_pipe'], 'rb'))
+            self.column_transformer = joblib.load(open(context.artifacts['col_trans'], 'rb'))
+
+        def predict(self, context, model_input):
+
+            if self.whole_pipeline:
+                return self.whole_pipeline.transform(model_input)
+            else:
+                raise ValueError("The model has not been loaded")
+            
+    class ProcessingPipeline(mlflow.pyfunc.PythonModel):
+
+        """Class that applies the fitted processing pipeline to new data"""
+        def __init__(self):
+            self.column_transformer = None
+
+        def load_context(self, context):
+
+            self.whole_pipeline = joblib.load(context.artifacts['whole_pipe'])
+            self.preprocessing_pipeline = joblib.load(context.artifacts['prepro_pipe'])
+            self.processing_pipeline = joblib.load(context.artifacts['proc_pipe'])
+            self.column_transformer = joblib.load(context.artifacts['col_trans'])
+
+        def predict(self, context, model_input):
+
+            if self.whole_pipeline:
+                return self.whole_pipeline.transform(model_input)
+            else:
+                raise ValueError("The model has not been loaded")
+            
+    class MappingTransformer(mlflow.pyfunc.PythonModel):
+
+        """Class that applies the category mapping to new data"""
+
+        def __init__(self):
+            pass
+
+        def load_context(self, context):
+            self.column_transformer = joblib.load(context.artifacts['col_trans'])
+
+        def predict(self, context, model_input):
+            
+            columns_to_apply = ["neighbourhood", "room_type"]
+            if self.column_transformer:
+                try:
+                    return self.column_transformer.transform(model_input)
+                except:
+                    try:
+                        model_input[columns_to_apply] = self.column_transformer['ordinal_encoder'].transform(model_input[columns_to_apply])
+                        return model_input
+                    except:
+                        raise ValueError(f"Necessary columns not present: {columns_to_apply}")
+            else:
+                raise ValueError("The model has not been loaded")
+            
+    
+    print("Applying pipeline")
+    df_processed = data_pipeline.transform(df_raw)
+    X = df_processed[FEATURE_NAMES]
+    y = df_processed[TARGET_VARIABLE]
+
+    print("Splitting data")
+    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=1)
+
+    print("Processed train data")
+    print(f"Dataset shape: {X_train.shape}")
+
+    print("Processed test data")
+    print(f"Dataset shape: {X_train.shape}")
+
+    print("Training Random Forest Model")
+    
+    print("Train model")
+    clf = RandomForestClassifier(n_estimators=500, random_state=0, class_weight='balanced', n_jobs=4)
+    clf.fit(X_train, y_train)
+
+    print("Model evaluation")
+    y_pred = clf.predict(X_test)
+    print(f"Accuracy: {accuracy_score(y_test, y_pred):1.4f}")
+
+    y_proba = clf.predict_proba(X_test)
+    roc_auc_score(y_test, y_proba, multi_class='ovr')
+    print(f"ROC score: {roc_auc_score(y_test, y_proba, multi_class='ovr'):1.4f}")
+
+    print("Saving model")
+    try:
+        joblib.dump(clf, open(MODEL_PATH / "classifier.joblib", "wb+"))
+    except FileNotFoundError:
+        joblib.dump(clf, open(MODEL_PATH / "classifier.joblib", "wb+"))
+
+
+    mlflow.set_experiment(experiment_name="price_category_predictor")
+    client = mlflow.client.MlflowClient()
+    with mlflow.start_run() as run:
+        
+        
+        print("Logging pipeline to mlflow")
+        # log prod pipeline model
+        
+        mlflow.pyfunc.log_model(
+            artifact_path="processing_pipeline",
+            python_model = ProcessingPipeline(),
+            registered_model_name="processing_pipeline",
+            artifacts = {
+                'whole_pipe': str(MODEL_PATH / "pipeline.joblib"),
+                'prepro_pipe':  str(MODEL_PATH / "preprocessing_pipeline.joblib"),
+                'proc_pipe':  str(MODEL_PATH / "processing_pipeline.joblib"),
+                'col_trans':  str(MODEL_PATH / "col_transformer.joblib")
+            },
+            pip_requirements = open(DIR_REPO / 'solution' / "requirements.txt", 'r').read().split('\n'),
+            code_paths = [ str(DIR_REPO / 'solution' / "code") ]
+            
+        )
+        latest_version = client.search_registered_models(filter_string="name = 'processing_pipeline'")[0].latest_versions[0].version
+        client.set_registered_model_alias('processing_pipeline', 'prod', latest_version)
+        
+        
+        print("Logging transformer to mlflow")
+        # save column transformer
+        mlflow.pyfunc.log_model(
+            artifact_path="transformer",
+            python_model = MappingTransformer(),
+            registered_model_name="mapping_transformer",
+            artifacts = {
+                'col_trans':  str(MODEL_PATH / "col_transformer.joblib")
+            },
+            pip_requirements = ['pandas', 'scikit-learn', 'numpy'],
+            code_paths = [ str(DIR_REPO / 'solution' / "code")]
+        )
+
+        latest_version = client.search_registered_models(filter_string="name = 'mapping_transformer'")[0].latest_versions[0].version
+        client.set_registered_model_alias('mapping_transformer', 'prod', latest_version)
+
+        print("Logging model to mlflow")
+        # log prod training model
+        signature = infer_signature(X_test, y_train)
+        
+        mlflow.sklearn.log_model(
+            clf,
+            artifact_path = "artifacts",
+            signature = signature,
+            registered_model_name="price_category_clf",
+            input_example = X_test[:1]
+        )
+
+        latest_version = client.search_registered_models(filter_string="name = 'price_category_clf'")[0].latest_versions[0].version
+        client.set_registered_model_alias('price_category_clf', 'prod', latest_version)
\ No newline at end of file
diff --git a/solution/code/src/__init__.py b/solution/code/src/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/solution/code/src/plots.py b/solution/code/src/plots.py
new file mode 100644
index 0000000..706363b
--- /dev/null
+++ b/solution/code/src/plots.py
@@ -0,0 +1,206 @@
+from pathlib import Path
+import datetime
+import pandas as pd
+import numpy as np
+import seaborn as sns
+import re
+import src.transformer as tr
+
+    
+
+def plot_num_bathroom_from_text_time_test(df_raw):
+
+    plot_data = []
+    for n in range(4,8):
+        sample_data = np.resize(df_raw.bathrooms_text, 10**n)
+        sample = pd.Series(sample_data)
+        t1 = datetime.datetime.now()
+        _ = sample.apply(tr.num_bathroom_from_text)
+        t2 = datetime.datetime.now()
+        
+        plot_data.append(
+            {
+                'n': 10**n,
+                'time': (t2-t1).total_seconds(),
+                'method': 'apply',
+
+            }
+        )
+
+        t1 = datetime.datetime.now()
+        _ = list(map(tr.num_bathroom_from_text, sample_data))
+        t2 = datetime.datetime.now()
+        
+        plot_data.append(
+            {
+                'n': 10**n,
+                'time': (t2-t1).total_seconds(),
+                'method': 'map',
+
+            }
+        )
+
+        t1 = datetime.datetime.now()
+        _ = [tr.num_bathroom_from_text(text) for text in sample_data]
+        t2 = datetime.datetime.now()
+        
+        plot_data.append(
+            {
+                'n': 10**n,
+                'time': (t2-t1).total_seconds(),
+                'method': 'inline_for_loop',
+
+            }
+        )
+
+    return pd.DataFrame(plot_data)
+
+
+def plot_price_to_test_time_test(df_raw):
+
+    plot_data = []
+    for n in range(4,8):
+        sample_data = np.resize(df_raw.price, 10**n)
+        sample = pd.Series(sample_data)
+        t1 = datetime.datetime.now()
+        _ = sample.str.extract(r"(\d+).").astype(int)
+        t2 = datetime.datetime.now()
+        
+        plot_data.append(
+            {
+                'n': 10**n,
+                'time': (t2-t1).total_seconds(),
+                'method': 'apply',
+
+            }
+        )
+
+        compiled_pattern = re.compile(r'\d+')
+        t1 = datetime.datetime.now()
+        _ = list(map(lambda x: int(tr.apply_regex(x, compiled_pattern)), sample_data))
+        t2 = datetime.datetime.now()
+        
+        plot_data.append(
+            {
+                'n': 10**n,
+                'time': (t2-t1).total_seconds(),
+                'method': 'map',
+
+            }
+        )
+
+        t1 = datetime.datetime.now()
+        _ = [int(tr.apply_regex(text, compiled_pattern)) for text in sample_data]
+        t2 = datetime.datetime.now()
+        
+        plot_data.append(
+            {
+                'n': 10**n,
+                'time': (t2-t1).total_seconds(),
+                'method': 'inline_for_loop',
+
+            }
+        )
+
+    return pd.DataFrame(plot_data)
+
+
+def plot_pd_cut_time_test(df_raw):
+
+    plot_data = []
+    for n in range(4,8):
+        sample_data = np.resize(df_raw.price, 10**n)
+        sample = pd.Series(sample_data).str.extract(r"(\d+).").astype(int).to_numpy().flatten()
+        t1 = datetime.datetime.now()
+        _ = pd.cut(sample, bins=[10, 90, 180, 400, np.inf], labels=[0, 1, 2, 3])
+        t2 = datetime.datetime.now()
+        
+        plot_data.append(
+            {
+                'n': 10**n,
+                'time': (t2-t1).total_seconds(),
+                'method': 'pd.cut',
+
+            }
+        )
+
+        t1 = datetime.datetime.now()
+        _ = tr.array_binding(sample, bins=[10, 90, 180, 400, np.inf], labels=[0, 1, 2, 3])
+        t2 = datetime.datetime.now()
+        
+        plot_data.append(
+            {
+                'n': 10**n,
+                'time': (t2-t1).total_seconds(),
+                'method': 'numpy',
+
+            }
+        )
+
+    return pd.DataFrame(plot_data)
+
+
+
+def plot_category_encoder_time_test(df_raw, cols):
+
+    plot_data = []
+    for n in range(2,6):
+        sample_data = pd.Series(np.resize(df_raw.amenities, 10**n), name='amenities')
+        sample = sample_data.reset_index()
+        t1 = datetime.datetime.now()
+        _ = tr.preprocess_amenities_column(sample)
+        t2 = datetime.datetime.now()
+        
+        plot_data.append(
+            {
+                'n': 10**n,
+                'time': (t2-t1).total_seconds(),
+                'method': 'custom_function',
+
+            }
+        )
+
+        t1 = datetime.datetime.now()
+        _ = sample_data.apply(lambda x: tr.find_categories(x, cols))
+
+        t2 = datetime.datetime.now()
+        
+        plot_data.append(
+            {
+                'n': 10**n,
+                'time': (t2-t1).total_seconds(),
+                'method': 'pd.apply',
+
+            }
+        )
+
+        t1 = datetime.datetime.now()
+        _ = [tr.find_categories(x, cols) for x in sample_data]
+
+        t2 = datetime.datetime.now()
+        
+        plot_data.append(
+            {
+                'n': 10**n,
+                'time': (t2-t1).total_seconds(),
+                'method': 'loop',
+
+            }
+        )
+
+
+        t1 = datetime.datetime.now()
+        _ = list(map(lambda x: tr.find_categories(x, cols), sample_data))
+
+        t2 = datetime.datetime.now()
+        
+        plot_data.append(
+            {
+                'n': 10**n,
+                'time': (t2-t1).total_seconds(),
+                'method': 'map',
+
+            }
+        )
+
+    return pd.DataFrame(plot_data)
diff --git a/solution/code/src/transformer.py b/solution/code/src/transformer.py
new file mode 100644
index 0000000..b99dd5c
--- /dev/null
+++ b/solution/code/src/transformer.py
@@ -0,0 +1,426 @@
+from sklearn.base import BaseEstimator, TransformerMixin
+from sklearn.preprocessing import OrdinalEncoder
+from typing import Dict, List
+import re
+
+import numpy as np
+import numpy.typing as npt
+import pandas as pd
+from pandas import DataFrame
+from typing import List, Any, Dict
+
+
+# Get number of bathrooms from `bathrooms_text`
+def num_bathroom_from_text(text):
+    try:
+        if isinstance(text, str):
+            bath_num = text.split(" ")[0]
+            return float(bath_num)
+        else:
+            return np.nan
+    except ValueError:
+        return np.nan
+
+def array_binding(array: List[int | float], bins: List[int | float], labels: List[Any]):
+    """
+    Function to replicate the behabiour of pandas.cut() function but with numpy
+
+    Parameters
+    ----------
+    values : array-like of length n_samples
+        The input data to be binned
+    
+    bins : array-like of length n_bins + 1
+        The bin edges, representing the intervals for binning.
+    
+    labels : list or array-like, optional, default=None
+        Labels corresponding to the bins.
+    
+    Returns
+    -------
+    An array of the same shape as `values`, where each element corresponds to 
+    the label of the bin in which that element falls.
+
+    """
+
+    bin_indices = np.digitize(array, bins)
+
+    bin_indices = np.clip(bin_indices - 1, 0, len(bins) - 1)
+
+
+    return np.array(labels)[bin_indices]
+
+def preprocess_amenities_column(df: DataFrame) -> DataFrame:
+    
+    df['TV'] = df['amenities'].str.contains('TV')
+    df['TV'] = df['TV'].astype(int)
+    df['Internet'] = df['amenities'].str.contains('Internet')
+    df['Internet'] = df['Internet'].astype(int)
+    df['Air_conditioning'] = df['amenities'].str.contains('Air conditioning')
+    df['Air_conditioning'] = df['Air_conditioning'].astype(int)
+    df['Kitchen'] = df['amenities'].str.contains('Kitchen')
+    df['Kitchen'] = df['Kitchen'].astype(int)
+    df['Heating'] = df['amenities'].str.contains('Heating')
+    df['Heating'] = df['Heating'].astype(int)
+    df['Wifi'] = df['amenities'].str.contains('Wifi')
+    df['Wifi'] = df['Wifi'].astype(int)
+    df['Elevator'] = df['amenities'].str.contains('Elevator')
+    df['Elevator'] = df['Elevator'].astype(int)
+    df['Breakfast'] = df['amenities'].str.contains('Breakfast')
+    df['Breakfast'] = df['Breakfast'].astype(int)
+
+    df.drop('amenities', axis=1, inplace=True)
+    
+    return df
+
+
+def find_categories(array, categories)-> Dict[str, int]:
+    string = str(array)
+    return {category: int(category in string) for category in categories}
+
+# Same regex function
+def apply_regex(text, pattern):
+    match = pattern.search(text)
+    return match.group(0) if match else None
+
+# Custom transformer split a string by spaces and cast to float the first element
+class StringToFloatTransformer(BaseEstimator, TransformerMixin):
+    def __init__(self, columns: Dict[str, str] | str):
+        # Specify which columns to apply the transformation to and the new name (optional)
+        self.columns = columns
+
+    def fit(self, X, y=None):
+        # No fitting needed
+        return self
+
+    def transform(self, X: DataFrame | np.ndarray):
+        X_copy = X.copy()
+        if self.columns:
+            if isinstance(X, DataFrame):
+                if isinstance(self.columns, dict):
+                    for old_name, new_name in self.columns.items():
+                        X_copy[new_name] =list(map(num_bathroom_from_text, X_copy[old_name])) 
+                else:
+                    for col in self.columns:
+                        X_copy[col] =list(map(num_bathroom_from_text, X_copy[col])) 
+                self.out_cols = list(X_copy.columns)
+            if isinstance(X, np.ndarray):
+                if isinstance(self.columns, dict):
+                    self.out_cols = list(self.columns.values())
+                    for i in range(len(self.columns)):
+                        X_copy[i] =list(map(num_bathroom_from_text, X_copy[i])) 
+                else:
+                    self.out_cols = self.columns
+                    for i in self.columns:
+                        X_copy[i] =list(map(num_bathroom_from_text, X_copy[i])) 
+
+        return X_copy
+    
+    def get_feature_names_out(self, columns):
+        return self.out_cols
+    
+# Custom transformer to parse to numeric the number of bathrooms 
+class ColumnSelector(BaseEstimator, TransformerMixin):
+    def __init__(self, columns: List[str] | str):
+        # Specify which columns to select
+        self.columns = columns
+
+    def fit(self, X, y=None):
+        # No fitting needed
+        return self
+
+    def transform(self, X: DataFrame):
+        X_copy = X.copy()
+        if self.columns:
+            if isinstance(X_copy, DataFrame):
+                X_copy = X_copy[self.columns if isinstance(self.columns, list) else [self.columns]]
+                self.out_cols = X_copy.columns
+                return X_copy
+            else:
+                raise ValueError("The data provided must be a pandas.Dataframe") 
+        self.out_cols = X_copy.columns
+        return X_copy
+    
+    def get_feature_names_out(self, columns):
+        return self.columns
+    
+# Custom transformer to rename columns 
+class ColumnRenamer(BaseEstimator, TransformerMixin):
+    def __init__(self, columns: Dict[str, str]):
+        if not isinstance(columns, dict):
+            raise  ValueError("The columns must be passed as a dict with the format {'old_key':'new_key'}") 
+        # Specify which columns to rename
+        self.columns = columns
+
+    def fit(self, X, y=None):
+        # No fitting needed for this transformer
+        return self
+
+    def transform(self, X: DataFrame):
+        X_copy = X.copy()
+        if self.columns:
+            if isinstance(X_copy, DataFrame):
+                X_copy.rename(columns=self.columns, inplace=True)
+                self.out_cols = X_copy.columns
+                return X_copy
+            else:
+                raise ValueError("The data provided must be a pandas.Dataframe") 
+        self.out_cols = X_copy.columns
+        return X_copy
+    def get_feature_names_out(self, columns):
+        return self.out_cols
+    
+# Custom transformer to drop NAs in columns or rows  
+class DropNan(BaseEstimator, TransformerMixin):
+    def __init__(self, axis: int=0):
+        # Specify which axis to evaluate 
+        self.axis = axis
+
+    def fit(self, X, y=None):
+        return self
+
+    def transform(self, X: DataFrame | np.ndarray):
+        
+        if isinstance(X, DataFrame):
+            X_copy = X.copy()
+            X_copy =  X_copy.dropna(axis=self.axis)
+            return X_copy
+        elif isinstance(X, np.ndarray):
+            X_copy = X.copy()
+            X_copy =  X_copy[~np.isnan(X_copy).any(axis=self.axis)]
+            return X_copy
+        else:
+            raise ValueError("The data provided must be a pandas.Dataframe or np.array")
+    
+    def get_feature_names_out(self, columns):
+        return columns
+        
+# Custom transformer to drop NAs in columns or rows  
+class DropColumns(BaseEstimator, TransformerMixin):
+    def __init__(self, columns: str | List[str]):
+        # Specify which axis to evaluate 
+        self.columns = columns
+
+    def fit(self, X, y=None):
+        # No fitting needed for this transformer
+        return self
+
+    def transform(self, X: DataFrame):
+        
+        if isinstance(X, DataFrame) and isinstance(self.columns, (str, list)):
+            X_copy = X.copy()
+            X_copy.drop(columns=self.columns, inplace=True)
+        else:
+            raise ValueError("The data provided must be a pandas.Dataframe and columns must be a string or list of strings")
+        self.cols_out = X_copy.columns
+        return X_copy
+    
+    def get_feature_names_out(self, columns):
+        return self.cols_out
+        
+# Custom transformer to cast string to int applying regexpatter
+class StringToInt(BaseEstimator, TransformerMixin):
+    def __init__(self, columns: List[str] | str, patterns: List[str] | str):
+
+        if type(columns) != type(patterns):
+            raise ValueError("The columns and patters must have the same data type")
+        elif isinstance(columns, list): 
+            if len(columns) != len(patterns):
+                raise ValueError("columns and patterns list must have the same leght")
+        elif not isinstance(columns, str):
+            raise ValueError("columnas and patters must be or a single string or a list of strings")
+        
+        self.columns = columns
+        self.patterns = patterns
+
+    def fit(self, X, y=None):
+        # No fitting needed for this transformer
+        return self
+    
+    # Function that applies a regex pattern to a string and returns the match
+    def _apply_regex(self, text, pattern):
+        match = pattern.search(text)
+        return match.group(0) if match else None
+
+    def transform(self, X: DataFrame | npt.ArrayLike):
+        
+        X_copy = X.copy()
+        if isinstance(X_copy, DataFrame):
+            if isinstance(self.columns, str):
+                comp_patter = re.compile(self.patterns)
+                X_copy[self.columns] = list(map(lambda x: int(self._apply_regex(x, comp_patter)), X_copy[self.columns]))
+            else:
+                for col, pattern in zip(self.columns, self.patterns):
+                    comp_patter = re.compile(pattern)
+                    X_copy[col] = list(map(lambda x: int(self._apply_regex(x, comp_patter)), X_copy[col]))
+            self.out_cols = X_copy.columns
+            return X_copy
+        elif isinstance(X_copy, np.ndarray):
+            self.out_cols = self.columns
+            if isinstance(self.columns, str):
+                comp_patter = re.compile(self.patterns)
+                return X_copy.apply(lambda x: int(self._apply_regex(x, comp_patter)))
+            else:
+                for i, pattern in enumerate(self.patterns):
+                    comp_patter = re.compile(pattern)
+                    X_copy[i] = list(map(lambda x: int(self._apply_regex(x, comp_patter)), X_copy[i]))
+            return X_copy
+        elif isinstance(X_copy, list):
+            self.out_cols = self.columns
+            if isinstance(self.columns, str):
+                comp_patter = re.compile(self.patterns)
+                return list(map(lambda x: int(self._apply_regex(x, comp_patter)), X_copy))
+            else:
+                for i, pattern in enumerate(self.patterns):
+                    comp_patter = re.compile(pattern)
+                    X_copy[i] = list(map(lambda x: int(self._apply_regex(x, comp_patter)), X_copy[i]))
+            return X_copy          
+    def get_feature_names_out(self, columns):
+        return self.out_cols
+        
+# Custom transformer to filter rows of a pandas.Dataframe
+class QueryFilter(BaseEstimator, TransformerMixin):
+    def __init__(self, query_string: str):
+        # Specify which axis to evaluate 
+        self.query_string = query_string
+
+    def fit(self, X, y=None):
+        # No fitting needed for this transformer
+        return self
+
+    def transform(self, X: DataFrame):
+
+        if self.query_string:
+            X_copy = X.copy()
+            if isinstance(X_copy, DataFrame):
+                try:
+                    X_copy.query(self.query_string, inplace=True)
+                    self.out_cols = X_copy.columns
+                    return X_copy
+                except Exception as e:
+                    ValueError(f"Error applying the query string: {str(e)}")
+            else:
+                raise ValueError("The data provided must be a pandas.Dataframe")
+        self.out_cols = X.columns
+        return X
+    
+    def get_feature_names_out(self, columns):
+        return self.out_cols 
+
+# Custom transformer to aaply pandas cut
+class DiscretizerTransformer(BaseEstimator, TransformerMixin):
+    def __init__(
+            self, 
+            columns: str | List[str],
+            new_colnames: str | List[str],
+            bins: List[float|int] | List[List[float|int]], 
+            labels: List[float|int] | List[List[float|int]]
+            ):
+        
+        # Validate the input parameters
+        if isinstance(columns, list):
+            if not isinstance(bins, list) or not isinstance(labels, list):
+                raise ValueError("If 'columns' is a list, 'bins' and 'labels' must also be lists.")
+            
+            for bin, label in zip(bins, labels):
+                if len(bin) != (len(label) +1):
+                    raise ValueError("'bins' must have the same length as 'labels' + 1.")
+            
+            if len(columns) != len(bins) or len(columns) != len(labels) or len(columns) != len(new_colnames):
+                raise ValueError("'columns', 'bins', 'labels' and 'new_colnames' must have the same length when 'columns' is a list.")
+
+        if isinstance(columns, str):
+            if not isinstance(bins, list) or not isinstance(labels, list):
+                raise ValueError("If 'columns' is a string, 'bins' and 'labels' must be lists.")
+
+            if len(bins) != (len(labels)+1):
+                raise ValueError("'bins' must have the same length as 'labels' + 1.")
+        
+        self.columns = columns
+        self.bins = bins
+        self.labels = labels
+        self.new_colnames = new_colnames
+        
+
+    def fit(self, X, y=None):
+        # No fitting needed for this transformer
+        return self
+
+    def transform(self, X: DataFrame):
+        X_copy = X.copy()
+        if isinstance(X_copy, DataFrame):
+            if isinstance(self.columns, str):
+                X_copy[self.new_colnames] = pd.cut(X_copy[self.columns], bins=self.bins, labels=self.labels)
+                
+            if isinstance(self.columns, list):
+                for col, new_name, bin, label in zip(self.columns, self.new_colnames, self.bins, self.labels):
+                    X_copy[new_name] = pd.cut(X_copy[col], bins=bin, labels=label)
+            self.out_cols = X_copy.columns
+            return X_copy
+        else:
+            self.out_cols = self.new_colnames
+            if isinstance(self.columns, str):
+                return pd.cut(X_copy, bins=self.bins, labels=self.labels).to_numpy()
+                
+            if isinstance(self.columns, list):
+                for i, bin, label in enumerate(zip(self.bins, self.labels)):
+                    X_copy[i] = pd.cut(X_copy[i], bins=bin, labels=label).to_numpy()
+            return X_copy               
+    def get_feature_names_out(self, columns):
+        return self.out_cols
+
+# Custom transformer to parse to numeric the number of bathrooms 
+class ArrayOneHotEncoder(BaseEstimator, TransformerMixin):
+    def __init__(self, column: str, categories: List[str]):
+        # Specify which columns to select
+        self.column = column
+        self.categories = categories
+
+    def fit(self, X, y=None):
+        # No fitting needed
+        return self
+
+    def transform(self, X: DataFrame | np.ndarray):
+        X_copy = X.copy()
+        if isinstance(X_copy, DataFrame):
+            if isinstance(self.column, str):
+
+                cat_df = pd.DataFrame(X_copy[self.column].apply(lambda x: find_categories(x, self.categories)).to_list(), index=X_copy.index)
+                X_copy = pd.concat([X_copy, cat_df], axis=1, ignore_index=False)
+            self.out_cols = X_copy.columns
+            return X_copy
+        if isinstance(X_copy, np.ndarray | pd.Series):
+            self.out_cols = self.categories
+            return pd.DataFrame(list(map(lambda x: find_categories(x, self.categories), X_copy))).to_numpy()
+            
+
+    def get_feature_names_out(self, columns):
+        return self.out_cols
+    
+ # Custom transformer to rename columns 
+
+
+class CustomOrdinalEncoder(BaseEstimator, TransformerMixin):
+    def __init__(self, categories: List[List[str]], start_category: int = 0):
+        if not isinstance(categories, list):
+            raise  ValueError("The categories must be passed as a list os list and columns must be also a list, one list for each columns") 
+        if not isinstance(categories[0], list):
+            raise  ValueError("The categories must be passed as a list os list, one list for each columns")
+        
+        if not isinstance(start_category, int):
+             raise  ValueError("The starting category must be a integer") 
+        
+        self.categories = categories
+        self.start_category = start_category
+        self.encoder = OrdinalEncoder(categories=categories)
+
+    def fit(self, X, y=None):
+        self.encoder.fit(X)
+        return self
+
+    def transform(self, X):
+        return self.encoder.transform(X) + self.start_category
+    
+    def get_feature_names_out(self, columns):
+        return self.encoder.get_feature_names_out(columns)
+    
\ No newline at end of file
diff --git a/solution/code/test/__init__.py b/solution/code/test/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/solution/code/test/develope_tests/__init__.py b/solution/code/test/develope_tests/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/solution/code/test/develope_tests/test_eda.py b/solution/code/test/develope_tests/test_eda.py
new file mode 100644
index 0000000..43b209c
--- /dev/null
+++ b/solution/code/test/develope_tests/test_eda.py
@@ -0,0 +1,168 @@
+import os
+import sys
+import logging
+from pathlib import Path
+import datetime
+import numpy as np
+import pandas as pd
+from pandas import DataFrame
+from sklearn.pipeline import Pipeline
+from sklearn.compose import ColumnTransformer
+
+DIR_REPO = Path(__file__).parent.parent.parent.parent.parent
+os.chdir(DIR_REPO)
+
+
+# Import custom functions
+from code.src.transformer import (
+    ArrayOneHotEncoder,
+    ColumnRenamer,
+    ColumnSelector,
+    DropColumns,
+    DropNan,
+    DiscretizerTransformer,
+    QueryFilter,
+    StringToFloatTransformer,
+    StringToInt,
+)
+
+
+LOG_DIR =  DIR_REPO / 'solution' / 'logs' 
+os.makedirs(LOG_DIR, exist_ok=True)
+
+log_file = os.path.join(LOG_DIR, "test_eda.log")
+
+# Configure logging
+logging.basicConfig(
+    filename=log_file, 
+    level=logging.DEBUG,
+    filemode='w+'
+)
+
+logger = logging.getLogger(__name__)
+
+
+pd.set_option('display.max_columns', 150)
+
+DIR_DATA_RAW = Path(DIR_REPO) / "data" / "raw"
+FILEPATH_DATA = DIR_DATA_RAW / "listings.csv"
+
+COLUMNS = ['id', 'neighbourhood_group_cleansed', 'property_type', 'room_type', 'latitude', 'longitude', 'accommodates', 'bathrooms', 'bedrooms', 'beds','amenities', 'price']
+COLUMNS_PIPE = ['id', 'neighbourhood_group_cleansed', 'property_type', 'room_type', 'latitude', 'longitude', 'accommodates', 'bathrooms_text', 'bedrooms', 'beds','amenities', 'price']
+CAT_COLS = ['TV', 'Internet', 'Air conditioning', 'Kitchen', 'Heating', 'Wifi', 'Elevator', 'Breakfast'] 
+
+MAP_ROOM_TYPE = {"Shared room": 1, "Private room": 2, "Entire home/apt": 3, "Hotel room": 4}
+MAP_NEIGHB = {"Bronx": 1, "Queens": 2, "Staten Island": 3, "Brooklyn": 4, "Manhattan": 5}
+
+
+df_raw = pd.read_csv(FILEPATH_DATA, low_memory=False)
+
+logger.info("Aplying old code")
+df_raw_old = df_raw.copy()
+t1_old_code = datetime.datetime.now()
+
+df_raw_old_code = df_raw_old.drop(columns=['bathrooms'])
+
+# Get number of bathrooms from `bathrooms_text`
+def num_bathroom_from_text(text):
+    try:
+        if isinstance(text, str):
+            bath_num = text.split(" ")[0]
+            return float(bath_num)
+        else:
+            return np.nan
+    except ValueError:
+        return np.nan
+df_raw_old_code['bathrooms'] = df_raw_old_code['bathrooms_text'].apply(num_bathroom_from_text)
+df = df_raw_old_code[COLUMNS].copy()
+df.rename(columns={'neighbourhood_group_cleansed': 'neighbourhood'}, inplace=True)
+df = df.dropna(axis=0)
+
+# Convert string to numeric
+df['price'] = df['price'].str.extract(r"(\d+).")
+df['price'] = df['price'].astype(int)
+
+df = df[df['price'] >= 10]
+
+df['category'] = pd.cut(df['price'], bins=[10, 90, 180, 400, np.inf], labels=[0, 1, 2, 3])
+
+
+def preprocess_amenities_column(df: DataFrame) -> DataFrame:
+    
+    df['TV'] = df['amenities'].str.contains('TV')
+    df['TV'] = df['TV'].astype(int)
+    df['Internet'] = df['amenities'].str.contains('Internet')
+    df['Internet'] = df['Internet'].astype(int)
+    df['Air_conditioning'] = df['amenities'].str.contains('Air conditioning')
+    df['Air_conditioning'] = df['Air_conditioning'].astype(int)
+    df['Kitchen'] = df['amenities'].str.contains('Kitchen')
+    df['Kitchen'] = df['Kitchen'].astype(int)
+    df['Heating'] = df['amenities'].str.contains('Heating')
+    df['Heating'] = df['Heating'].astype(int)
+    df['Wifi'] = df['amenities'].str.contains('Wifi')
+    df['Wifi'] = df['Wifi'].astype(int)
+    df['Elevator'] = df['amenities'].str.contains('Elevator')
+    df['Elevator'] = df['Elevator'].astype(int)
+    df['Breakfast'] = df['amenities'].str.contains('Breakfast')
+    df['Breakfast'] = df['Breakfast'].astype(int)
+
+    df.drop('amenities', axis=1, inplace=True)
+    
+    return df
+
+
+df = preprocess_amenities_column(df)
+
+t2_old_code = datetime.datetime.now()
+
+
+logger.info("Aplying new code")
+t1_new_code = datetime.datetime.now()
+
+# ct = ColumnTransformer(
+#     transformers=[
+#         ('bathroom_processing', StringToFloatTransformer({'bathrooms_text': 'bathrooms'}), ['bathrooms_text']),
+#         ('array_to_cat', ArrayOneHotEncoder('amenities', CAT_COLS), 'amenities')
+#     ],
+#     remainder='passthrough',
+#     n_jobs= -1,
+#     verbose_feature_names_out=False,
+# )
+
+# ct.set_output(transform='pandas')
+
+# preprocessing_pipeline = Pipeline(steps=[
+#     ('col_selector', ColumnSelector(COLUMNS_PIPE)),
+#     ('column_transformer', ct),
+#     ('drop_na', DropNan(axis=0)),
+#     ('cast_price', StringToInt('price', r"(\d+)")),
+#     ('bin_price', DiscretizerTransformer('price', 'category', bins=[10, 90, 180, 400, np.inf], labels=[0, 1, 2, 3])),
+#     ('filter_rows', QueryFilter("price >= 10")),
+#     ('col_renamer_conditioning', ColumnRenamer(columns={'Air conditioning': 'Air_conditioning', 'neighbourhood_group_cleansed': 'neighbourhood'})),
+#     ('drop_cols', DropColumns('bathrooms_text'))
+#     ]
+# )
+
+preprocessing_pipeline = Pipeline(steps=[
+    ('col_selector', ColumnSelector(COLUMNS_PIPE)),
+    ('bathroom_processing', StringToFloatTransformer({'bathrooms_text': 'bathrooms'})),
+    ('cast_price', StringToInt('price', r"(\d+)")),
+    ('filter_rows', QueryFilter("price >= 10")),
+    ('drop_na', DropNan(axis=0)),
+    ('bin_price', DiscretizerTransformer('price', 'category', bins=[10, 90, 180, 400, np.inf], labels=[0, 1, 2, 3])),
+    ('array_to_cat', ArrayOneHotEncoder('amenities', CAT_COLS)),
+    ('col_renamer_conditioning', ColumnRenamer(columns={'Air conditioning': 'Air_conditioning', 'neighbourhood_group_cleansed': 'neighbourhood'})),
+    ('drop_cols', DropColumns('amenities'))
+])
+
+preprocessing_pipeline.set_output(transform='pandas')
+
+df_processed = preprocessing_pipeline.fit_transform(df_raw)
+
+t2_new_code = datetime.datetime.now()
+
+logger.info(f"""
+Old code time: {t2_old_code - t1_old_code}
+New code time: {t2_new_code - t1_new_code}
+Same result: {all(df == df_processed[df.columns])}
+""")
\ No newline at end of file
diff --git a/solution/code/test/develope_tests/test_explore_classifier.py b/solution/code/test/develope_tests/test_explore_classifier.py
new file mode 100644
index 0000000..8217ee1
--- /dev/null
+++ b/solution/code/test/develope_tests/test_explore_classifier.py
@@ -0,0 +1,143 @@
+import os
+import sys
+from pathlib import Path
+import logging
+import datetime
+import numpy as np
+import pandas as pd
+from pandas import DataFrame
+from sklearn.pipeline import Pipeline
+from sklearn.compose import ColumnTransformer
+from sklearn.ensemble import RandomForestClassifier
+from sklearn.metrics import accuracy_score, roc_auc_score, classification_report, confusion_matrix
+from sklearn.model_selection import train_test_split
+
+DIR_REPO = Path(__file__).parent.parent.parent.parent.parent
+os.chdir(DIR_REPO)
+
+# Import custom functions
+from code.src.transformer import (
+    ColumnSelector,
+    CustomOrdinalEncoder,
+    DropNan
+)
+
+LOG_DIR =  DIR_REPO / 'solution' / 'logs' 
+os.makedirs(LOG_DIR, exist_ok=True)
+
+log_file = os.path.join(LOG_DIR, "test_explore.log")
+
+# Configure logging
+logging.basicConfig(
+    filename=log_file, 
+    level=logging.DEBUG,
+    filemode='w+'
+)
+
+logger = logging.getLogger(__name__)
+
+
+DIR_DATA_PROCESSED = Path(DIR_REPO) / "data" / "processed"
+DIR_MODELS = Path(DIR_REPO) / "models"
+FILEPATH_PROCESSED = DIR_DATA_PROCESSED / "preprocessed_listings.csv"
+
+MAP_ROOM_TYPE = {"Shared room": 1, "Private room": 2, "Entire home/apt": 3, "Hotel room": 4}
+MAP_NEIGHB = {"Bronx": 1, "Queens": 2, "Staten Island": 3, "Brooklyn": 4, "Manhattan": 5}
+FEATURE_NAMES = ['neighbourhood', 'room_type', 'accommodates', 'bathrooms', 'bedrooms']
+TARGET_VARIABLE = "category"
+
+df = pd.read_csv(FILEPATH_PROCESSED, index_col=0)
+
+logging.info("Aplying old code")
+df_old = df.copy()
+
+t1_old_code = datetime.datetime.now()
+
+df_old = df_old.dropna(axis=0)
+# Map categorical features
+df_old["neighbourhood"] = df_old["neighbourhood"].map(MAP_NEIGHB)
+df_old["room_type"] = df_old["room_type"].map(MAP_ROOM_TYPE)
+X_old = df_old[FEATURE_NAMES]
+y_old = df_old['category']
+
+
+X_train_old, X_test_old, y_train_old, y_test_old = train_test_split(X_old, y_old, test_size=0.15, random_state=1)
+
+clf = RandomForestClassifier(n_estimators=500, random_state=0, class_weight='balanced', n_jobs=4)
+clf.fit(X_train_old, y_train_old)
+
+y_pred_old = clf.predict(X_test_old)
+
+acc_old = accuracy_score(y_test_old, y_pred_old)
+
+y_proba_old = clf.predict_proba(X_test_old)
+roc_old = roc_auc_score(y_test_old, y_proba_old, multi_class='ovr')
+
+maps = {'0.0': 'low', '1.0': 'mid', '2.0': 'high', '3.0': 'lux'}
+
+report = classification_report(y_test_old, y_pred_old, output_dict=True)
+df_report = pd.DataFrame.from_dict(report).T[:-3]
+df_report.index = [maps[i] for i in df_report.index]
+df_report_old = df_report.copy()
+
+t2_old_code = datetime.datetime.now()
+
+logging.info("Aplying new code")
+t1_new_code = datetime.datetime.now()
+
+
+ct =  ColumnTransformer(
+    [
+        ('ordinal_encoder', CustomOrdinalEncoder(categories=[list(MAP_NEIGHB.keys()), list(MAP_ROOM_TYPE.keys())], start_category=1), ["neighbourhood", "room_type"])
+    ],
+    remainder = "passthrough",
+    verbose_feature_names_out=False
+)
+
+processing_pipeline = Pipeline(steps=[
+    ('drop_na', DropNan(axis=0)),
+    ('categorical', ct),
+    ('col_selector', ColumnSelector(FEATURE_NAMES + [TARGET_VARIABLE]))
+    ]
+)
+
+processing_pipeline.set_output(transform='pandas')
+
+df_processed = processing_pipeline.fit_transform(df)
+
+X_new = df_processed[FEATURE_NAMES]
+y_new = df_processed['category']
+
+
+X_train_new, X_test_new, y_train_new, y_test_new = train_test_split(X_new, y_new, test_size=0.15, random_state=1)
+
+clf = RandomForestClassifier(n_estimators=500, random_state=0, class_weight='balanced', n_jobs=4)
+clf.fit(X_train_new, y_train_new)
+
+y_pred_new = clf.predict(X_test_new)
+
+acc_new = accuracy_score(y_test_new, y_pred_new)
+
+y_proba_new = clf.predict_proba(X_test_new)
+roc_new = roc_auc_score(y_test_new, y_proba_new, multi_class='ovr')
+
+maps = {'0.0': 'low', '1.0': 'mid', '2.0': 'high', '3.0': 'lux'}
+
+report = classification_report(y_test_old, y_pred_old, output_dict=True)
+df_report = pd.DataFrame.from_dict(report).T[:-3]
+df_report.index = [maps[i] for i in df_report.index]
+df_report_new = df_report.copy()
+
+t2_new_code = datetime.datetime.now()
+
+df_old = pd.concat([X_old, y_old], axis=1)
+
+
+logging.info(f"""
+Old code time: {t2_old_code - t1_old_code}
+New code time: {t2_new_code - t1_new_code}
+Same accuracy: {acc_old == acc_new}
+Same roc: {roc_old == roc_new}
+Same result: {all(df_processed == df_old)}
+Same report: {all(df_report_new == df_report_old)}
+""")
\ No newline at end of file
diff --git a/solution/code/test/test_transformers.py b/solution/code/test/test_transformers.py
new file mode 100644
index 0000000..e0c81b6
--- /dev/null
+++ b/solution/code/test/test_transformers.py
@@ -0,0 +1,427 @@
+import os
+from pathlib import Path
+import unittest
+import numpy as np
+import pandas as pd
+import re
+from code.src.transformer import (
+    ArrayOneHotEncoder,
+    ColumnRenamer,
+    ColumnSelector,
+    CustomOrdinalEncoder,
+    DropColumns,
+    DropNan,
+    DiscretizerTransformer,
+    QueryFilter,
+    StringToFloatTransformer,
+    StringToInt
+)
+
+DIR_REPO = Path(__file__).parent.parent.parent.parent
+log_dir = os.path.join(DIR_REPO, "solution", "logs")
+os.makedirs(log_dir, exist_ok=True)
+log_file = os.path.join(log_dir, "unittests.log")
+
+
+class TestStringToFloatTransformer(unittest.TestCase):
+
+    def setUp(self):
+        # Sample data for testing
+        self.data = pd.DataFrame({
+            'bathrooms_text': ['1 private bath', '1 bath', 'NaN', '1.5 baths'],
+        })
+
+    def test_transform_with_dict_column(self):
+        transformer = StringToFloatTransformer(columns={"bathrooms_text": "bathrooms"})
+        transformed_data = transformer.transform(self.data)
+        expected_data = self.data.copy()
+        expected_data["bathrooms"] = pd.Series([1, 1, np.nan, 1.5])
+        pd.testing.assert_frame_equal(transformed_data, expected_data)
+
+    def test_transform_with_list_column(self):
+        transformer = StringToFloatTransformer(columns=["bathrooms_text"])
+        transformed_data = transformer.transform(self.data)
+        expected_data = self.data.copy()
+        expected_data["bathrooms_text"] = pd.Series([1, 1, np.nan, 1.5])
+        pd.testing.assert_frame_equal(transformed_data, expected_data)
+
+    def test_get_feature_names_out(self):
+        transformer = StringToFloatTransformer(columns={"bathrooms_text": "bathrooms"})
+        transformer.transform(self.data)
+        self.assertListEqual(transformer.get_feature_names_out(None), list(transformer.out_cols))
+
+# Test for ColumnSelector
+class TestColumnSelector(unittest.TestCase):
+
+    def setUp(self):
+        self.data = pd.DataFrame({
+            'price': [10, 20, 30],
+            'quantity': [1, 2, 3],
+            'description': ['A', 'B', 'C']
+        })
+
+
+    def test_transform_single_column(self):
+        transformer = ColumnSelector(columns="price")
+        transformed_data = transformer.transform(self.data)
+        expected_output = pd.DataFrame({'price': [10, 20, 30]})
+        pd.testing.assert_frame_equal(transformed_data, expected_output)
+
+    def test_transform_multiple_columns(self):
+        transformer = ColumnSelector(columns=["price", "quantity"])
+        transformed_data = transformer.transform(self.data)
+        expected_output = self.data[["price", "quantity"]]
+        pd.testing.assert_frame_equal(transformed_data, expected_output)
+
+    def test_get_feature_names_out(self):
+        transformer = ColumnSelector(columns=["price", "quantity"])
+        transformer.transform(self.data)
+        self.assertListEqual(list(transformer.get_feature_names_out(None)), transformer.columns)
+
+
+class TestColumnRenamer(unittest.TestCase):
+
+    def setUp(self):
+        self.data = pd.DataFrame({
+            'old_price': [10, 20, 30],
+            'quantity': [1, 2, 3]
+        })
+
+    def test_transform_column_renaming(self):
+        transformer = ColumnRenamer(columns={"old_price": "price", "old_description": "description"})
+        transformed_data = transformer.transform(self.data)
+        expected_output = self.data.rename(columns={"old_price": "price", "old_description": "description"})
+        pd.testing.assert_frame_equal(transformed_data, expected_output)
+
+    def test_transform_invalid_data_type(self):
+        transformer = ColumnRenamer(columns={"old_price": "price"})
+        with self.assertRaises(ValueError):
+            transformer.transform(["Not a DataFrame"])
+
+    def test_get_feature_names_out(self):
+        transformer = ColumnRenamer(columns={"old_price": "price", "old_description": "description"})
+        transformer.transform(self.data)
+        self.assertListEqual(list(transformer.get_feature_names_out(None)), list(transformer.out_cols))
+
+
+class TestDropNan(unittest.TestCase):
+
+    def setUp(self):
+        self.data = pd.DataFrame({
+            'price': [10, np.nan, 30, 40, 50],
+            'quantity': [1, 2, 3, 4, 5]
+        },
+        dtype=np.float32
+        )
+
+    def test_transform_rows(self):
+        transformer = DropNan(axis=0)
+        transformed_data = transformer.transform(self.data)
+        expected_output = pd.DataFrame({
+            'price': [10.0, 30.0, 40.0, 50.0],
+            'quantity': [1.0, 3.0, 4.0, 5.0]
+        },
+        dtype= np.float32
+        )
+        pd.testing.assert_frame_equal(transformed_data.reset_index(drop=True), expected_output)
+
+    def test_transform_columns(self):
+        transformer = DropNan(axis=1)
+        transformed_data = transformer.transform(self.data)
+        expected_output = pd.DataFrame({
+            'quantity': [1.0, 2.0, 3.0, 4.0, 5.0]
+        },
+        dtype= np.float32
+        )
+        pd.testing.assert_frame_equal(transformed_data, expected_output)
+
+    def test_transform_invalid_data_type(self):
+        transformer = DropNan(axis=0)
+        with self.assertRaises(ValueError):
+            transformer.transform("This is not valid")  
+
+    def test_get_feature_names_out(self):
+        transformer = DropNan(axis=0)
+        transformed_data = transformer.transform(self.data)
+        feature_names = transformer.get_feature_names_out(transformed_data.columns)
+        self.assertListEqual(list(feature_names), list(self.data.columns))
+
+
+class TestDropColumns(unittest.TestCase):
+
+    def setUp(self):
+        self.data = pd.DataFrame({
+            'price': [10, 20, 30, 40, 50],
+            'quantity': [1, 2, 3, 4, 5],
+            'description': ['A', 'B', 'C', 'D', 'E']
+        })
+
+    def test_transform_single_column(self):
+        transformer = DropColumns(columns="price")
+        transformed_data = transformer.transform(self.data)
+        expected_output = pd.DataFrame({
+            'quantity': [1, 2, 3, 4, 5],
+            'description': ['A', 'B', 'C', 'D', 'E']
+        })
+        pd.testing.assert_frame_equal(transformed_data, expected_output)
+
+    def test_transform_multiple_columns(self):
+        transformer = DropColumns(columns=["price", "quantity"])
+        transformed_data = transformer.transform(self.data)
+        expected_output = pd.DataFrame({
+            'description': ['A', 'B', 'C', 'D', 'E']
+        })
+        pd.testing.assert_frame_equal(transformed_data, expected_output)
+
+    def test_transform_invalid_data_type(self):
+        transformer = DropColumns(columns="price")
+        with self.assertRaises(ValueError):
+            transformer.transform("This is not valid")  
+
+    def test_get_feature_names_out(self):
+        transformer = DropColumns(columns="price")
+        transformed_data = transformer.transform(self.data)
+        feature_names = transformer.get_feature_names_out(transformed_data.columns)
+        self.assertListEqual(list(feature_names), list(transformed_data.columns))
+
+
+class TestStringToInt(unittest.TestCase):
+
+    def setUp(self):
+        # Sample data for testing
+        self.data = pd.DataFrame({
+            'price': ["$10", "$20", "$30", "$40", "$50"],
+            'quantity': ["1 unit", "2 units", "3 units", "4 units", "5 units"]
+        })
+
+        self.columns = ['price', 'quantity']
+        self.patterns = [r'\d+', r'\d+']
+
+    def test_initialization_mismatched_types(self):
+        with self.assertRaises(ValueError):
+            StringToInt(columns=['price', 'quantity'], patterns=r'\d+')
+
+    def test_initialization_mismatched_list_lengths(self):
+        with self.assertRaises(ValueError):
+            StringToInt(columns=['price', 'quantity'], patterns=[r'\d+'])
+
+    def test_apply_regex(self):
+        transformer = StringToInt(columns=self.columns, patterns=self.patterns)
+        match = transformer._apply_regex("$10", re.compile(r'\d+'))
+        self.assertEqual(match, "10")
+
+    def test_transform_single_column(self):
+        transformer = StringToInt(columns=self.columns[0], patterns=self.patterns[0])
+        transformed_data = transformer.transform(self.data.drop(columns='quantity'))
+        expected_output = pd.DataFrame({'price': [10, 20, 30, 40, 50]})
+        pd.testing.assert_frame_equal(transformed_data, expected_output)
+
+    def test_transform_multiple_columns(self):
+        transformer = StringToInt(columns=self.columns, patterns=self.patterns)
+        transformed_data = transformer.transform(self.data)
+        expected_output = pd.DataFrame({
+            'price': [10, 20, 30, 40, 50],
+            'quantity': [1, 2, 3, 4, 5]
+        })
+        pd.testing.assert_frame_equal(transformed_data, expected_output)
+
+
+    def test_get_feature_names_out(self):
+        transformer = StringToInt(columns=self.columns[0], patterns=self.patterns[0])
+        transformer.transform(self.data.drop(columns='quantity'))
+        feature_names = transformer.get_feature_names_out(self.data.columns[0])
+        self.assertListEqual(list(feature_names), [self.data.columns[0]])
+
+
+class TestQueryFilter(unittest.TestCase):
+
+    def setUp(self):
+        # Sample data for testing
+        self.data = pd.DataFrame({
+            'price': [10, 20, 30, 40, 50],
+            'quantity': [1, 2, 3, 4, 5]
+        })
+
+    def test_transform_valid_query(self):
+        filter_transformer = QueryFilter(query_string="price > 20")
+        transformed_data = filter_transformer.transform(self.data)
+        expected_output = pd.DataFrame({
+            'price': [30, 40, 50],
+            'quantity': [3, 4, 5]
+        })
+        pd.testing.assert_frame_equal(transformed_data.reset_index(drop=True), expected_output)
+
+
+    def test_get_feature_names_out(self):
+        filter_transformer = QueryFilter(query_string="price > 20")
+        filter_transformer.transform(self.data)
+        feature_names = filter_transformer.get_feature_names_out(self.data.columns)
+        self.assertListEqual(list(feature_names), list(self.data.columns))
+
+
+class TestDiscretizerTransformer(unittest.TestCase):
+
+    def setUp(self):
+        self.data = pd.DataFrame({
+            'price': [10, 30, 50, 70, 100],
+            'age': [20, 40, 60, 80, 100]
+        })
+
+        self.bins = [[0, 25, 50, 75, np.inf], [0, 35, 70, np.inf]]
+        self.labels = [[0, 1, 2, 3], [0, 1, 2]]
+        self.columns = ['price', 'age']
+        self.new_colnames = ['price_category', 'age_category']
+
+    def test_initialization_invalid_params(self):
+        with self.assertRaises(ValueError):
+            DiscretizerTransformer(
+                columns=['price', 'age'],
+                new_colnames=['price_category'],
+                bins=[self.bins[0]],
+                labels=[self.labels]
+            )
+
+        with self.assertRaises(ValueError):
+            DiscretizerTransformer(
+                columns='price',
+                new_colnames='price_category',
+                bins=self.bins,  
+                labels=[0, 1]
+            )
+
+    def test_transform_single_column(self):
+        transformer = DiscretizerTransformer(
+            columns='price',
+            new_colnames='price_category',
+            bins=self.bins[0],
+            labels=self.labels[0]
+        )
+
+        transformed_data = transformer.fit_transform(self.data.drop(columns=['age']))
+        expected_output = pd.DataFrame({
+            'price': [10, 30, 50, 70, 100],
+            'price_category': pd.Categorical([0, 1, 1, 2, 3], categories=self.labels[0], ordered=True)
+        })
+        pd.testing.assert_frame_equal(transformed_data, expected_output)
+
+    def test_transform_multiple_columns(self):
+        transformer = DiscretizerTransformer(
+            columns=self.columns,
+            new_colnames=self.new_colnames,
+            bins=self.bins,
+            labels=self.labels
+        )
+        transformed_data = transformer.transform(self.data)
+        expected_output = pd.DataFrame({
+            'price': [10, 30, 50, 70, 100],
+            'age': [20, 40, 60, 80, 100],
+            'price_category': pd.Categorical([0, 1, 1, 2, 3], categories=self.labels[0], ordered=True),
+            'age_category': pd.Categorical([0, 1, 1, 2, 2], categories=self.labels[1], ordered=True)
+        })
+        pd.testing.assert_frame_equal(transformed_data, expected_output)
+
+    def test_get_feature_names_out(self):
+        transformer = DiscretizerTransformer(
+            columns=self.columns,
+            new_colnames=self.new_colnames,
+            bins=self.bins,
+            labels=self.labels
+        )
+
+        transformer.fit_transform(self.data)
+        feature_names = transformer.get_feature_names_out(self.columns)
+        expected_output = list(self.data.columns) + self.new_colnames
+        self.assertListEqual(list(feature_names), expected_output)
+
+
+class TestArrayOneHotEncoder(unittest.TestCase):
+
+    def setUp(self):
+        self.data_df = pd.DataFrame({
+            'amenities': [
+                '["Extra pillows and blankets", "Baking sheet", "Wifi", "Heating", "Dishes and silverware", "Essentials", ]',
+                '["Extra pillows and blankets", "Luggage dropoff allowed", "Free parking on premises", "Wifi", "Heating"]',
+                '["Kitchen", "Long term stays allowed", "Heating", "Air conditioning", "Pool"]'
+                ]
+        })
+        self.categories = ['Wifi', 'Parking', 'Pool', 'Heating']
+        self.data_ndarray = self.data_df.amenities.to_numpy()
+        self.encoder = ArrayOneHotEncoder(column='amenities', categories=self.categories)
+
+    def test_initialization(self):
+        self.assertEqual(self.encoder.column, 'amenities')
+        self.assertEqual(self.encoder.categories, self.categories)
+
+
+    def test_transform_with_dataframe(self):
+        transformed_df = self.encoder.transform(self.data_df)
+        expected_columns = list(self.data_df.columns) + self.categories
+        self.assertTrue(all(col in transformed_df.columns for col in expected_columns))
+        expected_output = pd.DataFrame({
+            'amenities': self.data_df.amenities.tolist(),
+            'Wifi': [1, 1, 0],
+            'Parking': [0, 0, 0],
+            'Pool': [0, 0, 1],
+            'Heating': [1, 1, 1]
+        })
+        pd.testing.assert_frame_equal(transformed_df, expected_output)
+
+    def test_transform_with_ndarray(self):
+        transformed_array = self.encoder.transform(self.data_ndarray)
+        expected_output = [[1, 0, 0, 1], [1, 0, 0, 1], [0, 0, 1, 1]]
+        np.testing.assert_array_equal(transformed_array, expected_output)
+
+    def test_get_feature_names_out(self):
+        self.encoder.transform(self.data_df)
+        feature_names = self.encoder.get_feature_names_out(['amenities'])
+        self.assertListEqual(list(feature_names), list(self.data_df.columns) + self.categories)
+
+
+class TestCustomOrdinalEncoder(unittest.TestCase):
+
+    def setUp(self):
+        self.categories = [['low', 'medium', 'high']]
+        self.data = pd.DataFrame({'quality': ['low', 'medium', 'high', 'low']})
+
+    def test_initialization_invalid_categories_type(self):
+        with self.assertRaises(ValueError) as context:
+            CustomOrdinalEncoder(categories="not list")
+        self.assertIn("The categories must be passed as a list os list", str(context.exception))
+
+    def test_initialization_invalid_categories_format(self):
+        with self.assertRaises(ValueError) as context:
+            CustomOrdinalEncoder(categories=["not nested list"])
+        self.assertIn("The categories must be passed as a list os list", str(context.exception))
+
+    def test_initialization_invalid_start_category(self):
+        with self.assertRaises(ValueError) as context:
+            CustomOrdinalEncoder(categories=self.categories, start_category="not int")
+        self.assertIn("The starting category must be a integer", str(context.exception))
+
+    def test_fit_transform_with_start_category(self):
+        encoder = CustomOrdinalEncoder(categories=self.categories, start_category=1)
+        transformed_data = encoder.fit_transform(self.data)
+        expected_output = np.array([[1], [2], [3], [1]])
+        np.testing.assert_array_equal(transformed_data, expected_output)
+
+    def test_fit_transform_no_start_category(self):
+        encoder = CustomOrdinalEncoder(categories=self.categories)
+        transformed_data = encoder.fit_transform(self.data)
+        expected_output = np.array([[0], [1], [2], [0]])
+        np.testing.assert_array_equal(transformed_data, expected_output)
+
+    def test_get_feature_names_out(self):
+        encoder = CustomOrdinalEncoder(categories=self.categories)
+        encoder.fit(self.data)
+        feature_names = encoder.get_feature_names_out(['quality'])
+        self.assertEqual(feature_names, ['quality'])
+
+
+
+
+
+if __name__ == '__main__':
+    # Open the log file in write mode and run tests with custom TextTestRunner
+    with open(log_file, "w") as f:
+        runner = unittest.TextTestRunner(stream=f)
+        unittest.main(testRunner=runner, exit=False)
diff --git a/solution/docker/.env b/solution/docker/.env
new file mode 100644
index 0000000..0dc84bf
--- /dev/null
+++ b/solution/docker/.env
@@ -0,0 +1 @@
+ MLFLOW_TRACKING_URI="http://mlflow:5000"
\ No newline at end of file
diff --git a/solution/docker/Dockerfile.app b/solution/docker/Dockerfile.app
new file mode 100644
index 0000000..bdc1d1d
--- /dev/null
+++ b/solution/docker/Dockerfile.app
@@ -0,0 +1,16 @@
+FROM python:3.12-slim
+
+COPY solution/app /the-real-mle-challenge/solution/app
+COPY solution/code /the-real-mle-challenge/solution/code
+COPY solution/requirements.txt requirements.txt
+
+RUN pip install -r requirements.txt
+
+ENV PYTHONPATH="${PYTHONPATH}:/the-real-mle-challenge/solution/app:/the-real-mle-challenge/solution\
+:/the-real-mle-challenge/solution/code:/the-real-mle-challenge/solution/code/src:/the-real-mle-challenge/solution"
+
+WORKDIR /the-real-mle-challenge/solution/app
+
+EXPOSE 8080
+
+CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]
\ No newline at end of file
diff --git a/solution/docker/Dockerfile.mlflow b/solution/docker/Dockerfile.mlflow
new file mode 100644
index 0000000..2354505
--- /dev/null
+++ b/solution/docker/Dockerfile.mlflow
@@ -0,0 +1,7 @@
+FROM python:3.12-slim
+
+RUN pip install mlflow
+
+EXPOSE 5000
+
+CMD mlflow server --host 0.0.0.0 --port 5000
\ No newline at end of file
diff --git a/solution/docker/Dockerfile.pipe b/solution/docker/Dockerfile.pipe
new file mode 100644
index 0000000..e775cd1
--- /dev/null
+++ b/solution/docker/Dockerfile.pipe
@@ -0,0 +1,24 @@
+FROM python:3.12-slim
+
+RUN apt-get update && \
+    apt-get upgrade -y && \
+    apt-get install -y git
+
+COPY solution/plots /the-real-mle-challenge/solution/plots
+COPY solution/code /the-real-mle-challenge/solution/code
+COPY data /the-real-mle-challenge/data
+COPY solution/requirements.txt /the-real-mle-challenge/solution/requirements.txt
+
+RUN pip install -r /the-real-mle-challenge/solution/requirements.txt
+
+ENV PYTHONPATH="${PYTHONPATH}:/the-real-mle-challenge/solution/"
+
+WORKDIR /the-real-mle-challenge/solution/
+
+CMD sh -c "\
+python code/test/develope_tests/test_eda.py && \
+python code/test/develope_tests/test_explore_classifier.py && \
+python code/test/test_transformers.py && \
+python code/generate_plots.py && \
+python code/pipeline.py"
+
diff --git a/solution/docker/docker-compose.yml b/solution/docker/docker-compose.yml
new file mode 100644
index 0000000..a117085
--- /dev/null
+++ b/solution/docker/docker-compose.yml
@@ -0,0 +1,32 @@
+services:
+  
+  training-pipeline:
+    container_name: training-pipeline
+    build:
+      context: ../../
+      dockerfile: ./solution/docker/Dockerfile.pipe
+    volumes:
+      - ../models:/the-real-mle-challenge/solution/models
+      - ../plots:/the-real-mle-challenge/solution/plots
+      - ../logs:/the-real-mle-challenge/solution/logs
+    env_file: .env
+     
+  app:
+    container_name: app
+    build:
+      context: ../../
+      dockerfile: ./solution/docker/Dockerfile.app
+    volumes:
+      - ../models:/the-real-mle-challenge/solution/models
+    ports:
+      - 8080:8080
+    env_file: .env
+
+  mlflow:
+    container_name: mlflow
+    build: 
+      context: ../../
+      dockerfile: ./solution/docker/Dockerfile.mlflow
+    ports:
+      - "5000:5000"
+    restart: unless-stopped
diff --git a/solution/logs/test_eda.log b/solution/logs/test_eda.log
new file mode 100644
index 0000000..dd2919b
--- /dev/null
+++ b/solution/logs/test_eda.log
@@ -0,0 +1,7 @@
+INFO:__main__:Aplying old code
+INFO:__main__:Aplying new code
+INFO:__main__:
+Old code time: 0:00:00.131001
+New code time: 0:00:00.168337
+Same result: True
+
diff --git a/solution/logs/test_explore.log b/solution/logs/test_explore.log
new file mode 100644
index 0000000..785564a
--- /dev/null
+++ b/solution/logs/test_explore.log
@@ -0,0 +1,10 @@
+INFO:root:Aplying old code
+INFO:root:Aplying new code
+INFO:root:
+Old code time: 0:00:01.193487
+New code time: 0:00:01.176504
+Same accuracy: True
+Same roc: True
+Same result: True
+Same report: True
+
diff --git a/solution/logs/unittests.log b/solution/logs/unittests.log
new file mode 100644
index 0000000..b26bb2b
--- /dev/null
+++ b/solution/logs/unittests.log
@@ -0,0 +1,5 @@
+.......................................
+----------------------------------------------------------------------
+Ran 39 tests in 0.016s
+
+OK
diff --git a/solution/models/classifier.joblib b/solution/models/classifier.joblib
new file mode 100644
index 0000000..95b3f06
Binary files /dev/null and b/solution/models/classifier.joblib differ
diff --git a/solution/models/col_transformer.joblib b/solution/models/col_transformer.joblib
new file mode 100644
index 0000000..084167c
Binary files /dev/null and b/solution/models/col_transformer.joblib differ
diff --git a/solution/models/pipeline.joblib b/solution/models/pipeline.joblib
new file mode 100644
index 0000000..cfecdb6
Binary files /dev/null and b/solution/models/pipeline.joblib differ
diff --git a/solution/models/preprocessing_pipeline.joblib b/solution/models/preprocessing_pipeline.joblib
new file mode 100644
index 0000000..cb65910
Binary files /dev/null and b/solution/models/preprocessing_pipeline.joblib differ
diff --git a/solution/models/processing_pipeline.joblib b/solution/models/processing_pipeline.joblib
new file mode 100644
index 0000000..8184c50
Binary files /dev/null and b/solution/models/processing_pipeline.joblib differ
diff --git a/solution/plots/bathroom.png b/solution/plots/bathroom.png
new file mode 100644
index 0000000..18b4bb4
Binary files /dev/null and b/solution/plots/bathroom.png differ
diff --git a/solution/plots/cat_encoder.png b/solution/plots/cat_encoder.png
new file mode 100644
index 0000000..5c17a2f
Binary files /dev/null and b/solution/plots/cat_encoder.png differ
diff --git a/solution/plots/pd_cut.png b/solution/plots/pd_cut.png
new file mode 100644
index 0000000..ad338d2
Binary files /dev/null and b/solution/plots/pd_cut.png differ
diff --git a/solution/plots/price.png b/solution/plots/price.png
new file mode 100644
index 0000000..173b9af
Binary files /dev/null and b/solution/plots/price.png differ
diff --git a/solution/requirements.txt b/solution/requirements.txt
new file mode 100644
index 0000000..b3eff92
--- /dev/null
+++ b/solution/requirements.txt
@@ -0,0 +1,99 @@
+alembic==1.13.3
+annotated-types==0.7.0
+anyio==4.6.2.post1
+appnope==0.1.4
+asttokens==2.4.1
+blinker==1.8.2
+cachetools==5.5.0
+certifi==2024.8.30
+charset-normalizer==3.4.0
+click==8.1.7
+cloudpickle==3.1.0
+comm==0.2.2
+contourpy==1.3.0
+cycler==0.12.1
+databricks-sdk==0.36.0
+debugpy==1.8.7
+decorator==5.1.1
+Deprecated==1.2.14
+docker==7.1.0
+executing==2.1.0
+fastapi==0.115.3
+Flask==3.0.3
+fonttools==4.54.1
+gitdb==4.0.11
+GitPython==3.1.43
+google-auth==2.35.0
+graphene==3.4.1
+graphql-core==3.2.5
+graphql-relay==3.2.0
+gunicorn==23.0.0
+h11==0.14.0
+idna==3.10
+importlib_metadata==8.4.0
+ipykernel==6.29.5
+ipython==8.29.0
+itsdangerous==2.2.0
+jedi==0.19.1
+Jinja2==3.1.4
+joblib==1.4.2
+jupyter_client==8.6.3
+jupyter_core==5.7.2
+kiwisolver==1.4.7
+Mako==1.3.6
+Markdown==3.7
+MarkupSafe==3.0.2
+matplotlib==3.9.2
+matplotlib-inline==0.1.7
+mlflow==2.17.1
+mlflow-skinny==2.17.1
+nest-asyncio==1.6.0
+numpy==2.1.2
+opentelemetry-api==1.27.0
+opentelemetry-sdk==1.27.0
+opentelemetry-semantic-conventions==0.48b0
+packaging==24.1
+pandas==2.2.3
+parso==0.8.4
+pexpect==4.9.0
+pillow==11.0.0
+platformdirs==4.3.6
+prompt_toolkit==3.0.48
+protobuf==5.28.3
+psutil==6.1.0
+ptyprocess==0.7.0
+pure_eval==0.2.3
+pyarrow==17.0.0
+pyasn1==0.6.1
+pyasn1_modules==0.4.1
+pydantic==2.9.2
+pydantic_core==2.23.4
+Pygments==2.18.0
+pyparsing==3.2.0
+python-dateutil==2.9.0.post0
+pytz==2024.2
+PyYAML==6.0.2
+pyzmq==26.2.0
+requests==2.32.3
+rsa==4.9
+scikit-learn==1.5.2
+scipy==1.14.1
+seaborn==0.13.2
+six==1.16.0
+smmap==5.0.1
+sniffio==1.3.1
+SQLAlchemy==2.0.36
+sqlparse==0.5.1
+stack-data==0.6.3
+starlette==0.41.2
+threadpoolctl==3.5.0
+tornado==6.4.1
+traitlets==5.14.3
+typing_extensions==4.12.2
+tzdata==2024.2
+urllib3==2.2.3
+uvicorn==0.32.0
+wcwidth==0.2.13
+Werkzeug==3.0.6
+wrapt==1.16.0
+zipp==3.20.2