-
Notifications
You must be signed in to change notification settings - Fork 7
Poverty example #66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: poverty
Are you sure you want to change the base?
Poverty example #66
Conversation
| transform (torchvision.transforms): transform to apply to the data""" | ||
|
|
||
| super().__init__() | ||
| dataframe = pd.read_csv(dataset_path + labels_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider encapsulating your strings in pathlib.Path objects to avoid '/' (slash) sensitiveness.
e.g.:
from pathlib import Path
dataframe = pd.read_csv(Path(dataset_path) / Path(labels_name))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, I removed the dataset from the PR.
I did make the chage though, thanks for the advice!
| self.dataframe_train = dataframe[dataframe['subset'] == 'train'] | ||
| self.dataframe_val = dataframe[dataframe['subset'] == 'val'] | ||
| self.dataframe_test = dataframe[dataframe['subset'] == 'test'] | ||
| self.tif_dir = dataset_path + tif_dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider encapsulating your strings in pathlib.Path objects to avoid '/' (slash) sensitiveness.
e.g.:
from pathlib import Path
self.tif_dir = Path(dataset_path) / Path(tif_dir)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, I removed the dataset from the PR.
I did make the chage though, thanks for the advice!
| self.inference_batch_size = inference_batch_size | ||
| self.dict_normalize = json.load(open('examples/poverty/mean_std_normalize.json', 'r')) | ||
| self.num_workers = num_workers | ||
| self.task = 'regression' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The task argument is expected to be registered in an example config file. However here it is hardcoded.
Consider either adding the 'task' argument as input of your init() method; or checking in **kwargs if such and argument exists (taking priority over the default value 'regression')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, I removed the dataset from the PR.
I did make the chage though, thanks for the advice!
| dataset = MSDataset(self.dataframe_test, self.tif_dir, transform=self.test_transform()) | ||
| return dataset | ||
|
|
||
| def train_dataloader(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As your code is, it is only useful to re-define train/val/test_dataloaders to use persistent_workers=True, which is relevant if your epochs are fast and you want to make your training faster by avoiding to re-create workers at each epoch. This is done at the cost of memory.
If this is the intended use, ignore this comment. In the contrary case, you can remove altogether the re-defined train/val/test_dataloaders
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, I removed the dataset from the PR.
I did make the chage though, thanks for the advice!
|
|
||
| loss = self.loss(y_hat, self._cast_type_to_loss(y)) # Shape mismatch for binary: need to 'y = y.unsqueeze(1)' (or use .reshape(2)) to cast from [2] to [2,1] and cast y to float with .float() | ||
| self.log(f"loss/{split}", loss, **log_kwargs) | ||
| self.log(f"loss_{split}", loss, **log_kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing that character would also mean updating every example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't think of that. I undid it
| else: | ||
| score = metric_func(y_hat, y) | ||
| self.log(f"{metric_name}/{split}", score, **log_kwargs) | ||
| self.log(f"{metric_name}_{split}", score, **log_kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| for metric_name, metric_func in self.metrics.items(): | ||
| if isinstance(metric_func, dict): | ||
| score = metric_func['callable'](y_hat, y, **metric_func['kwargs']) | ||
| if metric_func['kwargs']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, metric kwargs are expected to be added in the config file; however, when the kwargs is empty, calling metric_func with kwargs will trigger a None-type error.
I suggest you replace that if condition by the following line instead:
score = metric_func['callable'](y_hat, y, metric_func.get('kwargs', {}))
replacing the old line 167.
This is a good addition, thx!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did the suggestion
| """ | ||
| device = 'cuda' if torch.cuda.is_available() else 'cpu' | ||
| self.model.to(device) | ||
| data = data.to(device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data isn't guaranteed to be a Tensor since this method doesn't require it to be (intentional behavior since the method should be callable by hand on any data type).
If data isn't a Tensor, the calling of .to() will fail and crash.
The data (or its elements) are cast to device anyway in the following lines.
=> Recommendation: delete this line
| lr=lr, | ||
| weight_decay=weight_decay) | ||
|
|
||
| print(optimizer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove debug print
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
|
||
| print(optimizer) | ||
|
|
||
| if 'regression' in task: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No default value for loss. Risk of error if task doesn't contain 'regression' string as loss will be None. The error is caught by malpolon.models.utils.check_loss() but I advise handling a 'default loss value' case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed the condition since the task is filtered
|
|
||
| model = check_model(model) | ||
|
|
||
| optimizer = torch.optim.AdamW(model.parameters(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
INFO: optimizers instantiation have changed behavior since update v2.1.0 and is now a user choice (via config file.
This is still 100% compatible with the update, but consider adding a optimizer input argument and nesting your optimizer variable instantiation inside the condition if optimizer is None:
(see malpolon.models.standard_prediction_systems.ClassificationSystem)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
|
||
| super().__init__(model, loss, optimizer, metrics=metrics) | ||
|
|
||
| def configure_optimizers(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
INFO: since the v2.1.0 update, this method doesn't need to be re-defined unless you want to impose a fixed optimizer/scheduler as both of these objects can be defined by the user in his experiment config file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
withdrawn
malpolon/models/utils.py
Outdated
| from pathlib import Path | ||
| from typing import Mapping, Union | ||
|
|
||
| import torchmetrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import not used ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
withdrawn
tlarcher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed the engine side of your PR as requested. Couple of things to address. I'll let you remove the example/dataset part for now.
When everything is addressed I'll merge this in a new branch which will wait for you examples when they are ready.
If you want, you can adapt the (optimizer + config file) parts in accordance to the v2.1.0 update but that isn't necessary.
tlarcher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed the engine side of your PR as requested. Couple of things to address. I'll let you remove the example/dataset part for now.
When everything is addressed I'll merge this in a new branch which will wait for you examples when they are ready.
If you want, you can adapt the (optimizer + config file) parts in accordance to the v2.1.0 update but that isn't necessary.
|
|
||
| class RegressionSystem(GenericPredictionSystem): | ||
| """Regression task class.""" | ||
| def __init__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some arguments (optimizer kwargs) are not used because incompatible with the proposed default optimizer. I suggest replacing them with the default optimizer's; or simply removing them.
Please update the docstring accordingly too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed the kwargs
Added Malpolon paper shield badge



📝 Changelog
✅ Checklist