Skip to content

Conversation

@Alexandr-Solovev
Copy link
Contributor

Description


Checklist:

Completeness and readability

  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with updates and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.
  • I have extended testing suite if new functionality was introduced in this PR.

Performance

  • I have measured performance for affected algorithms using scikit-learn_bench and provided at least a summary table with measured data, if performance change is expected.
  • I have provided justification why performance and/or quality metrics have changed or why changes are not expected.
  • I have extended the benchmarking suite and provided a corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

@Alexandr-Solovev Alexandr-Solovev added the dependencies Pull requests that update a dependency file label Jan 6, 2026
@david-cortes-intel
Copy link
Contributor

I think moving the data to a folder named 'dev' makes it harder to find.

@Alexandr-Solovev
Copy link
Contributor Author

I think moving the data to a folder named 'dev' makes it harder to find.

I’m open to moving the data to a different folder. My first suggestion was oneDAL/data, but I’m not sure. Open to other suggestions, @Vika-F.

@david-cortes-intel
Copy link
Contributor

I think moving the data to a folder named 'dev' makes it harder to find.

I’m open to moving the data to a different folder. My first suggestion was oneDAL/data, but I’m not sure. Open to other suggestions, @Vika-F.

What's wrong with the current location?

@Alexandr-Solovev
Copy link
Contributor Author

I think moving the data to a folder named 'dev' makes it harder to find.

I’m open to moving the data to a different folder. My first suggestion was oneDAL/data, but I’m not sure. Open to other suggestions, @Vika-F.

What's wrong with the current location?

~100MB of datasets are duplications and I would like to merge/unify them in the same folder

@david-cortes-intel
Copy link
Contributor

I think moving the data to a folder named 'dev' makes it harder to find.

I’m open to moving the data to a different folder. My first suggestion was oneDAL/data, but I’m not sure. Open to other suggestions, @Vika-F.

What's wrong with the current location?

~100MB of datasets are duplications and I would like to merge/unify them in the same folder

But why would that require moving them elsewhere? Perhaps they could be grouped by dataset instead of by algorithm, or something like that.

@Alexandr-Solovev
Copy link
Contributor Author

I think moving the data to a folder named 'dev' makes it harder to find.

I’m open to moving the data to a different folder. My first suggestion was oneDAL/data, but I’m not sure. Open to other suggestions, @Vika-F.

What's wrong with the current location?

~100MB of datasets are duplications and I would like to merge/unify them in the same folder

But why would that require moving them elsewhere? Perhaps they could be grouped by dataset instead of by algorithm, or something like that.

Okey, the problem is:
For now we have 4 places of datasets duplications:
examples
-daal
--datasets
-oneapi
--datasets

samples
-daal
--datasets
-oneapi
--datasets

And I want to remove datasets duplications here. For me it makes sense to move them in a separate folder with common acces for all samples and examples

@david-cortes-intel
Copy link
Contributor

For me it makes sense to move them in a separate folder with common acces for all samples and examples

Could it be under a similar folder as where the CSV reader is?

@ethanglaser
Copy link
Contributor

having a data/ folder in the root directory makes sense to me

@Alexandr-Solovev
Copy link
Contributor Author

For me it makes sense to move them in a separate folder with common acces for all samples and examples

Could it be under a similar folder as where the CSV reader is?

Don't think so, we have separate csv readers for oneDAL and DAAL and its pretty deep inside
cpp/oneapi/dal/io/csv and cpp/daal/include/data_management/data_source/... and could be complicated for user to find

@Vika-F
Copy link
Contributor

Vika-F commented Jan 8, 2026

I am Ok with the placement of all the data files in the ./data folder, but need to check that the BOMs generation is Ok.
Because the changes in the _release* catalogues structure might affect it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants