Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(file-mode-api: add filename extractor component #453

Conversation

aldogonzalez8
Copy link
Contributor

@aldogonzalez8 aldogonzalez8 self-assigned this Mar 31, 2025
@github-actions github-actions bot added the enhancement New feature or request label Mar 31, 2025
@aldogonzalez8 aldogonzalez8 changed the title feat(file-mode-api: aa filename extractor component feat(file-mode-api: add filename extractor component Mar 31, 2025
Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a concern about file naming. Can you add more context to this? I would like to make sure we avoid collisions


relative_path = self._filename_extractor.eval(self.config, record=record)
relative_path = relative_path.lstrip("/")
file_relative_path = Path(relative_path)

full_path = files_directory / file_relative_path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have the stream name somewhere in there? It feels like multiple streams could have a file with the same name.

Even more than this, should we have a unique ID per file? It feels like there could even be two files in the same stream with the same name...

Copy link
Contributor Author

@aldogonzalez8 aldogonzalez8 Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well , for Zendesk support, we do actually, e.g.:

filename_extractor: "{{ record.relative_path }}/{{ record.file_name }}/"

Interpolates as:

hc/article_attachments/"attachments_id"/"name _of_the_file.extension"

This works for this specific endpoint in Zendesk, but I can see it is not guaranteed for every connector in the future. So, I guess we can let the user add any extra path but make the component prefix to the path the stream and the attachment/file ID.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so what we are saying is that it is the developer's responsibility to make sure there are no clash. Could we remove this concern from the developer's and do it ourselves?

Regarding timing: I'm not 100% sure we need this right now and maybe we can make filename_extractor optional in the future when we find a solution this this. On the top of my head, I can only see one way and it is when the stream declares a PK which seems to be common when I checked for Confluence, Jira and Salesforce so maybe this is viable in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so what we are saying is that it is the developer's responsibility to make sure there are no clash. Could we remove this concern from the developer's and do it ourselves?

No, I didn't make myself clear. I'm sorry about that. To reduce the risk of collisions, I will add the stream name + unique ID on the backend (CDK).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the logic for the unique ID? Autogenerated UUID?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can:

  • Add ourselves stream name to the path reducing collision risk
  • Make filename_extractor optional so the developer can include a unique ID. There is a risk that he could mess up, but we can add some documentation to the component.
  • Use Autogenerated UUID if filename_extractor is not present.

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepting under the premise that we are fine that the connector developer ensure no file collisions for now

Base automatically changed from aldogonzalez8/move-file-uploader-to-record-selector to aldogonzalez8/poc-emit-file-reference-record March 31, 2025 16:54
@aldogonzalez8
Copy link
Contributor Author

Accepting under the premise that we are fine that the connector developer ensure no file collisions for now

@maxi297 This is still true with slight modifications that we can refine in the future:

  • We added the stream name to the path, reducing collision risk.
  • Make filename_extractor optional so the developer can include a unique ID. There is a risk that dev could mess up the interpolation, but we added documentation to the component.
  • Use Autogenerated UUID if filename_extractor is not present.

@aldogonzalez8 aldogonzalez8 merged commit 68480b7 into aldogonzalez8/poc-emit-file-reference-record Mar 31, 2025
8 of 22 checks passed
@aldogonzalez8 aldogonzalez8 deleted the aldogonzalez8/add-filename-extractor-component branch March 31, 2025 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants