Skip to content

Conversation

@JonasIsensee
Copy link
Collaborator

@JonasIsensee JonasIsensee commented Sep 24, 2025

Add Soft Links and External Links Support

This PR implements soft links and external links in JLD2, bringing enhanced HDF5 compatibility and enabling cross-file
dataset references.

Changes

Core Link System

  • Introduced unified Link type supporting three link modes: hard links (direct offsets), soft links (path-based references
    within the same file), and external links (references to datasets in other files)
  • Added type predicates is_hard_link(), is_soft_link(), and is_external_link() for zero-cost link type checking
  • Implemented getoffset() function to transparently resolve links, with recursive soft link resolution

API

  • New user-facing constructor: JLD2.Link(path; file=nothing) for creating soft and external links
  • Link is not exported because it is a very generic name and not many people will be using this.
  • Links are resolved on access, making them work seamlessly with existing getindex operations

Internal Changes

  • Updated Group to store Link objects instead of raw RelOffset values in both unwritten_links and written_links
  • Modified link message parsing to handle HDF5 link types (0=hard, 1=soft, 64=external)
  • Enhanced pathize() and haskey() to properly handle all link types during path traversal
  • Updated group display to show appropriate icons for different link types

Documentation

  • Added docs/src/external_links.md with usage examples and link type descriptions

Usage Example

  jldopen("file.jld2", "w") do f
      f["data"] = [1, 2, 3, 4, 5]
      f["alias"] = JLD2.Link("/data")                        # Soft link
      f["remote"] = JLD2.Link("/dataset"; file="other.jld2") # External link
  end

@codecov
Copy link

codecov bot commented Sep 24, 2025

Codecov Report

❌ Patch coverage is 89.03226% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.53%. Comparing base (f8fefb6) to head (8f6e5c0).

Files with missing lines Patch % Lines
src/groups.jl 87.50% 11 Missing ⚠️
src/object_headers.jl 0.00% 4 Missing ⚠️
src/explicit_datasets.jl 95.45% 1 Missing ⚠️
src/links.jl 96.96% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #686      +/-   ##
==========================================
+ Coverage   87.40%   87.53%   +0.13%     
==========================================
  Files          37       38       +1     
  Lines        4588     4676      +88     
==========================================
+ Hits         4010     4093      +83     
- Misses        578      583       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@nhz2
Copy link
Member

nhz2 commented Oct 3, 2025

This PR is quite big, and I think it is important to review this code very carefully since it is AI-generated.

Is it possible to get the AI to split the PR up into smaller pieces?

For example, can soft-links and external-links be added separately?

Caching is, in general, really difficult to get right. Can this be removed from the basic feature PRs and added afterwards as a performance optimization?

@JonasIsensee
Copy link
Collaborator Author

Hi @nhz2 ,

Yeah, no worries.
I have no intention of merging it like this.
The AI code is way too verbose for my liking.

I like the fact that i got a working implementation without that much effort on my side.

It allows us to add regression tests and then improve the code from there.

I agree that the caching logic is probably BS and should be removed.

@nhz2
Copy link
Member

nhz2 commented Oct 3, 2025

Yes, it's also very cool as a proof of concept to know this and chunks can be added without making breaking changes.

@JonasIsensee
Copy link
Collaborator Author

@nhz2 I've done another major overhaul of this PR. This improved performance and brought down the number of additional code lines introduced here.

Splitting the PR into separate bits does not really seem useful to me at this point.
The code specific to either external or soft links is rather minimal. Most of the changes are the plumbing - formerly, RelOffset was being passed around everywhere and now it is Link instead.
There is now a single Link struct which can represent hard links ( that is the default which was the only available link from before , represented as an offset in the file) , and soft links and hard links.

Joining these in a single struct makes sense, because there will never be more than these three types and it allows for type stability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants