Skip to content

Add RStudio addin to compare development vs installed package file sizes (directory or tar.gz), including totals report#6

Draft
Copilot wants to merge 29 commits intomainfrom
copilot/create-addin-to-compare-package-sizes
Draft

Add RStudio addin to compare development vs installed package file sizes (directory or tar.gz), including totals report#6
Copilot wants to merge 29 commits intomainfrom
copilot/create-addin-to-compare-package-sizes

Conversation

Copy link
Copy Markdown

Copilot AI commented Apr 21, 2026

admiral is nearing CRAN package size limits, and repeatedly comparing dev vs installed package contents was manual and error-prone. This PR adds an addin workflow that generates consistent file-level size reports for both versions.

  • New addin: package size comparison

    • Added run_compare_package_sizes() as an RStudio addin entry point.
    • Captures:
      • development package path
      • installed package path
      • output directory (defaults to ~)
    • Inputs can now be either:
      • package directories, or
      • source package .tar.gz files
    • If a provided directory contains exactly one .tar.gz (and is not a package root), it is automatically resolved to that tarball for comparison.
    • Registered in inst/rstudio/addins.dcf as Compare package sizes.
  • Report generation

    • Added internal helper compare_package_sizes() to generate three CSV outputs in the user-selected directory:
      • File size inventory: all files from dev + installed versions, sorted descending by size, including root/relative/full paths.
      • Cross-version comparison: joined by relative path with:
        • filename
        • dev path / installed path
        • file sizes (bytes and KB)
        • size diffs (bytes and KB)
        • existence flags (dev_exists, installed_exists) for files present in only one version.
      • Totals report with explicit input-vs-content size metrics:
        • Input artifact metadata:
          • input_type (tar.gz or directory; difference for diff row)
          • input_size_bytes
          • input_size_kb
          • input_size_mb
        • Summed package content totals (from compared files):
          • total_size_bytes
          • total_size_kb
          • total_size_mb
        • Includes development, installed, and diff_development_minus_installed rows.
    • For .tar.gz inputs, archives are unpacked to temporary directories and compared by internal file paths (not archive file size only), with normalization to avoid versioned top-level folder mismatches.
  • Package surface/docs updates

    • Exported addin function in NAMESPACE.
    • Added man/run_compare_package_sizes.Rd.
    • Updated README scope to include the new addin.
# internal report shape (comparison report)
data.frame(
  filename,
  relative_path,
  file_path_dev,
  file_path_installed,
  dev_exists,
  installed_exists,
  file_size_bytes_dev,
  file_size_kb_dev,
  file_size_bytes_installed,
  file_size_kb_installed,
  size_diff_bytes,
  size_diff_kb
)

Copilot AI linked an issue Apr 21, 2026 that may be closed by this pull request
6 tasks
Copilot AI and others added 8 commits April 21, 2026 18:31
Copilot AI changed the title [WIP] Add addin to compare package sizes between development and CRAN Add RStudio addin to compare development vs installed package file sizes Apr 21, 2026
Copilot AI requested a review from bms63 April 21, 2026 18:38
@bms63
Copy link
Copy Markdown

bms63 commented Apr 21, 2026

@copilot

I want to supply the tar.gz for comparions. The tool should be able to unpback these tar.gz and compare the files

"package_version","root_path","relative_path","file_path","file_size_bytes"
"development","/cloud/project/dev","admiral_1.4.1.9035.tar.gz","/cloud/project/dev/admiral_1.4.1.9035.tar.gz",1506659
"installed","/cloud/project/prod","admiral_1.4.1.tar.gz","/cloud/project/prod/admiral_1.4.1.tar.gz",1474502

currently it looks like the tool is just comparing the total size of the package

Copilot AI and others added 11 commits April 21, 2026 19:32
Copy link
Copy Markdown
Author

Copilot AI commented Apr 21, 2026

@copilot

I want to supply the tar.gz for comparions. The tool should be able to unpback these tar.gz and compare the files
...

Implemented in commit 14ee662.

The addin now accepts either directories or .tar.gz files for both dev and installed inputs. For tarballs, it unpacks to temp folders and compares internal file contents (not just the archive file size), normalizing paths so versioned top-level archive folders do not block file-to-file comparison.

No UI changes were made in this update.

Copy link
Copy Markdown
Author

Copilot AI commented Apr 21, 2026

@copilot I am only getting this data from the addin

"filename","relative_path","file_path_dev","file_path_installed","dev_exists","installed_exists","file_size_bytes_dev","file_size_kb_dev","file_siz...

Implemented in commit 1af5fff (core fix in fb364c9).

The addin now handles the case where a directory is provided that contains a single .tar.gz: it resolves that tarball and compares extracted internal files, so the output is the full per-file comparison report (e.g., .R, .Rd, .html) rather than only tarball rows.

No UI changes were made in this update.

Copilot AI requested a review from bms63 April 21, 2026 21:05
@bms63
Copy link
Copy Markdown

bms63 commented Apr 21, 2026

image image image

@bundfussr, @manciniedoardo @adcascone, @jeffreyad @millerg23 was messing around with this old pkg to maybe offer a way to quickly compare the package sizes. installed is 1.4.1 and dev is whatever is in main right now.

looks promising - report is just two sad looking csv files, but we could spruce up if wanting

@bms63
Copy link
Copy Markdown

bms63 commented Apr 21, 2026

just thinking as we are doing this - be really nice to have a way to quickly do the compare, get a report to look at and throw into PR for reducing size

@manciniedoardo
Copy link
Copy Markdown

just thinking as we are doing this - be really nice to have a way to quickly do the compare, get a report to look at and throw into PR for reducing size

Thinking out loud here, but if this is likely to be a concern for the forseeable, then it could be an idea to have an action that creates this report for every PR, just so we keep on top of things.

@bms63
Copy link
Copy Markdown

bms63 commented Apr 22, 2026

just thinking as we are doing this - be really nice to have a way to quickly do the compare, get a report to look at and throw into PR for reducing size

Thinking out loud here, but if this is likely to be a concern for the forseeable, then it could be an idea to have an action that creates this report for every PR, just so we keep on top of things.

I am really hesitant on another action. maybe something to put in the checklist - to use the addin if you created a new vignette, new datasets, a lot of new files, etc?

but maybe we could this working and get happy with the report first and then think on the action?

@bundfussr
Copy link
Copy Markdown

Could we replace the installed_package_path argument with a ref_version argument? I think this would make the usage easier and clearer. Otherwise, the user needs to install the reference version manually.

@manciniedoardo
Copy link
Copy Markdown

but maybe we could this working and get happy with the report first and then think on the action?

Yes, agree

@bms63
Copy link
Copy Markdown

bms63 commented Apr 22, 2026

Could we replace the installed_package_path argument with a ref_version argument? I think this would make the usage easier and clearer. Otherwise, the user needs to install the reference version manually.

so i was downloading tar.gz from cran 1.4.1 and building the dev pkg locally and storing in these folders.

image

@bms63
Copy link
Copy Markdown

bms63 commented Apr 23, 2026

@copilot can you also create a third csv file with the total size of the two packages being compared

Copy link
Copy Markdown
Author

Copilot AI commented Apr 23, 2026

@copilot can you also create a third csv file with the total size of the two packages being compared

Implemented in commit b4ed561 (feature added in 881ae20).

A third CSV is now generated: package_size_totals_<timestamp>.csv, containing total size for development and installed packages plus a diff_development_minus_installed row.

Copilot AI changed the title Add RStudio addin to compare development vs installed package file sizes (directory or tar.gz) Add RStudio addin to compare development vs installed package file sizes (directory or tar.gz), including totals report Apr 23, 2026
@bms63
Copy link
Copy Markdown

bms63 commented Apr 23, 2026

@copilot can you also provide a column for total MB in the total csv

Copy link
Copy Markdown
Author

Copilot AI commented Apr 23, 2026

@copilot can you also provide a column for total MB in the total csv

Implemented in commit 170c77e.

The totals CSV now includes total_size_mb (rounded to 2 decimals) for development, installed, and diff_development_minus_installed rows.

@bms63
Copy link
Copy Markdown

bms63 commented Apr 23, 2026

@copilot

package_version
total_size_bytes
total_size_kb
total_size_mb
1
development
6746832
6588.70
6.43
2
installed
6739130
6581.18
6.43
3
diff_development_minus_installed
7702
7.52
0.01

I feel like some of these calculations are off as I believe the current size of the development version of the package is 5 MB, but this says it is 6.43

Copy link
Copy Markdown
Author

Copilot AI commented Apr 23, 2026

@copilot

...

Implemented in commit 1cb0d6a.

You’re right — this was a compressed-vs-unpacked interpretation issue for .tar.gz inputs. I updated the totals CSV to include:

  • input_type
  • input_size_bytes, input_size_kb, input_size_mb (actual input size; for tar.gz this is the compressed archive size)

The existing total_size_* columns remain as summed unpacked file-content size used for per-file comparison.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Compare pacakge sizes

4 participants