Releases · linea-it/hipscatalog_gen

13 Feb 22:03

luigilcsilva

v0.3.0

e7d0b52

Release 0.3.0 Latest

Latest

Fix densmap scalability for large catalogs by replacing dense per-partition aggregation with sparse histogram reduction in a bounded fan-in tree, preventing oversized gather tasks at high depths.
Compute only the finest densmap from source data and derive lower orders by exact NESTED parent-child aggregation (4 children -> 1 parent), reducing repeated catalog scans; keep per-depth progress logs (Computing/Derived/Wrote densmap_o*.fits).
Optimize score_density_hybrid stage-1 per-tile top-k with an exact two-stage strategy (local prune + global reduce), reducing shuffle volume and improving runtime on large catalogs.
Add score_density_hybrid.density_up_to_depth (default 4) to control how far stage-1 density selection runs before switching to score-based stage-2.
Update output TSV column ordering semantics for columns.keep: preserve original input order when omitted/null; honor explicit keep order when complete; otherwise prepend missing required columns (with RA/DEC first when absent); and keep RA/DEC first when keep=[].
Make stage-2 depth writing (no Allsky) streaming-based with bucketed temporary fragments, avoiding depth_ddf.compute() materialization on the driver and reducing distributed-filesystem metadata pressure.
Run stage-2 bucket processing on distributed workers (Client.submit) so compute/IO stay on workers and the driver remains orchestration-only.
Require an active dask.distributed client for streamed stage-2 writes; fail fast when absent instead of silently degrading to local execution.
Auto-tune merge fan-in per worker task using RLIMIT_NOFILE and worker concurrency, and bound fan-in rounds to prevent EMFILE (Too many open files) during high-depth bucket merges.
Keep stage-2 k-way merge on a single bounded fan-in safety path, simplifying behavior while preserving robustness under high fragment fan-out.
Reuse selection-stage per-depth write stats for final output counts (telemetry/properties) and remove slow full-TSV recount fallback; pipeline now fails fast if required intermediate stats are missing/invalid.
Add startup observability logs for cluster runtime (local/SLURM resources + directives) and stage-2 streaming execution (worker count, bucket count, fan-in reduction summary).
Fix distributed compatibility warning by reading worker concurrency from Worker.state.nthreads (with fallback for older versions), avoiding FutureWarning on new distributed.
Remove pandas FutureWarning in local top-k pruning by avoiding partition-level DataFrameGroupBy.apply.
Detailed run benchmarks for these optimizations are tracked in:
- benchmarks/records/2026-02-10_des_dr2_score_density_hybrid_topk_two_stage.md
- benchmarks/records/2026-02-10_des_dr2_densmaps_finest_derive.md
- benchmarks/records/2026-02-12_des_dr2_score_density_hybrid_dask_workers_fanin.md

Assets 2

06 Feb 21:28

luigilcsilva

v0.2.0

bcf68f3

Release 0.2.0

Merge Dependabot updates (GitHub workflows and dependency version limits).
Add score/magnitude column details to process.log.
Add output row count to process.log.
Improve generated properties file content.
Improve generated arguments file content.
Add index.html preview file generation in outputs.
Fix MOC order generation bug (Moc.fits/Moc.json) for compatibility with current mocpy signatures.

Assets 2

07 Jan 15:41

luigilcsilva

v0.1.1

8271841

Release 0.1.1

Fix score_density_hybrid stage-1 de-duplication for LSDB catalogs by deriving unique IDs from pixel metadata and partition context.
Add tests for unique ID generation in Dask and LSDB paths.
Pin sphinx-rtd-theme>=3.0,<4 to avoid Sphinx 7+ theme incompatibility; update docs for mag_global hist_peak clipping.

Assets 2

30 Dec 22:58

luigilcsilva

v0.1.0

c9e3d00

Release 0.1.0

First publishable release of hipscatalog-gen.
Three selection modes: mag_global, score_global, score_density_hybrid, each with normalize/prepare/run stages via a mode registry.
Structured pipeline with immutable context, per-stage telemetry (telemetry.json), and optional JSON logs (process.jsonl).
CLI: --config to run, plus --list-modes, --check-config, --telemetry (summary of telemetry.json), and --json-logs.
Outputs: HiPS tiles/Allsky, density maps, MOC, metadata, logs, and consolidated counts in telemetry.json (no separate input/output counts files).
Config validation (common + per-mode), schema for telemetry bundled in the package.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Releases: linea-it/hipscatalog_gen

Release 0.3.0

Uh oh!

Release 0.2.0

Uh oh!

Release 0.1.1

Uh oh!

Release 0.1.0

Uh oh!