Release Release 0.3.0 · linea-it/hipscatalog_gen

Fix densmap scalability for large catalogs by replacing dense per-partition aggregation with sparse histogram reduction in a bounded fan-in tree, preventing oversized gather tasks at high depths.
Compute only the finest densmap from source data and derive lower orders by exact NESTED parent-child aggregation (4 children -> 1 parent), reducing repeated catalog scans; keep per-depth progress logs (Computing/Derived/Wrote densmap_o*.fits).
Optimize score_density_hybrid stage-1 per-tile top-k with an exact two-stage strategy (local prune + global reduce), reducing shuffle volume and improving runtime on large catalogs.
Add score_density_hybrid.density_up_to_depth (default 4) to control how far stage-1 density selection runs before switching to score-based stage-2.
Update output TSV column ordering semantics for columns.keep: preserve original input order when omitted/null; honor explicit keep order when complete; otherwise prepend missing required columns (with RA/DEC first when absent); and keep RA/DEC first when keep=[].
Make stage-2 depth writing (no Allsky) streaming-based with bucketed temporary fragments, avoiding depth_ddf.compute() materialization on the driver and reducing distributed-filesystem metadata pressure.
Run stage-2 bucket processing on distributed workers (Client.submit) so compute/IO stay on workers and the driver remains orchestration-only.
Require an active dask.distributed client for streamed stage-2 writes; fail fast when absent instead of silently degrading to local execution.
Auto-tune merge fan-in per worker task using RLIMIT_NOFILE and worker concurrency, and bound fan-in rounds to prevent EMFILE (Too many open files) during high-depth bucket merges.
Keep stage-2 k-way merge on a single bounded fan-in safety path, simplifying behavior while preserving robustness under high fragment fan-out.
Reuse selection-stage per-depth write stats for final output counts (telemetry/properties) and remove slow full-TSV recount fallback; pipeline now fails fast if required intermediate stats are missing/invalid.
Add startup observability logs for cluster runtime (local/SLURM resources + directives) and stage-2 streaming execution (worker count, bucket count, fan-in reduction summary).
Fix distributed compatibility warning by reading worker concurrency from Worker.state.nthreads (with fallback for older versions), avoiding FutureWarning on new distributed.
Remove pandas FutureWarning in local top-k pruning by avoiding partition-level DataFrameGroupBy.apply.
Detailed run benchmarks for these optimizations are tracked in:
- benchmarks/records/2026-02-10_des_dr2_score_density_hybrid_topk_two_stage.md
- benchmarks/records/2026-02-10_des_dr2_densmaps_finest_derive.md
- benchmarks/records/2026-02-12_des_dr2_score_density_hybrid_dask_workers_fanin.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 0.3.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!