Skip to content

maximum of column with missing #50

Open
@sprmnt21

Description

@sprmnt21

Trying to follow some examples from the tutorial, I found different outputs than expected(as showed in the documentation).

julia> ds = Dataset(g = [2, 1, 1, 2, 2],
                                 x1_int = [0, 0, 1, missing, 2],
                                 x2_int = [3, 2, 1, 3, -2],
                                 x1_float = [1.2, missing, -1.0, 2.3, 10],
                                 x2_float = [missing, missing, 3.0, missing, missing],     
                                 x3_float = [missing, missing, -1.4, 3.0, -100.0])
5×6 Dataset
 Row │ g         x1_int    x2_int    x1_float   x2_float   x3_float
     │ identity  identity  identity  identity   identity   identity
     │ Int64?    Int64?    Int64?    Float64?   Float64?   Float64?
─────┼───────────────────────────────────────────────────────────────
   1 │        2         0         3        1.2  missing    missing
   2 │        1         0         2  missing    missing    missing
   3 │        1         1         1       -1.0        3.0       -1.4
   4 │        2   missing         3        2.3  missing          3.0
   5 │        2         2        -2       10.0  missing       -100.0

julia> groupby!(ds, 1)
5×6 Grouped Dataset with 2 groups
Grouped by: g
 Row │ g         x1_int    x2_int    x1_float   x2_float   x3_float  
     │ identity  identity  identity  identity   identity   identity  
     │ Int64?    Int64?    Int64?    Float64?   Float64?   Float64?  
─────┼───────────────────────────────────────────────────────────────
   1 │        1         0         2  missing    missing    missing   
   2 │        1         1         1       -1.0        3.0       -1.4
   3 │        2         0         3        1.2  missing    missing   
   4 │        2   missing         3        2.3  missing          3.0
   5 │        2         2        -2       10.0  missing       -100.0

julia> modify(ds, r"int" => x -> x .- maximum(x))
5×6 Grouped Dataset with 2 groups
Grouped by: g
 Row │ g         x1_int    x2_int    x1_float   x2_float   x3_float  
     │ identity  identity  identity  identity   identity   identity  
     │ Int64?    Int64?    Int64?    Float64?   Float64?   Float64?  
─────┼───────────────────────────────────────────────────────────────
   1 │        1        -1         0  missing    missing    missing   
   2 │        1         0        -1       -1.0        3.0       -1.4
   3 │        2   missing         0        1.2  missing    missing
   4 │        2   missing         0        2.3  missing          3.0
   5 │        2   missing        -5       10.0  missing       -100.0

julia> combine(ds, :x1_int => x -> maximum(x))
2×2 Dataset
 Row │ g         function_x1_int 
     │ identity  identity
     │ Int64?    Int64?
─────┼───────────────────────────
   1 │        1                1
   2 │        2          missing

The behavior does not appear to be closely associated with group by

 julia> ungroup!(ds)
5×6 Sorted Dataset
 Sorted by: g
 Row │ g         x1_int    x2_int    x1_float   x2_float   x3_float  
     │ identity  identity  identity  identity   identity   identity
     │ Int64?    Int64?    Int64?    Float64?   Float64?   Float64?
─────┼───────────────────────────────────────────────────────────────
   1 │        1         0         2  missing    missing    missing
   2 │        1         1         1       -1.0        3.0       -1.4
   3 │        2         0         3        1.2  missing    missing
   4 │        2   missing         3        2.3  missing          3.0
   5 │        2         2        -2       10.0  missing       -100.0

julia> combine(ds, :x1_int => x -> maximum(x))
1×1 Dataset
 Row │ function_x1_int 
     │ identity
     │ Int64?
─────┼─────────────────
   1 │         missing

My status

(v1.7) pkg> status
      Status `C:\Users\sprmn\.julia\v1.7\Project.toml`
  [8be319e6] Chain v0.4.10
  [35d6a980] ColorSchemes v3.17.1
  [5ae59095] Colors v0.12.8
  [f7bf1975] Impute v0.6.8
  [5c01b14b] InMemoryDatasets v0.6.10
  [8197267c] IntervalSets v0.6.0
  [c8e1da08] IterTools v1.4.0
  [08abe8d2] PrettyTables v1.3.1
  [2913bbd2] StatsBase v0.33.16
  [bd369af6] Tables v1.7.0

julia> versioninfo()
Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, tigerlake)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS =

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions