Skip to content

Getting segmentation_fault on training with viewer #26

@cduguet

Description

@cduguet

Hello,
I'm running a remote ec2 instance, with a remote desktop client called Nice DCV (a competitor to VNC for enterprise, free for ec2). 24GB VRAM and 64GB RAM.

I can train without a viewer with no problems. However, when I try to run it with a viewer, I get segmentation_fault. The app window opens and nothing gets to load before it crashes.

I have tried both experimental and normal docker builds (I have only tried docker). I have tried checking out multiple versions of the repo (783c41f and e72ae5b), to see if the problem was recently introduced. Nothing has worked so far. The problem I get looks like this:

/workspace/permuto_sdf$ ./permuto_sdf_py/train_permuto_sdf.py --dataset dtu --scene dtu_scan24 --comp_name comp_3 --exp_info default 
args.with_mask False
args.low_res False
checkpoint_path /workspace/permuto_sdf/checkpoints
with_viewer True
has_apex True
[    D96CB740]DataLoaderDTU.cxx:173      1| loaded nr of scenes 1 for mode train
[    D96CB740]DataLoaderDTU.cxx:432      1| reading poses and intrinsics for scene "dtu_scan24"
[    D96CB740]DataLoaderDTU.cxx:173      1| loaded nr of scenes 1 for mode test
[    D96CB740]DataLoaderDTU.cxx:432      1| reading poses and intrinsics for scene "dtu_scan24"
[    D96CB740]    Mesh.cxx:3390     1| read obj with path /workspace/easy_pbr/data/sphere.obj
Segmentation fault (core dumped)

In contrast, when I train without a viewer, it looks like this:

/workspace/permuto_sdf$ ./permuto_sdf_py/train_permuto_sdf.py --dataset dtu --scene dtu_scan24 --comp_name comp_3 --exp_info default --no_viewer
args.with_mask False
args.low_res False
checkpoint_path /workspace/permuto_sdf/checkpoints
with_viewer False
has_apex True
[    2A5FF740]DataLoaderDTU.cxx:173      1| loaded nr of scenes 1 for mode train
[    2A5FF740]DataLoaderDTU.cxx:432      1| reading poses and intrinsics for scene "dtu_scan24"
[    2A5FF740]DataLoaderDTU.cxx:173      1| loaded nr of scenes 1 for mode test
[    2A5FF740]DataLoaderDTU.cxx:432      1| reading poses and intrinsics for scene "dtu_scan24"
phase.iter_nr 1000 loss  1.3530950546264648
phase.iter_nr 2000 loss  0.15609805285930634
phase.iter_nr 3000 loss  0.10311679542064667
...

How should I best troubleshoot this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions