Skip to content

Out of Memory Issue With Segmented Output #37

@Z-Richard

Description

@Z-Richard

Example inference_config.yaml:

experiment_dir: /workspace_hdd/haoyu/ACE2-inference-parent/ACE2-inference-2020_2025_init_feb01_segmented_output
n_forward_steps: 146
forward_steps_in_memory: 50
checkpoint_path: /workspace_hdd/haoyu/ACE2-ERA5/ace2_era5_ckpt.tar
logging:
  log_to_screen: true
  log_to_wandb: false
  log_to_file: true
  project: ace
initial_condition:
  path: /workspace_hdd/haoyu/ACE2-ERA5/initial_conditions/ic_2020.nc
  start_indices:
    times:
      - "2020-02-01T00:00:00"
forcing_loader:
  dataset:
    data_path: /workspace_hdd/haoyu/ACE2-ERA5/forcing_data
  num_data_workers: 4
data_writer:
  save_prediction_files: true
  save_monthly_files: false
  names: ["PRATEsfc"]

Command: nohup python -m fme.ace.inference inference_config.yaml --segments 51 >& ../log_files/log.ace.inference.2020.2025.txt &

With RTX A5880 Ada 48 GB RAM, I found that I will run into out-of-memory issues by segment 13. I do not encounter such an issue with n_forward_steps=7543 when I run all inference steps at once.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions