Skip to content

shape模块基于训练ckpt推理失败 #161

@linweijiang

Description

@linweijiang

在shape训练完后,用下面代码推理ckpt文件时有异常。用项目里提供的最小训练数据集自己的数据集 训练生成的ckpt文件进行推理,均出现异常。

(1)训练配置:
基于 hunyuandit-mini-overfitting-flowmatching-dinol518-bf16-lr1e4-4096.yaml 配置进行的训练,配置参数无变动。

(2)推理代码,基于minimal_demo_with_ckpt.py 文件做的修改 :

from PIL import Image
from hy3dshape.rembg import BackgroundRemover
from hy3dshape.pipelines import Hunyuan3DDiTFlowMatchingPipeline

model_path = '/data/weijiang/model/Hunyuan3D-2.1'
pipeline_shapegen = Hunyuan3DDiTFlowMatchingPipeline.from_pretrained(model_path)

import torch
import yaml
from hy3dshape.utils import instantiate_from_config

ckpt_cfg_path = 'output_folder5/dit/overfitting_depth_16_token_4096_lr1e4/hunyuandit-mini-overfitting-flowmatching-dinol518-bf16-lr1e4-4096.yaml'
ckpt_path = 'output_folder5/dit/overfitting_depth_16_token_4096_lr1e4/ckpt/ckpt-step=00275000.ckpt'

config = yaml.safe_load(open(ckpt_cfg_path, 'r'))
model = instantiate_from_config(config['model']['params']['denoiser_cfg'])
ckpt = torch.load(ckpt_path)
sd = ckpt['state_dict']
sd = {k: v for k, v in sd.items() if not k.startswith('cond_stage')}
sd = {k: v for k, v in sd.items() if not k.startswith('first_stage')}
sd = {k.replace('model.', ''):v for k,v in sd.items()}
msg = model.load_state_dict(sd)

print(msg)
model = model.cuda().half()

pipeline_shapegen.model = model

image = 'demos/demo.png'

# image = Image.open(image_path).convert("RGBA")
# if image.mode == 'RGB':
#     rembg = BackgroundRemover()
#     image = rembg(image)

# mesh = pipeline_shapegen(image=image, guidance_scale=1.0)[0]
mesh = pipeline_shapegen(image=image)[0]
mesh.export('demo-ckpt.glb')

(3)推理异常信息:

Traceback (most recent call last):
  File "/data/weijiang/project/Hunyuan3D-2.1/hy3dshape/hy3dshape/models/autoencoders/surface_extractors.py", line 88, in __call__
    vertices, faces = self.run(grid_logits[i], **kwargs)
  File "/data/weijiang/project/Hunyuan3D-2.1/hy3dshape/hy3dshape/models/autoencoders/surface_extractors.py", line 119, in run
    vertices, faces, normals, _ = measure.marching_cubes(grid_logit.cpu().numpy(),
  File "/data/weijiang/miniforge3/envs/hy3d-2_1/lib/python3.10/site-packages/skimage/measure/_marching_cubes_lewiner.py", line 139, in marching_cubes
    return _marching_cubes_lewiner(
  File "/data/weijiang/miniforge3/envs/hy3d-2_1/lib/python3.10/site-packages/skimage/measure/_marching_cubes_lewiner.py", line 206, in _marching_cubes_lewiner
    raise RuntimeError('No surface found at the given iso value.')
RuntimeError: No surface found at the given iso value.

(4)查看训练日志,训练中间验证时,也会出现该异常,异常日志如下:

MVolume Decoding:  98%|█████████▊| 6965/7134 [00:11<00:00, 597.83it/s]^[[A^[[A

^MVolume Decoding:  98%|█████████▊| 7025/7134 [00:11<00:00, 597.79it/s]^[[A^[[A

^MVolume Decoding:  99%|█████████▉| 7085/7134 [00:11<00:00, 597.86it/s]^[[A^[[A^MVolume Decoding: 100%|██████████| 7134/7134 [00:11<00:00, 598.25it/s]
Traceback (most recent call last):
  File "/data/weijiang/project/Hunyuan3D-2.1/hy3dshape/hy3dshape/models/autoencoders/surface_extractors.py", line 88, in __call__
    vertices, faces = self.run(grid_logits[i], **kwargs)
  File "/data/weijiang/project/Hunyuan3D-2.1/hy3dshape/hy3dshape/models/autoencoders/surface_extractors.py", line 119, in run
    grid_data = grid_logit.cpu().numpy()
  File "/data/weijiang/miniforge3/envs/hy3d-2_1/lib/python3.10/site-packages/skimage/measure/_marching_cubes_lewiner.py", line 139, in marching_cubes
    return _marching_cubes_lewiner(
  File "/data/weijiang/miniforge3/envs/hy3d-2_1/lib/python3.10/site-packages/skimage/measure/_marching_cubes_lewiner.py", line 206, in _marching_cubes_lewiner
    raise RuntimeError('No surface found at the given iso value.')
RuntimeError: No surface found at the given iso value.

^MValidation DataLoader 0:   6%|▋         | 1/16 [00:27<06:51, 27.47s/it]^[[A^MEpoch 0: : 257025it [101:52:16,  1.43s/it, loss=1.54, v_num=0, train/simple=1.460, trr
ain/total_loss=1.460, train/lr_abs=0.0001, val/simple=1.510, val/total_loss=1.510, val/lr_abs=0.0001]
^MValidation DataLoader 0:  12%|█▎        | 2/16 [00:28<03:18, 14.17s/it]^[[A^MEpoch 0: : 257026it [101:52:17,  1.43s/it, loss=1.54, v_num=0, train/simple=1.460, trr
ain/total_loss=1.460, train/lr_abs=0.0001, val/simple=1.510, val/total_loss=1.510, val/lr_abs=0.0001]
^MValidation DataLoader 0:  19%|█▉        | 3/16 [00:29<02:06,  9.74s/it]^[[A^MEpoch 0: : 257027it [101:52:18,  1.43s/it, loss=1.54, v_num=0, train/simple=1.460, trr

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions