Skip to content

Conversation

@Dmovic
Copy link

@Dmovic Dmovic commented May 24, 2024

related issues:

closed #10523
closed #10522
closed #10521

closed #10519
closed #10518

closed #10512
closed #10511
closed #10510
closed #10509
closed #10508
closed #10507
closed #10506
closed #10505
closed #10504
closed #10503
closed #10502
closed #10501
closed #10500

@Dmovic Dmovic requested a review from mosout May 24, 2024 02:43
@CLAassistant
Copy link

CLAassistant commented May 24, 2024

CLA assistant check
All committers have signed the CLA.

@github-actions
Copy link
Contributor

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

@Dmovic Dmovic requested a review from hjchen2 as a code owner May 28, 2024 08:15
@github-actions
Copy link
Contributor

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

@Dmovic Dmovic requested review from hjchen2 and removed request for hjchen2 May 28, 2024 08:16
@github-actions
Copy link
Contributor

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

@github-actions
Copy link
Contributor

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

@Oneflow-Inc Oneflow-Inc deleted a comment from github-actions bot May 29, 2024
@Dmovic Dmovic force-pushed the support_input_check branch from 414a637 to b0881a8 Compare May 29, 2024 08:58
@github-actions
Copy link
Contributor

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

@github-actions
Copy link
Contributor

@github-actions
Copy link
Contributor

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: NVIDIA GeForce RTX 3080 Ti 

❌ OneFlow resnet50 time: 43.3ms (= 4328.7ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 57.3ms (= 5731.2ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.32 (= 57.3ms / 43.3ms)

OneFlow resnet50 time: 26.5ms (= 2651.8ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 37.6ms (= 3755.2ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.42 (= 37.6ms / 26.5ms)

OneFlow resnet50 time: 18.7ms (= 3735.0ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 34.7ms (= 6949.2ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.86 (= 34.7ms / 18.7ms)

OneFlow resnet50 time: 17.4ms (= 3474.1ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 30.3ms (= 6050.0ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.74 (= 30.3ms / 17.4ms)

OneFlow resnet50 time: 16.7ms (= 3334.1ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 32.2ms (= 6437.5ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.93 (= 32.2ms / 16.7ms)

OneFlow swin dataloader time: 0.200s (= 39.979s / 200, num_workers=1)
PyTorch swin dataloader time: 0.128s (= 25.612s / 200, num_workers=1)
Relative speed: 0.641 (= 0.128s / 0.200s)

OneFlow swin dataloader time: 0.056s (= 11.218s / 200, num_workers=4)
PyTorch swin dataloader time: 0.032s (= 6.494s / 200, num_workers=4)
Relative speed: 0.579 (= 0.032s / 0.056s)

OneFlow swin dataloader time: 0.031s (= 6.238s / 200, num_workers=8)
PyTorch swin dataloader time: 0.017s (= 3.338s / 200, num_workers=8)
Relative speed: 0.535 (= 0.017s / 0.031s)

❌ OneFlow resnet50 time: 49.3ms (= 4928.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 63.9ms (= 6389.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.30 (= 63.9ms / 49.3ms)

OneFlow resnet50 time: 37.3ms (= 3734.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 46.2ms (= 4622.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.24 (= 46.2ms / 37.3ms)

OneFlow resnet50 time: 27.7ms (= 5539.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 39.7ms (= 7941.4ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.43 (= 39.7ms / 27.7ms)

OneFlow resnet50 time: 25.2ms (= 5030.7ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 38.5ms (= 7699.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.53 (= 38.5ms / 25.2ms)

OneFlow resnet50 time: 25.2ms (= 5034.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 36.2ms (= 7239.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.44 (= 36.2ms / 25.2ms)

const std::shared_ptr<Tensor>& x_cast);
Maybe<void> CheckInplaceShapeCanExpandTo(const Shape& shape, const Shape& expand_shape);

inline Maybe<void> CheckSizeNonNegative(const Shape& shape) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个名字改成CheckShapeNonNegative

self.padding = padding

def forward(self, x, indices, output_size=None):
kernel_size = _single(self.kernel_size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

重复逻辑封装成函数

@github-actions
Copy link
Contributor

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

@github-actions
Copy link
Contributor

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: NVIDIA GeForce RTX 3080 Ti 

❌ OneFlow resnet50 time: 43.3ms (= 4327.3ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 57.2ms (= 5722.0ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.32 (= 57.2ms / 43.3ms)

OneFlow resnet50 time: 26.1ms (= 2612.9ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 38.0ms (= 3795.8ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.45 (= 38.0ms / 26.1ms)

OneFlow resnet50 time: 18.3ms (= 3658.2ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 35.1ms (= 7019.6ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.92 (= 35.1ms / 18.3ms)

OneFlow resnet50 time: 17.2ms (= 3447.0ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 30.5ms (= 6092.3ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.77 (= 30.5ms / 17.2ms)

OneFlow resnet50 time: 17.0ms (= 3396.4ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 29.6ms (= 5910.8ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.74 (= 29.6ms / 17.0ms)

OneFlow swin dataloader time: 0.202s (= 40.388s / 200, num_workers=1)
PyTorch swin dataloader time: 0.130s (= 25.952s / 200, num_workers=1)
Relative speed: 0.643 (= 0.130s / 0.202s)

OneFlow swin dataloader time: 0.054s (= 10.861s / 200, num_workers=4)
PyTorch swin dataloader time: 0.032s (= 6.453s / 200, num_workers=4)
Relative speed: 0.594 (= 0.032s / 0.054s)

OneFlow swin dataloader time: 0.030s (= 6.054s / 200, num_workers=8)
PyTorch swin dataloader time: 0.016s (= 3.269s / 200, num_workers=8)
Relative speed: 0.540 (= 0.016s / 0.030s)

❌ OneFlow resnet50 time: 49.3ms (= 4927.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 64.5ms (= 6452.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.31 (= 64.5ms / 49.3ms)

OneFlow resnet50 time: 36.5ms (= 3648.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 46.8ms (= 4675.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.28 (= 46.8ms / 36.5ms)

OneFlow resnet50 time: 27.8ms (= 5554.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 42.0ms (= 8407.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.51 (= 42.0ms / 27.8ms)

OneFlow resnet50 time: 25.2ms (= 5042.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 38.7ms (= 7734.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.53 (= 38.7ms / 25.2ms)

OneFlow resnet50 time: 25.3ms (= 5051.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 36.3ms (= 7255.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.44 (= 36.3ms / 25.3ms)

@github-actions
Copy link
Contributor

github-actions bot commented Jun 2, 2024

@github-actions
Copy link
Contributor

github-actions bot commented Jun 2, 2024

Speed stats:
GPU Name: NVIDIA GeForce RTX 3080 Ti 

❌ OneFlow resnet50 time: 44.0ms (= 4399.0ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 57.3ms (= 5729.8ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.30 (= 57.3ms / 44.0ms)

OneFlow resnet50 time: 26.6ms (= 2659.2ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 37.6ms (= 3758.2ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.41 (= 37.6ms / 26.6ms)

OneFlow resnet50 time: 18.5ms (= 3703.2ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 34.8ms (= 6956.0ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.88 (= 34.8ms / 18.5ms)

OneFlow resnet50 time: 17.2ms (= 3440.9ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 32.1ms (= 6414.7ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.86 (= 32.1ms / 17.2ms)

OneFlow resnet50 time: 17.0ms (= 3402.2ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 29.8ms (= 5950.3ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.75 (= 29.8ms / 17.0ms)

OneFlow swin dataloader time: 0.200s (= 39.978s / 200, num_workers=1)
PyTorch swin dataloader time: 0.128s (= 25.624s / 200, num_workers=1)
Relative speed: 0.641 (= 0.128s / 0.200s)

OneFlow swin dataloader time: 0.055s (= 11.015s / 200, num_workers=4)
PyTorch swin dataloader time: 0.032s (= 6.459s / 200, num_workers=4)
Relative speed: 0.586 (= 0.032s / 0.055s)

OneFlow swin dataloader time: 0.031s (= 6.118s / 200, num_workers=8)
PyTorch swin dataloader time: 0.017s (= 3.378s / 200, num_workers=8)
Relative speed: 0.552 (= 0.017s / 0.031s)

❌ OneFlow resnet50 time: 49.5ms (= 4952.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 65.4ms (= 6538.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.32 (= 65.4ms / 49.5ms)

OneFlow resnet50 time: 37.0ms (= 3703.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 47.3ms (= 4729.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.28 (= 47.3ms / 37.0ms)

OneFlow resnet50 time: 28.0ms (= 5598.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 39.7ms (= 7940.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.42 (= 39.7ms / 28.0ms)

OneFlow resnet50 time: 25.3ms (= 5055.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 38.7ms (= 7734.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.53 (= 38.7ms / 25.3ms)

OneFlow resnet50 time: 24.9ms (= 4989.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 36.0ms (= 7198.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.44 (= 36.0ms / 24.9ms)

@Dmovic Dmovic removed the request for review from oneflow-ci-bot June 3, 2024 04:19
@Dmovic Dmovic requested review from oneflow-ci-bot and removed request for oneflow-ci-bot July 10, 2024 02:08
@github-actions
Copy link
Contributor

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: NVIDIA GeForce RTX 3080 Ti 

❌ OneFlow resnet50 time: 43.6ms (= 4355.9ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 57.3ms (= 5732.2ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.32 (= 57.3ms / 43.6ms)

OneFlow resnet50 time: 26.1ms (= 2609.7ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 37.4ms (= 3741.4ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.43 (= 37.4ms / 26.1ms)

OneFlow resnet50 time: 18.5ms (= 3709.0ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 35.7ms (= 7149.9ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.93 (= 35.7ms / 18.5ms)

OneFlow resnet50 time: 16.8ms (= 3356.2ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 32.6ms (= 6514.5ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.94 (= 32.6ms / 16.8ms)

OneFlow resnet50 time: 17.2ms (= 3441.1ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 29.0ms (= 5792.1ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.68 (= 29.0ms / 17.2ms)

OneFlow swin dataloader time: 0.201s (= 40.291s / 200, num_workers=1)
PyTorch swin dataloader time: 0.127s (= 25.486s / 200, num_workers=1)
Relative speed: 0.633 (= 0.127s / 0.201s)

OneFlow swin dataloader time: 0.058s (= 11.585s / 200, num_workers=4)
PyTorch swin dataloader time: 0.032s (= 6.479s / 200, num_workers=4)
Relative speed: 0.559 (= 0.032s / 0.058s)

OneFlow swin dataloader time: 0.030s (= 6.085s / 200, num_workers=8)
PyTorch swin dataloader time: 0.017s (= 3.348s / 200, num_workers=8)
Relative speed: 0.550 (= 0.017s / 0.030s)

❌ OneFlow resnet50 time: 49.2ms (= 4920.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 64.7ms (= 6465.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.31 (= 64.7ms / 49.2ms)

OneFlow resnet50 time: 37.3ms (= 3729.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 46.1ms (= 4614.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.24 (= 46.1ms / 37.3ms)

OneFlow resnet50 time: 27.7ms (= 5530.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 40.8ms (= 8156.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.47 (= 40.8ms / 27.7ms)

OneFlow resnet50 time: 25.4ms (= 5071.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 39.2ms (= 7845.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.55 (= 39.2ms / 25.4ms)

OneFlow resnet50 time: 24.9ms (= 4972.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 35.6ms (= 7115.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.43 (= 35.6ms / 24.9ms)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment