Skip to content

Embedded interpreter is 10% slower #270

Open
@vitrun

Description

@vitrun

Playing around with multipy, I found same program executed obviously slower in embedded interpreter. The original code is echo.py:

import time

def echo(_):
    start = time.time_ns()
    sum = 0
    for i in range(1000_000):
        sum += i
    cost = (time.time_ns() - start)/1000_000
    print(cost)

The cost is around 40 when executed by system python. It remains the same when bytecode generation(.pyc and__pycache__ files) is disabled.

40.741847
40.761805
40.764069
40.746884
40.773777
40.747697
40.765722
40.761535
40.782256

It is packaged by:

from torch.package import PackageExporter
from echo import echo

with PackageExporter("echo.zip") as ex:
    ex.intern("echo")
    ex.save_pickle("model", "model.pkl", echo)

and executed by multipy as follows:

int main(int argc, const char *argv[])
{
    if (argc != 3) {
        std::cerr << "usage: example-app <path-to-exported-script-module> thread_count\n";
        return -1;
    }

    torch::deploy::InterpreterManager manager(4);
    torch::deploy::ReplicatedObj model;
    try {
        torch::deploy::Package package = manager.loadPackage(argv[1]);
        model = package.loadPickle("model", "model.pkl");
        int n = std::stoi(argv[2]);
        for (int i=0; i<n; i++) {
            auto I = manager.acquireOne();
            auto echo = I.fromMovable(model);
            echo({1});
        }
        return 0;
    } catch (const c10::Error &e)
    {
        std::cerr << "error loading the model\n";
        std::cerr << e.msg();
        return -1;
    }
}

which prints out:

44.830043
45.339226
44.659302
44.789638
44.924977
44.725203
44.823946
44.660343
44.622977

why is it 10% slower?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions