Open
Description
Playing around with multipy, I found same program executed obviously slower in embedded interpreter. The original code is echo.py
:
import time
def echo(_):
start = time.time_ns()
sum = 0
for i in range(1000_000):
sum += i
cost = (time.time_ns() - start)/1000_000
print(cost)
The cost is around 40 when executed by system python. It remains the same when bytecode generation(.pyc
and__pycache__
files) is disabled.
40.741847
40.761805
40.764069
40.746884
40.773777
40.747697
40.765722
40.761535
40.782256
It is packaged by:
from torch.package import PackageExporter
from echo import echo
with PackageExporter("echo.zip") as ex:
ex.intern("echo")
ex.save_pickle("model", "model.pkl", echo)
and executed by multipy as follows:
int main(int argc, const char *argv[])
{
if (argc != 3) {
std::cerr << "usage: example-app <path-to-exported-script-module> thread_count\n";
return -1;
}
torch::deploy::InterpreterManager manager(4);
torch::deploy::ReplicatedObj model;
try {
torch::deploy::Package package = manager.loadPackage(argv[1]);
model = package.loadPickle("model", "model.pkl");
int n = std::stoi(argv[2]);
for (int i=0; i<n; i++) {
auto I = manager.acquireOne();
auto echo = I.fromMovable(model);
echo({1});
}
return 0;
} catch (const c10::Error &e)
{
std::cerr << "error loading the model\n";
std::cerr << e.msg();
return -1;
}
}
which prints out:
44.830043
45.339226
44.659302
44.789638
44.924977
44.725203
44.823946
44.660343
44.622977
why is it 10% slower?
Metadata
Metadata
Assignees
Labels
No labels