Skip to content

Parsing can be 20x slower w/ pygccxml vs. in-memory solutions? (e.g. clang.cindex) #129

Open
@EricCousineau-TRI

Description

@EricCousineau-TRI

This might be closable as "Not a Problem", but figgered I'd post it here anyway.

WARNING: These benchmarks are still relatively shallow. More work would be necessary to draw meaningful conclusions for more general usage / scalability.

New Setup: pygccxml vs. clang.cindex

Tinkering more, if I turn this towards a more complex project, like CastXML itself, and I want to see the CastXML symbols itself, it takes about ~70s to load a parsed file (from scratch) for pygccxml, vs. ~3.5s for clang.cindex.

Example:
https://github.com/EricCousineau-TRI/repro/blob/3c2fbae3cb0afd623a2d7909e3f77f14fd67da52/python/bindings/pygccxml_sandbox/test_castxml_scan.ipynb
Uses:

  • Ubuntu Bionic apt, libclang-9-dev (9-2~ubuntu18.04.2)
  • CastXML@3e9bc94, from superbuild download

Speculations for newer setup:

  • Obviously, I could filter the symbols from the XML side. But CastXML's filtering mechanisms seem simple (and it seems like it should kinda stay that way?)
  • I am not querying as much information with clang.cindex at present.

Old Setup: pygccxml vs. cppyy

With some simple code like this:

#include <vector>

#include <Eigen/Dense>

namespace ns {

template <typename T, typename U = int>
class ExampleClass {
public:
    std::vector<T> make_std_vector() const;
    Eigen::Matrix<U, 3, 3> make_matrix3();
};

// Analyze concrete instantiations of the given class.
extern template class ExampleClass<int>;
extern template class ExampleClass<float, float>;

}  // namespace ns

It takes about 0.60s on my machine for cppyy to parse this and allow me to print out a namespace object, whereas pygccxml (with castxml == 0.3.4) takes about 4.3s. (This is across 10 trials, only timing the parsing + retrieval routine)

Will post benchmark shortly.

Speculations:

  • I'm guessing the overhead comes in from disk I/O (e.g. XML serialization / deserialization, pygccxml correspondence, etc.)
  • I should tell castxml to ignore std and Eigen to see if that saves any time.
  • I'm not sure if cppyy does any aggressive "crawling" through the namespace; perhaps it only does reflection on-demand?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions