|
| 1 | +=========================== |
| 2 | +Telemetry framework in LLVM |
| 3 | +=========================== |
| 4 | + |
| 5 | +.. contents:: |
| 6 | + :local: |
| 7 | + |
| 8 | +.. toctree:: |
| 9 | + :hidden: |
| 10 | + |
| 11 | +Objective |
| 12 | +========= |
| 13 | + |
| 14 | +Provides a common framework in LLVM for collecting various usage and performance |
| 15 | +metrics. |
| 16 | +It is located at ``llvm/Telemetry/Telemetry.h``. |
| 17 | + |
| 18 | +Characteristics |
| 19 | +--------------- |
| 20 | +* Configurable and extensible by: |
| 21 | + |
| 22 | + * Tools: any tool that wants to use Telemetry can extend and customize it. |
| 23 | + * Vendors: Toolchain vendors can also provide custom implementation of the |
| 24 | + library, which could either override or extend the given tool's upstream |
| 25 | + implementation, to best fit their organization's usage and privacy models. |
| 26 | + * End users of such tool can also configure Telemetry (as allowed by their |
| 27 | + vendor). |
| 28 | + |
| 29 | +Important notes |
| 30 | +--------------- |
| 31 | + |
| 32 | +* There is no concrete implementation of a Telemetry library in upstream LLVM. |
| 33 | + We only provide the abstract API here. Any tool that wants telemetry will |
| 34 | + implement one. |
| 35 | + |
| 36 | + The rationale for this is that all the tools in LLVM are very different in |
| 37 | + what they care about (what/where/when to instrument data). Hence, it might not |
| 38 | + be practical to have a single implementation. |
| 39 | + However, in the future, if we see enough common pattern, we can extract them |
| 40 | + into a shared place. This is TBD - contributions are welcome. |
| 41 | + |
| 42 | +* No implementation of Telemetry in upstream LLVM shall store any of the |
| 43 | + collected data due to privacy and security reasons: |
| 44 | + |
| 45 | + * Different organizations have different privacy models: |
| 46 | + |
| 47 | + * Which data is sensitive, which is not? |
| 48 | + * Whether it is acceptable for instrumented data to be stored anywhere? |
| 49 | + (to a local file, what not?) |
| 50 | + |
| 51 | + * Data ownership and data collection consents are hard to accommodate from |
| 52 | + LLVM developers' point of view: |
| 53 | + |
| 54 | + * E.g., data collected by Telemetry is not necessarily owned by the user |
| 55 | + of an LLVM tool with Telemetry enabled, hence the user's consent to data |
| 56 | + collection is not meaningful. On the other hand, LLVM developers have no |
| 57 | + reasonable ways to request consent from the "real" owners. |
| 58 | + |
| 59 | + |
| 60 | +High-level design |
| 61 | +================= |
| 62 | + |
| 63 | +Key components |
| 64 | +-------------- |
| 65 | + |
| 66 | +The framework consists of four important classes: |
| 67 | + |
| 68 | +* ``llvm::telemetry::Manager``: The class responsible for collecting and |
| 69 | + transmitting telemetry data. This is the main point of interaction between the |
| 70 | + framework and any tool that wants to enable telemetry. |
| 71 | +* ``llvm::telemetry::TelemetryInfo``: Data courier |
| 72 | +* ``llvm::telemetry::Destination``: Data sink to which the Telemetry framework |
| 73 | + sends data. |
| 74 | + Its implementation is transparent to the framework. |
| 75 | + It is up to the vendor to decide which pieces of data to forward and where |
| 76 | + to forward them to for their final storage. |
| 77 | +* ``llvm::telemetry::Config``: Configurations for the ``Manager``. |
| 78 | + |
| 79 | +.. image:: llvm_telemetry_design.png |
| 80 | + |
| 81 | +How to implement and interact with the API |
| 82 | +------------------------------------------ |
| 83 | + |
| 84 | +To use Telemetry in your tool, you need to provide a concrete implementation of the ``Manager`` class and ``Destination``. |
| 85 | + |
| 86 | +1) Define a custom ``Serializer``, ``Manager``, ``Destination`` and optionally a subclass of ``TelemetryInfo`` |
| 87 | + |
| 88 | +.. code-block:: c++ |
| 89 | + |
| 90 | + class JsonSerializer : public Serializer { |
| 91 | + public: |
| 92 | + json::Object *getOutputObject() { return Out.get(); } |
| 93 | +
|
| 94 | + Error init() override { |
| 95 | + if (Started) |
| 96 | + return createStringError("Serializer already in use"); |
| 97 | + started = true; |
| 98 | + Out = std::make_unique<json::Object>(); |
| 99 | + return Error::success(); |
| 100 | + } |
| 101 | + |
| 102 | + // Serialize the given value. |
| 103 | + void write(StringRef KeyName, bool Value) override { |
| 104 | + writeHelper(KeyName, Value); |
| 105 | + } |
| 106 | + |
| 107 | + void write(StringRef KeyName, int Value) override { |
| 108 | + writeHelper(KeyName, Value); |
| 109 | + } |
| 110 | + |
| 111 | + void write(StringRef KeyName, long Value) override { |
| 112 | + writeHelper(KeyName, Value); |
| 113 | + } |
| 114 | + |
| 115 | + void write(StringRef KeyName, long long Value ) override { |
| 116 | + writeHelper(KeyName, Value); |
| 117 | + } |
| 118 | + |
| 119 | + void write(StringRef KeyName, unsigned int Value) override { |
| 120 | + writeHelper(KeyName, Value); |
| 121 | + } |
| 122 | + |
| 123 | + void write(StringRef KeyName, unsigned long Value) override { |
| 124 | + writeHelper(KeyName, Value); |
| 125 | + } |
| 126 | + |
| 127 | + void write(StringRef KeyName, unsigned long long Value) override { |
| 128 | + writeHelper(KeyName, Value); |
| 129 | + } |
| 130 | + |
| 131 | + void write(StringRef KeyName, StringRef Value) override { |
| 132 | + writeHelper(KeyName, Value); |
| 133 | + } |
| 134 | + |
| 135 | + void beginObject(StringRef KeyName) override { |
| 136 | + Children.push_back(json::Object()); |
| 137 | + ChildrenNames.push_back(KeyName.str()); |
| 138 | + } |
| 139 | + |
| 140 | + void endObject() override { |
| 141 | + assert(!Children.empty() && !ChildrenNames.empty()); |
| 142 | + json::Value Val = json::Value(std::move(Children.back())); |
| 143 | + std::string Name = ChildrenNames.back(); |
| 144 | + |
| 145 | + Children.pop_back(); |
| 146 | + ChildrenNames.pop_back(); |
| 147 | + writeHelper(Name, std::move(Val)); |
| 148 | + } |
| 149 | + |
| 150 | + Error finalize() override { |
| 151 | + if (!Started) |
| 152 | + return createStringError("Serializer not currently in use"); |
| 153 | + Started = false; |
| 154 | + return Error::success(); |
| 155 | + } |
| 156 | + |
| 157 | + private: |
| 158 | + template <typename T> void writeHelper(StringRef Name, T Value) { |
| 159 | + assert(Started && "serializer not started"); |
| 160 | + if (Children.empty()) |
| 161 | + Out->try_emplace(Name, Value); |
| 162 | + else |
| 163 | + Children.back().try_emplace(Name, Value); |
| 164 | + } |
| 165 | + bool Started = false; |
| 166 | + std::unique_ptr<json::Object> Out; |
| 167 | + std::vector<json::Object> Children; |
| 168 | + std::vector<std::string> ChildrenNames; |
| 169 | + }; |
| 170 | + |
| 171 | + class MyManager : public telemery::Manager { |
| 172 | + public: |
| 173 | + static std::unique_ptr<MyManager> createInstatnce(telemetry::Config *Config) { |
| 174 | + // If Telemetry is not enabled, then just return null; |
| 175 | + if (!Config->EnableTelemetry) |
| 176 | + return nullptr; |
| 177 | + return std::make_unique<MyManager>(); |
| 178 | + } |
| 179 | + MyManager() = default; |
| 180 | +
|
| 181 | + Error preDispatch(TelemetryInfo *Entry) override { |
| 182 | + Entry->SessionId = SessionId; |
| 183 | + return Error::success(); |
| 184 | + } |
| 185 | + |
| 186 | + // You can also define additional instrumentation points. |
| 187 | + void logStartup(TelemetryInfo *Entry) { |
| 188 | + // Add some additional data to entry. |
| 189 | + Entry->Msg = "Some message"; |
| 190 | + dispatch(Entry); |
| 191 | + } |
| 192 | + |
| 193 | + void logAdditionalPoint(TelemetryInfo *Entry) { |
| 194 | + // .... code here |
| 195 | + } |
| 196 | + |
| 197 | + private: |
| 198 | + const std::string SessionId; |
| 199 | + }; |
| 200 | + |
| 201 | + class MyDestination : public telemetry::Destination { |
| 202 | + public: |
| 203 | + Error receiveEntry(const TelemetryInfo *Entry) override { |
| 204 | + if (Error Err = Serializer.init()) |
| 205 | + return Err; |
| 206 | + |
| 207 | + Entry->serialize(Serializer); |
| 208 | + if (Error Err = Serializer.finalize()) |
| 209 | + return Err; |
| 210 | + |
| 211 | + json::Object Copied = *Serializer.getOutputObject(); |
| 212 | + // Send the `Copied` object to wherever. |
| 213 | + return Error::success(); |
| 214 | + } |
| 215 | +
|
| 216 | + private: |
| 217 | + JsonSerializer Serializer; |
| 218 | + }; |
| 219 | + |
| 220 | + // This defines a custom TelemetryInfo that has an additional Msg field. |
| 221 | + struct MyTelemetryInfo : public telemetry::TelemetryInfo { |
| 222 | + std::string Msg; |
| 223 | + |
| 224 | + Error serialize(Serializer &Serializer) const override { |
| 225 | + TelemetryInfo::serialize(serializer); |
| 226 | + Serializer.writeString("MyMsg", Msg); |
| 227 | + } |
| 228 | + |
| 229 | + // Note: implement getKind() and classof() to support dyn_cast operations. |
| 230 | + }; |
| 231 | + |
| 232 | + |
| 233 | +2) Use the library in your tool. |
| 234 | + |
| 235 | +Logging the tool init-process: |
| 236 | + |
| 237 | +.. code-block:: c++ |
| 238 | + |
| 239 | + // In tool's initialization code. |
| 240 | + auto StartTime = std::chrono::time_point<std::chrono::steady_clock>::now(); |
| 241 | + telemetry::Config MyConfig = makeConfig(); // Build up the appropriate Config struct here. |
| 242 | + auto Manager = MyManager::createInstance(&MyConfig); |
| 243 | + |
| 244 | + |
| 245 | + // Any other tool's init code can go here. |
| 246 | + // ... |
| 247 | + |
| 248 | + // Finally, take a snapshot of the time now so we know how long it took the |
| 249 | + // init process to finish. |
| 250 | + auto EndTime = std::chrono::time_point<std::chrono::steady_clock>::now(); |
| 251 | + MyTelemetryInfo Entry; |
| 252 | + |
| 253 | + Entry.Start = StartTime; |
| 254 | + Entry.End = EndTime; |
| 255 | + Manager->logStartup(&Entry); |
| 256 | + |
| 257 | +Similar code can be used for logging the tool's exit. |
0 commit comments