|
| 1 | +## GCSFuse Tracing (Experimental) |
| 2 | + |
| 3 | +### Introduction |
| 4 | + |
| 5 | +GCSFuse traces each of its FUSE file system operations. It further traces both the standard HTTP/1 and gRPC client calls inside every FUSE file system operation. GCSFuse supports exporting the traces locally to the process output stream and also to Google Cloud Trace. For information on permissions, requirements for exporting traces to Google Cloud Trace, and basic tracing concepts, refer to the [Google Cloud Trace Docs](https://docs.cloud.google.com/trace/docs/overview). |
| 6 | + |
| 7 | +### Enabling tracing |
| 8 | + |
| 9 | +To enable tracing in GCSFuse and visualize the data as a waterfall/gantt chart in the Google Cloud Trace Explorer, add the required configuration to your GCSFuse YAML file as shown below: |
| 10 | + |
| 11 | +``` |
| 12 | +monitoring: |
| 13 | + experimental-tracing-mode: gcptrace |
| 14 | + experimental-tracing-sampling-ratio: 0.1 |
| 15 | + experimental-tracing-project-id: <google-cloud-project-id> |
| 16 | +``` |
| 17 | + |
| 18 | +| Key | Value | Function | |
| 19 | +| :--- | :--- | :--- | |
| 20 | +| **`experimental-tracing-mode`** | `gcptrace` | Specifies the **exporter**. Supported Values: gcptrace, stdout. The value `gcptrace` indicates that the collected traces will be sent specifically to **Google Cloud Trace** (GCP Trace). The value `stdout` exports to the GCSFuse process output stream. | |
| 21 | +| **`experimental-tracing-sampling-ratio`** | `0.1` | Sets the **sampling rate**. This means **10%** (0.1 out of 1.0) of all incoming requests or operations will have a trace generated and exported. This helps manage cost and overhead in high-traffic applications. | |
| 22 | +| **`experimental-tracing-project-id`** | `<google-cloud-project-id>` | If exporting to `gcptrace`, this specifies the destination Google Cloud Project ID. By default, traces are sent to the project where GCSFuse is running. Replace the placeholder with your target project ID if you want to send traces to a different project. | |
| 23 | + |
| 24 | +### Accessing and Viewing Trace Exports |
| 25 | + |
| 26 | +For better visualization of the traces exported in GCSFuse, it is recommended to export them to Google Cloud Trace using the `gcptrace` option. You can then access the Trace Explorer on Google Cloud Console using the following link to find the traces. |
| 27 | + |
| 28 | +[Trace Explorer Link](https://console.cloud.google.com/traces/explorer) |
| 29 | + |
| 30 | +To restrict the exported traces in the filter input to only the current running instance of the GCSFuse mount, use the following attribute filter: |
| 31 | + |
| 32 | +``` |
| 33 | +Key = service.instance.id |
| 34 | +Value = <your-unique-mount-id> |
| 35 | +``` |
| 36 | + |
| 37 | +You can find your unique mount instance ID in any GCSFuse log line from the beginning of the mount, for example: mount-id=<your-unique-mount-id>. |
| 38 | + |
| 39 | +The trace explorer displays the spans generated by GCSFuse, which you can filter by attributes like the GCSFuse instance ID or the host's machine type. This allows you to isolate traces for a specific mount and gain performance insights, such as comparing performance across different machine types. |
| 40 | + |
| 41 | + |
| 42 | + |
| 43 | +Trace Explorer view filtering by mount instance ID |
| 44 | + |
| 45 | +#### Filtering Traces in Trace Explorer |
| 46 | + |
| 47 | +We can filter by several attributes and also filter only specific spans to get a timeline view of all the calls attributed to a single trace. Once you get a trace ID, you can also search spans using a unique trace ID. |
| 48 | + |
| 49 | + |
| 50 | + |
| 51 | +Waterfall/Gantt chart view of a single trace |
| 52 | + |
| 53 | +### Interpreting GCSFuse Spans |
| 54 | + |
| 55 | +Common Spans recorded and what each of them signifies |
| 56 | + |
| 57 | +| Span Name (as visible in trace explorer) | Description (of what the underlying span traces) | |
| 58 | +| :---- | :---- | |
| 59 | +| **FUSE Operations** | | |
| 60 | +| **StatFS** | Retrieves file system-wide statistics, such as total blocks, free blocks, and block size. | |
| 61 | +| **LookUpInode** | Look up a directory entry by name within a parent directory to find the corresponding inode. This is fundamental for path resolution. | |
| 62 | +| **GetInodeAttributes** | Retrieves the attributes of an inode, such as its size, permissions, and modification times. | |
| 63 | +| **SetInodeAttributes** | Modifies the attributes of an inode, for example, changing its size (**truncate**), permissions (**chmod**), or owner (**chown**). | |
| 64 | +| **ForgetInode** | Informs the file system that the kernel is no longer referencing a particular inode, allowing the file system to reclaim resources associated with it. | |
| 65 | +| **BatchForget** | A batch version of **ForgetInode** that allows the kernel to inform the file system about multiple inodes that are no longer in use. | |
| 66 | +| **MkDir** | Creates a new directory. | |
| 67 | +| **MkNode** | Creates a new file system node, which can be a regular file, a device file, or a named pipe. | |
| 68 | +| **CreateFile** | Creates and opens a new regular file. | |
| 69 | +| **CreateLink** | Creates a **hard link** to an existing file. | |
| 70 | +| **CreateSymlink** | Creates a **symbolic link**. | |
| 71 | +| **Rename** | Renames a file or directory, potentially moving it to a different directory. | |
| 72 | +| **RmDir** | Removes an empty directory. | |
| 73 | +| **Unlink** | Removes a file (deletes a name from the file system). If that name was the last link to a file and no processes have the file open, the file is deleted and the space it was using is made available for reuse. | |
| 74 | +| **OpenDir** | Open a directory for reading its contents. | |
| 75 | +| **ReadDir** | Reads entries from an open directory. | |
| 76 | +| **ReadDirPlus** | Similar to **ReadDir**, but it can also return the attributes of the entries, which can be more efficient than calling **LookUpInode** and **GetInodeAttributes** for each entry. | |
| 77 | +| **ReleaseDirHandle** | Releases an open directory handle, called when a process is done reading a directory. | |
| 78 | +| **OpenFile** | Open a file for reading or writing. | |
| 79 | +| **ReadFile** | Reads data from an open file. | |
| 80 | +| **WriteFile** | Writes data to an open file. | |
| 81 | +| **SyncFile** | Requests that any cached data for an open file be written to the underlying storage. | |
| 82 | +| **FlushFile** | Called when a file handle is being closed. This is an opportunity to flush any cached data. | |
| 83 | +| **ReleaseFileHandle** | Releases an open file handle, called when a process closes a file. | |
| 84 | +| **ReadSymlink** | Reads the target of a symbolic link. | |
| 85 | +| **GCS Operations (gRPC)** | | |
| 86 | +| **google.storage.v2.Storage/ListObjects** | The gRPC call to list a collection of objects (like a directory listing) within a Google Cloud Storage bucket. | |
| 87 | +| **cloud.google.com/go/storage.grpcStorageClient.ObjectsListCall** | The gRPC client-side function call within the Go library that initiates the object listing operation. | |
| 88 | +| **google.storage.v2.Storage/ReadObject** | The gRPC call to stream the content (data) of a specific GCS object. | |
| 89 | +| **google.storage.v2.Storage/GetObject** | The gRPC call to retrieve the **metadata/attributes** (not the data content) of a single GCS object. | |
| 90 | +| **google.storage.control.v2.StorageControl/GetFolder** | A specific gRPC call to retrieve metadata for a GCS Folder (using the Storage Control API). | |
| 91 | +| **GCS Operations (HTTP)** | | |
| 92 | +| **HTTP GET** | An entire end-to-end trace for a client's GET request. | |
| 93 | +| **HTTP POST** | An entire end-to-end trace for a client's POST request. | |
| 94 | +| **cloud.google.com/go/storage.httpStorageClient.ObjectsListCall** | A specific operation to list objects within a Google Cloud Storage (GCS) bucket. | |
| 95 | +| **cloud.google.com/go/storage.Object.Attrs** | A specific operation to get the metadata/attributes of a single GCS object. | |
| 96 | +| **Low-level HTTP Transport** | | |
| 97 | +| **http.dns** | Time spent resolving the domain name to an IP address. | |
| 98 | +| **http.getconn** | Time spent waiting for an idle connection from the connection pool or establishing a new one. | |
| 99 | +| **http.tls** | Time spent performing the TLS/SSL handshake (key exchange and certificate verification). | |
| 100 | +| **http.headers** | Time spent waiting for the first byte of the response headers after sending the request. | |
| 101 | +| **http.send** | Time spent sending the entire request (headers and body) to the server. | |
| 102 | +| **http.receive** | Time spent receiving the entire response body from the server. | |
| 103 | + |
| 104 | +### Differentiating gRPC from HTTP Spans |
| 105 | + |
| 106 | +You can differentiate gRPC traces and spans from traditional HTTP ones based on two key characteristics: **Naming Convention** and **Trace Content/Attributes**. |
| 107 | + |
| 108 | +#### Naming Convention |
| 109 | + |
| 110 | +| Trace Type | Naming Pattern | Example | |
| 111 | +| :---- | :---- | :---- | |
| 112 | +| **gRPC** | Uses the full **Service/Method** format, often starting with the API version and service name. | google.storage.v2.Storage/ListObjects | |
| 113 | +| **HTTP** | Uses the **HTTP method** or lower-level network phases. | HTTP GET, http.dns, http.send | |
| 114 | + |
| 115 | +#### Trace Content and Attributes |
| 116 | + |
| 117 | +The attributes (tags) attached to the span clearly indicate the protocol: |
| 118 | + |
| 119 | +* **gRPC Spans** will contain OpenTelemetry attributes starting with `rpc.`: |
| 120 | + * `rpc.system`: Typically `"grpc"` |
| 121 | + * `rpc.method`: The full method name (e.g., `ListObjects`) |
| 122 | + * `rpc.grpc.status_code`: The gRPC status code (e.g., `0` for OK) |
| 123 | +* **HTTP Spans** will contain attributes starting with `http.`: |
| 124 | + * `http.method`: The HTTP verb (`GET`, `POST`) |
| 125 | + * `http.url`: The full resource URL |
| 126 | + * `http.status_code`: The three-digit HTTP status code (e.g., `200`, `404`) |
| 127 | + |
| 128 | +### Best practices |
| 129 | + |
| 130 | +#### Controlling Trace Volume with Sampling |
| 131 | + |
| 132 | +Trace sampling is a critical mechanism for managing the operational overhead and costs associated with tracing in high-throughput environments. |
| 133 | + |
| 134 | +**Sampling Ratio:** The `experimental-tracing-sampling-ratio` flag controls the fraction of GCSFuse operations that are traced and exported. |
| 135 | + |
| 136 | +This ratio is a floating-point number between 0.0 (no traces exported) and 1.0 (all traces exported). |
| 137 | + |
| 138 | +**Crucially:** Once a root trace is selected for export by the sampling mechanism, all of its associated spans (sub-operations) are guaranteed to be fully captured. This ensures that the exported trace is complete and useful for analysis. |
| 139 | + |
| 140 | +| Sampling Ratio | Effect | |
| 141 | +| :---- | :---- | |
| 142 | +| **1.0** | Exports **100%** of all GCSFuse operations (Highest detail, highest cost/overhead). | |
| 143 | +| **0.1** | Exports **10%** of all GCSFuse operations (Good for production monitoring, balances detail and cost). | |
| 144 | +| **0.01** | Exports **1%** of all GCSFuse operations (Used for high-volume traffic/low-cost scenarios). | |
0 commit comments