Skip to content

Multiple memory safety violations in this crate #8

@parasyte

Description

@parasyte

Don't take this as a negative criticism. There is a lot of content in this issue, which is not meant as an attack on code quality. I am merely attempting to point out how difficult it is to use unsafe correctly. And more importantly, I want to eventually use this crate in my own project.

I have identified several issues. I'll try to enumerate them, but be aware that this is not an exhaustive audit. Also note that I am not 100% confident in this analysis, since I don't have a great way to rigorously provide proof for each problem. That said, what I have found is convincing on its own merit.

Connection::close()

As a starting point, let's look at the lifetime relationship in iracing::telemetry::Connection::close():

iracing.rs/src/telemetry.rs

Lines 781 to 791 in 5977462

pub fn close(&self) -> IOResult<()> {
let succ = unsafe { CloseHandle(self.location) };
if succ != 0 {
Ok(())
} else {
let errno: i32 = unsafe { GetLastError() as i32 };
Err(std::io::Error::from_raw_os_error(errno))
}
}

This is a public method which borrows self with an anonymous lifetime. Immediately after the call to CloseHandle, the Connection can no longer be used, because the file handle that it owns has been invalidated. The anonymous lifetime allows us to do just that in safe Rust, however. Here's the dump_sample.rs example with a single line added to close the connection before getting telemetry from it:

use iracing::telemetry::Connection;
use serde_yaml::to_string;

pub fn main() {
    let conn = Connection::new().expect("Unable to open telemetry");
    conn.close().unwrap();
    let telem = conn.telemetry().expect("Telem");

    print!("{}", to_string(&telem.all()).unwrap());
}

This compiles successfully, which is scary. Does this mean that the API allows for undefined behavior? Well, if we run this code, this happens:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 6, kind: Uncategorized, message: "The handle is invalid." }', examples\safety.rs:6:18

And this error occurs because CloseHandle is given a pointer to shared memory, not a handle! Connection needs to maintain a handle reference so that it can close it. The handle is created here in Connection::new:

iracing.rs/src/telemetry.rs

Lines 660 to 667 in 5977462

unsafe { mapping = OpenFileMappingW(FILE_MAP_READ, 0, path.as_ptr()); };
if null() == mapping {
unsafe { errno = GetLastError() as i32; }
return Err(std::io::Error::from_raw_os_error(errno));
}

Ok, so let's fix that:

diff --git a/src/telemetry.rs b/src/telemetry.rs
index 2bc1dde..15a7b05 100644
--- a/src/telemetry.rs
+++ b/src/telemetry.rs
@@ -644,6 +644,7 @@ impl Blocking {
 /// ```
 pub struct Connection {
     mux: Mutex<()>,
+    mapping: HANDLE,
     location: *mut c_void,
     header: Header
 }
@@ -680,7 +681,7 @@ impl Connection {
 
         let header = unsafe { Self::read_header(view) };
 
-        return Ok(Connection {mux: Mutex::default(), location: view, header: header});
+        return Ok(Connection {mux: Mutex::default(), mapping, location: view, header: header});
     }
 
     ///
@@ -779,7 +780,7 @@ impl Connection {
     }
 
     pub fn close(&self) -> IOResult<()> {
-        let succ = unsafe { CloseHandle(self.location) };
+        let succ = unsafe { CloseHandle(self.mapping) };
 
         if succ != 0 {
             Ok(())

Now calling conn.close() returns Ok(()) as expected. But, oh dear, the code prints out a bunch of info, and that's not at all expected. Or is it?

We just closed the HANDLE, but it looks like we didn't invalidate the shared memory. Honestly I have no idea how Windows feels about retaining access to shared memory after closing the handle to the file that owns it. This is most likely undefined behavior at this point, and the fact that the documentation for OpenFileMappingW, MapViewOfFile, and CloseHandle doesn't specify the behavior in this state strongly suggests that it is UB.

That said, the close method is missing an operation; unmapping the file view. So let's properly break this! The modified example code cannot possibly be correct if we do that, right?

diff --git a/src/telemetry.rs b/src/telemetry.rs
index 2bc1dde..8aa64d8 100644
--- a/src/telemetry.rs
+++ b/src/telemetry.rs
@@ -29,7 +29,7 @@ use std::sync::Mutex;
 use encoding::{Encoding, DecoderTrap};
 use encoding::all::ISO_8859_1;
 use winapi::shared::minwindef::LPVOID;
-use winapi::um::memoryapi::{OpenFileMappingW, FILE_MAP_READ, MapViewOfFile};
+use winapi::um::memoryapi::{OpenFileMappingW, FILE_MAP_READ, MapViewOfFile, UnmapViewOfFile};
 
 
 /// System path where the shared memory map is located.
@@ -644,6 +644,7 @@ impl Blocking {
 /// ```
 pub struct Connection {
     mux: Mutex<()>,
+    mapping: HANDLE,
     location: *mut c_void,
     header: Header
 }
@@ -680,7 +681,7 @@ impl Connection {
 
         let header = unsafe { Self::read_header(view) };
 
-        return Ok(Connection {mux: Mutex::default(), location: view, header: header});
+        return Ok(Connection {mux: Mutex::default(), mapping, location: view, header: header});
     }
 
     ///
@@ -779,15 +780,22 @@ impl Connection {
     }
 
     pub fn close(&self) -> IOResult<()> {
-        let succ = unsafe { CloseHandle(self.location) };
+        let succ = unsafe { CloseHandle(self.mapping) };
 
-        if succ != 0 {
-            Ok(())
-        } else {
+        if succ == 0 {
             let errno: i32 = unsafe { GetLastError() as i32 };
+            UnmapViewOfFile(self.location); // XXX: Ignore return value; we are already handling an error.
+            return Err(std::io::Error::from_raw_os_error(errno));
+        }
 
-            Err(std::io::Error::from_raw_os_error(errno))
+        let succ = unsafe { UnmapViewOfFile(self.location) };
+
+        if succ == 0 {
+            let errno: i32 = unsafe { GetLastError() as i32 };
+            return Err(std::io::Error::from_raw_os_error(errno));
         }
+
+        Ok(())
     }
 }
 

Now when we run the modified example, we get something horrifying:

error: process didn't exit successfully: `target\debug\examples\safety.exe` (exit code: 0xc0000005, STATUS_ACCESS_VIOLATION)
Segmentation fault

Remember, this modified example is all safe code! What happened? The API is in fact not valid. The close method cannot safely free kernel resources and allow the caller to continue using Connection. There's really only one way to encode a lifetime in the API that means "this struct can never be used again" and that's for the close method to consume the self argument. Essentially you just change the API to:

pub fn close(self) -> IOResult<()>

And you're done. Now the modified example does not compile:

error[E0382]: borrow of moved value: `conn`
   --> examples\safety.rs:7:17
    |
5   |     let conn = Connection::new().expect("Unable to open telemetry");
    |         ---- move occurs because `conn` has type `Connection`, which does not implement the `Copy` trait
6   |     conn.close().unwrap();
    |          ------- `conn` moved due to this method call
7   |     let telem = conn.telemetry().expect("Telem");
    |                 ^^^^ value borrowed here after move
    |
note: this function takes ownership of the receiver `self`, which moves `conn`
   --> C:\Users\jay\other-projects\iracing.rs\src\telemetry.rs:782:18
    |
782 |     pub fn close(self) -> IOResult<()> {
    |                  ^^^^

There's just one thing about this that is ugly. The fact that calling close at all is optional. We could move this code into a Drop impl so Rust will free the resources when Connection goes out of scope. Then we don't need the explicit close method, because drop(conn); would be identical to conn.close();

This does not fix all possible ways to leak memory (which is now a well known problem, e.g. the leakpocalypse) but it is a much better API than one which expects callers to remember to call the Connection::close() method when it's done with the connection.

Blocking::close()

This has the same lifetime problems described above and can also be fixed with impl Drop, but it also has its own bugs. Let's look at those here.

iracing.rs/src/telemetry.rs

Lines 562 to 588 in 5977462

pub fn close(&self) -> std::io::Result<()> {
if self.event_handle.is_null() {
return Ok(());
}
let succ = unsafe { CloseHandle(self.event_handle) };
if succ == 0 {
let err: i32 = unsafe { GetLastError() as i32 };
return Err(std::io::Error::from_raw_os_error(err));
}
if self.origin.is_null() {
return Ok(());
}
let succ = unsafe { CloseHandle(self.origin as HANDLE) };
if succ == 0 {
let err: i32 = unsafe { GetLastError() as i32 };
Err(std::io::Error::from_raw_os_error(err))
} else {
Ok(())
}
}

This first thing to notice is that self.event_handle cannot be NULL here. Ever. This was verified when the handle was created:

iracing.rs/src/telemetry.rs

Lines 545 to 551 in 5977462

let handle: HANDLE = unsafe { CreateEventW(sc, 0, 0, event_name.as_ptr()) };
if null() == handle {
let errno: i32 = unsafe { GetLastError() as i32 };
return Err(std::io::Error::from_raw_os_error(errno));
}

And because Blocking contains private fields and it doesn't implement Clone or Default, it is only possible to create one with the constructor: Blocking::new().

The second thing to point out is that there is also a NULL check on self.origin, but no corresponding NULL check in the constructor. The constructor even uses the pointer without a NULL check:

iracing.rs/src/telemetry.rs

Lines 537 to 538 in 5977462

pub fn new(location: *const c_void, head: Header) -> std::io::Result<Self> {
let values = head.get_var_header(location).to_vec();

FWIW, Header::get_var_header() also doesn't do a NULL check. For a quick detour, the Blocking constructor is public, it's safe to call, and it accepts a raw pointer. That can't be good. Except there is currently no safe way to construct a Header, so making the constructor for Blocking public is useless. 🤷 In short, all of these NULL checks are pointless and should just be removed.

The third thing to notice about this close method is this line right here:

let succ = unsafe { CloseHandle(self.origin as HANDLE) };

self.origin is a raw pointer. It is not a handle. What happens when we call this function?

use iracing::telemetry::Connection;

pub fn main() {
    let conn = Connection::new().expect("Unable to open telemetry");
    let blocking = conn.blocking().expect("Couldn't create telem handle");
    blocking.close().unwrap();
}

Let's find out!

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 6, kind: Uncategorized, message: "The handle is invalid." }', examples\safety.rs:6:22

Well, that explains a lot. But we're allowed to just ignore the Result and continue calling methods on Blocking. In this case, the event handle will be closed and if we try to call blocking.sample(), it will just return an error that similarly says the handle is closed. So not a great user experience, but I can't find any memory safety issues with Blocking::close().

However ...

Lifetime violations

Blocking owns a raw pointer to data owned by Connection. If we resolve the problems described above by implementing Drop, we fall into this trap where Blocking is allowed to outlive Connection. For instance:

let conn = Connection::new().expect("Unable to open telemetry");
let blocking = conn.blocking().expect("Couldn't create telem handle");
drop(conn);
blocking.sample(Duration::from_millis(500)).unwrap(); // THIS ACCESSES FREED MEMORY

Rust provides a nice way to tie the lifetimes of these structs together. We can make the lifetime of Blocking depend on the lifetime of Connection. Thus Connection must always outlive Blocking.

diff --git a/src/telemetry.rs b/src/telemetry.rs
index 2bc1dde..269f865 100644
--- a/src/telemetry.rs
+++ b/src/telemetry.rs
@@ -7,6 +7,7 @@ use std::ffi::CStr;
 use std::fmt::Display;
 use std::fmt;
 use std::os::windows::raw::HANDLE;
+use std::marker::PhantomData;
 use std::ptr::null;
 use std::slice::from_raw_parts;
 use std::time::Duration;
@@ -67,11 +68,12 @@ pub struct Header {
 ///
 /// Calling `sample()` on a Blocking interface will block until a new telemetry sample is made available.
 ///
-pub struct Blocking {
+pub struct Blocking<'conn> {
     origin: *const c_void,
     values: Vec<ValueHeader>,
     header: Header,
-    event_handle: HANDLE
+    event_handle: HANDLE,
+    phantom: PhantomData<&'conn Connection>,
 }
 
 #[derive(Copy, Clone, Debug)]
@@ -533,7 +535,7 @@ impl Display for TelemetryError {
 impl Error for TelemetryError {
 }
 
-impl Blocking {
+impl<'conn> Blocking<'conn> {
     pub fn new(location: *const c_void, head: Header) -> std::io::Result<Self> {
         let values = head.get_var_header(location).to_vec();
 
@@ -555,7 +557,8 @@ impl Blocking {
            origin: location,
            header: head,
            values: values,
-           event_handle: handle
+           event_handle: handle,
+           phantom: PhantomData,
         })
     }
 
@@ -774,7 +777,7 @@ impl Connection {
     /// let sampler = Connection::new()?.blocking()?;
     /// let sample = sample.sample(Duration::from_millis(50))?;
     /// ```
-    pub fn blocking(&self) -> IOResult<Blocking> {
+    pub fn blocking(&self) -> IOResult<Blocking<'_>> {
         Blocking::new(self.location, unsafe { Self::read_header(self.location) })
     }
 

With this patch, we now get a compile error when Connection is dropped before Blocking:

error[E0505]: cannot move out of `conn` because it is borrowed
 --> examples\safety.rs:7:10
  |
6 |     let blocking = conn.blocking().expect("Couldn't create telem handle");
  |                    ---- borrow of `conn` occurs here
7 |     drop(conn);
  |          ^^^^ move out of `conn` occurs here
8 |     blocking.sample(Duration::from_millis(500)).unwrap();
  |     -------- borrow later used here

Connection::read_header()

This is an associated function, so it doesn't need a Connection constructed to call it, but it is unsafe. So there is no way to abuse this function in safe code. However, this is still a memory safety violation with this function.

iracing.rs/src/telemetry.rs

Lines 702 to 707 in 5977462

pub unsafe fn read_header(from: *const c_void) -> Header {
let raw_header: *const Header = transmute(from);
let h: Header = *raw_header;
h.clone()
}

Fun fact: the transmute doesn't do anything here.

We can ignore the fact that this function takes a raw pointer, and the fact that it is not possible to get a valid shared memory pointer from the API in its present state. The latter fact just means that making this function public is useless. 🤷 But the important detail to consider is the h.clone() line. This line violate memory safety because Header is not Copy (meaning it cannot be read atomically) and there is no synchronization around the read. This function can read shared memory while another process (iRacing simulator) is writing to it.

Some potential outcomes of calling this code are:

  1. It returns a valid Header.
  2. It returns a valid Header that contains "half updated" values. E.g. maybe some fields are updated while others are old values.
    • There are a lot of concerns here.
    • What if Header::header_offset was changed, but the data it points to was not? Or vice versa?
    • What if Header::buffers was partially updated?
    • Etc.
  3. It returns an invalid Header containing impossible values. This is unlikely because Header is #[repr(C)] and only contains integral data and arrays of integral data.

The second point is the most concerning. It is also likely because the shared memory is updated every 16.67ms and callers are allowed to call this function (indirectly) whenever they like. If we look at how the iRacing SDK solves this, it will just make us cringe. The SDK is written in C, so it's more forgiving than Rust when it comes to unsynchronized memory access. But this is essentially how it works:

  1. Read the current tick from the selected buffer, remember the value.
  2. Copy the buffer from shared memory to its final location.
  3. Read the current tick from the selected buffer and compare with the remembered value.
    • If the values are equivalent, the copied data is considered valid.
    • Otherwise the data is discarded and the whole process is tried again, up to a total of 2 attempts.

This means that we can be reasonably sure the contents of the buffer are valid. But it says nothing about the validity of the data pointed by the offset within the buffer. I guess they assume clients will always read the data within 3 frames (about 50ms) of the write being completed. I also assume that all of the shared memory updates by the simulator occur rapidly, say entirely within a few microseconds before the event is set. But this synchronization based only on fuzzy timing is awful. (All it takes is some blocking operation like I/O to stall your process for more than 50ms and you are well in UB territory, even with the buffering and read-twice strategy.)

It is probably worth pointing out that this particular issue is a design issue with iRacing, and your crate can do little to improve the situation.

When it comes to Rust, it doesn't allow memory to change out from underneath it. See how unsafe is mmap? on the Rust users forum for a discussion of exactly this problem. TL;DR is that all kinds of bad things can happen with these ingredients, and we know how that will turn out.

Probably the best way to avoid this issue, without getting the iRacing simulator to actually lock memory regions, is removing all code paths that allow arbitrary calls to reading shared memory. In more concrete terms, Connection::session_info() and Connection::telemetry() need to be removed. That leaves Connection::Blocking as the only interface to shared memory. This provides, at a minimum, using the event as the only way to synchronize access to shared memory. And with careful cloning of all data as soon as possible, the caller can be given a large chunk of memory that will never be changed underneath it.

In terms of Connection::session_info(), it looks like this can also be synchronized with the event. So at least you don't have to remove a useful feature.

General concerns

Public methods should be cleaned up. Methods and fields should be made public only where necessary.

This safe public method accepts a raw pointer:

iracing.rs/src/telemetry.rs

Lines 369 to 372 in 5977462

pub fn telemetry(
&self,
from_loc: *const c_void,
) -> Result<Sample, Box<dyn std::error::Error>> {

It doesn't appear to be possible to gain access to a Header in safe code (as established earlier) so making this method public seems to be useless. 🤷

Blocking::sample() calls this method on an owned clone of Header, which is the cause of #3. Lots of undefined behavior potential here.

Connection::session_info() does not need to borrow self mutably or exclusively.

This early return leaks file handles from the OpenFileMappingW call:

return Err(std::io::Error::from_raw_os_error(errno))

The event should not be created. Switch to OpenEventW instead:

let handle: HANDLE = unsafe { CreateEventW(sc, 0, 0, event_name.as_ptr()) };

And finally, instances of the unsafe keyword can be reduced, especially by wrapping more code surface area in a single unsafe block instead of e.g. having 4 unsafe blocks in a single method body.


That's most of the issues I have discovered so far with how unsafe is used. I don't recommend committing the patches in this issue just yet, because it would create more work for me with #6 reformatting the entire code base, #7 cleaning up a lot of lint (see parasyte/iracing.rs@fix/cargo-fmt...parasyte:fix/check-and-clippy for a comparison diff between the two), and 3 extra as-yet-unpublished branches because they are blocked on those two PRs.

And by the way, thank you for this work, so far. It will save me a lot of time reinventing the wheel. And I was able to learn a lot about the IPC interface provided by the iRacing simulator from going through these motions. I wish they would redesign it, but it's unlikely given the number of tools that read telemetry from shared memory. They would probably scoff at the overhead required by locking memory regions, but that's just wild speculation on my part. And it's a rant for another day.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions