Skip to content

Commit bd0906b

Browse files
Merge pull request #8 from geosmall/atomic_mem
Add ordered atomic memory operations
2 parents ebacd93 + 0626026 commit bd0906b

File tree

4 files changed

+172
-22
lines changed

4 files changed

+172
-22
lines changed

src/README.md

Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
# Research
2+
3+
While a variable that is a single instruction load/store operations on a processor does provide some level of atomicity, there are a few additional considerations:
4+
5+
> Atomicity vs. Memory Ordering
6+
7+
**Atomicity:** Ensures that a read or write operation completes without interruption. For up to uint32_t variables, this is generally true on ARM Cortex-M processors.
8+
9+
**Memory Ordering:** Ensures the correct sequence of operations across multiple threads. This is where std::memory_order comes into play.
10+
11+
**C Sequence Points:** Define points in the code where all side effects of previous evaluations are complete, and no side effects of subsequent evaluations have started.
12+
13+
While sequence points help with ordering within a single thread, they don’t provide guarantees for memory visibility across multiple threads.
14+
15+
> Key reasons why memory ordering is considered relevant:
16+
17+
**Reordering by the Compiler:**
18+
The C and C++ standards allow the compiler to reorder instructions for optimization unless explicitly told not to. For example, without memory barriers or memory ordering, the compiler could:
19+
20+
- Write the head pointer (indicating that new data is available) before the data itself is written to the buffer.
21+
- Read the head pointer before reading the data, leading to the consumer seeing stale or partially written data.
22+
Even if write_index and buffer[] are uint32_t, this reordering could result in the producer marking the data as ready (by updating the write_index) before the data itself has been fully written.
23+
24+
**Reordering by the Processor:**
25+
While a Cortex-M4 has a simpler memory model than modern multi-core processors, it's still possible for the memory subsystem to perform some reordering, particularly with peripheral memory accesses.
26+
27+
**Interrupts:**
28+
ISR could preempt the main thread at any point, including in the middle of a read-modify-write sequence, leading to inconsistent states if proper synchronization isn't enforced. If the write_index is being updated at the same time the main thread reads it, the value read by the main thread could be incomplete or inconsistent.
29+
30+
**Sequence Points in C:**
31+
C sequence points (or in C++, sequencing rules) provide guarantees about when side effects (like memory writes) occur relative to other operations within the same thread. However, these guarantees do not extend across multiple threads or between an interrupt handler and the main thread. This is why memory ordering is required to ensure that changes made by one thread (or ISR) are visible to another in the correct order.
32+
33+
Even though variables up to unin32_t are single instruction load/store, without explicit memory ordering, there's no guarantee that:
34+
- The producer writes data to the buffer before updating the write_index.
35+
- The consumer reads the buffer data only after it sees the updated write_index.
36+
37+
Use of acquire-release semantics ensures that:
38+
- **Release semantics (memory_order_release)** in the put() function guarantee that any prior writes (to the buffer) are visible before the write_index is updated.
39+
- **Acquire semantics (memory_order_acquire)** in the get() function ensure that after reading the write_index, the consumer sees the correct data in the buffer.
40+
41+
With all this I ended up concluding there is indeed an issue that needs consideration for my use case, at least for ARM Cortex m4 and m7 that I am targeting.
42+
43+
# C Ring Buffer Implementation
44+
45+
The rationale behind the specific atomic access types in `ring_buf.c` implementation is based on the following principles of atomicity and memory ordering:
46+
47+
### Key Points of Atomic Operations:
48+
1. **Relaxed Memory Order** (`memory_order_relaxed`): This is used when the operation needs to be atomic, but doesn’t require any synchronization with other threads. This is typically used for operations when only updating a value that doesn’t affect memory visibility between threads.
49+
50+
2. **Acquire Memory Order** (`memory_order_acquire`): This ensures that all subsequent memory reads and writes are done **after** the acquire operation. Used for a read of a shared variable, ensuring that the application also sees any side effects that happened in another thread before writing to that variable.
51+
52+
3. **Release Memory Order** (`memory_order_release`): This ensures that all previous memory writes are completed **before** the release operation. This is used when writing to a shared variable to ensure that any data written prior to this write is visible to other threads that subsequently acquire this variable.
53+
54+
### Ring Buffer Scenario:
55+
The ring buffer operates in a single-producer, single-consumer (SPSC) context, where:
56+
- **Producer** (`put()`) is adding elements.
57+
- **Consumer** (`get()`) is removing elements.
58+
- **Producer and consumer** run in different contexts (e.g., main thread vs. interrupt handler).
59+
60+
In this context, the atomicity and memory ordering serve two purposes:
61+
- Ensure **safe concurrent access** (i.e., no data corruption or race conditions).
62+
- Ensure **correct memory visibility** (i.e., the consumer always sees the correct state of the buffer after the producer has written data, and vice versa).
63+
64+
### Rationale for Specific Atomic Operations in `ring_buf.c`:
65+
66+
#### 1. `RingBuf_put()`
67+
```c
68+
RingBufCtr head = atomic_load_explicit(&me->head, memory_order_relaxed);
69+
```
70+
- **Why `memory_order_relaxed`?**
71+
- We are only reading the `head` index to calculate where to write the next element. This read doesn’t involve any synchronization with the consumer (which reads `tail`). The key point here is that this read is purely for internal logic, and the synchronization comes later when the `head` is updated. Therefore, `memory_order_relaxed` is sufficient for this read.
72+
73+
```c
74+
RingBufCtr tail = atomic_load_explicit(&me->tail, memory_order_acquire);
75+
```
76+
- **Why `memory_order_acquire`?**
77+
- This is a critical read. Before we write a new element to the buffer, we need to ensure that any writes to the buffer that occurred before this point (from the consumer) are visible. By using `memory_order_acquire`, we ensure that the consumer's operations (which may have modified `tail`) are fully visible to the producer. The acquire order ensures that the producer sees the most up-to-date value of `tail` and any prior operations done by the consumer.
78+
79+
```c
80+
me->buf[me->head] = el;
81+
```
82+
- **Why no atomic operation on `buf[]`?**
83+
- The buffer itself (`buf[]`) does not need to be accessed atomically. The synchronization is achieved through atomic accesses to `head` and `tail`. Once `head` and `tail` are updated atomically, the buffer access is guaranteed to be valid.
84+
85+
```c
86+
atomic_store_explicit(&me->head, head, memory_order_release);
87+
```
88+
- **Why `memory_order_release`?**
89+
- This is the key update in `put()`. By storing `head` with a `memory_order_release`, we ensure that all prior memory writes (i.e., the write to `buf[me->head]`) are completed before `head` is updated. This guarantees that once the consumer reads `head`, it will see all the writes to the buffer that happened before this update. This ensures the consumer sees the correct data in the buffer.
90+
91+
#### 2. `RingBuf_get()`
92+
```c
93+
RingBufCtr tail = atomic_load_explicit(&me->tail, memory_order_relaxed);
94+
```
95+
- **Why `memory_order_relaxed`?**
96+
- This is similar to the `head` read in `put()`. We are reading `tail` for internal logic, and this does not require synchronization with the producer. The critical synchronization point is when `head` is read, which comes next.
97+
98+
```c
99+
RingBufCtr head = atomic_load_explicit(&me->head, memory_order_acquire);
100+
```
101+
- **Why `memory_order_acquire`?**
102+
- The consumer needs to ensure that any writes performed by the producer (to both `head` and the buffer) are fully visible before the consumer reads the data. By using `memory_order_acquire`, we guarantee that the consumer will see all the memory writes performed by the producer up to the point where `head` was updated. This ensures that the consumer reads the correct data from the buffer.
103+
104+
```c
105+
atomic_store_explicit(&me->tail, tail, memory_order_release);
106+
```
107+
- **Why `memory_order_release`?**
108+
- This update to `tail` indicates that the consumer has processed an element from the buffer. We use `memory_order_release` to ensure that all memory writes (e.g., modifications to `pel`) are completed before this update. This guarantees that when the producer reads `tail`, it will see the effects of the consumer's operations.
109+
110+
#### 3. `RingBuf_num_free()`
111+
```c
112+
RingBufCtr head = atomic_load_explicit(&me->head, memory_order_acquire);
113+
```
114+
- **Why `memory_order_acquire`?**
115+
- The purpose of `RingBuf_num_free()` is to provide an accurate count of the available space in the buffer. To do so, it needs to ensure that it has the most up-to-date value of `head`, especially when called by the consumer. Using `memory_order_acquire` ensures that any previous writes to `head` by the producer are visible to this function.
116+
117+
```c
118+
RingBufCtr tail = atomic_load_explicit(&me->tail, memory_order_relaxed);
119+
```
120+
- **Why `memory_order_relaxed`?**
121+
- This is a non-critical read. We are simply using it for calculating available space, and there’s no need for synchronization at this point. We’ve already synchronized with `head`, which is the key variable for space calculation.
122+
123+
#### 4. `RingBuf_process_all()`
124+
```c
125+
RingBufCtr head = atomic_load_explicit(&me->head, memory_order_acquire);
126+
```
127+
- **Why `memory_order_acquire`?**
128+
- This function processes elements from the buffer and needs to ensure it sees the most up-to-date value of `head`, reflecting the producer's actions. By using `memory_order_acquire`, it guarantees that any data written to the buffer by the producer is visible to the consumer.
129+
130+
```c
131+
atomic_store_explicit(&me->tail, tail, memory_order_release);
132+
```
133+
- **Why `memory_order_release`?**
134+
- After processing the elements, the consumer updates `tail` to indicate that it has finished processing. Using `memory_order_release` ensures that all memory writes performed by the consumer (e.g., processing the buffer elements) are visible to the producer when it next reads `tail`.
135+
136+
---
137+
138+
### Summary of Memory Ordering:
139+
140+
- **`memory_order_acquire`**: Used when reading the head or tail to ensure visibility of previous operations performed by the other thread.
141+
- **`memory_order_release`**: Used when updating the head or tail to ensure that all preceding memory writes are visible to the other thread.
142+
- **`memory_order_relaxed`**: Used for non-synchronization reads or internal logic, where memory ordering is not a concern.
143+
144+
These atomic operations ensure that both the producer and consumer have consistent views of the shared buffer and indexes (`head` and `tail`), preventing race conditions and ensuring correct data flow in the multithreaded environment.

src/ring_buf.c

Lines changed: 20 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -38,44 +38,46 @@ void RingBuf_ctor(RingBuf * const me,
3838
RingBufElement sto[], RingBufCtr sto_len) {
3939
me->buf = &sto[0];
4040
me->end = sto_len;
41-
me->head = 0U;
42-
me->tail = 0U;
41+
atomic_store(&me->head, 0U); /* initialize head atomically */
42+
atomic_store(&me->tail, 0U); /* initialize tail atomically */
4343
}
4444
/*..........................................................................*/
4545
bool RingBuf_put(RingBuf * const me, RingBufElement const el) {
46-
RingBufCtr head = me->head + 1U;
46+
RingBufCtr head = atomic_load_explicit(&me->head, memory_order_relaxed) + 1U;
4747
if (head == me->end) {
4848
head = 0U;
4949
}
50-
if (head != me->tail) { /* buffer NOT full? */
51-
me->buf[me->head] = el; /* copy the element into the buffer */
52-
me->head = head; /* update the head to a *valid* index */
53-
return true; /* element placed in the buffer */
50+
RingBufCtr tail = atomic_load_explicit(&me->tail, memory_order_acquire); /* acquire before checking tail */
51+
if (head != tail) { /* buffer NOT full? */
52+
me->buf[atomic_load_explicit(&me->head, memory_order_relaxed)] = el; /* write to buffer */
53+
atomic_store_explicit(&me->head, head, memory_order_release); /* update head with release */
54+
return true;
5455
}
5556
else {
56-
return false; /* element NOT placed in the buffer */
57+
return false; /* buffer full */
5758
}
5859
}
5960
/*..........................................................................*/
6061
bool RingBuf_get(RingBuf * const me, RingBufElement *pel) {
61-
RingBufCtr tail = me->tail;
62-
if (me->head != tail) { /* ring buffer NOT empty? */
62+
RingBufCtr tail = atomic_load_explicit(&me->tail, memory_order_relaxed);
63+
RingBufCtr head = atomic_load_explicit(&me->head, memory_order_acquire); /* acquire before accessing head */
64+
if (head != tail) { /* buffer NOT empty? */
6365
*pel = me->buf[tail];
6466
++tail;
6567
if (tail == me->end) {
6668
tail = 0U;
6769
}
68-
me->tail = tail; /* update the tail to a *valid* index */
70+
atomic_store_explicit(&me->tail, tail, memory_order_release); /* update tail with release */
6971
return true;
7072
}
7173
else {
72-
return false;
74+
return false; /* buffer empty */
7375
}
7476
}
7577
/*..........................................................................*/
7678
RingBufCtr RingBuf_num_free(RingBuf * const me) {
77-
RingBufCtr head = me->head;
78-
RingBufCtr tail = me->tail;
79+
RingBufCtr head = atomic_load_explicit(&me->head, memory_order_acquire); /* acquire for consistency */
80+
RingBufCtr tail = atomic_load_explicit(&me->tail, memory_order_relaxed);
7981
if (head == tail) { /* buffer empty? */
8082
return (RingBufCtr)(me->end - 1U);
8183
}
@@ -89,13 +91,14 @@ RingBufCtr RingBuf_num_free(RingBuf * const me) {
8991

9092
/*..........................................................................*/
9193
void RingBuf_process_all(RingBuf * const me, RingBufHandler handler) {
92-
RingBufCtr tail = me->tail;
93-
while (me->head != tail) { /* ring buffer NOT empty? */
94+
RingBufCtr tail = atomic_load_explicit(&me->tail, memory_order_relaxed);
95+
RingBufCtr head = atomic_load_explicit(&me->head, memory_order_acquire); /* acquire for processing */
96+
while (head != tail) { /* buffer NOT empty? */
9497
(*handler)(me->buf[tail]);
9598
++tail;
9699
if (tail == me->end) {
97100
tail = 0U;
98101
}
99-
me->tail = tail; /* update the tail to a *valid* index */
102+
atomic_store_explicit(&me->tail, tail, memory_order_release); /* update tail */
100103
}
101104
}

src/ring_buf.h

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,10 @@
3131
#ifndef RING_BUF_H
3232
#define RING_BUF_H
3333

34+
#include <stdatomic.h>
35+
#include <stdint.h>
36+
#include <stdbool.h>
37+
3438
/*! Ring buffer counter/index
3539
*
3640
* @attention
@@ -66,12 +70,11 @@ typedef uint8_t RingBufElement;
6670
typedef struct {
6771
RingBufElement *buf; /*!< pointer to the start of the ring buffer */
6872
RingBufCtr end; /*!< offset of the end of the ring buffer */
69-
RingBufCtr head; /*!< offset to where next el. will be inserted */
70-
RingBufCtr tail; /*!< offset of where next el. will be removed */
73+
_Atomic(RingBufCtr) head; /*!< atomic offset to where next element will be inserted */
74+
_Atomic(RingBufCtr) tail; /*!< atomic offset of where next element will be removed */
7175
} RingBuf;
7276

73-
void RingBuf_ctor(RingBuf * const me,
74-
RingBufElement sto[], RingBufCtr sto_len);
77+
void RingBuf_ctor(RingBuf * const me, RingBufElement sto[], RingBufCtr sto_len);
7578
RingBufCtr RingBuf_num_free(RingBuf * const me);
7679
bool RingBuf_put(RingBuf * const me, RingBufElement const el);
7780
bool RingBuf_get(RingBuf * const me, RingBufElement *pel);

test/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -179,7 +179,7 @@ endif
179179
endif
180180

181181
clean :
182-
-$(RM) $(BIN_DIR)/*.*
182+
-$(RM) -rf $(BIN_DIR)
183183

184184
show :
185185
@echo PROJECT = $(PROJECT)

0 commit comments

Comments
 (0)