11# Architecture
22
3+ All draws have been created with draw.io, the document being stored in ` doc ` folder.
4+
35## Interfaces & Communication Protocol
46
57The core connects internally its modules with AMBA philosophy. AMBA proposes a simple way to connect
@@ -136,6 +138,7 @@ Features:
136138- Direct-mapped placement policy
137139- Parametrizable cache depth
138140- Parametrizable cache line width
141+ - Parametrizable number of outstanding requests
139142- Software-based flush control with FENCE.i instruction
140143- Transparent operation for user, no need of any kind of management
141144- Cache prefetch can be activated in the internal memory controller to enhance efficiency
@@ -174,6 +177,75 @@ addr = | tag | index | offset |
174177- tag: the remaining MSBs, the part helping to determine a cache hit/miss
175178
176179
180+ #### Data Cache
181+
182+ <p align =" center " > <img src =" ./assets/dCache-top.png " > </p >
183+
184+ The data cache (dCache) relies on the same read flow than iCache. The differences are the dCache
185+ implements a write flow and manages read re-ordering.
186+
187+ Features:
188+
189+ - Direct-mapped placement policy
190+ - Write-through policy for write management
191+ - Parametrizable cache depth
192+ - Parametrizable cache line width
193+ - Parametrizable number of outstanding requests
194+ - IO Region configurable to manage uncachable requests
195+ - Cache prefetch can be activated in the internal memory controller to enhance efficiency
196+ - AXI4-lite slave interface to fetch an instruction
197+ - AXI4 master interface to read/write the system memory
198+
199+ ##### Write Path
200+
201+ <p align =" center " > <img src =" ./assets/dCache-pusher.png " > </p >
202+
203+ Pusher stage manages the write path, updating the cache blocks if the address to write is cached
204+ and issuing write request to the memory. It can buffer a certain number of write requests to unleash
205+ performance, this number being configurable with a parameter. If a write request targets an IO
206+ region, the application indicates with AWCACHE the request is not cachable and need to be directly
207+ written in the system memory and not in the cache blocks.
208+
209+ ##### Read Path
210+
211+ <p align =" center " > <img src =" ./assets/dCache-read-path.png " > </p >
212+
213+ The read path, if needs to manage IO region (uncachable) read multiplex the Block-Fetcher and
214+ IO-Fetcher modules based on the ARCACHE attribute. IO-Fetcher is always serviced first to issue
215+ request to the memory controller.
216+
217+
218+ ##### Read Out-Of-Order Management
219+
220+ <p align =" center " > <img src =" ./assets/dCache-ooo.png " > </p >
221+
222+ Read request can target either an IO region or a cachable region, the application needs to
223+ indicate this information with ARCACHE. Block-Fetcher stage (same module than iCache) manages the
224+ read request in the cache blocks, IO-Fetcher manages the IO request to route directly in the memory
225+ with the memory controller. Because read request can come back out-of-order with the latency
226+ different between block and memory, the dCache uses one more module to manage that. The OoO Manager
227+ module substitutes ARID to make it unique for each read request and uses them to reorder the read
228+ data completion to the application. This stage can be deactivated if not necessary, if the
229+ application can manage by itself the reordering or if doesn't target IO region (Block-Fetcher always
230+ completes requets in-order).
231+
232+ The module also manages the data interface resizing, the cache block and memory interface being
233+ always wider than XLEN (32 or 64 bits).
234+
235+ ##### AXI4 Ordering Rules
236+
237+ AXI doesn't provide advanced ordering rules and instructs the user to issue first a sequence of
238+ write then a sequence of read only once write completions have been all received (and vice versa).
239+ Internally, the cache could still processing or waiting for write requests while the application is
240+ already able to issue new series of R/W requests. The cache manages that situation by monitoring all
241+ read and write modules and block any situation that could lead to read / write collision and data
242+ integrity corruption.
243+
244+ However, the read and write path always buffer request with FIFO, preventing to slow down the
245+ application performance. Only the processing of the request will be stopped, the communication with
246+ the cache will remain active as long the FIFO are not full.
247+
248+
177249### CSR Unit
178250
179251The core implements in a dedicated module the supported registers described in the ISA manuel volume
@@ -192,6 +264,9 @@ The core implements the following CSR registers into the dedicated module:
192264- mepc (RW)
193265- mcause (RW)
194266- mtval (RW)
267+ - rdcycle (RO)
268+ - rdtime (RO)
269+ - rdinstret (RO)
195270
196271Next CSRs are available as a memory-mapped peripheral:
197272
@@ -220,25 +295,25 @@ The core always handles in its clock domain the interrupts by synchronizing them
220295FFDs.
221296
222297
223- ##### Register 0: MSIP Output [ RW] - Address 0x0
298+ #### MSIP Output [ RW] - Address 0x0
224299
225300Output software interrupt MSIP to trigger another core (1 bit)
226301
227- ##### Register 1: MTIME LSB [ RW] - Address 0x4
302+ #### MTIME LSB [ RW] - Address 0x4
228303
229- MTIME CSR, bits 0 to 31
304+ MTIME CSR, ` bit 31:0 `
230305
231- ##### Register 1: MTIME MSB [ RW] - Address 0x8
306+ #### MTIME MSB [ RW] - Address 0x8
232307
233- MTIME CSR, bits 32 to 63
308+ MTIME CSR, ` bit 63:32 `
234309
235- ##### Register 1: MTIMECMP LSB [ RW] - Address 0xC
310+ #### MTIMECMP LSB [ RW] - Address 0xC
236311
237- MTIMECMP CSR, bits 0 to 31
312+ MTIMECMP CSR, ` bit 31:0 `
238313
239- ##### Register 1: MTIMECMP MSB [ RW] - Address 0x10
314+ #### MTIMECMP MSB [ RW] - Address 0x10
240315
241- MTIMECMP CSR, bits 32 to 63
316+ MTIMECMP CSR, ` bit 63:32 `
242317
243318
244319### IO Peripherals
@@ -251,11 +326,11 @@ an APB interconnect
251326
252327The GPIOs are binded behind two registers:
253328
254- ##### Register 0: Outputs [ RW] - Address 0x0
329+ ##### OUTPUTS [ RW] - Address 0x0
255330
256331XLEN wide general purpose outputs
257332
258- ##### Register 1: Inputs [ RW] - Address 0x4
333+ ##### INPUTS [ RW] - Address 0x4
259334
260335XLEN wide general purpose inputs
261336
@@ -266,70 +341,70 @@ Reading and writing a GPIOs' register is never blocking.
266341
267342The UART uses few IOs:
268343
269- - rx : serial input, data from an external transmitter
270- - tx : serial output, data to an external receiver
271- - rts: back-pressure flag to indicate the core can't receive anymore data
272- - cts: back-pressure flag to indicate the external receiver can't receive data anymore
344+ - ` rx ` : serial input, data from an external transmitter
345+ - ` tx ` : serial output, data to an external receiver
346+ - ` rts ` : back-pressure flag to indicate the core can't receive anymore data
347+ - ` cts ` : back-pressure flag to indicate the external receiver can't receive data anymore
273348
274349The UART uses a FIFO to store data to transmit, and another to store data received. If the FIFOs are
275350full, the UART can't receive anymore data and rises the RTS flag, or can't transmit anymore and
276351block the APB bus until the receiver desasserts its CTS flag.
277352
278- The UART owns few registers. Any attempt to write in a read-only (RO ) register or a reserved field
353+ The UART owns few registers. Any attempt to write in a read-only (` RO ` ) register or a reserved field
279354will be without effect and can't change the register content neither the engine behavior. Read-write
280- (RW ) registers can be written partially by setting properly the WSTRB signal. A read in a write-only
281- (WO ) register is not garanteed to return a valid value written previously.
355+ (` RW ` ) registers can be written partially by setting properly the WSTRB signal. A read in a write-only
356+ (` WO ` ) register is not garanteed to return a valid value written previously.
282357
283358If a transfer (RX or TX) is active and the enable bit is setup back to 0, the transfer will
284359terminate only after the complete frame transmission.
285360
286361
287- ##### Register 0: Control and Status [ RW/RO] - Address 0x0
362+ ##### CONTROL AND STATUS [ RW/RO] - Address 0x0
288363
289- - bit 0 : Enable the UART engine (both RX and TX) [ RW]
290- - bit 1 : Loopback mode, every received data will be stored in RX FIFO and forwarded back to TX [ RW]
291- - bit 2 : Enable parity bit [ RW]
292- - bit 3 : 0 for even parity, 1 for odd parity [ RW]
293- - bit 4 : 0 for one stop bit, 1 for two stop bits [ RW]
294- - bit 7:5 : Reserved
295- - bit 8 : Busy flag, the UART engine is processing (RX or TX) [ RO]
296- - bit 9 : TX FIFO is empty [ RO]
297- - bit 10 : TX FIFO is full [ RO]
298- - bit 11 : RX FIFO is empty [ RO]
299- - bit 12 : RX FIFO is full [ RO]
300- - bit 13 : UART RTS, flagging it can't receive anymore data [ RO]
301- - bit 14 : UART CTS, flagging it can't send anymore data [ RO]
302- - bit 15 : Parity error of the last RX transaction [ RO]
303- - bit 31:16 : Reserved
364+ - ` Bit 0 ` : Enable the UART engine (both RX and TX) [ RW]
365+ - ` Bit 1 ` : Loopback mode, every received data will be stored in RX FIFO and forwarded back to TX [ RW]
366+ - ` Bit 2 ` : Enable parity bit [ RW]
367+ - ` Bit 3 ` : 0 for even parity, 1 for odd parity [ RW]
368+ - ` Bit 4 ` : 0 for one stop bit, 1 for two stop bits [ RW]
369+ - ` Bit 7:5` : Reserved
370+ - ` Bit 8 ` : Busy flag, the UART engine is processing (RX or TX) [ RO]
371+ - ` Bit 9 ` : TX FIFO is empty [ RO]
372+ - ` Bit 10` : TX FIFO is full [ RO]
373+ - ` Bit 11` : RX FIFO is empty [ RO]
374+ - ` Bit 12` : RX FIFO is full [ RO]
375+ - ` Bit 13` : UART RTS, flagging it can't receive anymore data [ RO]
376+ - ` Bit 14` : UART CTS, flagging it can't send anymore data [ RO]
377+ - ` Bit 15` : Parity error of the last RX transaction [ RO]
378+ - ` Bit 31:16` : Reserved
304379
305380
306- ##### Register 1: UART Clock Divider [ RW] - Address 0x4
381+ ##### UART CLOCK DIVIDER [ RW] - Address 0x4
307382
308383The number of CPU core cycles to divide down to get the UART data bit rate (baud rate).
309384
310- - Bit 15:0 : Clock divider
311- - Bit 31:16 : Reserved
385+ - ` Bit 15:0 ` : Clock divider
386+ - ` Bit 31:16 ` : Reserved
312387
313388An update during an ongoing operation will certainly lead to compromise the transfer integrity and
314389possibly make unstable the UART engine. The user is advised to configure the baud rate during
315390start-up and be sure the engine is disabled before changing this value.
316391
317- ##### Register 2: TX FIFO [ WO] - Address 0x8
392+ ##### TX FIFO [ WO] - Address 0x8
318393
319394Push data into TX FIFO. Writing into this register will block the APB write request if TX FIFO is
320395full, until the engine transmit a new word.
321396
322- - Bit 7:0 : data to write
323- - Bit 31:8 : Reserved
397+ - ` Bit 7:0 ` : data to write
398+ - ` Bit 31:8 ` : Reserved
324399
325400
326- ##### Register 3: RX FIFO [ RO] - Address 0xC
401+ ##### RX FIFO [ RO] - Address 0xC
327402
328403Pull data from RX FIFO. Reading into this register will block the APB read request if FIFO is empty,
329404until the engine receives a new word.
330405
331- - Bit 7:0 : data ready to be read
332- - Bit 31:8 : Reserved
406+ - ` Bit 7:0 ` : data ready to be read
407+ - ` Bit 31:8 ` : Reserved
333408
334409Current limitations:
335410- only support 8 bits wide data word
0 commit comments