-
Notifications
You must be signed in to change notification settings - Fork 183
Expand file tree
/
Copy pathAIEX.td
More file actions
1493 lines (1280 loc) · 55 KB
/
AIEX.td
File metadata and controls
1493 lines (1280 loc) · 55 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
//===- AIE.td ----------------------------------------------*- tablegen -*-===//
//
// This file is licensed under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
// (c) Copyright 2019 Xilinx Inc.
//
//===----------------------------------------------------------------------===//
#ifndef AIEX_OPS
#define AIEX_OPS
include "aie/Dialect/AIE/IR/AIEAttrs.td"
include "aie/Dialect/AIE/IR/AIEInterfaces.td"
include "mlir/IR/OpBase.td"
include "mlir/IR/AttrTypeBase.td"
include "mlir/IR/EnumAttr.td"
include "mlir/IR/BuiltinAttributes.td"
include "mlir/IR/SymbolInterfaces.td"
include "mlir/Interfaces/SideEffectInterfaces.td"
include "mlir/Interfaces/DataLayoutInterfaces.td"
include "mlir/IR/BuiltinTypeInterfaces.td"
include "mlir/IR/CommonAttrConstraints.td"
def AIEX_Dialect : Dialect {
let name = "aiex";
let cppNamespace = "::xilinx::AIEX";
let description = [{
This is a dialect for experimental work related to AIEngine processors.
The expectation is that new ideas can be developed here before migration
to the more mature AIE dialect.
}];
let extraClassDeclaration = [{
}];
let useDefaultTypePrinterParser = 1;
}
//===----------------------------------------------------------------------===//
// AIEX Attributes
//===----------------------------------------------------------------------===//
include "aie/Dialect/AIEX/IR/AIEXAttrs.td"
//===----------------------------------------------------------------------===//
// AIEX Types
//===----------------------------------------------------------------------===//
def AIEX_BlockFloatingPointType : TypeDef<AIEX_Dialect, "BlockFloat", [
MemRefElementTypeInterface,
DeclareTypeInterfaceMethods<DataLayoutTypeInterface>
]> {
let summary = "AIEX type representing a block floating point type.";
let mnemonic = "bfp";
let description = [{
This is a type representing a block floating point.
It is meant to eventually be lowered into a standard type further down the pipeline.
It the meantime, it can be used for blocked fp related dataflow adaptations.
Available types are v8bfp16ebs8 and v16bfp16ebs16.
}];
let parameters = (
ins StringRefParameter<"">:$block_type
);
let assemblyFormat = "`<` $block_type `>`";
let extraClassDeclaration = [{
int getBlockSize() const;
int getMantissaBits() const;
int getExponentBits() const;
int getSubtileShiftBits() const;
using BlockFormat = struct BlockFormat {
int blockSize;
int mantissaBits;
int exponentBits;
int subtileShiftBits;
};
std::optional<BlockFormat> getBlockFormat() const;
static std::optional<BlockFormat> getBlockFormat(llvm::StringRef blockType);
uint64_t getTotalSizeInBits() const;
}];
let extraClassDefinition = [{
int $cppClass::getBlockSize() const { return getBlockFormat().value().blockSize; }
int $cppClass::getMantissaBits() const { return getBlockFormat().value().mantissaBits; }
int $cppClass::getExponentBits() const { return getBlockFormat().value().exponentBits; }
int $cppClass::getSubtileShiftBits() const { return getBlockFormat().value().subtileShiftBits; }
std::optional<$cppClass::BlockFormat> $cppClass::getBlockFormat() const {
return getBlockFormat(getBlockType());
}
}];
let genVerifyDecl = 1;
}
//===----------------------------------------------------------------------===//
// AIEX Operations
//===----------------------------------------------------------------------===//
class AIEX_Op<string mnemonic, list<Trait> traits = []> :
Op<AIEX_Dialect, mnemonic, traits>;
def AIE_GetTileOp: AIEX_Op<"getTile", []>, Results<(outs Index:$result)> {
let arguments = (
ins Index:$col,
Index:$row
);
let summary = "Get a reference to an AIE tile";
let description = [{
Return a reference to an AIE tile, given the column and the row of the tile.
}];
let assemblyFormat = [{ `(` $col `,` $row `)` attr-dict }];
}
def AIE_ConnectionOp: AIEX_Op<"connection", []> {
let arguments = (
ins Index:$source,
WireBundle:$sourceBundle,
AIEI32Attr:$sourceChannel,
Index:$dest,
WireBundle:$destBundle,
AIEI32Attr:$destChannel
);
let summary = "A logical circuit-switched connection between cores";
let description = [{
The "aie.connection" operation represents a circuit switched connection between two endpoints, usually
"aie.core" operations. During routing, this is replaced by "aie.connect" operations which represent
the programmed connections inside a switchbox, along with "aie.wire" operations which represent
physical connections between switchboxes and other components. Note that while "aie.flow" operations
can express partial routes between tiles, this is not possible with "aie.connection" operations.
Example:
%22 = aie.tile(2, 2)
%c22 = aie.core(%22)
%11 = aie.tile(1, 1)
%c11 = aie.core(%11)
aie.flow(%c22, "Core" : 0, %c11, "Core" : 1)
}];
let assemblyFormat = [{
`(` $source `,` $sourceBundle `:` $sourceChannel `,` $dest `,` $destBundle `:` $destChannel `)` attr-dict
}];
let extraClassDeclaration = [{
int sourceIndex() { return getSourceChannel(); }
int destIndex() { return getDestChannel(); }
}];
}
def AIE_MulticastOp: AIEX_Op<"multicast", [SingleBlockImplicitTerminator<"AIE::EndOp">]> {
let arguments = (
ins Index:$tile,
WireBundle:$bundle,
AIEI32Attr:$channel
);
let regions = (region AnyRegion:$ports);
let summary = "An abstraction of multicast";
let description = [{
An abstraction of broadcast. During place and
route, it will be replaced by multiple flows.
Example:
```
%70 = AIE.tile(7, 0)
%73 = AIE.tile(7, 3)
%74 = AIE.tile(7, 4)
%63 = AIE.tile(6, 3)
%64 = AIE.tile(6, 4)
aiex.multicast(%70, "DMA" : 0){
aiex.multi_dest<%73, "DMA" : 0>
aiex.multi_dest<%74, "DMA" : 0>
aiex.multi_dest<%63, "DMA" : 0>
aiex.multi_dest<%64, "DMA" : 0>
}
```
}];
let assemblyFormat = [{ `(` $tile `,` $bundle `:` $channel `)` regions attr-dict }];
let hasVerifier = 1;
let extraClassDeclaration = [{
int channelIndex() { return getChannel(); }
AIE::Port port() { return {getBundle(), channelIndex()}; }
}];
}
def AIE_MultiDestOp: AIEX_Op<"multi_dest", [HasParent<"MulticastOp">]> {
let arguments = (
ins Index:$tile,
WireBundle:$bundle,
AIEI32Attr:$channel
);
let summary = "A destination port of multicast flow";
let description = [{
An object representing the destination of a multicast flow. This must exist
within an [aiex.multicast] operation. There can be multiple destinations within an
aiex.multicast Op.
See [aiex.multicast]for an example.
}];
let assemblyFormat = [{
`<` $tile `,` $bundle `:` $channel `>` attr-dict
}];
let extraClassDeclaration = [{
int channelIndex() { return getChannel(); }
AIE::Port port() { return {getBundle(), channelIndex()}; }
}];
}
def AIE_BroadcastPacketOp: AIEX_Op<"broadcast_packet", [SingleBlockImplicitTerminator<"AIE::EndOp">]> {
let arguments = (
ins Index:$tile,
WireBundle:$bundle,
AIEI32Attr:$channel
);
let regions = (region AnyRegion:$ports);
let summary = "Combination of broadcast and packet-switch";
let description = [{
An abstraction of broadcast and packet-switched flow. During place and
route, it will be replaced by packet-switched flow and further replaced
by MasterSets and PacketRules inside switchboxes.
Example:
```
%70 = AIE.tile(7, 0)
%73 = AIE.tile(7, 3)
%74 = AIE.tile(7, 4)
%63 = AIE.tile(6, 3)
%64 = AIE.tile(6, 4)
AIE.broadcast_packet(%70, "DMA" : 0){
AIE.bp_id(0x0){
AIE.bp_dest<%73, "DMA" : 0>
AIE.bp_dest<%63, "DMA" : 0>
}
AIE.bp_id(0x1){
AIE.bp_dest<%74, "DMA" : 0>
AIE.bp_dest<%64, "DMA" : 0>
}
}
```
}];
let assemblyFormat = [{ `(` $tile `,` $bundle `:` $channel `)` regions attr-dict }];
let hasVerifier = 1;
let extraClassDeclaration = [{
int channelIndex() { return getChannel(); }
AIE::Port port() { return {getBundle(), channelIndex()}; }
}];
}
def AIE_BPIDOp: AIEX_Op<"bp_id", [SingleBlockImplicitTerminator<"AIE::EndOp">]> {
let arguments = (ins AIEI8Attr:$ID);
let regions = (region AnyRegion:$ports);
let summary = "A set of packets that share the same ID";
let description = [{
A set of destination packets that share the same source and ID. This must exist
within an [AIE.broadcast_packet] operation.
See [AIE.broadcast_packet]for an example.
}];
let assemblyFormat = [{ `(` $ID `)` regions attr-dict }];
let extraClassDeclaration = [{
int IDInt() { return getID(); }
}];
}
def AIE_BPDestOp: AIEX_Op<"bp_dest", [HasParent<"BPIDOp">]> {
let arguments = (
ins Index:$tile,
WireBundle:$bundle,
AIEI32Attr:$channel
);
let summary = "A destination port";
let description = [{
An object representing the destination of a Broad Packet. This must exist
within an [AIE.bp_id] operation.
See [AIE.broadcast_packet] for an example.
}];
let assemblyFormat = [{
`<` $tile `,` $bundle `:` $channel `>` attr-dict
}];
let extraClassDeclaration = [{
int channelIndex() { return getChannel(); }
AIE::Port port() { return {getBundle(), channelIndex()}; }
}];
}
def AIE_TokenOp: AIEX_Op<"token", [Symbol]> {
let summary = "Declare a token (a logical lock)";
let description = [{
This operation creates a logical lock. We use Symbol so that it can be referenced globally.
Unlike phsical locks, logical locks are unlimited, and we can specify any integer value
associated with a lock. The logical lock is used to manually specify the dependence of tasks, or
core executions.
The operation can also be generated automatically if the Dependence Analysis can be leveraged.
Example:
AIE.token(0) {sym_name = "token0"} // Declare token0 with initial value of 0
...
AIE.useToken @token0("Acquire", 0) // acquire token0 if its value is 0
...
AIE.useToken @token0("Release", 5) // release token0 and set its value to 5
}];
let arguments = (ins AIEI32Attr:$value);
let assemblyFormat = [{ `(` $value `)` attr-dict }];
let extraClassDeclaration = [{
int getTokenValue() { return getValue(); }
}];
}
def AIE_UseTokenOp: AIEX_Op<"useToken", []> {
let summary = "acquire/release a logical lock";
let description = [{
This operation uses token (logical lock). A logical lock can be acquired or released with a value.
Similar to UseLockOp, this operation can be understood as "blocking" op.
}];
let arguments = (
ins FlatSymbolRefAttr:$tokenName,
AIEI32Attr:$value,
LockAction:$action
);
let assemblyFormat = [{ $tokenName `(` $action `,` $value `)` attr-dict }];
let hasVerifier = 1;
let extraClassDeclaration = [{
bool acquire() { return (getAction() == AIE::LockAction::Acquire); }
bool release() { return (getAction() == AIE::LockAction::Release); }
int getTokenValue() { return getValue(); }
}];
}
def AIE_MemcpyOp: AIEX_Op<"memcpy", []> {
let summary = "A memcpy op";
let description = [{
This operation defines a logical data transfer of a buffer from a source tile to another buffer
from a destination tile.
This operation should be lowered to Mem ops with DMA setup and Flow ops for routing data from
the source tile to the dest. tile.
}];
let arguments = (
ins FlatSymbolRefAttr:$tokenName,
AIEI32Attr:$acqValue,
AIEI32Attr:$relValue,
Index:$srcTile,
AnyMemRef:$srcBuf,
AIEI32Attr:$srcOffset,
AIEI32Attr:$srcLen,
Index:$dstTile,
AnyMemRef:$dstBuf,
AIEI32Attr:$dstOffset,
AIEI32Attr:$dstLen
);
let assemblyFormat = [{
$tokenName `(` $acqValue `,` $relValue `)` `(`
$srcTile `:` `<` $srcBuf `,` $srcOffset `,` $srcLen `>` `,`
$dstTile `:` `<` $dstBuf `,` $dstOffset `,` $dstLen `>` `)`
attr-dict `:` `(` type($srcBuf) `,` type($dstBuf) `)`
}];
let extraClassDeclaration = [{
int getAcquireTokenValue() { return getAcqValue(); }
int getReleaseTokenValue() { return getRelValue(); }
int getSrcOffsetValue() { return getSrcOffset(); }
int getDstOffsetValue() { return getDstOffset(); }
int getSrcLenValue() { return getSrcLen(); }
int getDstLenValue() { return getDstLen(); }
}];
}
/// Experimental Herd operations
def AIE_HerdOp: AIEX_Op<"herd", []>, Results<(outs Index)> {
let summary = "Declare a herd which is a bundle of core organized in a rectangular shape";
let description = [{
This operation creates a group of AIE tiles in 2D shape.
Example:
%herd0 = AIE.herd[1][1] // a single AIE tile. location unknown
%herd1 = AIE.herd[4][1] // a row of four-AIE tile
The operation can be used in replacement of a TileOp -- in case we want to select a group of
hardware entities (cores, mems, switchboxes) instead of individual entity, and we don't want to
specify their locations just yet. This can be useful if we want to generate parameterizable
code (the column and row values are parameterized).
Example:
%herd = AIE.herd[2][2] // a herd of 2x2 AIE tiles
AIE.core(%herd) {
// all the cores belong to this herd runs the same code
}
}];
let arguments = (
ins AIEI32Attr:$width,
AIEI32Attr:$height
);
let extraClassDeclaration = [{
int getHerdWidth() { return getWidth(); }
int getHerdHeight() { return getHeight(); }
int getNumAIETiles() { return getHerdWidth() * getHerdHeight(); }
mlir::StringAttr name() {
if (auto attr = getOperation()->getAttrOfType<mlir::StringAttr>(
mlir::SymbolTable::getSymbolAttrName()))
return attr;
emitOpError("does not have '")
<< mlir::SymbolTable::getSymbolAttrName() << "' attribute specified";
llvm::report_fatal_error("couldn't get name");
}
}];
let assemblyFormat = [{ `[` $width `]` `[` $height `]` attr-dict }];
let builders = [
OpBuilder<(ins "int":$width, "int":$height),
[{
build($_builder, $_state, $_builder.getIndexType(),
$_builder.getI32IntegerAttr(width),
$_builder.getI32IntegerAttr(height));
}]>
];
}
def AIE_PlaceOp: AIEX_Op<"place", []> {
let summary = "A place operation that specifies the relative placement (XY) of one herd to another";
let description = [{
A place operation that specifies the relative placement (XY) of one herd to another.
}];
let arguments = (
ins Index:$sourceHerd,
Index:$destHerd,
AIEI32Attr:$distX,
AIEI32Attr:$distY
);
let assemblyFormat = [{ `(` $sourceHerd `,` $destHerd `,` $distX `,` $distY `)` attr-dict }];
let extraClassDeclaration = [{
int getDistXValue() { return getDistX(); }
int getDistYValue() { return getDistY(); }
}];
}
def AIE_RouteOp: AIEX_Op<"route", []> {
let summary = "A route operation that routes one herd to another";
let description = [{
A route operation that routes one herd to another.
}];
let arguments = (
ins Index:$sourceHerds,
WireBundle:$sourceBundle,
AIEI32Attr:$sourceChannel,
Index:$destHerds,
WireBundle:$destBundle,
AIEI32Attr:$destChannel
);
let assemblyFormat = [{
`(` `<` $sourceHerds `,` $sourceBundle `:` $sourceChannel `>` `,`
`<` $destHerds `,` $destBundle `:` $destChannel `>` `)` attr-dict
}];
let extraClassDeclaration = [{
int getSourceChannelValue() { return getSourceChannel(); }
int getDestChannelValue() { return getDestChannel(); }
}];
}
def AIE_IterOp: AIEX_Op<"iter", []>, Results<(outs Index)> {
let summary = "An iter operation";
let description = [{
This operation generates index values that can be used with the SelectOp to select a group of tiles
from a herd.
Example:
%iter0 = AIE.iter(0, 15, 1) // 0, 1, 2, ... , 15
%iter1 = AIE.iter(2, 8, 2) // 2, 4, 6
}];
let arguments = (
ins AIEI32Attr:$start,
AIEI32Attr:$end,
AIEI32Attr:$stride
);
let assemblyFormat = [{ `(` $start `,` $end `,` $stride `)` attr-dict }];
let extraClassDeclaration = [{
int getStartValue() { return getStart(); }
int getEndValue() { return getEnd(); }
int getStrideValue() { return getStride(); }
}];
let builders = [
OpBuilder<(ins "int":$start, "int":$end, "int":$stride), [{
build($_builder, $_state, $_builder.getIndexType(),
$_builder.getI32IntegerAttr(start),
$_builder.getI32IntegerAttr(end),
$_builder.getI32IntegerAttr(stride));
}]>
];
}
def AIE_SelectOp: AIEX_Op<"select", []>, Results<(outs Index)> {
let summary = "A select operation";
let description = [{
This operation selects a group of tiles based on the selected indices.
Example:
%herd = AIE.herd[4][4] // a herd of 4x4 tiles
%ix = AIE.iter(0, 4, 1) // 0, 1, 2, 3
%iy = AIE.iter(0, 1, 1) // 0
%sub_herd = AIE.select(%herd, %ix, %iy)
The SelectOp in the above example will select the tiles %herd[0][0], %herd[1][0],
%herd[2][0], %herd[3][0] (the first column of the herd).
}];
let arguments = (
ins Index:$startHerd,
Index:$iterX,
Index:$iterY
);
let assemblyFormat = [{ `(` $startHerd `,` $iterX `,` $iterY `)` attr-dict }];
let builders = [
OpBuilder<(ins "mlir::Value":$startHerd, "mlir::Value":$iterX, "mlir::Value":$iterY), [{
build($_builder, $_state, $_builder.getIndexType(),
startHerd, iterX, iterY);
}]>
];
}
// NOTE: runtime_sequence operation has been moved to the AIE dialect (AIEOps.td)
// Use aie.runtime_sequence instead of aiex.runtime_sequence
def AIE_ConfigureOp: AIEX_Op<"configure", [
HasParent<"AIE::RuntimeSequenceOp">,
NoTerminator
]>
{
let summary = "Set up a configuration (program memories, stream switches, etc.) on the NPU device.";
let arguments = (
ins FlatSymbolRefAttr:$symbol
);
let assemblyFormat = [{
$symbol regions attr-dict
}];
let extraClassDeclaration = [{
AIE::DeviceOp getReferencedDeviceOp();
}];
let regions = (region
AnyRegion:$body
);
let hasVerifier = 1;
}
def AIE_RunOp: AIEX_Op<"run", [HasParent<"ConfigureOp">]> {
let arguments = (
ins FlatSymbolRefAttr:$runtime_sequence_symbol,
Variadic<AnyType>:$args
);
let assemblyFormat = [{
$runtime_sequence_symbol `(` $args `)` `:` `(` type($args) `)` attr-dict
}];
let extraClassDeclaration = [{
AIE::DeviceOp getCalleeDeviceOp();
AIE::RuntimeSequenceOp getCalleeRuntimeSequenceOp();
}];
let summary = "Execute a runtime sequence";
let description = [{
Executes an `aiex.runtime_sequence` with the given name and arguments by inlining its instructions at the call site.
}];
}
def AIE_NpuDmaMemcpyNdOp: AIEX_Op<"npu.dma_memcpy_nd", [
AttrSizedOperandSegments,
MyOffsetSizeAndStrideOpInterface
]> {
let summary = "half DMA operator";
let description = [{
An n-dimensional half DMA operator.
Programs a DMA to access a memory `memref` with an access pattern specified by `offsets`,
`sizes` and `strides` or `static_offsets`, `static_sizes` and `static_strides`. The operator
references the target DMA coordinates (`x`, `y`) and channel through the `metadata`
symbol and specifies a descriptor `id` to be used, which will become the `bd_id` to be used
when lowered further. The `issue_token` attribute specifies whether the execution of this
operation should issue a token which can be received and read for synchronization purposes.
This `issue_token` attribute is set to `false` by default for `MM2S` for backward compatibility
and **is always set to true for** `S2MM` channels.
The burst length attribute specifies the burst length in bytes for the DMA operation. A value
of 0 indicates that the burst length is not specified and the maximal burst length is used.
#### `metadata` -- Specifying Tile, Channel, Direction and Linking a `dma_memcpy_nd` to its Other Half
The `metadata` attribute must point to a symbol referencing a
[`aie.shim_dma_allocation` operation](AIEDialect.html#aiedma_bd-xilinxaiedmabdop).
The tile coordinates of the DMA to configure, the channel number and the direction (`MM2S` or `S2MM`) are taken from this operation.
To connect the DMA to its other half (i.e. a `MM2S` DMA to its receiving end and a `S2MM` to the sending end),
the user must configure a flow (`aie.flow`) between the tile and channel referenced in the `aie.shim_dma_allocation` and the corresponding other end.
When using ObjectFIFOs, the `aie.shim_dma_allocation` operations and the `aie.flows` are generated automatically.
The symbol of the `aie.objectfifo` (create) operation can be used directly in `metadata` in this case.
#### Notes on Synchronization and Reusing Buffer Descriptor IDs
When the `dma_memcpy_nd` operation executes, it immediately reprograms the buffer descriptor with ID `bd_id` on tile (`x`, `y`), even if that buffer descriptor is currently executing.
Without proper synchronization, this inevitably leads to nondeterministic results.
Programming a buffer descriptor that is not currently executing is harmless.
Thus, the first `dma_memcpy_nd` call for each `bd_id` requires no synchronization.
However, if you wish to later re-use a `bd_id` on the same tile, you must wait for the previous buffer descriptor to complete.
The `sync` or `dma_wait` operations can be used for this.
`sync` blocks until it receives a _task completion token_ (TCT).
To properly synchronize, you must thus configure your BD to issue a TCT using the `issue_token` attribute, then wait on that token before reusing the BD.
`dma_wait` is a convenience operation that lowers to the corresponding `sync` operation for the refrenced symbol.
Note that if you have multiple concurrently running BDs and you can reason one BD will always complete after all others, it is not strictly necessary to issue and wait on the TC token for every BD.
For example, if you have input and output BDs on the shim, and you know the cores will only push output onto the output BD after the input BDs have completed, it may be sufficient to synchronize only on the output BD before reusing input BDs.
#### Data Layout Transformations
The `sizes` and `strides` attributes describe a data layout transformation to be performed by the DMA.
These transformations are described in more depth in the documentation for the
[`aie.dma_bd` operation](AIEDialect.html#aiedma_bd-xilinxaiedmabdop).
Note that the syntax here differs from that of the `dma_bd` operation:
offsets and strides are given as separate arrays instead of tuples.
The `offsets` array is used to calculate a static offset into the memref.
Each offset in the array is understood in relation to the shape of the memref;
the lowest-dimension `offset` is a direct offset in units of memref element type, and the higher dimensions are multiplied by the size of the memref in those dimensions.
Note that this is for convenience of the user only.
The hardware only supports a single static offset, and this offset is calculated at compile time.
Thus, all offsets can be equivalently expressed with the lowest dimension only.
#### Automatic Linearization of Contiguous Accesses
A canonicalization pattern automatically folds a contiguous row-major access pattern into
the canonical linear form `[s3, 1, 1, N][st3, 0, 0, 1]`, where N is the product of the
inner three sizes. An access is contiguous when `strides[0] == 1` and each outer stride
equals the product of the inner sizes (i.e. a standard row-major scan).
This means users can express naturally multidimensional accesses such as a 2D image
`[1, 1, height, width][0, 0, width, 1]` or a 3D activation tensor
`[1, H, W, C][0, W*C, C, 1]` without worrying about hardware dimension size limits.
The compiler will fold them to the linear form, which uses a wider hardware register
and avoids the 10-bit d0 wrap-size constraint that applies to ND transfers.
#### Packet Header Attribute
The optional `packet` attribute defines the packet header and packet type that gets issued per DMA BD.
If the attribute is set, then every time the DMA BD gets issued, a packet header is generated prior to the transmission of data.
The packet header is used to guide arbitration throughout a packet-routed data flow, where each switch box arbitrates the data packet to stream to a successor based on the packet header.
}];
let arguments = (
ins AnyRankedOrUnrankedMemRef:$memref,
// NOTE: these are in reverse order: offset3, offset2, ...
Variadic<I64>:$offsets,
Variadic<I64>:$sizes,
Variadic<I64>:$strides,
ConfinedAttr<DenseI64ArrayAttr, [DenseArrayCount<4>]>:$static_offsets,
ConfinedAttr<DenseI64ArrayAttr, [DenseArrayCount<4>]>:$static_sizes,
ConfinedAttr<DenseI64ArrayAttr, [DenseArrayCount<4>]>:$static_strides,
OptionalAttr<PacketInfoAttr>:$packet,
SymbolRefAttr:$metadata,
I64Attr:$id,
DefaultValuedOptionalAttr<BoolAttr, "false">:$issue_token,
DefaultValuedOptionalAttr<I64Attr, "0">:$d0_zero_before,
DefaultValuedOptionalAttr<I64Attr, "0">:$d1_zero_before,
DefaultValuedOptionalAttr<I64Attr, "0">:$d2_zero_before,
DefaultValuedOptionalAttr<I64Attr, "0">:$d0_zero_after,
DefaultValuedOptionalAttr<I64Attr, "0">:$d1_zero_after,
DefaultValuedOptionalAttr<I64Attr, "0">:$d2_zero_after,
DefaultValuedOptionalAttr<I64Attr, "0">:$burst_length,
// if set, the aiex.parameter that will override the BD's address
OptionalAttr<FlatSymbolRefAttr>:$offset_parameter
);
let assemblyFormat = [{
`(` $memref ``
custom<DynamicIndexList>($offsets, $static_offsets) ``
custom<DynamicIndexList>($sizes, $static_sizes) ``
custom<DynamicIndexList>($strides, $static_strides) ``
(`,` `packet` `=` $packet^)? `)`
attr-dict `:` type($memref)
}];
let extraClassDeclaration = [{
static unsigned getOffsetSizeAndStrideStartOperandIndex();
static std::array<unsigned, 3> getArrayAttrMaxRanks();
/* Returns the data transfer offset in bytes, i.e. the first N bytes of the
target buffer will be skipped. In the IR, offsets are expressed in units
of memref element data type size. */
int64_t getOffsetInBytes();
bool isLinearTransferWithoutTransformation();
/* Returns the bitwidth of the type inside the memref through a
call to DataLayout */
uint64_t getElementTypeBitwidth();
}];
let extraClassDefinition = [{
unsigned $cppClass::getOffsetSizeAndStrideStartOperandIndex() { return 1; }
std::array<unsigned, 3> $cppClass::getArrayAttrMaxRanks() { return {4, 4, 4}; }
uint64_t $cppClass::getElementTypeBitwidth() {
DataLayout dataLayout = DataLayout::closest(*this);
return dataLayout.getTypeSizeInBits(getMemref().getType().getElementType());
}
}];
let hasVerifier = 1;
let hasCanonicalizer = 1;
}
def AIE_NpuDmaWaitOp: AIEX_Op<"npu.dma_wait", []> {
let summary = "Blocking operation to wait for a DMA to complete execution.";
let description = [{
The NpuDmaWaitOp blocks until the DMA referenced through `symbol` completes execution
and issues a task-complete-token (TCT).
`symbol` is a reference to a `aie.shim_dma_allocation`, which contains information about the column, channel and channel direction on which to wait for a TCT.
The `aie.shim_dma_allocation` may be generated from an ObjectFIFO, in which case you can directly pass the ObjectFIFO symbol refrence.
`npu.dma_wait` will be lowered to the corresponding `npu.sync` operation using the information from `symbol`.
Example:
```mlir
...
aie.objectfifo @out0(%tile_0_1, {% raw %}{%tile_0_0}{% endraw %}, 4 : i32) : !aie.objectfifo<memref<32x32xi32>>
...
aiex.npu.dma_memcpy_nd(0, 0, %arg2[1, 1, 0, 0][1, 1, 32, 32][1, 1, 64, 1]) {id = 0 : i64, issue_token = true, metadata = @out0} : memref<32x64xi32>
...
aiex.npu.dma_wait { symbol = @out0 }
```
Here, we have an objectfifo with symbol name `out0`, which is then referenced in the
`npu.dma_memcpy_nd` operation as the target for the respective DMA operation. Afterwards,
an `npu.dma_wait` operation references the same symbol to block until the respective DMA
has executed all of its tasks.
}];
let arguments = (
ins FlatSymbolRefAttr:$symbol
);
let assemblyFormat = [{
attr-dict
}];
let hasVerifier = 1;
}
// Write RTP
def AIE_NpuWriteRTPOp: AIEX_Op<"npu.rtp_write", []> {
let summary = "rtp write operator";
let arguments = (
ins FlatSymbolRefAttr:$buffer,
UI32Attr:$index,
I32Attr:$value
);
let results = (outs );
let assemblyFormat = [{ `(` $buffer `,` $index `,` $value `)` attr-dict
}];
let description = [{
rtp write operator
}];
}
// Push BD to Queue
def AIE_NpuPushQueueOp: AIEX_Op<"npu.push_queue", []> {
let summary = "bd queue push operator";
let arguments = (
ins I32Attr:$column,
I32Attr:$row,
DMAChannelDir:$direction,
I32Attr:$channel,
BoolAttr:$issue_token,
I32Attr:$repeat_count,
I32Attr:$bd_id
);
let results = (outs );
let assemblyFormat = [{
`(` $column `,` $row `,` $direction `:` $channel `)` attr-dict
}];
let hasVerifier = 1;
let description = [{
bd queue push operator
}];
}
// WRITE32
def AIE_NpuWrite32Op: AIEX_Op<"npu.write32", []> {
let summary = "write32 operator";
let arguments = (
ins UI32Attr:$address,
UI32Attr:$value,
OptionalAttr<FlatSymbolRefAttr>:$buffer,
OptionalAttr<I32Attr>:$column,
OptionalAttr<I32Attr>:$row
);
let results = (outs );
let assemblyFormat = [{
attr-dict
}];
let description = [{
NPU write32 operator writes a 32bit value to the AIE array.
If 'buffer' is present then 'address' is interpreted as an offset into the
aie.buffer with symbol name 'buffer'.
If 'column' and 'row' are present then 'address' is interpreted as an offset
into the memory space of aie.tile(column, row).
If 'buffer' is not present and 'column' and 'row' are not present then
'address' is interpreted as a full 32-bit address in the AIE array.
}];
let extraClassDeclaration = [{
std::optional<uint32_t> getAbsoluteAddress();
}];
}
// MASKWRITE
def AIE_NpuMaskWrite32Op: AIEX_Op<"npu.maskwrite32", []> {
let summary = "Write a masked 32-bit value to the AIE array";
let arguments = (
ins UI32Attr:$address,
UI32Attr:$value,
UI32Attr:$mask,
OptionalAttr<FlatSymbolRefAttr>:$buffer,
OptionalAttr<I32Attr>:$column,
OptionalAttr<I32Attr>:$row
);
let results = (outs );
let assemblyFormat = [{
attr-dict
}];
let description = [{
NPU mask write32 operator writes a masked 32bit value to the AIE array.
If 'buffer' is present then 'address' is interpreted as an offset into the
aie.buffer with symbol name 'buffer'.
If 'column' and 'row' are present then 'address' is interpreted as an offset
into the memory space of aie.tile(column, row).
If 'buffer' is not present and 'column' and 'row' are not present then
'address' is interpreted as a full 32-bit address in the AIE array.
}];
let extraClassDeclaration = [{
std::optional<uint32_t> getAbsoluteAddress();
}];
}
// BLOCKWRITE
def AIE_NpuBlockWriteOp: AIEX_Op<"npu.blockwrite", []> {
let summary = "blockwrite operator";
let arguments = (
ins UI32Attr:$address,
AnyMemRef:$data,
OptionalAttr<FlatSymbolRefAttr>:$buffer,
OptionalAttr<I32Attr>:$column,
OptionalAttr<I32Attr>:$row
);
let results = (outs );
let assemblyFormat = [{
`(` $data `)` attr-dict `:` type($data)
}];
let description = [{
blockwrite operator writes the data from the memref 'data' to the AIE array.
If 'buffer' is present then 'address' is interpreted as an offset into the
aie.buffer with symbol name 'buffer'.
If 'column' and 'row' are present then 'address' is interpreted as an offset
into the memory space of aie.tile(column, row).
If 'buffer' is not present and 'column' and 'row' are not present then
'address' is interpreted as a full 32-bit address in the AIE array.
}];
let extraClassDeclaration = [{
std::optional<uint32_t> getAbsoluteAddress();
mlir::DenseIntElementsAttr getDataWords();
}];
}
// OP_SYNC
def AIE_NpuSyncOp: AIEX_Op<"npu.sync", []> {
let summary = "sync operator";
let arguments = (
ins I32Attr:$column,
I32Attr:$row,
I32Attr:$direction,
I32Attr:$channel,
I32Attr:$column_num,
I32Attr:$row_num
);
let results = (outs );
let assemblyFormat = [{
attr-dict
}];
let description = [{
The sync operation blocks execution of the instruction stream until a task-complete token (TCT) is received on `column`, `row`, channel `channel`, direction `direction` (where `0` is `S2MM` and `1` is `MM2S`).
#### Troubleshooting
If this operation appears to deadlock, ensure that at least one buffer descriptor is configured to issue a TCT on the channel you expect.
By default, `dma_memcpy_nd` operations only issue tokens for `S2MM` channels, and `issue_token` must be set to `true` to issue tokens for `MM2S` channels.
}];
}
// XAIE_IO_CUSTOM_OP_BEGIN + 1 (address patch)
def AIE_NpuAddressPatchOp: AIEX_Op<"npu.address_patch", []> {
let summary = "address patch operator";
let arguments = (
ins UI32Attr:$addr,
I32Attr:$arg_idx,
I32Attr:$arg_plus
);
let results = (outs );
let assemblyFormat = [{
attr-dict
}];
let description = [{
address patch operator
}];
}
def AIE_NpuPreemptOp: AIEX_Op<"npu.preempt", []> {
let summary = "Preempt transaction operation";
let arguments = (
ins UI8Attr:$level
);
let results = (outs );
let assemblyFormat = [{
attr-dict
}];
let description = [{
Yield to higher priority task(s). Indicates to the transaction processor that the instruction stream can be interrupted at this point.
Levels:
0: Noop.
1: Mem tile.
2: AIE tile.
3: AIE registers.
}];
}
// XAIE_IO_CREATE_SCRATCHPAD (opcode 10)
def AIE_NpuCreateScratchpadOp: AIEX_Op<"npu.create_scratchpad", []> {
let summary = "Create a control code scratchpad memory region";
let arguments = (
ins UI32Attr:$size,
DefaultValuedOptionalAttr<UI8Attr, "0">:$usage_type
);
let results = (outs );
let assemblyFormat = [{
attr-dict
}];
let description = [{
Create a scratchpad memory for data exchange between the host's main memory and the NPU command processor and copy its contents to the command processor's memory.
When the runtime (XRT) observes that this instruction is present in the runtime sequence, it will allocate a scratchpad memory of the specified size on the host.
When the command processor firmware executes this instruction, it copies the data in the runtime-allocated scratchpad region from the host's main memory to the NPU command processor's memory.
From there, you can write values from the copy of the scratchpad memory in the command processor to arbitrary locations in the NPU (with restrictions) via the `npu.update_from_scratchpad` op.
The flow of data therefore looks like this:
```
create_scratchpad update_from_scratchpad
[Host memory] -------------------> [Command processor memory] ------------------------> [NPU partition memory/registers]
^ |
| |
-----------------
XRT sees inst.
and allocates
```
To get a handle on the allocated scratchpad memory from XRT, use the `run.get_ctrl_scratchpad_bo()` method on the `xrt::run` object in your host application (`test.cpp`).
An example can be found in `test/npu-xrt/scratchpad_regwrite`.
The `usage_type` attribute specifies the scratchpad layout; currently only a value of `0` is supported.
The `size` attribute specifies the size of the scratchpad in bytes.
The host's main memory address is patched into the instruction at runtime by XRT based on the `.ctrl.scratchpad` section in the ELF. The assembler (aiebu) generates patching information for this address when it encounters this opcode.
The scratchpad memory contains a `StateTable`, indexed by 32-bit words by the `npu.update_from_scratchpad` op. `StateTable` constraints: max 32 entries, max total scratchpad size 128 bytes.
}];
let hasVerifier = 1;
}
// XAIE_IO_UPDATE_REG (opcode 12)
def AIE_NpuUpdateFromScratchpadOp: AIEX_Op<"npu.update_from_scratchpad", []> {
let summary = "Add a computed value to an 8-byte section of NPU memory from scratchpad state";
let arguments = (