feat: introduce `ListType<T>` type and attribute #121

dshaaban01 · 2025-04-04T14:20:58Z

This PR introduces the container type ListType<T> and ListAttr<T> which represent values of type T, where T can be an atomic or parameterized type.

ingomueller-net

Nice progress! A few comments, most of them minor and some optional.

ingomueller-net · 2025-04-07T14:09:51Z

include/substrait-mlir/Dialect/Substrait/IR/SubstraitAttrs.td

+  let description = [{
+    This attribute represents a list of atomic attributes, parameterized with a ListType.
+  }];
+  let parameters = (ins "mlir::ArrayAttr":$value, "ListType":$type);


Don't we typically use Substrait_ListType here?

Oh no wait: I'm pretty sure that ListType is the syntax I should use in this file, not Substrait_ListType. I did the same thing for my other attributes that referenced types, i.e. VarCharAttr has a parameter called VarCharType, NOT Substrait_VarCharType.

And the other syntax doesn't work? Maybe we've been doing it wrong the whole time? I don't have the details in my head and haven't checked, so maybe there is a reason why we are doing it this way...

https://mlir.llvm.org/docs/DefiningDialects/AttributesAndTypes/#parameters

Substrait_ListType is not AttrParameter or TypeParameter, but rather a TypeDef. Therefore, based on my understanding of what the above link says, we have to use the "raw c++Type string instead", which in this case would be ListType.

(Other syntax does not work btw, I tried)

OK, cool, thanks for confirming.

ingomueller-net · 2025-04-07T14:11:09Z

include/substrait-mlir/Dialect/Substrait/IR/SubstraitAttrs.td

+def Substrait_ListAttr : Substrait_Attr<"List", "list", [TypedAttrInterface]> {
+  let summary = "Substrait list attribute";
+  let description = [{
+    This attribute represents a list of atomic attributes, parameterized with a ListType.


Is the restriction to atomic element types intentional? If so, we should probably add a TODO (here and to the type if it has the same restriction). Note that we can have "lists of lists" and "structs of lists of structs of lists" etc. in Substrait.

Will add TODO. I think better to do in a separate PR.

ingomueller-net · 2025-04-07T14:12:51Z

include/substrait-mlir/Dialect/Substrait/IR/SubstraitTypes.td

+  let description = [{
+    This type represents a list type.
+  }];
+  let parameters = (ins Substrait_AtomicType:$type);


Same here: if the restriction to atomic types is intentional, then add a TODO. Adding support for arbitrary nesting of lists and structs is probably not trivial and it may be a good strategy to do it in two steps.

ingomueller-net · 2025-04-07T14:14:38Z

lib/Dialect/Substrait/IR/Substrait.cpp

+    }
+
+    if (typedAttr.getType() != expectedType) {
+      return emitError() << "Mismatched element type in ListAttr: expected "


Nits: error messages start with lower case and single-quoted attribute name.

Suggested change

return emitError() << "Mismatched element type in ListAttr: expected "

return emitError() << "mismatched element type in 'ListAttr': expected "

ingomueller-net · 2025-04-07T14:14:57Z

lib/Dialect/Substrait/IR/Substrait.cpp

+    auto typedAttr = mlir::dyn_cast<mlir::TypedAttr>(attr);
+    if (!typedAttr) {
+      return emitError()
+             << "ListAttr values must be typed attributes, but got: " << attr;


Nit: single-quoted class name.

Suggested change

<< "ListAttr values must be typed attributes, but got: " << attr;

<< "'ListAttr' values must be typed attributes, but got: " << attr;

ingomueller-net · 2025-04-07T14:24:02Z

include/substrait-mlir/Dialect/Substrait/IR/SubstraitTypes.td

+  }];
+  let parameters = (ins Substrait_AtomicType:$type);
+  let assemblyFormat = [{ `<` $type `>` }];
+  let genVerifyDecl = 1;


I think it is possible and most certainly preferrable to verify this using tablegen (more). Take a look at TupleOf in CommonTypeConstraints.td upstream. (NestedTupleOf will then be helpful as inspiration for nested Substrait structs, lists, and maps...). One problem here is that we don't see in this file which types can be used, plus we now have to maintain two lists in two different files, which risk to run out of sync at some point.

ingomueller-net · 2025-04-07T14:28:02Z

lib/Target/SubstraitPB/Export.cpp

  // `IntegerType`s.
  if (auto intType = dyn_cast<IntegerType>(literalType)) {
    if (!intType.isSigned())
-      op->emitOpError("has integer value with unsupported signedness");


Why this change (here and two times below)?

Maybe the intention was to only emit an error message below in exportOperation(LiteralOp). However, we have more information here and I think it's better to emit the more precise message here (and none below).

So I created a function called exportAttribute(Attribute value) that contains all of the logic that was previously in exportOperation(LiteralOp).

Then I altered exportOperation(LiteralOp op) to look like this.

FailureOr<std::unique_ptr<Expression>> SubstraitExporter::exportOperation(LiteralOp op) { // Build `Literal` message depending on type. Attribute value = op.getValue(); auto literal = exportAttribute(value); if (failed(literal)) return op->emitOpError("has unsupported value"); // Build `Expression` message. auto expression = std::make_unique<Expression>(); expression->set_allocated_literal(literal->release()); return expression; }

Since exportAttribute(Attribute value) does not have access to the op, I can't emit an error message via the op. And not sure how to pass the error strings into failure() failures. Wanted to talk to you about this.

Ah, I see. Three patterns come to mind: (1) Use InFlightDiagnostics but maybe you have the same problem of not having an op; (2) provide a Location loc argument and emit the error at that location; or (3) provide an emitError function argument and use that to emit the error.

Okay will do. And do you agree with the restructuring? Should I similarly implement exportAttribute(Attribute value) ?

OK, cool. Yeah, I think it's a good idea to cut this mega function into pieces :) Thanks!

ingomueller-net · 2025-04-07T14:28:56Z

lib/Target/SubstraitPB/Import.cpp

+    llvm::SmallVector<Attribute> listElements;
+    listElements.reserve(listType.values_size());
+    for (const Expression_Literal &element : listType.values()) {
+      // TODO: Create importAttribute function to avoid creating redundant


Is this TODO still up-to-date?

This TODO is to implement the same logic in the import.cpp, as what I did in export.cpp (i.e. with the creation of exportAttribute(Attribute value) and augmentation of exportOperation(LiteralOp).)

Did not want to implement it yet until we spoke about you're thoughts on my export.cpp implementation

ingomueller-net · 2025-04-07T14:30:21Z

test/Dialect/Substrait/literal.mlir

+    %0 = named_table @t1 as ["a"] : tuple<si1>
+    %1 = project %0 : tuple<si1> -> tuple<si1, !substrait.list<!substrait.fixed_binary<4>>> {
+    ^bb0(%arg : tuple<si1>):
+      %bytes = literal #substrait.list<[


Nit: variable name.

Suggested change

%bytes = literal #substrait.list<[

%list = literal #substrait.list<[

ingomueller-net · 2025-04-07T14:36:41Z

test/Dialect/Substrait/literal.mlir

+      %bytes = literal #substrait.list<[
+                        #substrait.fixed_binary<"8181">,
+                        #substrait.fixed_binary<"8181">,
+                        #substrait.fixed_binary<"8181">], !substrait.list<!substrait.fixed_binary<4>>>


There is a lot of redundancy in the assembly format but I think it's hard to reduce. Unfortunately, there is no built-in way to remove the #substrait. and !substrait. prefixes, for example. In the LLVM dialect, there is a work-around that parses type names nested in other LLVM types manually, so the !llvm. prefix can be omitted there. We could do something like that. What would be even more concise is if we could do something like #substrait.list<["8181", "8181", "8181"], list<fixed_binary<4>>> and use the type for the parsing of the values (potentially swapping the order of the values and the type). Both are probably quite a lift, and that "only" for the assembly format, so this is a clear candidate for skipping now (and potentially forever).

Yea it's super redundant, also wasn't quite sure how to handle. :( Should I maybe add a TODO to implement a printer/parser?

Yeah, TODO is definitely fine. As I said, all alternatives that come to my mind are significantly more complex to implement, which is a good reason not to do it, at least immediately.

dshaaban01 · 2025-05-02T09:46:35Z

Note to self: Final TODO for this PR, implement ListOf (similar to TupleOf).

…om parsing/printing

dshaaban01 requested a review from ingomueller-net April 4, 2025 14:21

dshaaban01 changed the title ~~introduce ListType<T> type and attribute~~ [feat] introduce ListType<T> type and attribute Apr 4, 2025

dshaaban01 changed the title ~~[feat] introduce ListType<T> type and attribute~~ feat: introduce ListType<T> type and attribute Apr 4, 2025

ingomueller-net reviewed Apr 7, 2025

View reviewed changes

dshaaban01 added 12 commits May 2, 2025 11:59

introduce varchar type

86e0c17

OPTION: type inference + literal #substrait.var_char<"hello", 6> cust…

fcfa671

…om parsing/printing

introduce fixed_binary type and attribute

b56fe6b

introduce fixed_binary type

6a5844a

fix rebase

3293b40

ingo comments

0e4120e

introduce list type and attribute

5ca4ffc

ingo comments

f991a6c

add todo

75b3f8a

inflight diagnostics fix

82447b4

clang

ac24aac

refactor importLiteral into importAttribute

3eaa9cf

dshaaban01 force-pushed the dalia/compound-types/list branch from bf5d254 to 3eaa9cf Compare May 2, 2025 09:59

ingomueller-net mentioned this pull request May 13, 2025

feat: implement real relation type #78

Merged

	return emitError() << "Mismatched element type in ListAttr: expected "
	return emitError() << "mismatched element type in 'ListAttr': expected "

	<< "ListAttr values must be typed attributes, but got: " << attr;
	<< "'ListAttr' values must be typed attributes, but got: " << attr;

	%bytes = literal #substrait.list<[
	%list = literal #substrait.list<[

feat: introduce ListType<T> type and attribute #121

Are you sure you want to change the base?

feat: introduce ListType<T> type and attribute #121

Uh oh!

Conversation

dshaaban01 commented Apr 4, 2025

Uh oh!

ingomueller-net left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dshaaban01 Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dshaaban01 commented May 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: introduce `ListType<T>` type and attribute #121

feat: introduce `ListType<T>` type and attribute #121

dshaaban01 Apr 9, 2025 •

edited

Loading