Skip to content

Conversation

@dshaaban01
Copy link
Collaborator

This PR introduces the container type ListType<T> and ListAttr<T> which represent values of type T, where T can be an atomic or parameterized type.

@dshaaban01 dshaaban01 changed the title introduce ListType<T> type and attribute [feat] introduce ListType<T> type and attribute Apr 4, 2025
@dshaaban01 dshaaban01 changed the title [feat] introduce ListType<T> type and attribute feat: introduce ListType<T> type and attribute Apr 4, 2025
Copy link
Collaborator

@ingomueller-net ingomueller-net left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice progress! A few comments, most of them minor and some optional.

let description = [{
This attribute represents a list of atomic attributes, parameterized with a ListType.
}];
let parameters = (ins "mlir::ArrayAttr":$value, "ListType":$type);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we typically use Substrait_ListType here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no wait: I'm pretty sure that ListType is the syntax I should use in this file, not Substrait_ListType. I did the same thing for my other attributes that referenced types, i.e. VarCharAttr has a parameter called VarCharType, NOT Substrait_VarCharType.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the other syntax doesn't work? Maybe we've been doing it wrong the whole time? I don't have the details in my head and haven't checked, so maybe there is a reason why we are doing it this way...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://mlir.llvm.org/docs/DefiningDialects/AttributesAndTypes/#parameters

Substrait_ListType is not AttrParameter or TypeParameter, but rather a TypeDef. Therefore, based on my understanding of what the above link says, we have to use the "raw c++Type string instead", which in this case would be ListType.

(Other syntax does not work btw, I tried)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, cool, thanks for confirming.

def Substrait_ListAttr : Substrait_Attr<"List", "list", [TypedAttrInterface]> {
let summary = "Substrait list attribute";
let description = [{
This attribute represents a list of atomic attributes, parameterized with a ListType.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the restriction to atomic element types intentional? If so, we should probably add a TODO (here and to the type if it has the same restriction). Note that we can have "lists of lists" and "structs of lists of structs of lists" etc. in Substrait.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add TODO. I think better to do in a separate PR.

let description = [{
This type represents a list type.
}];
let parameters = (ins Substrait_AtomicType:$type);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: if the restriction to atomic types is intentional, then add a TODO. Adding support for arbitrary nesting of lists and structs is probably not trivial and it may be a good strategy to do it in two steps.

}

if (typedAttr.getType() != expectedType) {
return emitError() << "Mismatched element type in ListAttr: expected "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits: error messages start with lower case and single-quoted attribute name.

Suggested change
return emitError() << "Mismatched element type in ListAttr: expected "
return emitError() << "mismatched element type in 'ListAttr': expected "

auto typedAttr = mlir::dyn_cast<mlir::TypedAttr>(attr);
if (!typedAttr) {
return emitError()
<< "ListAttr values must be typed attributes, but got: " << attr;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: single-quoted class name.

Suggested change
<< "ListAttr values must be typed attributes, but got: " << attr;
<< "'ListAttr' values must be typed attributes, but got: " << attr;

}];
let parameters = (ins Substrait_AtomicType:$type);
let assemblyFormat = [{ `<` $type `>` }];
let genVerifyDecl = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is possible and most certainly preferrable to verify this using tablegen (more). Take a look at TupleOf in CommonTypeConstraints.td upstream. (NestedTupleOf will then be helpful as inspiration for nested Substrait structs, lists, and maps...). One problem here is that we don't see in this file which types can be used, plus we now have to maintain two lists in two different files, which risk to run out of sync at some point.

// `IntegerType`s.
if (auto intType = dyn_cast<IntegerType>(literalType)) {
if (!intType.isSigned())
op->emitOpError("has integer value with unsupported signedness");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change (here and two times below)?

Maybe the intention was to only emit an error message below in exportOperation(LiteralOp). However, we have more information here and I think it's better to emit the more precise message here (and none below).

Copy link
Collaborator Author

@dshaaban01 dshaaban01 Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I created a function called exportAttribute(Attribute value) that contains all of the logic that was previously in exportOperation(LiteralOp).

Then I altered exportOperation(LiteralOp op) to look like this.

FailureOr<std::unique_ptr<Expression>>
SubstraitExporter::exportOperation(LiteralOp op) {
  // Build `Literal` message depending on type.
  Attribute value = op.getValue();
  auto literal = exportAttribute(value);

  if (failed(literal))
    return op->emitOpError("has unsupported value");

  // Build `Expression` message.
  auto expression = std::make_unique<Expression>();
  expression->set_allocated_literal(literal->release());

  return expression;
}

Since exportAttribute(Attribute value) does not have access to the op, I can't emit an error message via the op. And not sure how to pass the error strings into failure() failures. Wanted to talk to you about this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Three patterns come to mind: (1) Use InFlightDiagnostics but maybe you have the same problem of not having an op; (2) provide a Location loc argument and emit the error at that location; or (3) provide an emitError function argument and use that to emit the error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay will do. And do you agree with the restructuring? Should I similarly implement exportAttribute(Attribute value) ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, cool. Yeah, I think it's a good idea to cut this mega function into pieces :) Thanks!

llvm::SmallVector<Attribute> listElements;
listElements.reserve(listType.values_size());
for (const Expression_Literal &element : listType.values()) {
// TODO: Create importAttribute function to avoid creating redundant
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this TODO still up-to-date?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO is to implement the same logic in the import.cpp, as what I did in export.cpp (i.e. with the creation of exportAttribute(Attribute value) and augmentation of exportOperation(LiteralOp).)

Did not want to implement it yet until we spoke about you're thoughts on my export.cpp implementation

%0 = named_table @t1 as ["a"] : tuple<si1>
%1 = project %0 : tuple<si1> -> tuple<si1, !substrait.list<!substrait.fixed_binary<4>>> {
^bb0(%arg : tuple<si1>):
%bytes = literal #substrait.list<[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: variable name.

Suggested change
%bytes = literal #substrait.list<[
%list = literal #substrait.list<[

%bytes = literal #substrait.list<[
#substrait.fixed_binary<"8181">,
#substrait.fixed_binary<"8181">,
#substrait.fixed_binary<"8181">], !substrait.list<!substrait.fixed_binary<4>>>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of redundancy in the assembly format but I think it's hard to reduce. Unfortunately, there is no built-in way to remove the #substrait. and !substrait. prefixes, for example. In the LLVM dialect, there is a work-around that parses type names nested in other LLVM types manually, so the !llvm. prefix can be omitted there. We could do something like that. What would be even more concise is if we could do something like #substrait.list<["8181", "8181", "8181"], list<fixed_binary<4>>> and use the type for the parsing of the values (potentially swapping the order of the values and the type). Both are probably quite a lift, and that "only" for the assembly format, so this is a clear candidate for skipping now (and potentially forever).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea it's super redundant, also wasn't quite sure how to handle. :( Should I maybe add a TODO to implement a printer/parser?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, TODO is definitely fine. As I said, all alternatives that come to my mind are significantly more complex to implement, which is a good reason not to do it, at least immediately.

@dshaaban01
Copy link
Collaborator Author

Note to self: Final TODO for this PR, implement ListOf (similar to TupleOf).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants