Skip to content

Represent externref payloads in Literals#8391

Open
tlively wants to merge 9 commits intomainfrom
externref-literals
Open

Represent externref payloads in Literals#8391
tlively wants to merge 9 commits intomainfrom
externref-literals

Conversation

@tlively
Copy link
Member

@tlively tlively commented Feb 26, 2026

Literals could previously represent externalized internal references and
strings, but they could not represent externref values that actually
represent external references. Fix this by using the i32 field to hold
externref payloads in the case where they are not externalized
references or strings. Strings literals are already differentiated
because they have type (ref string), but that still leaves external
references and externalized internal references to be differentiated.
They were previously differentiated using the type field in GCData,
but that field was otherwise redundant wit the type field in the outer
Literal. Remove type from GCData and instead use the low bit of the
i32/shared_ptr union to mark external references. This depends
on shared_ptr<GCData> not using that low bit.

Literals could previously represent externalized internal references and
strings, but they could not represent externref values that actually
represent external references. Fix this by using the `i32` field to hold
externref payloads in the case where they are not externalized
references or strings. Strings literals are already differentiated
because they have type `(ref string)`, but that still leaves external
references and externalized internal references to be differentiated.
They were previously differentiated using the `type` field in GCData,
but that field was otherwise redundant wit the `type` field in the outer
Literal. Remove `type` from GCData and instead use the low bit of the
i32/shared_ptr<GCData> union to mark external references. This depends
on `shared_ptr<GCData>` not using that low bit.
src/literal.h Outdated
}
if (type.isMaybeShared(wasm::HeapType::any)) {
// This may be an extern string that was internalized to |any|. Undo
// This may be an extern string that was 9d to |any|. Undo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't even have a cat to blame it on 😿

src/literal.h Outdated
//
// Externref payloads are also stored here, with their low bit set to
// differentiate an externref with a payload from an externalized internal
// reference, which uses the gcData field instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "here" mean the i32 field?

Why can we assume externrefs can be stored as i32? (or really i31 as you reserve one bit?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the optimizer, externrefs are totally opaque and only their identities matter, so i32s (or i31s, or i63s if we use the i64 field) are sufficient. For the spec tests, externrefs are conveniently already represented as values like ref.extern 42, so this is exactly what we need. For the fuzzer, I plan to import JS objects that look like { payload: 42 } and have fuzzing imports that can log those payloads. The fuzzing interpreter will similarly log the interpreted externref payloads.

Copy link
Member

@kripken kripken Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, then (1) let's document those limitations, but also (2) if spec tests have ref.extern 42 then don't we need to represent any integer? (i.e. we can't give up a bit, and should it be 32 or 64 bits?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure it won't matter in practice. If we do encounter issues, we can just switch to using the i64 field of the union instead. (Or I could eagerly do that now, but I think using the i32 field is a more naturally first choice.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sgtm to use i32, but please document all this in this header.

@tlively tlively requested a review from kripken February 27, 2026 20:44
@tlively tlively enabled auto-merge (squash) February 27, 2026 20:58
int32_t getPayload() const {
assert(hasPayload());
return int32_t(uint32_t(i32) >> 1);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps hasExternrefPayload / getExternrefPayload etc., to clarify this isn't a generic payload?

}
case HeapType::any:
// Internalized external reference.
new (&gcData) std::shared_ptr<GCData>(other.gcData);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep the code below that clarifies this is a string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can expand the comment, but the gcData no longer has a type field for an assertion to look at. I could assert gcData.values[0].type.getHeapType().isMaybeShared(HeapType::ext), but I don't think it's worth it.

assert(data->values.size() == 1);
o << "internalized " << literal.getGCData()->values[0];
break;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this change from printing a string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyref literals are now either internalized strings or other internalized extern references. We don't always have a string to print.

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but not hitting approve as there is automerge and I would fuzz this one...

@tlively tlively disabled auto-merge February 28, 2026 01:03
@tlively
Copy link
Member Author

tlively commented Feb 28, 2026

Feel free to disable my auto-merges! There should be a button for it.

@tlively
Copy link
Member Author

tlively commented Feb 28, 2026

Fuzzer looks happy.

@tlively tlively enabled auto-merge (squash) February 28, 2026 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants