Skip to content

Conversation

@ottomorac
Copy link
Contributor

@ottomorac ottomorac commented Oct 19, 2025

This PR addresses #200 and #201 :

  • Includes normative statement about UTF-8 for DID parameters
  • Removes ASCII only normative statement from the individual DID parameters (except versionTime which should probably remain as ASCII only in my view)

Preview | Diff

@jandrieu
Copy link
Contributor

jandrieu commented Dec 4, 2025

I think it is important to not remove the % encoding language, and possible the UTF-8 language is also misleading.

Given that we are still following RFC3986, which says when a character that is not part of the unreserved ASCII set is to be included in a URI, its UTF-8 byte sequence is first determined. Then, each byte in that sequence is percent-encoded.

So, technically, a UTF-8 string would break current parsing.

It may be that this wholesale change is the wrong layer for whatever we're aiming for here.

@w3cbot
Copy link

w3cbot commented Dec 4, 2025

This was discussed during the #did meeting on 04 December 2025.

View the transcript

w3c/did-resolution#217

<ottomorac> Prefer UTF-8 for DID Parameters #217

ottomorac: related to an issue raised by Addison Philips as part of the i18n review

<ottomorac> Also appreciate reviews on 219, 248, 253


@w3cbot
Copy link

w3cbot commented Dec 4, 2025

This was discussed during the #did meeting on 04 December 2025.

View the transcript

w3c/did-resolution#217

JoeAndrieu: I just added a comment. The language that removed the percent-encoding bit that caught my eye.
… I don't think that we can remove it, because if we use UTF-8 in a URL, it will get percent-encoded.

pchampin: is percent-encoding our responsibility?

JoeAndrieu: as we are defining URL parameter, they are defined as being part of a URL, so they need to be percent-encoded

manu: there are 3 different things here
… I agree with JoeAndrieu that things will need to be percent-encoded; we should not re-define it, but refer to the RFC

<ottomorac1> Addison talked about RFC3987: w3c/did-resolution#200 (comment)

manu: another point is: are we talking about the the value space, which need to support UTF-8, or the lexical space, where percent-encoding is required

<Zakim> TallTed, you wanted to check whether we're staying with RFC 3986/7 or moving to WHAT WG's URL spec

TallTed: RCF3986 and RFC3987 travel together, they don't really make sense without each other
… another question is: are we sticking to these RFCs, or moving to the WHATWG's definition of URL

<ottomorac1> w3c/did-resolution#200 (comment)

TallTed: (mostly the same, but slightly different)

<Zakim> JoeAndrieu, you wanted to say I see the fix

JoeAndrieu: my concerns have actually been addressed
… there is a global statement stating that everything must be serialized
… I got distracted by the title of the issue; with TallTed's clarification, the text seems fine
… I think we should talk about the issue raised by TallTed; the VC WG has moved to the WHATWG's definition of URL

manu: now reading Addison's commentary, it is aligned with what I said.
… He calls it "logical" vs "serialized", what I called "value space" vs "lexical space"
… He says we don't make the distinction and we should make it.
… He says "don't constraint systems to be ASCII only", talking about the value space
… There are two encoding algorithms, we need to be carefuly which one we refer to.
… We need to be sure this change does not impact also DID core, looking at it right now.

ottomorac: Will pointed out that DID parameters were dropped from DID core 1.1 .

manu: if that right?

JoeAndrieu: if that's right, it feels weird to me.

manu: DID URLs define parameters, it talks mostly about the lexical space
… we should be good.
… I agree with JoeAndrieu that it would have been wrong to completely remove them.
… This now raises the question of DID Resolution re-stating something that DID core states more cleanly.

JoeAndrieu: looking at the differences between RFC3986 and RFC3987. They are not many, but they may be significant.
… RFC3987 aims to support international characters that are not supported by RFC3986.
… We could switch to RFC3987. There are questions about similarly-looking characters, but I don't think that's our issue.

manu: going back to the PR. I think it is fine as is.
… I don't think we need to change anything.
… JoeAndrieu would you agree?

JoeAndrieu: it addresses my concern with percent-encoding.
… But there is a shift about international characters.
… The shift to RFC3987 would allow characters that were not allowed before.
… We need to think whether it would break implementations?
… Allowing international characters is a good thing as long as we don't introduce security or privacy problems.

manu: trying to get closure on this. §3.1 in RFC3987 describes how to map an IRI to a URI.

JoeAndrieu: I was looking to the ABNF, §2.2. 'ucschar' is a class of new characters

manu: ucschar is not UTF-8, but can be converted to UTF-8.
… I think we are fine with the PR as is.

JoeAndrieu: +1


@wip-abramson
Copy link
Contributor

Reviewing the minutes, this looks good to merge.

@swcurran
Copy link
Contributor

While not part of this PR (which I'm fine to merge), Line 324 -- immediately above the first change -- is incorrect, as the relativeRef query parameter is not percent-encoded as required. Probably best to update that outside this PR, but it should be fixed.

@wip-abramson wip-abramson added the discuss Needs further discussion before a pull request can be created label Dec 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

discuss Needs further discussion before a pull request can be created

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants