Prefer UTF-8 for DID Parameters #217

ottomorac · 2025-10-19T03:45:38Z

This PR addresses #200 and #201 :

Includes normative statement about UTF-8 for DID parameters
Removes ASCII only normative statement from the individual DID parameters (except versionTime which should probably remain as ASCII only in my view)

jandrieu · 2025-12-04T16:26:07Z

I think it is important to not remove the % encoding language, and possible the UTF-8 language is also misleading.

Given that we are still following RFC3986, which says when a character that is not part of the unreserved ASCII set is to be included in a URI, its UTF-8 byte sequence is first determined. Then, each byte in that sequence is percent-encoded.

So, technically, a UTF-8 string would break current parsing.

It may be that this wholesale change is the wrong layer for whatever we're aiming for here.

w3cbot · 2025-12-04T17:12:03Z

This was discussed during the #did meeting on 04 December 2025.

View the transcript

w3c/did-resolution#217

<ottomorac> Prefer UTF-8 for DID Parameters #217

ottomorac: related to an issue raised by Addison Philips as part of the i18n review

<ottomorac> Also appreciate reviews on 219, 248, 253

w3cbot · 2025-12-04T17:12:08Z

This was discussed during the #did meeting on 04 December 2025.

View the transcript

w3c/did-resolution#217

JoeAndrieu: I just added a comment. The language that removed the percent-encoding bit that caught my eye.
… I don't think that we can remove it, because if we use UTF-8 in a URL, it will get percent-encoded.

pchampin: is percent-encoding our responsibility?

JoeAndrieu: as we are defining URL parameter, they are defined as being part of a URL, so they need to be percent-encoded

manu: there are 3 different things here
… I agree with JoeAndrieu that things will need to be percent-encoded; we should not re-define it, but refer to the RFC

<ottomorac1> Addison talked about RFC3987: w3c/did-resolution#200 (comment)

manu: another point is: are we talking about the the value space, which need to support UTF-8, or the lexical space, where percent-encoding is required

<Zakim> TallTed, you wanted to check whether we're staying with RFC 3986/7 or moving to WHAT WG's URL spec

TallTed: RCF3986 and RFC3987 travel together, they don't really make sense without each other
… another question is: are we sticking to these RFCs, or moving to the WHATWG's definition of URL

<ottomorac1> w3c/did-resolution#200 (comment)

TallTed: (mostly the same, but slightly different)

<Zakim> JoeAndrieu, you wanted to say I see the fix

JoeAndrieu: my concerns have actually been addressed
… there is a global statement stating that everything must be serialized
… I got distracted by the title of the issue; with TallTed's clarification, the text seems fine
… I think we should talk about the issue raised by TallTed; the VC WG has moved to the WHATWG's definition of URL

manu: now reading Addison's commentary, it is aligned with what I said.
… He calls it "logical" vs "serialized", what I called "value space" vs "lexical space"
… He says we don't make the distinction and we should make it.
… He says "don't constraint systems to be ASCII only", talking about the value space
… There are two encoding algorithms, we need to be carefuly which one we refer to.
… We need to be sure this change does not impact also DID core, looking at it right now.

ottomorac: Will pointed out that DID parameters were dropped from DID core 1.1 .

manu: if that right?

JoeAndrieu: if that's right, it feels weird to me.

manu: DID URLs define parameters, it talks mostly about the lexical space
… we should be good.
… I agree with JoeAndrieu that it would have been wrong to completely remove them.
… This now raises the question of DID Resolution re-stating something that DID core states more cleanly.

JoeAndrieu: looking at the differences between RFC3986 and RFC3987. They are not many, but they may be significant.
… RFC3987 aims to support international characters that are not supported by RFC3986.
… We could switch to RFC3987. There are questions about similarly-looking characters, but I don't think that's our issue.

manu: going back to the PR. I think it is fine as is.
… I don't think we need to change anything.
… JoeAndrieu would you agree?

JoeAndrieu: it addresses my concern with percent-encoding.
… But there is a shift about international characters.
… The shift to RFC3987 would allow characters that were not allowed before.
… We need to think whether it would break implementations?
… Allowing international characters is a good thing as long as we don't introduce security or privacy problems.

manu: trying to get closure on this. §3.1 in RFC3987 describes how to map an IRI to a URI.

JoeAndrieu: I was looking to the ABNF, §2.2. 'ucschar' is a class of new characters

manu: ucschar is not UTF-8, but can be converted to UTF-8.
… I think we are fine with the PR as is.

JoeAndrieu: +1

wip-abramson · 2025-12-11T13:44:36Z

Reviewing the minutes, this looks good to merge.

swcurran · 2025-12-11T14:37:21Z

While not part of this PR (which I'm fine to merge), Line 324 -- immediately above the first change -- is incorrect, as the relativeRef query parameter is not percent-encoded as required. Probably best to update that outside this PR, but it should be fixed.

Prefer UTF-8 for DID Parameters

349e2b6

ottomorac requested review from ChristopherA, danpape, dmitrizagidulin, mccown and peacekeeper as code owners October 19, 2025 03:45

ottomorac requested a review from msporny October 19, 2025 03:47

This was referenced Nov 6, 2025

relativeRef should prefer UTF-8 for percent encoding #201

Open

DID parameters require ASCII-only #200

Open

msporny approved these changes Dec 4, 2025

View reviewed changes

w3cbot mentioned this pull request Dec 4, 2025

Conflicting normative statements about didDocument representation #234

Open

wip-abramson added the discuss Needs further discussion before a pull request can be created label Dec 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prefer UTF-8 for DID Parameters #217

Prefer UTF-8 for DID Parameters #217

Uh oh!

ottomorac commented Oct 19, 2025 •

edited by pr-preview bot

Loading

Uh oh!

jandrieu commented Dec 4, 2025

Uh oh!

w3cbot commented Dec 4, 2025

w3c/did-resolution#217

Uh oh!

w3cbot commented Dec 4, 2025

w3c/did-resolution#217

Uh oh!

wip-abramson commented Dec 11, 2025

Uh oh!

swcurran commented Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Prefer UTF-8 for DID Parameters #217

Are you sure you want to change the base?

Prefer UTF-8 for DID Parameters #217

Uh oh!

Conversation

ottomorac commented Oct 19, 2025 • edited by pr-preview bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jandrieu commented Dec 4, 2025

Uh oh!

w3cbot commented Dec 4, 2025

w3c/did-resolution#217

Uh oh!

w3cbot commented Dec 4, 2025

w3c/did-resolution#217

Uh oh!

wip-abramson commented Dec 11, 2025

Uh oh!

swcurran commented Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ottomorac commented Oct 19, 2025 •

edited by pr-preview bot

Loading