Description
Describe the bug
If a document ID contains a utf-8 character or (most) reserved characters it is percent encoded (looks like on the server side) for the document id. However, it is not on a get_document call, which means clients need to carefully encode for get_doc to work as expected.
To Reproduce
class FakeDoc(ParentDocument):
name: str
_key = LexicalKey(["name"])
aliastype: Optional[str]
client = OpenDB()
awfulstring = "2H₂ + O₂ ⇌ 2H₂O, R = 4.7 kΩ, ⌀ 200 mm(ish)!"
f = FakeDoc(name=awfulstring, aliatype=awfulstring)
client.update_document(f)
# Fails with document not found error
try:
client.get_document("FakeDoc/%s" % awfulstring)
except:
print("Nope")
#Succeeds:
t = client.get_document("FakeDoc/%s" % urllib.parse.quote(awfulstring, safe=",=()'!"))
print(t)
Expected behavior
It would be kind if the client provided encoding, so that set/get on the same docid worked. Unclear exactly what encoding is going on, I merely emulated it by trial and error. IE, "()" are not encoded but "[]" are, so minimally a helper to emulate the encoding propertly would be useful.
Error logs
Here is the encoding of the awful string above:
{'@id': 'FakeDoc/2H%E2%82%82%20%2B%20O%E2%82%82%20%E2%87%8C%202H%E2%82%82O,%20R%20=%204.7%20k%CE%A9,%20%E2%8C%80%20200%20mm(ish)!', '@type': 'FakeDoc', 'name': '2H₂ + O₂ ⇌ 2H₂O, R = 4.7 kΩ, ⌀ 200 mm(ish)!'}
System information (please complete the following information):
- OS: RHEL8 Server Ubuntu 22 client
- terminus-client-python version = 10.2.3
Additional context
Not sure if this is actually a bug or a feature request but certainly unexpected behavior.