Skip to content

Conversation

brontolosone
Copy link
Contributor

@brontolosone brontolosone commented Sep 9, 2025

Following up on this Slack discussion.

What has been done to verify that this works as intended?

Tests, a few of which needed fixing, as they were supplying non-UUIDs as if they were UUIDs.

Why is this the best possible solution? Were any other approaches considered?

No. Well, it'd be good to do some light validation of inputs before they reach the DB. You'll get a 500 when supplying a non-UUID.

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

  • Users who get a sense of accomplishment out of disk space usage may be sad to see their DB disk space usage shrink and overall performance improved.
  • Users who somehow have succeeded in using non-UUID entity IDs (as the DB used to permit so), will be faced with an unmigratable DB. But on Slack I've been assured that this is not a likely scenario and that something must have gone very wrong to end up in that state.

Does this change require updates to the API documentation? If so, please update docs/api.yaml as part of this PR.

Not really.

Before submitting this PR, please make sure you have:

  • run make test and confirmed all checks still pass OR confirm CircleCI build passes
  • verified that any code from external sources are properly credited in comments or that everything is internally sourced

@brontolosone brontolosone requested review from alxndrsn and removed request for alxndrsn September 9, 2025 10:41
@github-project-automation github-project-automation bot moved this to 🕒 backlog in ODK Central Sep 10, 2025
@brontolosone brontolosone moved this from 🕒 backlog to ✏️ in progress in ODK Central Sep 10, 2025
@brontolosone brontolosone marked this pull request as ready for review September 10, 2025 12:18
@brontolosone brontolosone requested review from ktuite and sadiqkhoja and removed request for alxndrsn September 15, 2025 08:37
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These migrations are small enough that I don't think we should use this pattern of the sidecar sql files. Especially because the mechanism for loading the extra files is longer than the migrations themselves. If we want to use this pattern more in the future, maybe we put getSqlFiles in a utility file or something instead of copying it from migration to migration.

one: `<data xmlns:jr="http://openrosa.org/javarosa" xmlns:entities="http://www.opendatakit.org/xforms" id="simpleEntity" version="1.0">
<meta>
<instanceID>one</instanceID>
<entities:entity dataset="people" id="uuid:12345678-1234-4123-8234-123456789abc" create="1">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These prefixes should stay in these tests. I can't find where it's mentioned explicitly in the spec but this was an explicit decision to allow these prefixes in submissions and then strip them out before storing them because there's a similar functionality for submission instance IDs.

When test/data/xml.js is reverted, most of the other test files (the ones that remove the uuid) also need to be reverted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allow these prefixes in submissions and then strip them out before storing them because there's a similar functionality for submission instance IDs.

IIUC submission instance IDs are stored as-is though. They're not mangled (that is, their uuid: (if they have one at all - instance IDs are freeform strings...) prefix is not stripped).


it('should call entities purge if entities uuid is specified', testTask(() =>
purgeTask({ entityUuid: 'abc', projectId: 1, datasetName: 'people' })
purgeTask({ entityUuid: '00000000-0000-0000-8000-000000000000', projectId: 1, datasetName: 'people' })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type of change is reasonable to me.

But if you change abc to a valid uuid structure here, you should probably do that for the other tests in this file, even though many of them never get to the uuid validity checking stage.

create: '1',
dataset: 'people',
id: 'uuid:12345678-1234-4123-8234-123456789abc'
id: '12345678-1234-4123-8234-123456789abc'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this to bring uuid: back here and elsewhere in this file

.then((result) => {
should(result.system).eql({
create: '1',
id: 'uuid:12345678-1234-4123-8234-123456789abc',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this to bring uuid: back here and elsewhere in this file

.send(testData.instances.simpleEntity.one
.replace('create="1"', 'update="1"')
.replace('<instanceID>one', '<deprecatedID>one</deprecatedID><instanceID>one2')
.replace('id="uuid:12345678-1234-4123-8234-123456789abc"', 'id="uuid:12345678-1234-4123-8234-123456789aaa"'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this to bring uuid: back here and in similar situations in this file

await asAlice.post('/v1/projects/1/datasets/people/entities/bulk-delete')
.send({
ids: ['12345678-1234-4123-8234-nonexistent']
ids: ['12345678-1234-4123-8234-0123456789ab']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of change to make it into a valid UUID is good.

const asAlice = await service.login('alice');

await asAlice.post('/v1/projects/1/datasets/people/entities/nonexistant/restore')
await asAlice.post('/v1/projects/1/datasets/people/entities/00000000-0000-0000-0000-000000000000/restore')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is fine but maybe there should be another test of the other way this endpoint can fail

    it('should reject if the entity uuid is not valid', testEntities(async (service) => {
      const asAlice = await service.login('alice');

      await asAlice.post('/v1/projects/1/datasets/people/entities/abc/restore')
        .expect(400)
        .then(({ body }) => {
          body.code.should.equal(400.11);
          body.message.should.equal('Invalid input data type: expected "abc" to be (type uuid)');
        });
    }));

I'm not thinking of a test like this for every endpoint using entity UUIDs, but at least one somewhere to capture this new validation.

it('should return notfound if the dataset does not exist', testEntities(async (service) => {
const asAlice = await service.login('alice');

await asAlice.get('/v1/projects/1/datasets/nonexistent/entities/123')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are fine

await asAlice.post('/v1/projects/1/forms/multiPropertyEntity2/submissions')
.send(testData.instances.multiPropertyEntity.one
.replace('multiPropertyEntity', 'multiPropertyEntity2')
.replace('uuid:12345678-1234-4123-8234-123456789aaa', 'uuid:12345678-1234-4123-8234-123456789ccc')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this to bring uuid: back here and elsewhere in this file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ✏️ in progress

Development

Successfully merging this pull request may close these issues.

2 participants