Skip to content

Option to disallow duplicate JSON properties #115856

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

PranavSenthilnathan
Copy link
Member

Adds option to disallow duplicate JSON properties. There are some open questions:

  • Are duplicate properties an error in the JSON payload itself, or only errors during deserialization? In other words, if AllowsDuplicateProperties=false, should we validate this in the Utf8JsonReader before giving it to converters or should the converters dedup themselves? This PR makes the change in the converters, but we might want to consider moving it down to the reader. Pros/cons for each:

    • Utf8Reader dedup - all converters get deduplication this for free. All JSON that has duplicate properties will be rejected.
    • Converter dedup - types like JsonObject will have dictionaries internally so they can do the dedup more efficiently. There are probably workarounds to improve perf though (e.g. opt out of the reader dedup behavior when TokenType.StartArray is seen)
  • For JsonDocument deserialization, do we want to deserialize everything first and then dedup, or dedup while parsing? The former makes the duplicate tracking a little easier since the number of properties for an object would be known. The latter allows failing fast.

  • The dictionary converter is not yet implemented. It will also need to decide when to dedup as mentioned in the point above (note the dictionary keys do not have to be string, so we need an additional data structure to do the dedup).

/cc @eiriktsarpalis @jozkee

Contributes to #108521

Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis
See info in area-owners.md if you want to be subscribed.

@eiriktsarpalis
Copy link
Member

should we validate this in the Utf8JsonReader before giving it to converters or should the converters dedup themselves?

Honestly, I don't believe this is possible with the current design of Utf8JsonReader. The reader would need to track arbitrarily many properties in all objects it is currently nested within. This would almost certainly necessitate allocating from the heap for sufficiently complex data. In turn though, this would break the checkpointing semantics that Utf8JsonReader currently enjoys and many converters depend on:

Utf8JsonReader checkpoint = reader;
var value = JsonSerializer.Deserialize<T>(ref reader);
reader = checkpoint; // resets the reader state

For this reason, I think we need to make it strictly a deserialization-level concern.

For JsonDocument deserialization, do we want to deserialize everything first and then dedup, or dedup while parsing?

I think we want to make this a fail-fast operation. Specifically for JsonDocument not doing so makes it easy to bypass validation altogether since you can always forward the underlying JSON using JsonElement.GetRawText().

It will also need to decide when to dedup as mentioned in the point above (note the dictionary keys do not have to be string, so we need an additional data structure to do the dedup).

I suspect you might need to implement this on case-by-case basis. For known mutable dictionary types you can rely on the result itself to track duplicates but immutable or frozen dictionaries (which IIRC have constructor methods with overwrite semantics) you probably need to enlist an auxiliary dictionary type to do so (it might help if you could make this a pooled object that can be used across serializations).

}
else
{
if (jObject.ContainsKey(propertyName))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible we could avoid duplicate lookups here using a TryAdd-like operation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still in PR: #111229

// False means that property is not set (not yet occurred in the payload).
// Length of the BitArray is equal to number of non-extension properties.
// Every JsonPropertyInfo has PropertyIndex property which maps to an index in this BitArray.
public BitArray? AssignedProperties;
Copy link
Member

@eiriktsarpalis eiriktsarpalis May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One optimization we left out when adding required property validation is avoiding allocations for a small number of properties. You could do that by implementing ValueBitArray struct that either uses an int to track objects with less than 32 properties and fall back to allocating a regular BitArray for anything bigger. Not required for this PR but nice to have nonetheless.

Copy link
Member

@krwq krwq May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps some simple solution like this could be good enough to significantly reduce allocations:

struct OptimizedBitArray
{
   private ulong _fastArray;
   private BitArray? _slowArray;

   public OptimizedBitArray(int numberOfProperties)
   {
     if (numberOfProperties > 64)
     {
         _slowArray = new BitArray();
     }
   }

   public void SetBit(int index)
   {
     if (_slowArray != null)
     // ...
   }

   // etc
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but honestly I'd check if this will actually make any difference in E2E scenario

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add the data structure optimizations in separate PRs. I also want to add a HasAllSet(BitArray mask) overload which is useful for the required properties bitmask (might be good for a new API proposal too).

The other optimization I want to do is for PropertyNameSet to use a HashSet. The elements are ReadOnlyMemory<byte> in that case so we need to implement a custom comparer and it needs to use randomized Marvin seed to prevent collision attacks.

[InlineData("""{ "1": 0 , "1": 1 }""")]
[InlineData("""{ "1": null, "1": null }""")]
[InlineData("""{ "1": "a" , "1": null }""")]
[InlineData("""{ "1": null, "1": "b" }""")]
Copy link
Member

@krwq krwq May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps I missed but I think there might be couple of test cases which would be nice to see:

  • recursive structure with duplicate properties in one of the inner, both positive and negative test cases
  • two C# properties with same name but different casing
  • are duplicates allowed inside of dictionary? regardless of the answer there should be a test case
  • arrays - both duplicating entry with array and also item in the array with duplicated entries (might be also worth to add testcase that we don't false trigger)
  • if there is per type setting on JsonTypeInfo I'd imagine this should be exercised (I think no but have only skimmed through PR)
  • for arrays another interesting case is to create pre-populated list and add properties and see if it appends when it encounters for the first time - is that expected with this setting on? (I'd imagine if someone doesn't like the dups they might want to erase the content first - it's a bit user shooting themselves in the foot so not sure about doing anything special but might be worth defining expectations)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arrays are not affected by this change, only objects. Added the remaining tests. Duplicates in dictionaries raise errors but equality is determined by TryAdd (using an ordinal comparer where applicable).

@PranavSenthilnathan PranavSenthilnathan marked this pull request as ready for review May 26, 2025 21:56
@PranavSenthilnathan
Copy link
Member Author

Current behavior when AllowsDuplicateProperties = false:

  • POCOs throw on duplicate properties always (honor case-insensitive, naming policy, etc.)
  • Dictionaries throw when TryAdd returns false. For Dictionary<string, TKey> we always use the default comparer (ordinal case-sensitive)
  • JsonObject - already throws after deserialization when duplicates are found during evaluation of the lazy ordered dictionary
  • JsonDocument - dedups case-sensitively
  • Extension properties - behavior depends on type but it always matches one of the bullet points above
  • deserialization - AllowDuplicateProperties does not apply at all

Note that JsonObject deserialization is still lazy. I believe there's value in keeping the lazy behavior when AllowDuplicateProperties is true because it avoids unnecessary allocations. This means that deserialization will not throw but accessing the property will. If we keep the lazy behavior for that case, I think it's best to be consistent and also keep the lazy behavior for AllowDuplicateProperties = false. There are likely cases where only a specific part of a JsonObject is evaluated and laziness improves perf for these cases.

Follow up items (in future PRs):

  • Add ValueBitArray
  • Use HashSet in PropertyNameSet

@eiriktsarpalis
Copy link
Member

Note that JsonObject deserialization is still lazy. I believe there's value in keeping the lazy behavior when AllowDuplicateProperties is true because it avoids unnecessary allocations. This means that deserialization will not throw but accessing the property will. If we keep the lazy behavior for that case, I think it's best to be consistent and also keep the lazy behavior for AllowDuplicateProperties = false. There are likely cases where only a specific part of a JsonObject is evaluated and laziness improves perf for these cases.

Keep in mind that there are certain operations that bypass lazy initialization altogether. Crucially, it happens when you try to re-serialize a freshly created JsonObject instance:

if (dictionary is null && jsonElement.HasValue)
{
// Write the element without converting to nodes.
jsonElement.Value.WriteTo(writer);
}

Meaning that an object with duplicate properties could end up being forwarded while avoiding detection. I think we need to remove lazy initialization, for the new mode in the very least.

@PranavSenthilnathan
Copy link
Member Author

I think we need to remove lazy initialization, for the new mode in the very least.

I changed it to do the validation on the JsonElement but keep the lazy evaluation which still allows us to avoid allocations. However, for case-insensitive I had to change it to eagerly evaluate all children nodes since we don't thread that option into JsonElement.Parse.

@@ -827,4 +827,13 @@
<data name="CannotMixEncodings" xml:space="preserve">
<value>Mixing UTF encodings in a single multi-segment JSON string is not supported. The previous segment's encoding was '{0}' and the current segment's encoding is '{1}'.</value>
</data>
<data name="DuplicatePropertiesNotAllowed_JsonPropertyInfo" xml:space="preserve">
<value>Duplicate property '{0}' encountered during deserialization of type '{1}'.</value>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we've settled on a design where we don't just throw on duplicate JSON properties in the strictly syntactic sense (case insensitivity, dictionary key conflicts) I'm starting to wonder if the term "duplicate" is fit for purpose. Is the term "conflicting property" more appropriate perhaps?

@@ -708,13 +735,16 @@ private static JsonDocument Parse(
stack.Dispose();
}

return new JsonDocument(utf8Json, database, extraRentedArrayPoolBytes, extraPooledByteBufferWriter);
JsonDocument document = new JsonDocument(utf8Json, database, extraRentedArrayPoolBytes, extraPooledByteBufferWriter);
ValidateDocument(document, allowDuplicateProperties);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if ValidateDocument throws? Shouldn't we be disposing the database or the JsonDocument?

// Positive heap cache count implies it is not null.
Debug.Assert(_heapCacheCount is 0 || _heapCache is not null);

for (int i = 0; i < _heapCacheCount; i++)
Copy link
Member

@eiriktsarpalis eiriktsarpalis May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I incorrect in thinking that validation is $\mathcal O(n^2)$ for objects of $n$ properties? That would make this code susceptible to potential DOS attacks. For object sizes exceeding the fixed threshold we should use a regular hashtable so that complexity remains at $\mathcal O(n)$.


if (dbRow.HasComplexChildren)
{
propertyName = JsonReaderHelper.GetUnescaped(propertyName.Span);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the buffer containing the unescaped property allocated?

{
// Data structure to track property names in an object while deserializing
// into a JsonDocument and validate that there are no duplicates. A small inline
// array is used for the first few property names and then a pooled array is used
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we know the number of properties of a given object ahead of time, it seems we could predetermine whether a heap allocated cache is necessary or not.


do
{
JsonElement curr = new JsonElement(document, traversalPath.Peek());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we just traverse the JsonElement with the usual publicly available APIs?

Comment on lines +411 to +415
case JsonValue:
break;
default:
Debug.Fail($"Unknown JsonNode type encountered during initialization: {this.GetType().FullName}");
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case JsonValue:
break;
default:
Debug.Fail($"Unknown JsonNode type encountered during initialization: {this.GetType().FullName}");
break;
default:
Debug.Fail(this is JsonValue);
break;

Comment on lines +25 to +37
if (options.AllowDuplicateProperties)
{
collection[key] = value;
}
else
{
if (collection.Contains(key))
{
ThrowHelper.ThrowJsonException_DuplicatePropertyNotAllowed(key);
}

collection.Add(key, value);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (options.AllowDuplicateProperties)
{
collection[key] = value;
}
else
{
if (collection.Contains(key))
{
ThrowHelper.ThrowJsonException_DuplicatePropertyNotAllowed(key);
}
collection.Add(key, value);
}
if (!options.AllowDuplicateProperties && collection.Contains(key))
{
ThrowHelper.ThrowJsonException_DuplicatePropertyNotAllowed(key);
}
collection[key] = value;

{
Debug.Fail("F# Map is a collection");

foreach (KeyValuePair<TKey, TValue> _ in map)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add logic if we're asserting that this case is impossible? The type does implement ICollection so we can just assert that.


[<Fact>]
let ``Map deserialization should reject duplicate keys``() =
Assert.Throws<JsonException>(fun () -> JsonSerializer.Deserialize<Map<string, int>>("""{ "1": "a", "1": "a" }""") |> ignore) |> ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised that this would throw in the default case. Shouldn't it only happen when AllowDuplicateProperties has been disabled?

{
JsonElement jElement = JsonElement.ParseValue(ref reader);
return new JsonArray(jElement, options);
if (options.PropertyNameCaseInsensitive)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to always validate for duplicates. Shouldn't it be predicated on JsonSerializerOptions.AllowDuplicateProperties?

@@ -613,7 +629,10 @@ protected static bool TryLookupConstructorParameter(
createExtensionProperty: false);

// Mark the property as read from the payload if required.
state.Current.MarkRequiredPropertyAsRead(jsonPropertyInfo);
if (!useExtensionProperty)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the method guarded by this boolean now? I would assume we validate required properties even if the type uses an extension property.

/// </summary>
internal int NumberOfRequiredProperties { get; private set; }
internal BitArray? NegatedRequiredPropertiesMask { get; private set; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using a simpler description

Suggested change
internal BitArray? NegatedRequiredPropertiesMask { get; private set; }
internal BitArray? OptionalPropertiesMask { get; private set; }


if (!RequiredPropertiesSet.HasAllSet())
// All properties must be either assigned or not required
BitArray assignedOrNotRequiredPropertiesSet = AssignedProperties.Or(typeInfo.NegatedRequiredPropertiesMask);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it we're ok with this operation mutating AssignedProperties?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you're clearing out the field, so probably yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants