OAK-12101 - Skip indexing of very long similarity tags#2747
OAK-12101 - Skip indexing of very long similarity tags#2747antonhosgood wants to merge 10 commits intoapache:trunkfrom
Conversation
| private boolean isTagWithinLengthLimit(String value, String pname) { | ||
| int maxLength = definition.getSimilarityTagMaxLength(); | ||
| if (maxLength < 0 || value.length() <= maxLength) { | ||
| return true; | ||
| } | ||
| if (!LOG_SILENCER.silence(LOG_KEY_SIMILARITY_TAG_SKIPPED)) { | ||
| log.warn("[{}] Skipping similarity tag for property {}. Value length {} exceeds maximum allowed length {}", | ||
| getIndexName(), pname, value.length(), maxLength); |
There was a problem hiding this comment.
(Just my view) I think it's less surprising if the logging itself is done outside of this method, that is in the caller. If it is needed in multiple places, that would mean code duplication or another method, which is also bad. An alternative would be to change the method name to eg. "isTagWithinLengthLimitOrLogWarning". Yes it's a long name... but it's less surprising.
|
|
||
| private final Logger log = LoggerFactory.getLogger(getClass()); | ||
|
|
||
| private static final LogSilencer LOG_SILENCER = new LogSilencer(Duration.ofMinutes(1).toMillis(), 4); |
There was a problem hiding this comment.
That looks good... alternatively, we could use the default silencer settings (15 min, 64 entries cache)
| if (maxLength < 0 || value.length() <= maxLength) { | ||
| return true; | ||
| } | ||
| if (!LOG_SILENCER.silence(LOG_KEY_SIMILARITY_TAG_SKIPPED)) { |
There was a problem hiding this comment.
This means only one key. I think it would be better to use the property name as the key:
| if (!LOG_SILENCER.silence(LOG_KEY_SIMILARITY_TAG_SKIPPED)) { | |
| if (!LOG_SILENCER.silence(pname)) { |
If there are multiple properties indexed like that, each property name will have a separate silence period. I'm not sure if this happens in reality thought.
reschke
left a comment
There was a problem hiding this comment.
Please do not merge until the Sonar checks have been successful.
See https://jackrabbit.apache.org/oak/docs/participating.html#pull-requests-prs
|
Could you open a new PR against the (new) OAK-12101 branch please, so that we can run Sonar tests? |
Very long similarity tag names inflate index sizes and probably defeat the purpose of tags. This PR introduces an optional configurable value per index to configure how similarity tags should be filtered in
indexSimilarityTagandindexDynamicBoost.