Skip to content

Commit 289c6fd

Browse files
author
Manuel Erdoes
committed
allow strings and multiple fylr fields in datacite fields
1 parent a401fe8 commit 289c6fd

5 files changed

Lines changed: 224 additions & 20 deletions

File tree

CHANGELOG.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,15 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project uses [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [1.1.0] - 2026-05-11
9+
10+
### Added
11+
- Expression syntax for `fylr Field Path`: combine multiple fields and static strings using `+` (e.g. `"Label: " + objecttype.field1 + "\n" + objecttype.field2`)
12+
- String literals with escape sequences (`\n` newline, `\t` tab) in field path expressions
13+
- Conditional groups `(...)`: if any field inside a group is empty, the entire group — including its static labels — is omitted from the output
14+
- `|decimal2` format specifier for fixed-point integer fields stored as ×100 (e.g. `2030``20.30`)
15+
- Support for nested table fields in dot-path resolution: paths now traverse through fylr's `_nested:<objecttype>__<fieldname>` array keys automatically
16+
817
## [1.0.1] - 2026-05-06
918

1019
### Added

README.md

Lines changed: 39 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,8 +51,45 @@ Each row maps a DataCite metadata field to a field path in the fylr object. The
5151
|---|---|
5252
| Profile ID | Must match the `ID` of a profile in the Profiles table |
5353
| DataCite Field | Target field name. Supported: `title`, `creator`, `publisher`, `date`, `description`, `contributor`, `subjects`, `resourceTypeGeneral`, `publicationYear` |
54-
| fylr Field Path | Dot-separated path to the field in the fylr object, e.g. `haustieranatomie.titel` |
55-
| Default Value | Fallback value if the field path resolves to nothing |
54+
| fylr Field Path | A field path or expression (see below) |
55+
| Default Value | Fallback value if the expression resolves to an empty string |
56+
57+
#### fylr Field Path expressions
58+
59+
The **fylr Field Path** column accepts either a plain dot-path or a concatenation expression built from field references, string literals, and optional conditional groups.
60+
61+
**Plain path** (single field, existing behaviour):
62+
63+
```
64+
haustieranatomie.titel
65+
```
66+
67+
**Concatenation** — join multiple values with `+`:
68+
69+
```
70+
haustieranatomie.field1 + haustieranatomie.field2
71+
```
72+
73+
**String literals** — use `"..."` or `'...'` for static text. `\n` inserts a newline, `\t` a tab:
74+
75+
```
76+
haustieranatomie.titel + "\n" + haustieranatomie.beschreibung
77+
"Title: " + haustieranatomie.titel
78+
```
79+
80+
**Conditional groups** — wrap part of an expression in `(...)`. The group is omitted entirely if any field inside it is empty. Use this to avoid orphan labels when optional fields are missing:
81+
82+
```
83+
haustieranatomie.titel + ("\nHeight: " + haustieranatomie.hoehe) + ("\nWidth: " + haustieranatomie.breite)
84+
```
85+
86+
If `hoehe` is empty the whole `("\nHeight: " + haustieranatomie.hoehe)` group is dropped — no stray label appears.
87+
88+
**Decimal format specifier** — append `|decimal2` to a field reference to divide the stored integer by 100 and format it with two decimal places. Useful for fields stored as fixed-point integers (e.g. a value of `2030` is displayed as `20.30`):
89+
90+
```
91+
"Height: " + haustieranatomie.hoehe|decimal2 + " cm"
92+
```
5693

5794
## Webhook URL
5895

documentation.md

Lines changed: 46 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,9 @@ This document explains how the `fylr-plugin-datacite` plugin is implemented. It
2424
9. [The Main Script (`server/webhook/register-doi.js`)](#the-main-script-serverwebhookregister-doijs)
2525
- [Top-level structure](#top-level-structure)
2626
- [`main()`](#main)
27+
- [`resolveExpressionAsync()`](#resolveexpressionasync)
28+
- [`tokenizeExpression()`](#tokenizeexpression)
29+
- [`formatDecimal2()`](#formatdecimal2)
2730
- [`resolveFieldPathAsync()`](#resolvefieldpathasync)
2831
- [`getNestedValue()`](#getnestedvalue)
2932
- [`httpRequest()`](#httprequest)
@@ -290,14 +293,17 @@ If the admin forgets the query parameter, the script fails fast with `datacite.c
290293

291294
### Top-level structure
292295

293-
The file is ~480 lines with this shape:
296+
The file is ~560 lines with this shape:
294297

295298
- Lines 1-13: require `http`/`https`, parse `info.json` from `argv[2]`.
296299
- Lines 15-41: buffer stdin, invoke `main()` when stdin ends, catch unhandled errors.
297300
- Lines 43-334: `main()` — the whole orchestration.
298-
- Lines 341-419: `resolveFieldPathAsync()` — dot-path resolution with fylr-specific fallbacks.
299-
- Lines 424-437: `getNestedValue()` — simple dot-path getter.
300-
- Lines 443-476: `httpRequest()` — Node stdlib HTTP wrapper returning a Promise.
301+
- `resolveExpressionAsync()` — evaluates a field-path expression (single path or concatenation); calls `resolveFieldPathAsync()` per path token.
302+
- `tokenizeExpression()` — parses an expression string into typed tokens.
303+
- `formatDecimal2()` — divides an integer by 100 and formats to two decimal places.
304+
- `resolveFieldPathAsync()` — dot-path resolution with fylr-specific fallbacks (linked objects, nested tables).
305+
- `getNestedValue()` — simple dot-path getter for reading from `info.json`.
306+
- `httpRequest()` — Node stdlib HTTP wrapper returning a Promise.
301307

302308
There are commented-out `console.error` debug lines throughout. They are intentionally preserved for quick re-enablement during future debugging; do not delete them without a reason.
303309

@@ -318,7 +324,7 @@ Orchestrates everything in a single async function:
318324
9. **Compute the DataCite Basic Auth header** ([L128](server/webhook/register-doi.js#L128)).
319325
10. **Loop over objects** ([L143-323](server/webhook/register-doi.js#L143-L323)):
320326
- Read `_system_object_id` and `_objecttype`.
321-
- For each field mapping: strip an optional `<objecttype>.` prefix from the dot-path (the UI sometimes includes it, sometimes not), pick `_current[objecttype]` as the root if it exists (richer data, closer to what a full object fetch would return), and resolve the path.
327+
- For each field mapping: call `resolveExpressionAsync()` with the raw `fylr_field_path` value. This handles both plain dot-paths and concatenation expressions transparently.
322328
- Construct the DOI as `<doi_prefix><system_object_id>`.
323329
- Construct the landing URL by substituting `%system_object_id%` in the template.
324330
- Build the DataCite payload (see [JSON:API payload format](#jsonapi-payload-format)). Note the defensive `:unkn` fallbacks — DataCite requires certain fields and rejects empty ones.
@@ -328,15 +334,44 @@ Orchestrates everything in a single async function:
328334

329335
**Why `exit(0)` on config errors?** A non-zero exit causes fylr to treat the whole webhook invocation as failed and may hide the emitted JSON body. Emitting a structured error on stdout plus `exit(0)` gives cleaner admin-visible diagnostics.
330336

331-
### `resolveFieldPathAsync()`
337+
### `resolveExpressionAsync()`
338+
339+
The entry point for all field-path resolution. It accepts either a plain dot-path (backwards compatible) or a concatenation expression and returns a string.
340+
341+
The expression syntax:
342+
343+
| Token | Example | Behaviour |
344+
|---|---|---|
345+
| Plain path | `haustieranatomie.titel` | resolves via `resolveFieldPathAsync` |
346+
| Concatenation | `field1 + field2` | results joined without separator |
347+
| String literal | `"Label: "` or `"\n"` | used verbatim (`\n` → newline, `\t` → tab) |
348+
| Conditional group | `("Label: " + field)` | entire group omitted if any field inside is empty |
349+
| Format specifier | `field\|decimal2` | integer ÷ 100, two decimal places (e.g. `2030``20.30`) |
350+
351+
An optional `<objecttype>.` prefix on each path token is stripped before resolution, so paths with or without the prefix work identically.
352+
353+
### `tokenizeExpression()`
332354

333-
[register-doi.js:341-419](server/webhook/register-doi.js#L341-L419)
355+
Parses an expression string into an array of typed tokens used by `resolveExpressionAsync`. Token types:
356+
357+
- `{type: 'literal', value: string}` — a quoted string literal
358+
- `{type: 'path', value: string, format?: string}` — a dot-path reference, with an optional `format` field when `|decimal2` is appended
359+
- `{type: 'group', tokens: Token[]}` — a conditional group enclosed in `(...)`, recursively parsed
360+
361+
The parser is recursive-descent, handling nested groups naturally. `+` is the only operator; whitespace around it is ignored.
362+
363+
### `formatDecimal2()`
364+
365+
Divides the raw string value by 100 and formats the result with `Number.toFixed(2)`. Used for field values stored as fixed-point integers (×100). Returns the original value unchanged if it is not a valid number.
366+
367+
### `resolveFieldPathAsync()`
334368

335-
Resolves a dot-separated path (e.g. `haustieranatomie.titel` or `hersteller.hersteller.name`) into a value. The logic has three subtleties that would not be obvious from the signature:
369+
Resolves a dot-separated path (e.g. `haustieranatomie.titel` or `hersteller.hersteller.name`) into a value. The logic has four subtleties that would not be obvious from the signature:
336370

337-
1. **fylr date fields are wrapped objects** (`{value: "2025-04-05"}`). At the end of the walk, if the result is an object with a `value` key, it is unwrapped ([L417](server/webhook/register-doi.js#L417)).
371+
1. **fylr date fields are wrapped objects** (`{value: "2025-04-05"}`). At the end of the walk, if the result is an object with a `value` key, it is unwrapped.
338372
2. **The root is always `obj[objecttype]`** — fylr objects are keyed by objecttype at the top level. Passing in a bare object without that wrapper returns `undefined`.
339-
3. **Linked objects are shallow in the webhook payload**. When an object links to another object, the webhook payload only carries `{_id, _version}` for the linked object. If a dot-path descends into a linked object (either directly navigating into its typed key or trying to access a missing field on a wrapper), the function fetches the full linked object via `GET /api/v1/db/<objecttype>/_all_fields/<id>?format=long` using the plugin user's Bearer token, then continues the walk.
373+
3. **Nested table fields use a special key format.** In the fylr API payload, a nested table named `tierart` on objecttype `haustieranatomie` is stored under the key `_nested:haustieranatomie__tierart` (not `tierart`). The traversal loop tries the plain part name first; if not found, it automatically tries the `_nested:<objecttype>__<part>` variant before proceeding. The value is an array; the first element is taken and traversal continues into it.
374+
4. **Linked objects are shallow in the webhook payload**. When an object links to another object, the webhook payload only carries `{_id, _version}` for the linked object. If a dot-path descends into a linked object (either directly navigating into its typed key or trying to access a missing field on a wrapper), the function fetches the full linked object via `GET /api/v1/db/<objecttype>/_all_fields/<id>?format=long` using the plugin user's Bearer token, then continues the walk.
340375

341376
Both linked-object-fetch branches record failures via the shared `warnings` array so the caller can decide how to surface them. Errors from fetches are never fatal; the path just returns `undefined` and the mapping falls back to its `default_value`.
342377

@@ -445,4 +480,5 @@ Using the internal API URL (`info.api_url`) avoids any reverse-proxy interferenc
445480
| `PublishUnknownCollector: collector ""` | The `_basetype` wrapper got removed, or the `collector` field inside `publish` is empty, or the collector name doesn't match the one configured in fylr |
446481
| Publish entry returns 401/403 | The plugin user in `datacite_global.api_user` lacks the `system.api.publish.post` right |
447482
| Linked-object field resolves to `undefined` | Either the plugin user can't read the linked objecttype, or the path is wrong. Uncomment the linked-object `console.error` lines to see the failed fetch URL. |
483+
| Nested table field (e.g. `tierart`) resolves to `undefined` | The field is stored under `_nested:<objecttype>__<fieldname>` in the payload. Make sure the path continues *into* the nested rows (e.g. `objecttype.nestedField.linkedType.linkedType.textField`). |
448484
| Admin UI shows raw l10n keys instead of labels | New parameter was added to `manifest.master.yml` but not to `datacite-loca.csv` |

manifest.master.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
plugin:
22
name: fylr-plugin-datacite
3-
version: "1.0.1"
3+
version: "1.1.0"
44
url: https://github.com/eth-library/fylr-publish-datacite-plugin
55
l10n: l10n/datacite-loca.csv
66
displayname:

server/webhook/register-doi.js

Lines changed: 129 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -162,15 +162,9 @@ async function main() {
162162

163163
if (!dataciteField) continue;
164164

165-
// Strip objecttype prefix if present (e.g. "haustieranatomie.titel" -> "titel")
166-
let resolvedPath = fylrPath;
167-
if (resolvedPath && resolvedPath.startsWith(objecttype + '.')) {
168-
resolvedPath = resolvedPath.slice(objecttype.length + 1);
169-
}
170-
171165
// Prefer _current which contains the full field data; fall back to top-level obj
172166
const sourceObj = (obj._current && obj._current[objecttype]) ? obj._current : obj;
173-
const resolvedValue = await resolveFieldPathAsync(sourceObj, objecttype, resolvedPath, fylrApiUrl, accessToken, warnings);
167+
const resolvedValue = await resolveExpressionAsync(fylrPath, sourceObj, objecttype, fylrApiUrl, accessToken, warnings);
174168
mappedFields[dataciteField] = resolvedValue || defaultValue || '';
175169
}
176170

@@ -364,6 +358,120 @@ async function main() {
364358
process.exit(0);
365359
}
366360

361+
/**
362+
* Parses a fylr_field_path value into typed tokens for resolveExpressionAsync.
363+
*
364+
* Token types:
365+
* literal – a quoted string ("..." or '...'), with \n and \t escape support
366+
* path – an unquoted dot-path field reference
367+
* group – a (...) conditional block; omitted entirely when any path inside is empty
368+
*
369+
* Examples:
370+
* type.field + " — " + type.other + "\n"
371+
* ("Label: " + type.field + "\n") + ("Other: " + type.other + "\n")
372+
*/
373+
function formatDecimal2(value) {
374+
if (!value) return '';
375+
const num = parseFloat(value) / 100;
376+
if (isNaN(num)) return value;
377+
// return num.toFixed(2).replace('.', ',');
378+
return num.toFixed(2);
379+
}
380+
381+
function tokenizeExpression(expression) {
382+
let i = 0;
383+
384+
function makePathToken(raw) {
385+
const pipeIdx = raw.indexOf('|');
386+
if (pipeIdx === -1) return { type: 'path', value: raw };
387+
return { type: 'path', value: raw.slice(0, pipeIdx).trim(), format: raw.slice(pipeIdx + 1).trim() };
388+
}
389+
390+
function parseString(q) {
391+
i++; // skip opening quote
392+
let s = '';
393+
while (i < expression.length && expression[i] !== q) {
394+
if (expression[i] === '\\' && i + 1 < expression.length) {
395+
const n = expression[i + 1];
396+
s += n === 'n' ? '\n' : n === 't' ? '\t' : n;
397+
i += 2;
398+
} else { s += expression[i++]; }
399+
}
400+
i++; // skip closing quote
401+
return s;
402+
}
403+
404+
function parseTokenList(stopChar) {
405+
const result = [];
406+
let current = '';
407+
while (i < expression.length && expression[i] !== stopChar) {
408+
const ch = expression[i];
409+
if (ch === '"' || ch === "'") {
410+
const t = current.trim(); if (t) result.push(makePathToken(t)); current = '';
411+
result.push({ type: 'literal', value: parseString(ch) });
412+
} else if (ch === '+') {
413+
const t = current.trim(); if (t) result.push(makePathToken(t)); current = ''; i++;
414+
} else if (ch === '(') {
415+
const t = current.trim(); if (t) result.push(makePathToken(t)); current = '';
416+
i++; // skip '('
417+
const groupTokens = parseTokenList(')');
418+
if (i < expression.length) i++; // skip ')'
419+
result.push({ type: 'group', tokens: groupTokens });
420+
} else { current += ch; i++; }
421+
}
422+
const t = current.trim(); if (t) result.push(makePathToken(t));
423+
return result;
424+
}
425+
426+
return parseTokenList(undefined);
427+
}
428+
429+
/**
430+
* Resolves a fylr_field_path expression to a string value.
431+
*
432+
* Supports:
433+
* - Single dot-path: type.field (existing behaviour)
434+
* - Concatenation: type.field + " — " + type.other + "\n"
435+
* - Conditional group: ("Label: " + type.field + "\n")
436+
* → the group is omitted entirely when any field path inside resolves to empty,
437+
* so labels/separators never appear without their corresponding value.
438+
*/
439+
async function resolveExpressionAsync(expression, sourceObj, objecttype, fylrApiUrl, accessToken, warnings) {
440+
if (!expression) return '';
441+
const tokens = tokenizeExpression(expression);
442+
443+
async function resolvePath(token) {
444+
let path = token.value;
445+
if (path.startsWith(objecttype + '.')) path = path.slice(objecttype.length + 1);
446+
const raw = (await resolveFieldPathAsync(sourceObj, objecttype, path, fylrApiUrl, accessToken, warnings)) || '';
447+
if (token.format === 'decimal2') return formatDecimal2(raw);
448+
return raw;
449+
}
450+
451+
const parts = [];
452+
for (const token of tokens) {
453+
if (token.type === 'literal') {
454+
parts.push(token.value);
455+
} else if (token.type === 'path') {
456+
parts.push(await resolvePath(token));
457+
} else if (token.type === 'group') {
458+
const subParts = [];
459+
let anyEmpty = false;
460+
for (const sub of token.tokens) {
461+
if (sub.type === 'literal') {
462+
subParts.push(sub.value);
463+
} else {
464+
const val = await resolvePath(sub);
465+
if (!val) anyEmpty = true;
466+
subParts.push(val);
467+
}
468+
}
469+
if (!anyEmpty) parts.push(subParts.join(''));
470+
}
471+
}
472+
return parts.join('');
473+
}
474+
367475
/**
368476
* Like resolveFieldPath but handles fylr date objects ({value: "..."})
369477
* and fetches linked objects from the fylr API when the path goes deeper
@@ -379,6 +487,13 @@ async function resolveFieldPathAsync(obj, objecttype, dotPath, fylrApiUrl, acces
379487
for (let i = 0; i < parts.length; i++) {
380488
if (current === null || current === undefined) return undefined;
381489

490+
// Nested tables (haustieranatomie > tierart etc.) are arrays in the API payload.
491+
// Take the first element so dot-path traversal can continue into it.
492+
if (Array.isArray(current)) {
493+
current = current[0];
494+
if (current === null || current === undefined) return undefined;
495+
}
496+
382497
const part = parts[i];
383498

384499
if (typeof current !== 'object') return undefined;
@@ -412,6 +527,13 @@ async function resolveFieldPathAsync(obj, objecttype, dotPath, fylrApiUrl, acces
412527
}
413528

414529
if (!(part in current)) {
530+
// fylr stores nested table rows under _nested:<objecttype>__<fieldname>
531+
const nestedKey = '_nested:' + objecttype + '__' + part;
532+
if (nestedKey in current) {
533+
current = current[nestedKey];
534+
continue;
535+
}
536+
415537
// If current is a linked object wrapper, fetch the full object and retry
416538
if (current._objecttype && current._system_object_id && fylrApiUrl && accessToken) {
417539
const innerId = current[current._objecttype] && current[current._objecttype]._id;

0 commit comments

Comments
 (0)