Skip to content

Commit bd4f59d

Browse files
committed
chore(seed): widen K10 fixture — add NAF 'N' sector + workforce bias
Two visibility fixes for local K10 exploration: - Add a ninth NAF sample code (`N78.10Z` — services administratifs) so the "Segmenter par NAF" mode shows all five dominant series (C, G, M, N, Q) + Autres instead of only four. - `pseudoAverageGap` now takes `workforce` into account with a linear gradient (+1.5pt on <50 down to -1pt on 250+). Without it the five workforce curves rendered on top of each other in 'Segmenter par effectif' mode since every bucket had the same synthetic baseline. Catalog grows from 5×8=40 to 5×9=45 companies, and from 160 to 180 submitted declarations across four campaign years.
1 parent 1ee09be commit bd4f59d

1 file changed

Lines changed: 54 additions & 17 deletions

File tree

packages/app/scripts/seed-conformite-stats.mjs

Lines changed: 54 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -25,17 +25,23 @@ const SEED_DECLARANT_ID = "seed-conformite-declarant-0000-0000000";
2525

2626
/** 777XXXXXX SIRENs are reserved for this fixture. */
2727
const SIREN_PREFIX = "777";
28-
/** Generates one company per (bucket × sector) combination, 5 × 8 = 40. */
28+
/** Generates one company per (bucket × sector) combination, 5 × 9 = 45. */
2929
const WORKFORCE_BUCKETS = [20, 60, 120, 180, 300];
30+
/**
31+
* Nine NAF sample codes cover the five K10 dominant sections (C, G, M, N, Q)
32+
* so every expected curve shows up in "Segmenter par NAF" mode, plus four
33+
* non-dominant sections (A, F, J, K) that collapse into the "Autres" series.
34+
*/
3035
const NAF_SAMPLE_CODES = [
31-
"A01.11Z", // A — Agriculture
32-
"C10.11Z", // C — Industrie manufacturière
33-
"F41.10A", // F — Construction
34-
"G47.11B", // G — Commerce
35-
"J62.01Z", // J — Information & communication
36-
"K64.19Z", // K — Activités financières et d'assurance
37-
"M70.10Z", // M — Activités spécialisées
38-
"Q86.10Z", // Q — Santé humaine
36+
"A01.11Z", // A — Agriculture → "Autres"
37+
"C10.11Z", // C — Industrie manufacturière (dominant)
38+
"F41.10A", // F — Construction → "Autres"
39+
"G47.11B", // G — Commerce (dominant)
40+
"J62.01Z", // J — Information & communication → "Autres"
41+
"K64.19Z", // K — Activités financières et d'assurance → "Autres"
42+
"M70.10Z", // M — Activités spécialisées (dominant)
43+
"N78.10Z", // N — Services administratifs (dominant)
44+
"Q86.10Z", // Q — Santé humaine (dominant)
3945
];
4046
/** Number of most-recent campaign years to seed (current year + N-1 … N-3). */
4147
const CAMPAIGN_YEARS_BACK = 4;
@@ -119,20 +125,46 @@ function shouldHaveAlertGap(companyIndex, yearsBeforeCurrent, nafCode) {
119125

120126
/**
121127
* Build a plausible average gap (0..12%) for a seed row so K10 has something
122-
* to plot. Same shape as `shouldHaveAlertGap`: slightly improving over time,
123-
* with a sector bias (K finance skewed up, M services skewed down). Clamped
124-
* to [0.5, 12] so the chart's Y-axis stays readable.
128+
* to plot. Combines three effects so every K10 segmentation produces visibly
129+
* distinct curves:
130+
* - **year drift**: ~+0.6pt per year going back, so the trend slopes down
131+
* toward the current year.
132+
* - **NAF sector bias**: K (finance) runs hotter, M (services spécialisés)
133+
* cooler — the "Segmenter par NAF" mode shows spread between Autres and
134+
* the dominant series.
135+
* - **workforce bias**: smaller companies (< 50) trend ~+1.5pt vs. 250+,
136+
* so the "Segmenter par effectif" mode separates the buckets vertically
137+
* instead of drawing five overlapping lines.
138+
* Clamped to [0.5, 12] so the Y-axis stays readable.
125139
*/
126-
function pseudoAverageGap(companyIndex, yearsBeforeCurrent, nafCode) {
140+
function pseudoAverageGap(
141+
companyIndex,
142+
yearsBeforeCurrent,
143+
nafCode,
144+
workforce,
145+
) {
127146
const baseGap = 4 + 0.6 * yearsBeforeCurrent;
128147
const sectorShift = nafCode.startsWith("K")
129148
? 2.5
130149
: nafCode.startsWith("M")
131150
? -1.5
132151
: 0;
152+
// Linear gradient across the five buckets: <50 → +1.5, 50-99 → +0.75,
153+
// 100-149 → 0, 150-249 → -0.5, 250+ → -1. Enough spread for the chart
154+
// to render five distinct curves without overlapping.
155+
const workforceShift =
156+
workforce < 50
157+
? 1.5
158+
: workforce < 100
159+
? 0.75
160+
: workforce < 150
161+
? 0
162+
: workforce < 250
163+
? -0.5
164+
: -1;
133165
const jitter =
134-
(pseudoRandom(companyIndex * 211 + yearsBeforeCurrent) - 0.5) * 3;
135-
const value = baseGap + sectorShift + jitter;
166+
(pseudoRandom(companyIndex * 211 + yearsBeforeCurrent) - 0.5) * 1.5;
167+
const value = baseGap + sectorShift + workforceShift + jitter;
136168
return Math.max(0.5, Math.min(12, Math.round(value * 10) / 10));
137169
}
138170

@@ -194,10 +226,15 @@ async function seed(sql) {
194226
for (let yearsBack = 0; yearsBack < CAMPAIGN_YEARS_BACK; yearsBack++) {
195227
const year = currentYear - yearsBack;
196228
let companyIndex = 0;
197-
for (const { siren, nafCode } of catalog) {
229+
for (const { siren, nafCode, workforce } of catalog) {
198230
companyIndex++;
199231
const hasAlertGap = shouldHaveAlertGap(companyIndex, yearsBack, nafCode);
200-
const averageGap = pseudoAverageGap(companyIndex, yearsBack, nafCode);
232+
const averageGap = pseudoAverageGap(
233+
companyIndex,
234+
yearsBack,
235+
nafCode,
236+
workforce,
237+
);
201238
// Spread submissions over January-February so the rows look realistic
202239
// (even though the campaign progression chart is not what we exercise
203240
// here, keeping a plausible date avoids surprises elsewhere).

0 commit comments

Comments
 (0)