- Overview
- Installation
- Running the App
- Data Requirements
- Workflow
- Configuration
- Knowledge Base Export
- Limitations
tseda is an automated signal decomposition and diagnostic engine for high-fidelity time series preprocessing. It is designed for enterprise data pipelines working with data at hourly cadence or lower (daily, monthly, quarterly, etc.) and guides you through a structured validation process:
- Initial assessment — upload your data, inspect its statistical properties, and understand its autocorrelation structure.
- SSA decomposition — apply Singular Spectrum Analysis (SSA) to separate the series into trend, seasonal, and noise components.
- Observation logging — review auto-generated narrative summaries, annotate your findings, and export results.
Configuration callout: all thresholds, constants, and heuristics used by the app are configurable in src/tseda/config/tseda_config.yaml.
conda create -n tseda python=3.13
conda activate tseda
pip install tsedapipx install tsedagit clone <repo-url>
cd tseda
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"After installation, launch tseda from the command line:
tsedaOr equivalently:
python -m tsedaThe app starts a local Dash web server and opens automatically in your default browser. No arguments are required.
tseda expects a CSV or Excel file in long (tidy) format with:
| Column | Description |
|---|---|
| Column 1 | Timestamps — must be parseable as dates/datetimes |
| Column 2 | Numeric values — the time series variable |
Additional requirements:
- Regular cadence: the sampling interval must be uniform and inferable (e.g., daily, monthly, quarterly, hourly).
- No missing values: gaps in the series are not supported.
- Maximum 2,000 rows: longer files will be rejected.
- The file must have at least two columns; only the first two are used.
Example valid formats: daily sales data, monthly energy consumption, quarterly GDP, hourly sensor readings.
Upload your CSV or Excel file using the file upload control. The app parses the file and displays:
A summary table showing:
- N: number of observations
- Start / End: date range of the series
- Duration: total time span
- Inferred frequency: e.g., "Monthly", "Daily"
- Heuristic SSA window: a suggested window size for decomposition based on the detected cadence
| Cadence | Initial Window |
|---|---|
| Hourly | 24 |
| Daily | 5 |
| Weekly | 4 |
| Monthly | 12 |
| Quarterly | 4 |
A kernel density estimate (KDE) overlaid with a box plot of the raw values. Use this to:
- Assess the central tendency and spread.
- Identify multi-modality (multiple peaks may indicate regime changes or mixed populations).
A scatter/line plot of the full time series, letting you visually inspect trends, seasonality, outliers, and structural breaks.
Autocorrelation Function (ACF) and Partial ACF plots computed via statsmodels. These reveal:
- Slow decay in ACF: trend component present.
- Seasonal spikes: periodicity at regular lags.
- PACF cutoff: guides autoregressive model order if needed.
SSA decomposes the series into a set of components ranked by their contribution to total variance (explained by eigenvalues).
The app pre-selects a window size based on the detected cadence. You can adjust it manually. A good rule of thumb:
- The window should be at least as long as the expected seasonal period.
- Larger windows capture longer-range structure but increase computation.
Before grouping, the app computes a cadence-based initial window and then applies an eigen-spectrum spread check. The goal is to avoid starting with a window where the smallest eigenvalue still explains too much variance.
Algorithm: Initial SSA Window Setup
Input: regular time series x, inferred cadence c
Params: min_tail_spread = 0.10
--- Cadence-based initialization ---
1. If c = hourly -> w ← 24
2. If c = daily -> w ← 5
3. If c = weekly -> w ← 4
4. If c = monthly -> w ← 12
5. If c = quarterly -> w ← 4
6. If cadence is unknown -> fail with "invalid window"
--- Spectrum-spread refinement ---
7. Build SSA(x, w) and compute eigenvalues λ₁ ≥ ... ≥ λ_w
8. tail_ratio ← λ_w / Σᵢ λᵢ
9. While tail_ratio ≥ min_tail_spread and 2w ≤ floor(N/2):
w ← 2w
Rebuild SSA(x, w)
tail_ratio ← λ_w / Σᵢ λᵢ
--- Output ---
10. Return final w as the decomposition default and slider value
Operationally, this means the startup window can be larger than the raw cadence mapping if the first decomposition is too flat in the tail. The UI slider is synchronized to this final value.
A bar chart showing the explained variance for each SSA component (eigenvalue rank). Use this to decide how many components to retain.
Plots of the leading eigenvectors. Paired eigenvectors (similar shape, similar eigenvalues) indicate a periodic/seasonal component.
Before the grouping controls are activated, the app automatically checks whether this series is structurally suited to SSA decomposition.
Why this check exists. SSA decomposes a series by finding a small number of dominant directions (eigenvectors) that capture most of the variance. This works well when the series has real structure — a trend pulling values in a consistent direction, or a seasonal oscillation repeating at a known period. Those structures produce large, concentrated eigenvalues at the top of the spectrum. The remaining eigenvectors, which correspond to noise, are small and comparably sized.
When a series is dominated by noise — white noise, a random walk with no drift, or any process without persistent structure — the eigenspectrum looks completely different: variance is spread roughly equally across all eigenvectors. No single component stands out. Applying SSA to such a series still produces a mathematically valid result, but the Trend and Seasonality groups it generates are statistical artefacts rather than meaningful signal components. The decomposition cannot be trusted, and the Durbin-Watson check on the noise residual will rarely give a clean result because there is no coherent structure left to separate.
The suitability check quantifies this directly: it sums the explained variance of the top k eigenvectors and requires that sum to reach a minimum threshold. If it does not, the Apply Grouping button is disabled and a red alert explains what was found:
Params: top_k = 5, min_explained_variance = 0.40
1. total ← Σᵢ λᵢ
2. k ← min(top_k, spectrum_length)
3. top_k_ratio ← Σᵢ₌₁ᵏ λᵢ / total
4. If top_k_ratio < min_explained_variance:
→ Block Apply Grouping
→ Show: "Top k eigenvectors explain X.X% — minimum required Y%"
→ Recommend: external stochastic modelling (outside this SSA app)
5. Else:
→ Allow grouping to proceed
The alert reports the actual ratio alongside the threshold so you can judge whether the dataset is marginally below the cutoff (and perhaps worth trying with a different window) or deeply unsuitable. Both top_k and min_explained_variance are configurable — see Section 10 of the Configuration reference.
What to do if the check fails:
- Try a larger SSA window using the slider. Sometimes a cadence-based default window is too small to reveal seasonal structure.
- Inspect the eigenvalue profile plot — if the bars form a steep drop-off rather than a flat line, the series may still be worth exploring at a different scale.
- If the spectrum remains flat at all window sizes, the series is most likely noise-dominated. Since this tool is SSA-focused, switch to an external stochastic approach (for example random walk/Brownian-motion-style models or ARIMA/SARIMA).
The automatic grouping heuristic assigns each SSA component to Trend, Seasonality, or Noise using the following procedure:
Algorithm: SSA Eigenvalue Group Assignment
Input: eigenvalues λ₁ ≥ λ₂ ≥ ... ≥ λₖ (sorted descending),
noise residual r
Params: variance_threshold = 0.10, pair_tolerance = 0.05,
pool_selection_method = "kneedle",
kneedle_min_distance = 0.03,
min_signal_components = 1,
min_noise_components = 2,
dw_low = 1.5, dw_high = 2.5
--- Initial classification ---
1. Trend ← ∅; Seasonality ← ∅
2. If pool_selection_method = "kneedle":
a) yᵢ ← log(1 + λᵢ)
b) Normalize y between first and last points
c) Compute distance to endpoint chord line
d) knee ← argmax(distance) if max(distance) ≥ kneedle_min_distance
e) Eligible ← {0..knee} with bounds:
min |Eligible| = min_signal_components
max |Eligible| = K - min_noise_components
Else (legacy fallback):
Eligible ← { i : (λᵢ / Σⱼ λⱼ) ≥ variance_threshold }
3. Noise ← all indices not in Eligible
--- Scan eligible components in rank order ---
4. cursor ← 0
5. While cursor < |Eligible|:
j ← Eligible[cursor]
k ← Eligible[cursor + 1] (if it exists)
if k = j + 1 and |λⱼ − λₖ| / max(λⱼ, λₖ) ≤ pair_tolerance then
Seasonality ← Seasonality ∪ { j, k }
cursor ← cursor + 2
else
Trend ← Trend ∪ { j }
cursor ← cursor + 1
--- Validate with Durbin-Watson ---
6. r_noise ← r − Σᵢ∈(Trend∪Seasonality) component(i)
7. dw ← DurbinWatson(r_noise)
8. best ← current assignment; best_dist ← |dw − 2.0|
--- Iterative expansion from noise pool ---
9. While dw ∉ [dw_low, dw_high] and |Noise| > 0:
candidate ← Noise[0] // largest remaining noise eigenvalue
next ← Noise[1] (if it exists)
if next = candidate + 1 and |λ_candidate − λ_next| / max(…) ≤ pair_tolerance then
Seasonality ← Seasonality ∪ { candidate, next }
Noise ← Noise \ { candidate, next }
else
Trend ← Trend ∪ { candidate }
Noise ← Noise \ { candidate }
r_noise ← r − Σᵢ∈(Trend∪Seasonality) component(i)
dw ← DurbinWatson(r_noise)
if |dw − 2.0| < best_dist then
best ← current assignment; best_dist ← |dw − 2.0|
--- Output ---
10. If dw ∉ [dw_low, dw_high]:
Return best with warning "DW criterion not met — try a different window size"
11. Else:
Return (Trend, Seasonality, Noise)
A heatmap of correlations between SSA components. Strongly correlated component pairs should be grouped together in the reconstruction step.
The app first renders a Suggested Grouping table in the center of the decomposition panel and prepopulates the input rows for Trend, Seasonality, and Noise using the heuristic above. You can edit those values before reconstruction if the diagnostic plots or domain context suggest a different interpretation.
Assign SSA components to interpretable groups by entering index ranges (0-based). Typical groupings:
| Group | Example indices | Meaning |
|---|---|---|
| Trend | [0, 1] |
Low-frequency trend |
| Seasonality | [2, 3] |
Dominant periodic component |
| Noise | [4:] |
Residual / noise |
The app reconstructs the signal for each group and displays the result overlaid on the original series.
If you click Clear Uploaded File in Step 1, the suggested grouping table and the prepopulated grouping fields are reset with the rest of the session analysis state.
For notebook users, the same retry pattern is available via
NotebookThreeStepAPI.suggest_grouping_with_window_autotune(...), which
automatically reapplies grouping after each window reassignment until the
Durbin-Watson gate passes or the configured maximum window is reached.
Applied to the noise component to assess residual independence:
- Value ≈ 2: residuals are uncorrelated (good).
- Value < 1.5: positive autocorrelation remains (consider adding more components to the structured groups).
- Value > 2.5: negative autocorrelation.
Export gating behavior: The Export Components button is enabled only when Durbin-Watson is within the configured valid range (noise_validation.dw_low to noise_validation.dw_high, default [1.5, 2.5]). When enabled, clicking export downloads a CSV containing timestamp, Trend, Seasonality, and Noise.
After applying the reconstruction grouping, the app runs a change-point analysis that independently examines trend shifts and seasonal amplitude shifts. Noise is excluded from this analysis: by definition, noise has no persistent structure and is not a source of meaningful change points.
| Component | Analysed? | Rationale |
|---|---|---|
| Trend | ✅ Yes | Persistent mean-level shifts are the primary structural break of interest. |
| Seasonality | ✅ Yes (amplitude envelope) | Changes in how strong the seasonal pattern is (e.g. seasonality growing or shrinking) are detected. Phase/frequency shifts are not — see note below. |
| Noise | ❌ No | Noise is by construction structureless; running a change-point detector on it would produce spurious breaks. |
Phase and frequency shifts are not detected by the current algorithm. Detecting period changes robustly requires either a sliding-window FFT or a Hilbert-transform instantaneous frequency approach; both are noisy on short series (< 200 points) and are not implemented here.
Detects points where the long-run mean level of the series changes permanently.
Input: Trend component from the SSA reconstruction
1. Z-score normalise the trend:
z[t] = (trend[t] − mean(trend)) / std(trend)
Normalisation makes the penalty scale-invariant across datasets
with very different value ranges.
2. Fit PELT (ruptures, l2 cost model) with a BIC-style penalty:
penalty = log(n)
where n is the series length.
3. Collect interior breakpoints (PELT appends n as a sentinel;
discard it). These are the trend-shift indices.
Visualisation: vertical - - - dashed lines, labelled T1, T2, … at the top of the plot.
Detects points where the strength of the seasonal pattern changes (the seasonal oscillations become noticeably larger or smaller).
Input: Seasonality component from the SSA reconstruction
1. Compute the rolling RMS envelope over a window w equal to the
SSA window size (captures one nominal seasonal cycle):
envelope[t] = sqrt( mean( seasonality[t-w/2 : t+w/2]² ) )
This converts the oscillating seasonality signal into a smooth
amplitude envelope.
2. Z-score normalise the envelope (same reason as for the trend).
3. Fit PELT (ruptures, l2 cost model) with penalty = log(n).
4. Collect interior breakpoints. These are the seasonal-amplitude-
shift indices.
Visualisation: vertical ··· dotted lines, labelled S1, S2, … at the bottom of the plot.
The smoothed signal (trend + all non-noise components) is drawn as a single continuous line — it is never broken at a segment boundary. The two sets of markers are visually distinct:
| Marker style | Label | Meaning |
|---|---|---|
Vertical - - - dashed line |
T1, T2, … | The trend mean shifted here |
Vertical ··· dotted line |
S1, S2, … | The seasonal amplitude changed here |
A plain-language summary is printed below the plot, for example:
Trend shifts (- -): 2026-11-29, 2027-02-07
Seasonal amplitude shifts (···): 2026-10-25, 2026-11-29, 2027-02-07
If no changes are detected for a component the line reads none detected.
When a trend-shift date and a seasonal-amplitude-shift date coincide, both a dashed and a dotted line will appear at the same x position, indicating a simultaneous structural regime change in both the level and the seasonal strength.
For each candidate model rank
where
The app produces a prose summary combining:
- Sampling metadata and descriptive statistics (mean, median, std, skewness, kurtosis)
- SSA findings (dominant components, seasonality flag)
- Residual diagnostics (Durbin-Watson result)
- Change point locations
The narrative is editable in the app. Add your own observations, domain context, or caveats before exporting.
The finalised report can be exported as a plain text file for documentation or sharing.
Use Save to Knowledge Base in the Observation Logging step to persist observations to a KMDS OWL/RDF file.
When entering the save location:
- The app validates that the directory exists.
- The app validates that the current user has sufficient write privileges.
- If validation fails, a clear error is shown and the Save button is disabled.
- The app blocks save attempts to invalid or non-writable locations.
The selected directory and file name are stored in app state during the session and are reset when you click Clear Uploaded File in Step 1.
tseda uses an externalized configuration file to manage thresholds, parameters, and limits across the application. This allows you to customize algorithm behavior without modifying code.
The configuration file is stored at: src/tseda/config/tseda_config.yaml
When the application starts, it automatically loads this configuration into memory. You can modify any settings in this file to customize the application behavior.
file_upload:
max_file_lines: 2000 # Maximum rows allowed in uploaded CSV/Excel filesDefault: 2000 rows
Suggested range: 1,000 – 5,000
Purpose: Prevents memory exhaustion from very large files and maintains UI responsiveness.
window_selection:
hourly: 24 # One diurnal cycle
daily: 5 # One business week
weekly: 4 # Approximately one month
monthly: 12 # One full annual cycle
quarterly: 4 # One full annual cycle (4 quarters)Default values: As listed above Purpose: Provides initial SSA window sizes based on detected sampling frequency. Values represent one expected seasonal cycle at each cadence.
grouping_heuristic:
pool_selection_method: "kneedle" # "kneedle" (default) or legacy "variance_threshold"
variance_threshold: 0.10 # Used only in legacy variance_threshold mode
pair_similarity_tolerance: 0.05 # Maximum allowed difference (as fraction) for paired eigenvalues
kneedle_min_distance: 0.03 # Minimum normalized knee distance to accept elbow
min_signal_components: 1 # Minimum initial signal pool size
min_noise_components: 2 # Minimum initial noise pool sizePool selection method:
- Default:
kneedle - Options:
kneedle,variance_threshold - Purpose: Defines how the initial signal pool is selected before DW-guided expansion.
Kneedle minimum distance:
- Default:
0.03 - Range:
0.01–0.08 - Purpose: Controls how strong the elbow must be to accept a detected noise floor.
Minimum signal components:
- Default:
1 - Purpose: Prevents empty initial signal pools on flat spectra.
Minimum noise components:
- Default:
2 - Purpose: Preserves room for DW-based reassignment and residual checks.
Variance threshold (legacy mode):
- Default:
0.10(10%) - Range:
0.05–0.20 - Purpose: Used only when
pool_selection_methodis set tovariance_threshold.
Pair similarity tolerance:
- Default:
0.05(5%) - Range:
0.02–0.10 - Purpose: When two adjacent eigenvalues differ by ≤ this fraction, they are paired and assigned to seasonality.
noise_validation:
dw_low: 1.5 # Minimum acceptable Durbin-Watson statistic
dw_high: 2.5 # Maximum acceptable Durbin-Watson statisticDefault range: [1.5, 2.5]
Suggested range: [1.4, 2.6]
Purpose: The Durbin-Watson (DW) statistic measures autocorrelation in the noise residual. A value near 2.0 indicates uncorrelated noise; values outside this range suggest residual autocorrelation that should be addressed by adjusting component grouping.
window_refinement:
min_tail_spread: 0.10 # Minimum acceptable smallest eigenvalue (as fraction of total variance)Default: 0.10 (10%)
Suggested range: 0.05 – 0.15
Purpose: After initial window selection, the algorithm checks if the smallest eigenvalue explains too much variance. If it does, the window is doubled and SSA is recomputed until this invariant is satisfied or the half-length bound is reached. This ensures the eigenvalue spectrum has meaningful spread.
seasonality_heuristic:
leading_eigenvalues_to_check: 6 # Number of top eigenvalues inspected for paired structureDefault: 6
Suggested range: 4 – 12
Purpose: When deciding whether to flag the series as seasonal, the algorithm examines the top N eigenvalues for paired (near-equal) structure. Higher values inspect more components.
periodicity:
fmin: 0.1 # Minimum search frequency (cycles per sample)
fmax: 2.0 # Maximum search frequency (cycles per sample)
num_frequencies: 1000 # Number of discrete frequency points to evaluatefmin / fmax:
- Default:
0.1–2.0 - Purpose: Defines the frequency range for Lomb-Scargle periodogram analysis.
num_frequencies:
- Default:
1000 - Suggested range:
500–2000 - Purpose: Higher values give finer frequency resolution but increase computation cost.
loess:
min_fraction: 0.05 # Minimum smoothing fraction (data points to use per local regression)
max_fraction: 0.5 # Maximum smoothing fraction
default_fraction: 0.05 # Default value shown in the UI slider
step: 0.05 # Slider incrementDefault values: As listed above Purpose: The LOESS smoother uses a sliding-window local regression. The fraction parameter controls the width of each window; lower values produce noisier but more detailed curves, while higher values produce smoother curves that may hide detail.
change_point_detection:
model: "rbf" # Cost model for PELT algorithm ("rbf", "l2", "linear")
penalty_multiplier: 2.0 # Multiplier for BIC-style penalty = penalty_multiplier * log(n)Model:
- Default:
"rbf" - Options:
"rbf","l2","linear" - Purpose: Defines the cost function used by the PELT algorithm. "rbf" (radial basis function) is robust and recommended for most time series.
Penalty multiplier:
- Default:
2.0(yields BIC penalty = 2 * log(n)) - Suggested range:
1.5–2.5 - Purpose: Higher values discourage finding many small segments (conservative), while lower values permit more breakpoints (liberal).
suitability_check:
top_k_eigenvectors: 5 # Number of leading eigenvectors to sum for the concentration test
min_explained_variance: 0.40 # Minimum fraction of total variance the top-k must explaintop_k_eigenvectors:
- Default:
5 - Suggested range:
3–8 - Purpose: Defines how many leading eigenvectors are summed for the suitability check. A value of 5 is a reasonable default for most series — it covers one trend component and up to two seasonal pairs. For series with complex multi-period seasonality (e.g. hourly data with daily and weekly cycles), you may want to raise this to 7 or 8.
min_explained_variance:
- Default:
0.40(40%) - Suggested range:
0.30–0.55 - Purpose: The minimum fraction of total variance the top-k eigenvectors must collectively explain. If the actual ratio falls below this threshold, the series is deemed unsuitable for SSA and the Apply Grouping button is disabled. Raising this threshold makes the check stricter (only highly structured series pass); lowering it is more permissive and appropriate if your series has moderate but real structure diluted by heavy noise.
Example: relaxing the check for a noisy-but-structured series:
suitability_check:
top_k_eigenvectors: 6
min_explained_variance: 0.30 # Allow series where top-6 explain at least 30%- Open
src/tseda/config/tseda_config.yamlin a text editor. - Modify any values in the appropriate section.
- Save the file.
- Restart the
tsedaapplication. Configuration is loaded at startup.
If you find that the automatic component grouping rarely satisfies the DW criterion on your datasets, try relaxing the bounds:
noise_validation:
dw_low: 1.4 # Relaxed from 1.5
dw_high: 2.6 # Relaxed from 2.5This will allow groupings with slightly more residual autocorrelation to be accepted.
tseda can persist your exploratory observations to an OWL/RDF knowledge base using the kmds and owlready2 libraries. This is designed for teams that maintain a structured, machine-readable log of analytical findings.
- Observations are appended to a
.xmlOWL file. - Existing observations can be deleted from the UI.
- The knowledge base can accumulate findings across multiple analysis sessions.
- Save-location validation prevents writing to non-existent or privilege-restricted directories and prompts you to choose a valid location.
| Constraint | Detail |
|---|---|
| Maximum series length | 2,000 rows |
| Sampling cadence | Must be regular and inferable; gaps not supported |
| Input format | CSV or Excel with timestamp + numeric value columns only |
| Frequency range | Hourly cadence or lower (sub-hourly not supported) |
| Missing values | Not supported |
