Skip to content

Proposal: Import retroactively added transactions from CSV files #2464

@lfos

Description

@lfos

This is a concrete proposal to fix #1943. Without knowing much about how the import functionality is currently implemented (and knowing little about Haskell in general), I'd expect this to be relatively straightforward to add -- even though the description below is a bit lengthy. I grouped the proposed changes into three logical steps below but it might be easier to implement them all at once.

I'm looking forward to feedback!

Goals

  1. Fix import of CSV files that are only "mostly append-only", i.e., where existing entries are immutable and new entries can be inserted anywhere within the last X days.
  2. Maintain backwards-compatibility and keep the import framework as simple as possible.

Why? It is common for some types of accounts (e.g., credit cards) to have transactions be posted with a delay of a few days. The delay may vary for different transactions within the same account.

Change 1 - Keep more context in the .latest state

A new keep-extra-state rule is added to the set of allowed CSV rules. It controls how many extra days are stored in the state file. The default is 0, i.e., the current behavior. The resulting file with three extra days might look as follows:

2025-09-23
2025-09-23
2025-09-22
2025-09-21
2025-09-21
2025-09-21
2025-09-20

When importing a CSV file:

  1. Iterate over CSV transactions and state file records in parallel to ensure they match. (May need to skip over some import transactions first, or skip over some state file records first, depending on which earliest entry lies further in the past).
  2. Error out if there's a mismatch and provide details in the error message (e.g., unexpected new transaction on 2025-09-21 in bank.csv).
  3. Otherwise, once all state file entries have a match, continue importing as usual and update the state file accordingly.

Why? This change results in detection (but not import) of most retroactively added entries. Entries retroactively inserted on the most recent date in the state file are not yet detected. They still result in duplicate/incorrect imports.

Alternatives:

  • Keep all imported transaction dates, and make the CSV rule a on/off toggle.
  • Support both "number of extra days" (keep-extra-state 3d) and "number of extra transactions" (keep-extra-state 50) in the syntax.
  • Store state dates in ascending order in the state file instead.

Change 2 - Add optional transaction identifiers to the state file

A new state-descriptor hledger field is added. It can be set in a rule file to store additional information with transactions. This is up to the user; and how to best do this depends on the information available in the CSV file. E.g., if the bank provides a stable and unique transaction ID, state-descriptor %transaction_id may be a good approach. For other banks, state-descriptor %description %amount may be good enough.

It is not important that the descriptor is unique; what matters is that two entries on the same date with the same descriptor are interchangeable when importing. For example, if exactly the same charge is made twice on a day, there will be two indentical lines in the state file, and that's okay.

The field is then stored in the state file, together with dates, e.g.,

2025-09-23 88ce31a
2025-09-23 0882689
2025-09-22 dc460da
2025-09-21 4ad72c4
2025-09-21 82231e2
2025-09-21 8e688e0
2025-09-20 1f2778a

When importing a CSV file, the pattern matching described in the previous section now also compares the descriptors. The error messages can be more detailed, e.g., referring to the exact transaction with a mismatch instead of only the date.

Why? Further improved validation of "append only" nature of CSV files. Detection of violations in all cases, including entries retroactively inserted on the most recent date in the state file.

Change 3 - Import retroactively inserted entries

Instead of matching the state file against the CSV file as a consecutive block, perform "scattered" pattern matching:

  • Iterate over the CSV records and state file in parallel (skipping a prefix as needed) but allow additional non-matching entries in the CSV file. Those non-matching entries are exactly those that end up getting imported.
  • The only requirement is that if a new date is reached in the CSV file, all state file records for the same date must have been processed. Otherwise, an error is displayed (e.g., unexpected transaction in bank.csv: <details>).

Why? Retroactively inserted transactions are now imported properly. Moreover, any unexpected changes/deletions in already imported transactions are detected.

Remark: When no transaction identifiers (as proposed in the previous section) are used, this logic automatically behaves as "append at the end of each day". This is consistent with current behavior.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-WISHSome kind of improvement request or proposal.csvThe csv file format, csv output format, or generally CSV-related.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions