Skip to content

Commit 91b5eab

Browse files
authored
Update README.rst to document the algorithms used
fixes #139 by documentating the machine learning algorithms used in smart_importer
1 parent ef0b279 commit 91b5eab

File tree

1 file changed

+36
-0
lines changed

1 file changed

+36
-0
lines changed

README.rst

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -234,3 +234,39 @@ In your importer code, you can then pass `jieba` to be used as tokenizer:
234234
tokenizer = lambda s: list(jieba.cut(s))
235235
236236
predictor = PredictPostings(string_tokenizer=tokenizer)
237+
238+
239+
Privacy
240+
-------
241+
242+
smart_importer uses machine learning (artificial intelligence, AI) algorithms in an ethical, privacy-conscious way:
243+
All data processing happens on the local machine; no data is sent to or retrieved from external servers or the cloud.
244+
All the code, including the machine learning implementation, is open-source.
245+
246+
Model:
247+
The machine learning model used in smart_importer is a classification model.
248+
The goal of the classification model is to predict transaction attributes,
249+
such as postings/accounts and payee names,
250+
in order to reduce the manual effort when importing transactions.
251+
The model is implemented using the open-source `scikit-learn <https://scikit-learn.org/>`__ library,
252+
specifically using scikit-learn's `SVC (support vector machine) <https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html>`__ implementation.
253+
254+
Training data:
255+
The model is trained on historical transactions from your Beancount ledger.
256+
This training happens on-the-fly when the import process is started, by reading ``existing_entries`` from the importer.
257+
The trained model is used locally on your machine during the import process, as follows.
258+
259+
Input:
260+
The input data are the transactions to be imported.
261+
Typically, these are transactions with a single posting, where one posting (e.g., the bank account) is known and the other one is missing.
262+
263+
Output:
264+
The output data are transactions with predicted second postings and/or other predicted transaction attributes.
265+
266+
Accuracy and Feedback Loops:
267+
The effectiveness of the model depends on the volume and diversity of your historical data — small or homogeneous datasets may result in poor predictions.
268+
Predictions are made automatically when importing new transactions, but users should always review them for accuracy before committing them to the ledger.
269+
Users can manually adjust predictions (e.g., change the payee or account) and save the corrected transactions to their ledger.
270+
These corrections are then used as training data for future predictions, allowing the accuracy to improve over time.
271+
272+
The smart_importer project is fully open source, meaning you can inspect and modify the code as needed.

0 commit comments

Comments
 (0)