Skip to content

Commit 4dbd6e0

Browse files
chg ! bulk records update (#43)
1 parent bce9bc0 commit 4dbd6e0

File tree

13 files changed

+394
-155
lines changed

13 files changed

+394
-155
lines changed

docs/src/.pages

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,5 @@ nav:
66
- dev_helpers.md
77
- flows
88
- Interfaces: interfaces.md
9+
- Data cleaning and updates: data_cleaning_updates.md
910
- Import Data: import_data

docs/src/data_cleaning_updates.md

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
# Data cleaning and updates.
2+
3+
There are several options to clean data for households or individuals in the collector interface.
4+
5+
## Bulk Actions
6+
7+
The system allows users to perform mass operations on multiple records at once. These actions help streamline data management by applying updates to multiple `Household` or `Individual` entities in a single process.
8+
9+
### In-System Bulk Data Modifications
10+
11+
To perform bulk updates, go to:
12+
```
13+
Home › Households | Individuals
14+
```
15+
Select the records and apply the desired action:
16+
17+
- **Mass update record fields**
18+
19+
Allows the user to define values and apply them using functions like:
20+
21+
```
22+
set, set null, upper, lower, toggle, ...
23+
```
24+
25+
- **Update fields using RegEx**
26+
27+
Allows the user to define a regular expression and a substitution pattern for any field and apply it.
28+
29+
---
30+
31+
### Bulk Data Export, Offline Editing, and Reimport
32+
33+
Users can modify data in bulk by exporting records to a `.xlsx` file, making necessary changes outside the system, and then importing the updated file back.
34+
35+
The `.xlsx` file contains two protected columns: `id` and `version`. **These columns should not be modified during regular editing**, as they are essential for correctly identifying and updating records.
36+
37+
Additionally, the system includes a **concurrency control mechanism** to prevent accidental overwrites of updated data.
38+
39+
#### Process Overview
40+
41+
1. **Export Data**
42+
43+
To begin, navigate to:
44+
```
45+
Home › Households | Individuals
46+
```
47+
Select the records and apply the action **"Export records as .xlsx for bulk updates"**.
48+
Then, choose the columns to update and press **[Export]**.
49+
The export process will be scheduled as an asynchronous task.
50+
51+
2. **Edit the Exported File**
52+
53+
Once the `.xlsx` file is generated, update the necessary values while ensuring that the `id` and `version` columns remain unchanged.
54+
55+
3. **Import the Updated File Back into the System**
56+
57+
After making the necessary modifications, navigate to:
58+
```
59+
Home › Program
60+
```
61+
Press **[Update Records]**, select the file, choose the target (`Household | Individual`), and optionally provide a description.
62+
Finally, press **[Import]**. This action will schedule an asynchronous task to process the updates.
63+
64+
#### **Preventing Conflicts with `CONCURRENCY_GUARD`**
65+
66+
To ensure data consistency, the system provides the **`CONCURRENCY_GUARD`** parameter.
67+
When enabled, it prevents updates if a record has changed after the export. This ensures that newer modifications made by other users or processes are not accidentally overwritten during import.
68+
If a conflict is detected, the system will reject the update, requiring the user to re-export the latest data before making further changes.
69+
70+
---
71+
72+
## **Single-Entity Operations**
73+
74+
Unlike bulk actions, these operations apply to a single `Household` or `Individual` entry. To access these actions, go to:
75+
```
76+
Home › Households | Individuals
77+
```
78+
Select a specific `Household` or `Individual`, then use the available buttons:
79+
80+
### Validate
81+
Ensures that the record meets the required data standards.
82+
If any issues are detected, the affected fields turn red, and an explanatory error message appears.
83+
84+
### View Raw Data
85+
86+
Allows the user to see the raw, unprocessed data of the selected record.
87+
This can be useful for debugging or reviewing the exact stored values before applying further modifications.

docs/src/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,6 @@ management process.
1414

1515
- [Import data](import_data/index.md) from [Kobo](import_data/kobo.md) / [Aurora](import_data/aurora.md) / [XLS](import_data/xls.md) (RDI format)
1616
- Data validation
17-
- Data cleaning and updates
17+
- [Data cleaning and updates](data_cleaning_updates.md)
1818
- Push data to HOPE
1919
- Export/Amend/Import process

src/country_workspace/config/fragments/constance.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,13 @@
5050
"KOBO_API_URL": ("", "Kobo API Server address", str),
5151
"CACHE_TIMEOUT": (86400, "Cache Redis TTL", int),
5252
"CACHE_BY_VERSION": (False, "Invalidate Cache on CW version change", bool),
53+
"CONCURRENCY_GUARD": (
54+
True,
55+
"Prevent updates if data has changed after export. When enabled, the system will reject updates to records"
56+
" that were modified after they were exported. This helps maintain data consistency and prevents accidental"
57+
" overwrites of newer information.",
58+
bool,
59+
),
5360
}
5461

5562
CONSTANCE_CONFIG_FIELDSETS = {
@@ -63,6 +70,7 @@
6370
"KOBO_API_TOKEN",
6471
"KOBO_API_URL",
6572
),
73+
"Data consistency": ("CONCURRENCY_GUARD",),
6674
}
6775

6876
# Mapping of config keys to masked default display values in the Constance admin UI.

src/country_workspace/workspaces/admin/cleaners/actions.py

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
11
from typing import TYPE_CHECKING
22

33
from django.contrib import admin, messages
4-
from django.http import HttpRequest, HttpResponse, HttpResponseRedirect
5-
from django.shortcuts import render
4+
from django.http import HttpRequest, HttpResponse
5+
from django.shortcuts import render, redirect
66
from django.utils.translation import gettext as _
77
from strategy_field.utils import fqn
88

9-
from .bulk_update import BulkUpdateForm, bulk_update_export_template
9+
from country_workspace.workspaces.admin.forms import BulkUpdateExportForm
10+
from .bulk_update import bulk_update_export_template
1011
from .calculate_checksum import calculate_checksum_impl
1112
from .mass_update import MassUpdateForm, mass_update_impl
1213
from .regex import RegexUpdateForm, regex_update_impl
@@ -29,7 +30,7 @@ def validate_records(
2930
) -> None:
3031
opts = queryset.model._meta
3132
job = AsyncJob.objects.create(
32-
description="Validate Queryset records for updates",
33+
description=validate_records.short_description,
3334
type=AsyncJob.JobType.ACTION,
3435
owner=state.request.user,
3536
action=fqn(validate_queryset),
@@ -47,7 +48,7 @@ def mass_update(
4748
request: HttpRequest,
4849
queryset: "QuerySet[Beneficiary]",
4950
) -> "HttpResponse":
50-
ctx = model_admin.get_common_context(request, title=_("Mass update"))
51+
ctx = model_admin.get_common_context(request, title=_(mass_update.short_description))
5152
ctx["checker"] = checker = model_admin.get_checker(request)
5253
ctx["preserved_filters"] = model_admin.get_preserved_filters(request)
5354
form = MassUpdateForm(request.POST, checker=checker)
@@ -57,7 +58,7 @@ def mass_update(
5758
opts = queryset.model._meta
5859

5960
job = AsyncJob.objects.create(
60-
description="Mass update record fields",
61+
description=mass_update.short_description,
6162
type=AsyncJob.JobType.ACTION,
6263
owner=state.request.user,
6364
action=fqn(mass_update_impl),
@@ -82,7 +83,7 @@ def regex_update(
8283
request: "HttpRequest",
8384
queryset: "QuerySet[Beneficiary]",
8485
) -> HttpResponse:
85-
ctx = model_admin.get_common_context(request, title=_("Regex update"))
86+
ctx = model_admin.get_common_context(request, title=_(regex_update.short_description))
8687
ctx["checker"] = checker = model_admin.get_checker(request)
8788
ctx["queryset"] = queryset
8889
ctx["opts"] = model_admin.model._meta
@@ -96,9 +97,8 @@ def regex_update(
9697
form = RegexUpdateForm(request.POST, checker=checker)
9798
if form.is_valid():
9899
opts = queryset.model._meta
99-
100100
job = AsyncJob.objects.create(
101-
description="Mass update record fields",
101+
description=regex_update.short_description,
102102
type=AsyncJob.JobType.ACTION,
103103
owner=state.request.user,
104104
action=fqn(regex_update_impl),
@@ -125,22 +125,22 @@ def regex_update(
125125
return render(request, "workspace/actions/regex.html", ctx)
126126

127127

128-
@admin.action(description="Create XLS template for bulk updates", permissions=["export"])
128+
@admin.action(description="Export records as .xlsx for bulk updates", permissions=["export"])
129129
def bulk_update_export(
130130
model_admin: "BeneficiaryBaseAdmin",
131131
request: HttpRequest,
132132
queryset: "QuerySet[Beneficiary]",
133133
) -> HttpResponse:
134-
ctx = model_admin.get_common_context(request, title=_("Export data for bulk update"))
134+
ctx = model_admin.get_common_context(request, title=_(bulk_update_export.short_description))
135135
ctx["checker"] = checker = model_admin.get_checker(request)
136136
ctx["preserved_filters"] = model_admin.get_preserved_filters(request)
137-
form = BulkUpdateForm(request.POST, checker=checker)
137+
form = BulkUpdateExportForm(request.POST, checker=checker)
138138
ctx["form"] = form
139139
if "_export" in request.POST and form.is_valid():
140-
columns = {"fields": ["id"] + sorted(form.cleaned_data["fields"])}
140+
columns = ["id", "version"] + sorted(form.cleaned_data["fields"])
141141
opts = queryset.model._meta
142142
job = AsyncJob.objects.create(
143-
description="Mass update record fields",
143+
description=bulk_update_export.short_description,
144144
type=AsyncJob.JobType.TASK,
145145
owner=state.request.user,
146146
action=fqn(bulk_update_export_template),
@@ -153,7 +153,7 @@ def bulk_update_export(
153153
)
154154
job.queue()
155155
model_admin.message_user(request, "Task scheduled", messages.SUCCESS)
156-
return HttpResponseRedirect(".")
156+
return redirect(".")
157157

158158
return render(request, "workspace/actions/bulk_update_export.html", ctx)
159159

@@ -166,7 +166,7 @@ def calculate_checksum(
166166
) -> HttpResponse:
167167
opts = queryset.model._meta
168168
job = AsyncJob.objects.create(
169-
description="Calculate record checksum",
169+
description=calculate_checksum.short_description,
170170
type=AsyncJob.JobType.ACTION,
171171
owner=state.request.user,
172172
action=fqn(calculate_checksum_impl),
@@ -178,4 +178,4 @@ def calculate_checksum(
178178
)
179179
job.queue()
180180
model_admin.message_user(request, "Task scheduled", messages.SUCCESS)
181-
return HttpResponseRedirect(".")
181+
return redirect(".")

src/country_workspace/workspaces/admin/cleaners/bulk_update.py

Lines changed: 49 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,35 +1,27 @@
11
import io
22
from io import BytesIO
3-
from typing import TYPE_CHECKING, Any
3+
from typing import TYPE_CHECKING, Any, Callable
44

55
from django import forms
66
from django.apps import apps
77
from django.core.exceptions import ObjectDoesNotExist
88
from django.core.files.storage import default_storage
99
from xlsxwriter import Workbook
1010

11+
from constance import config as constance_config
12+
1113
from hope_flex_fields.models import DataChecker, FlexField
1214
from hope_flex_fields.xlsx import get_format_for_field
1315
from hope_smart_import.readers import open_xls
1416

1517
from country_workspace.models import AsyncJob, Program
16-
from country_workspace.workspaces.admin.cleaners.base import BaseActionForm
1718

1819
if TYPE_CHECKING:
1920
from django.db.models import QuerySet
2021

2122
from country_workspace.types import Beneficiary
2223

2324

24-
class BulkUpdateForm(BaseActionForm):
25-
fields = forms.MultipleChoiceField(choices=[], widget=forms.CheckboxSelectMultiple())
26-
27-
def __init__(self, *args: Any, **kwargs: Any) -> None:
28-
checker: "DataChecker" = kwargs.pop("checker")
29-
super().__init__(*args, **kwargs)
30-
self.fields["fields"].choices = [(name, name) for name, fld in checker.get_form()().fields.items()]
31-
32-
3325
"""
3426
# class Criteria:
3527
# pass
@@ -130,12 +122,11 @@ def create_xls_importer(
130122
queryset: "QuerySet[Beneficiary]",
131123
program: Program,
132124
columns: list[str],
133-
) -> [io.BytesIO, Workbook]:
125+
) -> tuple[io.BytesIO, Workbook]:
134126
out = BytesIO()
135127
dc: DataChecker = program.get_checker_for(queryset.model)
136128

137129
workbook = Workbook(out, {"in_memory": True, "default_date_format": "yyyy/mm/dd"})
138-
139130
header_format = workbook.add_format(
140131
{
141132
"bold": False,
@@ -147,14 +138,13 @@ def create_xls_importer(
147138
"indent": 1,
148139
},
149140
)
150-
151141
header_format.set_bg_color("#DDDDDD")
152142
header_format.set_locked(True)
153143
header_format.set_align("center")
154144
header_format.set_bottom_color("black")
155145
worksheet = workbook.add_worksheet()
156146
worksheet.protect()
157-
worksheet.unprotect_range("B1:ZZ999", None)
147+
worksheet.unprotect_range("C1:ZZ999", None)
158148

159149
for i, fld_name in enumerate(columns):
160150
fld = dc_get_field(dc, fld_name)
@@ -179,39 +169,62 @@ def create_xls_importer(
179169
return out, workbook
180170

181171

182-
# def bulk_update_export_template(queryset, program_pk: str, columns: list[str], filename: str) -> bytes:
183172
def bulk_update_export_template(job: AsyncJob) -> bytes:
184173
model = apps.get_model(job.config["model_name"])
185174
queryset = model.objects.filter(pk__in=job.config["pks"])
186175
filename = "bulk_update_export_template/%s/%s/%s.xlsx" % (job.program.pk, job.owner.pk, job.config["model_name"])
187-
out, __ = create_xls_importer(queryset.all(), job.program, job.config["columns"])
176+
out, __ = create_xls_importer(queryset, job.program, job.config["columns"])
188177
path = default_storage.save(filename, out)
189178
job.file = path
190179
job.save()
191180
return path
192181

193182

194-
def bulk_update_individual(job: AsyncJob) -> dict[str, Any]:
195-
ret = {"not_found": []}
196-
for e in open_xls(io.BytesIO(job.file.read()), start_at=0):
183+
def bulk_update_collection(job: AsyncJob, collection_getter: Callable[[int], Any]) -> dict[str, Any]:
184+
result: dict[str, Any] = {"not_found": []}
185+
version_check = constance_config.CONCURRENCY_GUARD
186+
if version_check:
187+
result["version_mismatch"] = []
188+
errors = {}
189+
190+
file_data = job.file.read()
191+
rows = open_xls(io.BytesIO(file_data), start_at=0)
192+
for line_number, row in enumerate(rows, start=1):
197193
try:
198-
_id = e.pop("id")
199-
ind = job.program.individuals.get(id=_id)
200-
ind.flex_fields.update(**e)
201-
ind.save()
194+
_id = int(row.pop("id"))
195+
entity = collection_getter(_id)
196+
except (KeyError, ValueError):
197+
errors.setdefault("Invalid or missing 'id' on line", []).append(line_number)
198+
continue
202199
except ObjectDoesNotExist:
203-
ret["not_found"].append(_id)
204-
return ret
200+
result["not_found"].append(_id)
201+
continue
202+
203+
if version_check:
204+
try:
205+
_version = int(row.pop("version"))
206+
except (KeyError, ValueError):
207+
errors.setdefault("Invalid or missing 'version' on line", []).append(line_number)
208+
continue
209+
210+
if entity.version != _version:
211+
result["version_mismatch"].append(_id)
212+
continue
213+
214+
entity.flex_fields.update(**row)
215+
entity.save()
216+
217+
if errors:
218+
result["errors"] = errors
219+
220+
return result
221+
222+
223+
def bulk_update_individual(job: AsyncJob) -> dict[str, Any]:
224+
program = job.program
225+
return bulk_update_collection(job, lambda _id: program.individuals.get(id=_id))
205226

206227

207228
def bulk_update_household(job: AsyncJob) -> dict[str, Any]:
208-
ret = {"not_found": []}
209-
for e in open_xls(io.BytesIO(job.file.read()), start_at=0):
210-
try:
211-
_id = e.pop("id")
212-
ind = job.program.households.get(id=_id)
213-
ind.flex_fields.update(**e)
214-
ind.save()
215-
except ObjectDoesNotExist:
216-
ret["not_found"].append(_id)
217-
return ret
229+
program = job.program
230+
return bulk_update_collection(job, lambda _id: program.households.get(id=_id))

0 commit comments

Comments
 (0)