Skip to content

Commit 0201873

Browse files
committed
Added an S3 endpoint and a duplicate-file validation with feedback in frontend
1 parent e2fbf7b commit 0201873

File tree

12 files changed

+179
-64
lines changed

12 files changed

+179
-64
lines changed

README.md

Lines changed: 99 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -10,37 +10,111 @@ See deployment for notes on how to deploy the project on a live system (Coming s
1010

1111
## Prerequisites
1212

13-
Taiga 2 is built on the following stack:
13+
- **Python 3.10** (required — the project pins `>=3.10,<3.11`)
14+
- **Poetry** (dependency manager)
15+
- **Node.js** and **Yarn** (for the React frontend)
16+
- **Redis** (Celery broker and result backend)
17+
- **Docker** (optional — only needed if you want to test file uploads via MiniStack)
1418

15-
### Database
19+
### Stack overview
1620

17-
- PostgreSQL (to store the app information)
18-
- Psycopg2
19-
- SQLAlchemy
20-
- Amazon S3 (to store the files)
21+
| Layer | Technologies |
22+
|-------|-------------|
23+
| Database | SQLite (local dev), PostgreSQL (production), SQLAlchemy |
24+
| Backend/API | Python 3.10, Flask, Connexion (Swagger/OpenAPI), Celery/Redis |
25+
| Object storage | AWS S3 (production), MiniStack (optional, local dev) |
26+
| Frontend | React, TypeScript, Webpack, Yarn |
2127

22-
### Backend/API
28+
## Installing
29+
30+
1. Install Python dependencies:
31+
32+
poetry install
33+
34+
2. Install frontend dependencies:
35+
36+
cd react_frontend && yarn install && cd ..
37+
38+
3. Copy the sample settings file (if you don't already have one):
39+
40+
cp settings.cfg.sample settings.cfg
41+
42+
4. Create the dev database:
43+
44+
poetry run bash -c 'source setup_env.sh && flask recreate-dev-db'
45+
46+
## Running Locally
47+
48+
Start these four processes in separate terminal windows:
49+
50+
```bash
51+
# Terminal 1 — Redis (skip if already running; check with `redis-cli ping`)
52+
redis-server
53+
54+
# Terminal 2 — Webpack dev server (frontend hot reload)
55+
poetry run bash -c 'source setup_env.sh && flask webpack'
56+
57+
# Terminal 3 — Flask app server
58+
poetry run bash -c 'source setup_env.sh && flask run'
59+
60+
# Terminal 4 — Celery worker (async file conversion tasks)
61+
poetry run bash -c 'source setup_env.sh && flask run-worker'
62+
```
63+
64+
Open your browser to: **http://127.0.0.1:5000/taiga/**
65+
66+
You are automatically logged in as the seeded admin user (`admin@broadinstitute.org`) via the `DEFAULT_USER_EMAIL` setting.
67+
68+
Without S3 configured, you can browse/search the seeded data, create folders, and work with the UI. File uploads require either MiniStack or real AWS credentials (see below).
69+
70+
## Local S3 with MiniStack (Optional)
71+
72+
[MiniStack](https://github.com/Nahuel990/ministack) is a free, open-source AWS emulator that runs 33 AWS services (including S3 and STS) in a single Docker container. It lets you test the full upload pipeline locally without an AWS account.
73+
74+
### Setup
75+
76+
1. Start MiniStack:
77+
78+
docker run -d --name ministack -p 4566:4566 nahuelnucera/ministack
79+
80+
2. Create the local S3 bucket (run once):
81+
82+
```bash
83+
python -c "import boto3; boto3.client('s3', endpoint_url='http://localhost:4566', aws_access_key_id='test', aws_secret_access_key='test').create_bucket(Bucket='taiga-dev')"
84+
```
2385

24-
- Python 3.6
25-
- Flask
26-
- Celery/Redis
27-
- Swagger
86+
3. In `settings.cfg`, uncomment the MiniStack block (Option A) and comment out Option B:
2887

29-
### FrontEnd
88+
```python
89+
S3_ENDPOINT_URL = 'http://localhost:4566'
90+
AWS_ACCESS_KEY_ID = 'test'
91+
AWS_SECRET_ACCESS_KEY = 'test'
92+
S3_BUCKET = 'taiga-dev'
93+
```
3094

31-
- React
32-
- TypeScript
33-
- Webpack
34-
- Yarn
95+
4. Restart Flask and the Celery worker to pick up the new settings.
3596

36-
### Configuring AWS users
97+
### Managing MiniStack
98+
99+
```bash
100+
docker start ministack # start (if previously stopped)
101+
docker stop ministack # stop
102+
docker rm ministack # remove entirely
103+
```
104+
105+
### Switching back to no-S3 mode
106+
107+
Set `S3_ENDPOINT_URL = ''` and clear the AWS keys in `settings.cfg`. The app runs fine without S3 — you just can't upload files.
108+
109+
## Configuring AWS (Production)
37110
38111
We need two users: One IAM account (main) is used in general by the app to read/write to S3. The second (uploader) has it's rights delegated via STS on a short term basis. However, this user should
39112
only have access to upload to a single location within S3.
40113

41114
Permissions for the main user:
42115

43-
```{
116+
```json
117+
{
44118
"Version": "2012-10-17",
45119
"Statement": [
46120
{
@@ -57,7 +131,8 @@ Permissions for the main user:
57131

58132
Permissions for the "upload" user:
59133

60-
```{
134+
```json
135+
{
61136
"Version": "2012-10-17",
62137
"Statement": [
63138
{
@@ -112,46 +187,13 @@ For our case, it is pretty simple:
112187

113188
#### Configure Taiga to use your Bucket
114189

115-
1. Copy `settings.cfg.sample` to `settings.cfg`
116-
2. edit `settings.cfg` and set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
117-
3. also set S3_BUCKET to the bucket created above
118-
119-
## Installing
120-
121-
1. Install all the dependencies:
122-
123-
`poetry install`
124-
`poetry shell`
125-
126-
2. Create a test database to have some data to work with:
127-
128-
`./flask recreate-dev-db`
190+
1. Edit `settings.cfg` and set `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`
191+
2. Set `S3_BUCKET` to the bucket created above
192+
3. Remove `S3_ENDPOINT_URL` (or leave it empty) so the app connects to real AWS
129193

130-
3. Open 4 terminal windows to launch Webpack, Taiga 2, Celery and Redis processes:
194+
## Adding user to admin group
131195

132-
a. In terminal 1:
133-
134-
`./flask webpack`
135-
136-
b. In terminal 2:
137-
138-
`redis-server`
139-
140-
c. In terminal 3:
141-
142-
`./flask run`
143-
144-
d. In terminal 4:
145-
146-
`./flask run-worker`
147-
148-
4. Congratulations! You can now access to Taiga 2 through your browser at:
149-
150-
`http://127.0.0.1:5000/taiga/`
151-
152-
## adding user to admin group
153-
154-
```
196+
```sql
155197
INSERT INTO group_user_association (group_id, user_id) select 1, id FROM users WHERE name =
156198
'pmontgom';
157199
```

react_frontend/src/components/UploadTracker.tsx

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -207,14 +207,19 @@ export class UploadTracker {
207207
): Promise<DatasetIdAndVersionId> {
208208
// TODO: If we change the page, we lose the download
209209
// Configure the AWS S3 object with the received credentials
210-
let s3 = new AWS.S3({
210+
let s3Options: AWS.S3.ClientConfiguration = {
211211
apiVersion: "2006-03-01",
212212
credentials: {
213213
accessKeyId: s3_credentials.accessKeyId,
214214
secretAccessKey: s3_credentials.secretAccessKey,
215215
sessionToken: s3_credentials.sessionToken,
216216
},
217-
});
217+
};
218+
if (s3_credentials.endpointUrl) {
219+
s3Options.endpoint = s3_credentials.endpointUrl;
220+
s3Options.s3ForcePathStyle = true;
221+
}
222+
let s3 = new AWS.S3(s3Options);
218223

219224
// Looping through all the files
220225
let uploadPromises: Array<Promise<boolean>> = [];

react_frontend/src/components/modals/UploadDialogs/UploadDialog.tsx

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,20 @@ export class UploadDialog extends React.Component<
123123
validationError = "At least one file is required";
124124
}
125125

126+
if (!validationError) {
127+
const nameCounts = new Map<string, number>();
128+
for (const file of this.state.uploadFiles) {
129+
const trimmed = file.name.trim();
130+
nameCounts.set(trimmed, (nameCounts.get(trimmed) || 0) + 1);
131+
}
132+
const duplicates = Array.from(nameCounts.entries())
133+
.filter(([_, count]) => count > 1)
134+
.map(([name]) => name);
135+
if (duplicates.length > 0) {
136+
validationError = `Duplicate file name${duplicates.length > 1 ? "s" : ""}: ${duplicates.join(", ")}`;
137+
}
138+
}
139+
126140
let submitButton: any;
127141

128142
// If we have a new datasetVersion in the state, we can show the link button

react_frontend/src/models/models.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -236,6 +236,7 @@ export interface S3Credentials {
236236
sessionToken: string;
237237
bucket: string;
238238
prefix: string;
239+
endpointUrl?: string;
239240
}
240241

241242
export interface S3UploadedData {

settings.cfg.sample

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,14 +9,28 @@ PREFIX = '/taiga'
99
USE_FRONTEND_DEV_SERVER = True
1010

1111
# Amazon Web Services
12+
# Option A — Local dev with MiniStack (no real AWS needed):
13+
# docker run -d --name ministack -p 4566:4566 nahuelnucera/ministack
14+
# Then uncomment the MiniStack block below and comment out Option B.
15+
#
16+
# Option B — Real AWS or no S3 (browse seeded data only):
17+
# Leave S3_ENDPOINT_URL empty and fill in real AWS keys, or leave all blank.
18+
19+
# --- Option A: MiniStack ---
20+
# S3_ENDPOINT_URL = 'http://localhost:4566'
21+
# AWS_ACCESS_KEY_ID = 'test'
22+
# AWS_SECRET_ACCESS_KEY = 'test'
23+
# S3_BUCKET = 'taiga-dev'
24+
25+
# --- Option B: Real AWS (or blank for no uploads) ---
26+
S3_ENDPOINT_URL = ''
1227
AWS_ACCESS_KEY_ID = ''
1328
AWS_SECRET_ACCESS_KEY = ''
29+
S3_BUCKET = ''
1430

1531
CLIENT_UPLOAD_AWS_ACCESS_KEY_ID = AWS_ACCESS_KEY_ID
1632
CLIENT_UPLOAD_AWS_SECRET_ACCESS_KEY = AWS_SECRET_ACCESS_KEY
1733

18-
S3_BUCKET = ''
19-
2034
# Database configuration
2135
SQLALCHEMY_DATABASE_URI = 'sqlite:///db.sqlite3'
2236

taiga2/api_app.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ def create_app(settings_override=None, settings_file=None):
5353
init_backend_auth(app)
5454

5555
# Exception report with StackDriver
56-
exception_reporter.init_app(app=app, service_name="taiga-" + app.config["ENV"])
56+
exception_reporter.init_app(app=app, service_name="taiga-" + app.config.get("ENV", "dev"))
5757
register_errorhandlers(app=app)
5858

5959
return api_app, app

taiga2/controllers/endpoint.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -213,6 +213,10 @@ def get_s3_credentials():
213213
"prefix": prefix,
214214
}
215215

216+
endpoint_url = flask.current_app.config.get("S3_ENDPOINT_URL")
217+
if endpoint_url:
218+
model_frontend_credentials["endpointUrl"] = endpoint_url
219+
216220
# See frontend/models/models.ts for the S3Credentials object and Swagger.yaml
217221
return flask.jsonify(model_frontend_credentials)
218222

@@ -637,6 +641,10 @@ def create_dataset(sessionDatasetInfo):
637641
dataset_description = sessionDatasetInfo.get("datasetDescription", None)
638642
current_folder_id = sessionDatasetInfo["currentFolderId"]
639643

644+
duplicates = models_controller.validate_session_has_no_duplicate_filenames(session_id)
645+
if duplicates:
646+
api_error("Duplicate file names in upload: {}".format(", ".join(duplicates)))
647+
640648
added_dataset = models_controller.add_dataset_from_session(
641649
session_id, dataset_name, dataset_description, current_folder_id
642650
)
@@ -723,6 +731,10 @@ def create_new_dataset_version(datasetVersionMetadata):
723731
dataset_version = datasetVersionMetadata.get("datasetVersion", None)
724732
current_user = models_controller.get_current_session_user()
725733

734+
duplicates = models_controller.validate_session_has_no_duplicate_filenames(session_id)
735+
if duplicates:
736+
api_error("Duplicate file names in upload: {}".format(", ".join(duplicates)))
737+
726738
models_controller.lock()
727739

728740
new_dataset_version = _create_new_dataset_version_from_session(

taiga2/controllers/models_controller.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1507,6 +1507,17 @@ def log_datafile_read_access_info(datafile_id: str):
15071507
db.session.commit()
15081508

15091509

1510+
def validate_session_has_no_duplicate_filenames(session_id: str):
1511+
"""Check that no two files in the upload session share the same name.
1512+
Returns a list of duplicate names (empty if none)."""
1513+
files = get_upload_session_files_from_session(session_id)
1514+
seen = {}
1515+
for f in files:
1516+
name = f.filename.strip()
1517+
seen[name] = seen.get(name, 0) + 1
1518+
return [name for name, count in seen.items() if count > 1]
1519+
1520+
15101521
def _add_datafiles_from_session(session_id: str):
15111522
# We retrieve all the upload_session_files related to the UploadSession
15121523
added_files = get_upload_session_files_from_session(session_id)

taiga2/default_settings.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,3 +16,6 @@
1616

1717
# S3 settings
1818
CLIENT_UPLOAD_TOKEN_EXPIRY = 86400 # A day
19+
20+
ENV = "dev"
21+
REPORT_EXCEPTIONS = False

taiga2/swagger/swagger.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -243,6 +243,8 @@ paths:
243243
type: string
244244
prefix:
245245
type: string
246+
endpointUrl:
247+
type: string
246248

247249
/upload_session:
248250
get:

0 commit comments

Comments
 (0)