Skip to content

Commit 1d3d8d2

Browse files
committed
Script for Bulk import of Metadata fields in Phrase TMS - v 0.05
1 parent a2ec663 commit 1d3d8d2

File tree

2 files changed

+380
-0
lines changed

2 files changed

+380
-0
lines changed
Lines changed: 175 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,175 @@
1+
Phrase TMS Bulk Import Script Documentation
2+
Version: 2.0 (Progress Bar Enhanced)
3+
Last Updated: [Date]
4+
5+
1. Prerequisites
6+
1.1 System Requirements
7+
Python 3.8+ (tested on 3.8, 3.9, 3.10)
8+
9+
RAM: Minimum 512MB (optimized for low memory usage)
10+
11+
Disk Space: 50MB free
12+
13+
Network: HTTPS access to cloud.memsource.com
14+
15+
1.2 Software Dependencies
16+
Install these packages using pip:
17+
18+
bash
19+
pip install \
20+
requests==2.31.0 \
21+
tqdm==4.66.1 \
22+
python-dotenv==1.0.0 \
23+
urllib3==1.26.18
24+
25+
1.3 Phrase TMS Requirements
26+
Valid API credentials (Admin-level access)
27+
28+
Existing domain structure (if creating subdomains)
29+
30+
API access enabled for your account
31+
32+
33+
2. Setup Instructions
34+
35+
2.1 Environment Configuration
36+
Create a project folder:
37+
bash
38+
mkdir phrase-import && cd phrase-import
39+
40+
Add credentials to .env file:
41+
ini
42+
# .env
43+
PHRASE_USER="[email protected]"
44+
PHRASE_PASSWORD="yourSecurePassword123!"
45+
46+
Set file permissions (Linux/macOS):
47+
bash
48+
chmod 600 .env
49+
50+
2.2 CSV File Preparation
51+
File Requirements:
52+
53+
UTF-8 encoding
54+
55+
First row as header
56+
57+
Columns (case-insensitive):
58+
Column Name Required For Example Value
59+
type All domain
60+
name All Marketing Team
61+
timezone Domains Europe/Paris
62+
parent_domain_id Subdomains DOM-1234
63+
client_id Business Units CLIENT-5678
64+
65+
Sample CSV (structure.csv):
66+
csv
67+
type,name,timezone,parent_domain_id,client_id
68+
domain,EMEA Division,Europe/Berlin,,
69+
subdomain,France Team,,DOM-9876,
70+
client,Acme Corporation,,,
71+
business_unit,Legal Dept,,,CLIENT-123
72+
73+
74+
3. Execution Guide
75+
76+
3.1 Basic Command
77+
bash
78+
python import_tool.py structure.csv
79+
Expected Output:
80+
Progress Bar Example
81+
Real-time progress with success/error counts
82+
83+
3.2 Advanced Options
84+
Flag Description Example
85+
--delimiter CSV delimiter character --delimiter=';'
86+
--dry-run Validate without creating entities --dry-run
87+
--help Show help message python import_tool.py -h
88+
89+
Dry Run Example:
90+
bash
91+
python import_tool.py test_data.csv --delimiter=',' --dry-run
92+
93+
94+
4. Post-Execution Steps
95+
96+
4.1 Verify Results
97+
Check bulk_import.log:
98+
bash
99+
tail -f bulk_import.log
100+
101+
Validate in Phrase TMS UI:
102+
Domains: Admin Console Domains
103+
Clients: Admin Console Clients
104+
105+
4.2 Handle Errors
106+
Retry Failed Items:
107+
108+
Create failed_rows.csv from log entries
109+
bash
110+
grep "ERROR" bulk_import.log > failed_rows.csv
111+
112+
Re-run with filtered file
113+
bash
114+
python import_tool.py failed_rows.csv
115+
116+
Common Error Codes:
117+
Code Meaning Solution
118+
401 Invalid credentials Verify .env file
119+
403 Permission denied Check admin rights
120+
409 Entity already exists Update CSV with unique IDs
121+
500 Server error Retry after 5 minutes
122+
123+
124+
5. Performance Optimization
125+
126+
5.1 For Large Files (>10,000 Rows)
127+
Split CSV into chunks:
128+
129+
bash
130+
split -l 1000 large_file.csv chunk_
131+
Parallel processing (GNU Parallel):
132+
133+
bash
134+
parallel -j 4 "python import_tool.py {}" ::: chunk_*
135+
136+
5.2 Memory Management
137+
Windows Task Manager: Monitor Python process memory
138+
139+
Linux/macOS:
140+
141+
bash
142+
top -pid $(pgrep -f import_tool.py)
143+
144+
145+
6. Security Best Practices
146+
147+
6.1 Credential Safety
148+
Rotate API passwords quarterly
149+
150+
Never commit .env to version control
151+
152+
Use environment variables in production:
153+
154+
bash
155+
export PHRASE_USER="[email protected]"
156+
export PHRASE_PASSWORD="..."
157+
158+
6.2 Network Security
159+
Whitelist Phrase TMS IP ranges:
160+
161+
52.28.160.0/19
162+
52.57.224.0/19
163+
Use VPN for on-premise deployments
164+
165+
166+
7. Support
167+
168+
7.1 Troubleshooting Guide
169+
Symptom Diagnostic Command
170+
Slow performance ping cloud.memsource.com
171+
Connection failures curl -v https://cloud.memsource.com/web/api/v2/auth/login
172+
Encoding errors file -I structure.csv
173+
174+
7.2 Contact Information
175+
Phrase TMS Support: [email protected]

Import tool.py

Lines changed: 205 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,205 @@
1+
"""
2+
Phrase TMS Bulk Import Script with Progress Tracking
3+
- Features memory-safe streaming CSV processing
4+
- Real-time progress statistics
5+
- Enterprise-grade error handling
6+
"""
7+
8+
import os
9+
import csv
10+
import logging
11+
from time import sleep
12+
from dotenv import load_dotenv
13+
from typing import Dict, Any, Generator
14+
import requests
15+
from requests.adapters import HTTPAdapter
16+
from urllib3.util.retry import Retry
17+
from tqdm import tqdm
18+
19+
# Configuration
20+
load_dotenv()
21+
BASE_URL = "https://cloud.memsource.com/web/api/v1" # Verified correct version
22+
MAX_RETRIES = 3
23+
BACKOFF_FACTOR = 1
24+
TIMEOUT = 30
25+
CSV_FIELDS = {
26+
'domain': ['name', 'timezone'],
27+
'subdomain': ['name', 'parent_domain_id'],
28+
'client': ['name'],
29+
'business_unit': ['name', 'client_id']
30+
}
31+
32+
# Configure logging
33+
logging.basicConfig(
34+
level=logging.INFO,
35+
format='%(asctime)s - %(levelname)s - %(message)s',
36+
handlers=[
37+
logging.FileHandler('bulk_import.log'),
38+
logging.StreamHandler()
39+
]
40+
)
41+
42+
43+
class PhraseTMSClient:
44+
"""Enhanced API client with connection pooling and smart retries"""
45+
46+
def __init__(self):
47+
self.session = requests.Session()
48+
retry = Retry(
49+
total=MAX_RETRIES,
50+
backoff_factor=BACKOFF_FACTOR,
51+
status_forcelist=[500, 502, 503, 504],
52+
allowed_methods=['POST', 'PUT', 'GET', 'DELETE']
53+
)
54+
adapter = HTTPAdapter(max_retries=retry)
55+
self.session.mount('https://', adapter)
56+
self.token = self._authenticate()
57+
58+
def _authenticate(self) -> str:
59+
"""Secure credential handling with environment variables"""
60+
credentials = {
61+
'userName': os.getenv('PHRASE_USER'),
62+
'password': os.getenv('PHRASE_PASSWORD')
63+
}
64+
response = self.session.post(
65+
f"{BASE_URL}/auth/login",
66+
json=credentials,
67+
timeout=TIMEOUT
68+
)
69+
response.raise_for_status()
70+
return response.json()['token']
71+
72+
def create_entity(self, entity_type: str, data: Dict[str, Any]) -> Dict[str, Any]:
73+
"""Generic entity creation with conflict detection"""
74+
endpoints = {
75+
'domain': '/domains',
76+
'subdomain': lambda d: f"/domains/{d['parent_domain_id']}/subDomains",
77+
'client': '/clients',
78+
'business_unit': '/businessUnits'
79+
}
80+
81+
url = BASE_URL + (
82+
endpoints[entity_type](data) if callable(endpoints[entity_type])
83+
else endpoints[entity_type]
84+
)
85+
86+
response = self.session.post(
87+
url,
88+
json=data,
89+
headers={'Authorization': f'ApiToken {self.token}'},
90+
timeout=TIMEOUT
91+
)
92+
93+
if response.status_code == 409:
94+
logging.debug(f"Entity conflict: {data.get('name')}")
95+
return {'status': 'conflict'}
96+
97+
response.raise_for_status()
98+
return response.json()
99+
100+
101+
def validate_row(entity_type: str, row: Dict[str, str]) -> bool:
102+
"""Structural validation of CSV rows"""
103+
required = CSV_FIELDS[entity_type]
104+
missing = [field for field in required if not row.get(field)]
105+
if missing:
106+
logging.warning(f"Missing fields: {missing} in {row.get('name')}")
107+
return False
108+
return True
109+
110+
111+
def count_csv_rows(file_path: str, delimiter: str) -> int:
112+
"""Memory-efficient row counting"""
113+
with open(file_path, 'r', encoding='utf-8') as f:
114+
reader = csv.reader(f, delimiter=delimiter)
115+
next(reader, None) # Skip header
116+
return sum(1 for _ in reader)
117+
118+
119+
def process_csv(file_path: str, delimiter: str) -> Generator[Dict[str, str], None, None]:
120+
"""Streaming CSV parser with normalization"""
121+
with open(file_path, 'r', encoding='utf-8', newline='') as f:
122+
reader = csv.DictReader(f, delimiter=delimiter)
123+
for row in reader:
124+
yield {k.strip().lower(): v.strip() for k, v in row.items()}
125+
126+
127+
def bulk_import(file_path: str, delimiter: str, dry_run: bool = False):
128+
"""Main import workflow with progress tracking"""
129+
client = PhraseTMSClient()
130+
stats = {'success': 0, 'errors': 0, 'skipped': 0}
131+
132+
total_rows = count_csv_rows(file_path, delimiter)
133+
134+
with tqdm(
135+
total=total_rows,
136+
desc="🚀 Importing",
137+
unit="row",
138+
bar_format="{l_bar}{bar:20}{r_bar}",
139+
dynamic_ncols=True
140+
) as pbar:
141+
for row in process_csv(file_path, delimiter):
142+
try:
143+
entity_type = row.get('type', '').lower()
144+
if not entity_type or entity_type not in CSV_FIELDS:
145+
stats['errors'] += 1
146+
logging.error(f"Invalid type: {row.get('type')}")
147+
continue
148+
149+
if not validate_row(entity_type, row):
150+
stats['errors'] += 1
151+
continue
152+
153+
if dry_run:
154+
stats['success'] += 1
155+
continue
156+
157+
result = client.create_entity(entity_type, row)
158+
if result.get('status') == 'conflict':
159+
stats['skipped'] += 1
160+
elif result:
161+
stats['success'] += 1
162+
else:
163+
stats['skipped'] += 1
164+
165+
except Exception as e:
166+
stats['errors'] += 1
167+
logging.debug(f"Row error: {str(e)}")
168+
sleep(0.5) # Error cooldown
169+
170+
finally:
171+
pbar.update(1)
172+
pbar.set_postfix(
173+
success=stats['success'],
174+
errors=stats['errors'],
175+
skipped=stats['skipped'],
176+
refresh=False
177+
)
178+
179+
del row # Memory management
180+
181+
logging.info("\n🔥 Final Statistics:")
182+
logging.info(f"✅ Success: {stats['success']}")
183+
logging.info(f"⚠️ Skipped: {stats['skipped']}")
184+
logging.info(f"❌ Errors: {stats['errors']}")
185+
186+
187+
if __name__ == "__main__":
188+
import argparse
189+
190+
parser = argparse.ArgumentParser(description='Phrase TMS Bulk Import Tool')
191+
parser.add_argument('file', help='CSV file path')
192+
parser.add_argument('--delimiter', default=',', help='CSV delimiter')
193+
parser.add_argument('--dry-run', action='store_true', help='Simulate import')
194+
args = parser.parse_args()
195+
196+
try:
197+
bulk_import(
198+
args.file,
199+
args.delimiter,
200+
args.dry_run
201+
)
202+
except KeyboardInterrupt:
203+
logging.info("\n🛑 Operation cancelled by user")
204+
except Exception as e:
205+
logging.error(f"💥 Catastrophic failure: {str(e)}")

0 commit comments

Comments
 (0)