forked from achimrabus/polyscriptor
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy path.gitignore
More file actions
243 lines (199 loc) · 3.5 KB
/
.gitignore
File metadata and controls
243 lines (199 loc) · 3.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
*.egg-info/
dist/
build/
*.egg
# Jupyter Notebook
.ipynb_checkpoints
*/.ipynb_checkpoints/*
# Virtual environments
venv/
venv_*/
env/
ENV/
.venv
# IDE
.vscode/
.idea/
*.swp
*.swo
*~
# OS
.DS_Store
Thumbs.db
# Model checkpoints and outputs
models/
output/
output_*/
seq2seq_model_handwritten/
seq2seq_*/
cyrillic_seq2seq_*/
kazars_trocr_*/
checkpoint-*/
*.pt
*.pth
*.bin
*.safetensors
# Data
data/
datasets/
full-datasets/
unpacked-datasets/
*.arrow
*.csv
*.txt
!requirements.txt
!requirements-gpu.txt
!requirements-kraken.txt
!README.txt
# Images and media
*.jpg
*.jpeg
*.png
*.tif
*.tiff
*.bmp
!example*.png
!sample*.png
!assets/*.png
# Test images directory
HTR_Images/
# Logs and temporary files
logs/
*.log
runs/
tensorboard/
wandb/
# Security scanning reports
gitleaks_*.json
*_report.json
# Compressed files
*.zip
*.tar.gz
*.rar
# Large files
*.h5
*.hdf5
# Transkribus exports
page/
*.xml
!example*.xml
# ignore local claude settings
.claude/
# Test/debug scripts
investigate_page440.py
resume_extraction.py
run_ddp_manual.py
run_ddp_test.py
train_minimal_example.py
test_segmenter_comparison.py
config_test_ddp.yaml
config_ukrainian.yaml
# Platform-specific build scripts
run_training_ddp.bat
run_training_ddp.ps1
# Windows artifacts
nul
.hf_model_history.json
# API keys storage (from GUI)
.trocr_gui/
.env
*.env
# External repositories
party_repo/
kraken_repo/
# Backup and temporary files
*_backup.py
*_backup_*.py
fix_*.py
# Session documentation (auto-generated)
COMPREHENSIVE_*.md
QWEN_*.md
KRAKEN_*.md
SESSION_SUMMARY_*.md
Documentation/
# Detailed project documentation (hardware-specific, too detailed for public)
CLAUDE.md
# Internal planning documents (not for public repo)
*_PLAN.md
*_PLAN_*.md
IMPLEMENTATION_SUMMARY.md
PARTY_FIX_TESTING.md
PARTY_POC_VS_PLUGIN_COMPARISON.md
QUICK_START_IMPROVEMENTS.md
# Training scripts and logs (specific to our setup)
run_pylaia_*.sh
start_pylaia_*training*.py
start_pylaia_*replica.py
resume_pylaia_*.py
train_pylaia_*_pagexml.py
monitor_*.sh
*.backup
# Test and debug scripts (temporary)
check_*.py
test_*.py
# Exception: web API test suite is a proper test, not a throwaway script
!web/tests/test_server.py
# Accidentally created files
=*
# HuggingFace model download history (auto-generated)
.hf_model_history.json
# Legacy/broken inference implementations
inference_pylaia.py
inference_pylaia_lm.py
# Virtual environments (project-specific)
churro_venv/
party_env/
# Status and implementation notes (temporary documentation)
*_STATUS.md
*_NOTES.md
*_READY.md
*_COMPLETE.md
*_TODO.md
*_ISSUE.md
*_QUICKSTART.md
*_LESSONS_LEARNED.md
*_UPDATE.md
*_IMPLEMENTATION.md
*_BUGFIX_*.md
TRAINING_STATUS_*.md
# Experimental scripts (Churro)
convert_pylaia_to_churro_*.py
prepare_*_churro.py
finetune_churro_*.py
inference_churro.py
run_churro_*.sh
# Debug scripts
debug_*.py
# Server environment config (private)
SERVER_ENV.md
htr_gui/
# Training logs
training_ukrainian_v2c.log
nohup.out
nohup_*.log
# Lightning logs
lightning_logs/
# Internal planning docs
*_PLAN.md
PLAN_*.md
# Training run scripts (local)
run_party_*.sh
# Jupyter notebooks (local experiments)
*.ipynb
# Web UI upload temp dirs (created at runtime)
/tmp/polyscriptor_uploads_*/
# Web UI key store (contains API keys — never commit)
web/api_keys.json
web/uploads/
# Diagnostic and inspection scripts (temporary)
diagnose_exif_mismatch.py
inspect_*.ipynb
# Gabelsberger shorthand preparation (work in progress)
prepare_gabelsberger_shorthand.py
*.gitlab-token