Skip to content

Commit a2beacf

Browse files
committed
Added readme
1 parent 453ebf7 commit a2beacf

File tree

1 file changed

+373
-0
lines changed

1 file changed

+373
-0
lines changed

README.md

Lines changed: 373 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,373 @@
1+
# D47crunch
2+
3+
Python library for processing and standardizing carbonate clumped-isotope analyses, from low-level data out of a dual-inlet mass spectrometer to final, “absolute” Δ<sub>47</sub> values with fully propagated analytical error estimates.
4+
5+
All questions and suggestions are welcome and should be directed at [Mathieu Daëron](mailto:[email protected]?subject=[D47crunch]).
6+
7+
## 1. Requirements
8+
9+
Python 3, [numpy], [lmfit]. We recommend installing the [Anaconda] distribution.
10+
11+
[numpy]: https://numpy.org
12+
[lmfit]: https://lmfit.github.io
13+
[Anaconda]: https://www.anaconda.com/distribution
14+
15+
## 2. Installation
16+
17+
This should do the trick:
18+
19+
```bash
20+
pip install D47crunch
21+
```
22+
23+
Alternatively:
24+
25+
1. download [D47crunch-master.zip]
26+
2. unzip it
27+
3. rename the resulting directory to `D47crunch`
28+
4. move the `D47crunch` directory to somewhere in your `PYTHONPATH` or to your current working directory
29+
30+
[D47crunch-master.zip]: https://github.com/mdaeron/D47crunch/archive/master.zip
31+
32+
## 3. Documentation
33+
34+
For the full API documentation, see [https://github.com/mdaeron/D47crunch/docs/index.html].
35+
36+
For a short tutorial see below.
37+
38+
[https://github.com/mdaeron/D47crunch/docs/index.html]: https://github.com/mdaeron/D47crunch/docs/index.html
39+
40+
## 4. Usage
41+
42+
### 4.1 Import data
43+
44+
Start with some raw data stored as CSV in a file named `rawdata.csv` (spaces after commas are optional). Each line corresponds to a single analysis.
45+
46+
The only required fields are a sample identifier (`Sample`), and the working-gas delta values `d45`, `d46`, `d47`. If no session information is provided, all analuses will be treated as belonging to a single analytical session. Alternatively, to group analyses into sessions, provide session identifiers in a `Session` field. If not specified by the user, a unique identifier (`UID`) will be assigned automatically to each analysis. Independently known oxygen-17 anomalies may be provided as `D17O` (in ‰ relative to VSMOW, with λ equal to `D47data.lambda_17`), and are assumed to be zero otherwise. Working-gas deltas `d48` and `d49` may also be provided, and are otherwise treated as `NaN`.
47+
48+
Example `rawdata.csv` file:
49+
50+
```
51+
UID, Session, Sample, d45, d46, d47, d48, d49
52+
A01, Session1, ETH-1, 5.79502, 11.62767, 16.89351, 24.56708, 0.79486
53+
A02, Session1, IAEA-C1, 6.21907, 11.49107, 17.27749, 24.58270, 1.56318
54+
A03, Session1, ETH-2, -6.05868, -4.81718, -11.63506, -10.32578, 0.61352
55+
A04, Session1, IAEA-C2, -3.86184, 4.94184, 0.60612, 10.52732, 0.57118
56+
A05, Session1, ETH-3, 5.54365, 12.05228, 17.40555, 25.96919, 0.74608
57+
A06, Session1, ETH-2, -6.06706, -4.87710, -11.69927, -10.64421, 1.61234
58+
A07, Session1, ETH-1, 5.78821, 11.55910, 16.80191, 24.56423, 1.47963
59+
A08, Session1, IAEA-C2, -3.87692, 4.86889, 0.52185, 10.40390, 1.07032
60+
A09, Session1, ETH-3, 5.53984, 12.01344, 17.36863, 25.77145, 0.53264
61+
A10, Session1, IAEA-C1, 6.21905, 11.44785, 17.23428, 24.30975, 1.05702
62+
A11, Session2, ETH-1, 5.79958, 11.63130, 16.91766, 25.12232, 1.25904
63+
A12, Session2, IAEA-C1, 6.22514, 11.51264, 17.33588, 24.92770, 2.54331
64+
A13, Session2, ETH-2, -6.03042, -4.74644, -11.52551, -10.55907, 0.04024
65+
A14, Session2, IAEA-C2, -3.83702, 4.99278, 0.67529, 10.73885, 0.70929
66+
A15, Session2, ETH-3, 5.53700, 12.04892, 17.42023, 26.21793, 2.16400
67+
A16, Session2, ETH-2, -6.06820, -4.84004, -11.68630, -10.72563, 0.04653
68+
A17, Session2, ETH-1, 5.78263, 11.57182, 16.83519, 25.09964, 1.26283
69+
A18, Session2, IAEA-C2, -3.85355, 4.91943, 0.58463, 10.56221, 0.71245
70+
A19, Session2, ETH-3, 5.52227, 12.01174, 17.36841, 26.19829, 1.03740
71+
A20, Session2, IAEA-C1, 6.21937, 11.44701, 17.26426, 24.84678, 0.76866
72+
```
73+
74+
First create a `D47data` object named `foo` and import `rawdata.csv`:
75+
76+
```python
77+
import D47crunch
78+
79+
foo = D47crunch.D47data()
80+
foo.read('rawdata.csv')
81+
82+
print('foo contains:')
83+
print(f'{len(foo)} analyses')
84+
print(f'{len({r["Sample"] for r in foo})} samples')
85+
print(f'{len({r["Session"] for r in foo})} sessions')
86+
87+
# output:
88+
# foo contains:
89+
# 20 analyses
90+
# 5 samples
91+
# 2 sessions
92+
```
93+
94+
We can inspect the elements of `foo`:
95+
96+
```python
97+
r = foo[0]
98+
for k in r:
99+
print(f'r["{k}"] = {repr(r[k])}')
100+
101+
# output:
102+
# r["UID"] = 'A01'
103+
# r["Session"] = 'Session1'
104+
# r["Sample"] = 'ETH-1'
105+
# r["d45"] = 5.79502
106+
# r["d46"] = 11.62767
107+
# r["d47"] = 16.89351
108+
# r["d48"] = 24.56708
109+
# r["d49"] = 0.79486
110+
```
111+
112+
### 4.2 Working gas composition
113+
114+
There are two ways to define the isotpic composition of the working gas.
115+
116+
#### 4.2.1 Option 1: explicit definition
117+
118+
Directly writing to fields `d13Cwg_VPDB` and `d18Owg_VSMOW`:
119+
120+
```python
121+
for r in foo:
122+
if r['Session'] == 'Session1':
123+
r['d13Cwg_VPDB'] = -3.75
124+
r['d18Owg_VSMOW'] = 25.14
125+
elif r['Session'] == 'Session2':
126+
r['d13Cwg_VPDB'] = -3.74
127+
r['d18Owg_VSMOW'] = 25.17
128+
```
129+
130+
#### 4.2.2 Option 2: based on the known composition of a sample:
131+
132+
```python
133+
# The 2 code lines below are the default settings. It is thus not
134+
# necessary to include them unless you wish to use different values.
135+
136+
foo.SAMPLE_CONSTRAINING_WG_COMPOSITION = ('ETH-3', 1.71, -1.78)
137+
foo.ALPHA_18O_ACID_REACTION = 1.00813 # (Kim et al., 2007), calcite at 90 °C
138+
139+
# Compute the WG composition for each session:
140+
foo.wg()
141+
142+
```
143+
144+
### 4.3 Crunch the data
145+
146+
Now compute δ<sup>13</sup>C, δ<sup>18</sup>Ο, and raw Δ<sub>47</sub>, Δ<sub>48</sub>, Δ<sub>49</sub> values. Note that δ<sup>18</sup>Ο is the CO<sub>2</sub> composition. The user is responsible for any acid fractionation correction.
147+
148+
```python
149+
foo.crunch()
150+
151+
r = foo[0]
152+
for k in r:
153+
print(f'r["{k}"] = {r[k]}')
154+
155+
# output:
156+
# r["UID"] = A01
157+
# r["Session"] = Session1
158+
# r["Sample"] = ETH-1
159+
# r["d45"] = 5.79502
160+
# r["d46"] = 11.62767
161+
# r["d47"] = 16.89351
162+
# r["d48"] = 24.56708
163+
# r["d49"] = 0.79486
164+
# r["d13Cwg_VPDB"] = -3.7555729459832765
165+
# r["d18Owg_VSMOW"] = 25.1145492463934
166+
# r["D17O"] = 0.0
167+
# r["d13C_VPDB"] = 1.9948594073404546
168+
# r["d18O_VSMOW"] = 37.03357105550355
169+
# r["D47raw"] = -0.5746856128030498
170+
# r["D48raw"] = 1.1496833191546596
171+
# r["D49raw"] = -27.690248970251407
172+
```
173+
174+
### 4.4 Oxygen-17 correction parameters
175+
176+
Note that this crunching step uses the IUPAC oxygen-17 correction parameters, as recommended by [Daëron et al. (2016)](https://dx.doi.org/10.1016/j.chemgeo.2016.08.014) and [Schauer et al. (2016)](https://dx.doi.org/10.1002/rcm.7743):
177+
178+
```python
179+
R13_VPDB = 0.01118 # (Chang & Li, 1990)
180+
R18_VSMOW = 0.0020052 # (Baertschi, 1976)
181+
lambda_17 = 0.528 # (Barkan & Luz, 2005)
182+
R17_VSMOW = 0.00038475 # (Assonov & Brenninkmeijer, 2003, rescaled to R13_VPDB)
183+
R18_VPDB = R18_VSMOW * 1.03092
184+
R17_VPDB = R17_VSMOW * 1.03092 ** lambda_17
185+
```
186+
187+
To use different numerical values for these parameters, change them before performing `foo.crunch()`:
188+
189+
```python
190+
# to change the lambda value to 0.5164,
191+
# leaving the other parameters unchanged:
192+
foo.lambda_17 = 0.5164
193+
```
194+
195+
### 4.5 Reference frame
196+
197+
The nominal Δ<sub>47</sub> values assigned to the anchor samples are defined in `foo.Nominal_D47`, which may be redefined arbitrarily:
198+
199+
```python
200+
print(foo.Nominal_D47) # default values from Bernasconi et al. (2018)
201+
# output:
202+
# {'ETH-1': 0.258, 'ETH-2': 0.256, 'ETH-3': 0.691}
203+
204+
foo.Nominal_D47 = {
205+
"Foo-1": 0.232,
206+
"Foo-2": 0.289,
207+
"Foo-3": 0.455,
208+
"Foo-4": 0.704,
209+
}
210+
211+
print(foo.Nominal_D47)
212+
# output:
213+
# {'Foo-1': 0.232, 'Foo-2': 0.289, 'Foo-3': 0.455, 'Foo-4': 0.704}
214+
```
215+
216+
### 4.6 Standardization (`pooled`)
217+
218+
### 4.6.1 Default method (`pooled`)
219+
220+
The default standardization approach computes the best-fit standardization parameters (a,b,c) for each session, along with the best-fit Δ<sub>47</sub> values of unknown samples, using a pooled regression model taking into account the relative mapping of all samples (anchors and unknowns) in (δ<sub>47</sub>, Δ<sub>47</sub>) space.
221+
222+
```python
223+
foo.standardize()
224+
```
225+
226+
The following text is output:
227+
228+
```
229+
-------------------------------- -----------
230+
N samples (anchors + unknowns) 5 (3 + 2)
231+
N analyses (anchors + unknowns) 20 (12 + 8)
232+
Repeatability of δ13C_VPDB 13.8 ppm
233+
Repeatability of δ18O_VSMOW 41.9 ppm
234+
Repeatability of Δ47 (anchors) 10.7 ppm
235+
Repeatability of Δ47 (unknowns) 3.4 ppm
236+
Repeatability of Δ47 (all) 8.6 ppm
237+
Model degrees of freedom 12
238+
Student's 95% t-factor 2.18
239+
-------------------------------- -----------
240+
241+
-------- -- -- ----------- ------------ ------ ------ ------ ------------- ------------- --------------
242+
Session Na Nu d13Cwg_VPDB d18Owg_VSMOW r_d13C r_d18O r_D47 a ± SE 1e3 x b ± SE c ± SE
243+
-------- -- -- ----------- ------------ ------ ------ ------ ------------- ------------- --------------
244+
Session1 6 4 -3.756 25.115 0.0035 0.0415 0.0066 0.838 ± 0.016 3.340 ± 0.247 -0.859 ± 0.007
245+
Session2 6 4 -3.743 25.117 0.0174 0.0490 0.0119 0.815 ± 0.015 4.601 ± 0.246 -0.847 ± 0.007
246+
-------- -- -- ----------- ------------ ------ ------ ------ ------------- ------------- --------------
247+
248+
249+
------- - --------- ---------- ------ ------ -------- ------ --------
250+
Sample N d13C_VPDB d18O_VSMOW D47 SE 95% CL SD p_Levene
251+
------- - --------- ---------- ------ ------ -------- ------ --------
252+
ETH-1 4 2.00 37.00 0.2580 0.0096
253+
ETH-2 4 -10.03 20.18 0.2560 0.0154
254+
ETH-3 4 1.71 37.45 0.6910 0.0039
255+
IAEA-C1 4 2.46 36.88 0.3624 0.0061 ± 0.0133 0.0031 0.901
256+
IAEA-C2 4 -8.04 30.18 0.7246 0.0082 ± 0.0178 0.0037 0.825
257+
------- - --------- ---------- ------ ------ -------- ------ --------
258+
```
259+
260+
### 4.6.1 `D47data.sessions`
261+
262+
Under the hood, the normalization step does many things. It stores session information in `foo.sessions`:
263+
264+
```python
265+
print([k for k in foo.sessions])
266+
# output: ['Session1', 'Session2']
267+
268+
for k in foo.sessions['Session1']:
269+
if k == 'data':
270+
print(f"{k:>16}: [...] (too large to print)")
271+
else:
272+
print(f"{k:>16}: {foo.sessions['Session1'][k]}")
273+
# output:
274+
# data: [...] (too large to print)
275+
# scrambling_drift: False
276+
# slope_drift: False
277+
# wg_drift: False
278+
# d13Cwg_VPDB: -3.7555729459832765
279+
# d18Owg_VSMOW: 25.1145492463934
280+
# Na: 6
281+
# Nu: 4
282+
# a: 0.8381700022050721
283+
# SE_a: 0.015603758280720111
284+
# b: 0.0033401762623331823
285+
# SE_b: 0.00024740622188942793
286+
# c: -0.8586982120784981
287+
# SE_c: 0.006737855778815339
288+
# a2: 0.0
289+
# b2: 0.0
290+
# c2: 0.0
291+
# r_d13C_VPDB: 0.0035270933192414504
292+
# r_d18O_VSMOW: 0.04146499779427257
293+
# r_D47: 0.006638319347677248
294+
```
295+
296+
each element of `foo.sessions` has the following attributes:
297+
298+
+ `data`: list of all the analyses in this session
299+
+ `scrambling_drift`, `slope_drift`, `wg_drift`: whether parameters `a`, `b`,`c` are allowed to drift (change linearly with with time)
300+
+ `d13Cwg_VPDB`, `d18Owg_VSMOW`: working gas composition
301+
+ `Na`: number of anchor analyses in this session
302+
+ `Nu`: number of unknown analyses in this session
303+
+ `a`,`SE_a`: best-fit value and model SE of scrambling factor
304+
+ `b`,`SE_b`: best-fit value and model SE of compositional slope
305+
+ `c`,`SE_c`: best-fit value and model SE of working gas offset
306+
+ `a2`,`b2`,`c2`: drift rates (per unit of `TimeTag`) of `a`,`b`, `c`. If `TimeTag` is one of the fields in the raw data, this will be used, otherwise `TimeTag` starts at 0 for each session and increases by 1 for each analysis, in the listed order (thus beware of datasets ordered by sample name).
307+
+ `r_d13C_VPDB`, `r_d18O_VSMOW`, `r_D47`: repeatabilities for `d13C_VPDB`, `d18O_VSMOW`, `D47` in this session
308+
309+
### 4.6.2 `D47data.samples`, `D47data.anchors`, and `D47data.unknowns`
310+
311+
Additional information about the samples is stored in `foo.samples` (the same information can also be accessed via `foo.anchors` and `foo.unknowns`):
312+
313+
```python
314+
print([k for k in foo.samples])
315+
# output:
316+
# ['ETH-1', 'ETH-2', 'ETH-3', 'IAEA-C1', 'IAEA-C2']
317+
318+
for k in foo.samples['IAEA-C1']:
319+
if k == 'data':
320+
print(f"{k:>12}: [...] (too large to print)")
321+
else:
322+
print(f"{k:>12}: {foo.samples['IAEA-C1'][k]}")
323+
# output:
324+
# data: [...] (too large to print)
325+
# N: 4
326+
# SD_D47: 0.003120794222015294
327+
# d13C_VPDB: 2.4606390899379327
328+
# d18O_VSMOW: 36.87682448377142
329+
# D47: 0.36241877475632883
330+
# SE_D47: 0.006107113137661028
331+
# p_Levene: 0.9011524351870661
332+
```
333+
334+
Each element of `foo.samples` has the following attributes:
335+
336+
+ `N`: total number of analyses in the whole data set
337+
+ `SD_D47`: the sample SD of Δ<sub>47</sub> for this sample
338+
+ `d13C_VPDB`, `d18O_VSMOW`: average δ<sup>13</sup>C, δ<sup>18</sup>Ο values for the analyte CO<sub>2</sub>.
339+
+ `D47`, `SE_D47`: best-fit value and model SE for the Δ<sub>47</sub> of this sample
340+
+ `p_Levene`: p-value for a [Levene's test](https://en.wikipedia.org/wiki/Levene%27s_test) of whether the observed Δ<sub>47</sub> variance for this sample is significantly larger than that for ETH-3 (to change the reference sample to compare with, e.g. to ETH-1: `foo.LEVENE_REF_SAMPLE = 'ETH-1'`).
341+
342+
### 4.6.3 `D47data.repeatability`
343+
344+
The overall analytical repeatabilities are now saved to `foo.repeatability`:
345+
346+
```python
347+
for k in foo.repeatability:
348+
print(f"{k:>12}: {foo.repeatability[k]}")
349+
350+
# output:
351+
# r_d13C_VPDB: 0.013821704833171146
352+
# r_d18O_VSMOW: 0.04191487414887982
353+
# r_D47a: 0.010690471302409636
354+
# r_D47u: 0.0034370447628642863
355+
# r_D47: 0.008561367687546161
356+
```
357+
358+
+ `r_d13C_VPDB`: Analytical repeatability of δ<sup>13</sup>C for all samples
359+
+ `r_d18O_VSMOW`: Analytical repeatability of δ<sup>18</sup>O for all samples (CO<sub>2</sub> values)
360+
+ `r_D47a`: Analytical repeatability of Δ<sub>47</sub> for anchor samples only
361+
+ `r_D47u`: Analytical repeatability of Δ<sub>47</sub> for unknown samples only
362+
+ `r_D47`: Analytical repeatability of Δ<sub>47</sub> for all samples.
363+
364+
### 4.6.4 `D47data.result`
365+
366+
By default `foo.normalize()` uses the [`lmfit.Minimizer.leastsq()`](https://lmfit.github.io/lmfit-py/fitting.html#lmfit.minimizer.Minimizer) method, which returns an instance of [`lmfit.MinimizerResult`](https://lmfit.github.io/lmfit-py/fitting.html#lmfit.minimizer.MinimizerResult). This `MinimizerResult`instance is stored in `foo.result`. A detailed report may be printed using `foo.report()`
367+
368+
```python
369+
print(type(foo.normalization))
370+
# output:
371+
# <class 'lmfit.minimizer.MinimizerResult'>
372+
```
373+

0 commit comments

Comments
 (0)