|
| 1 | +# D47crunch |
| 2 | + |
| 3 | +Python library for processing and standardizing carbonate clumped-isotope analyses, from low-level data out of a dual-inlet mass spectrometer to final, “absolute” Δ<sub>47</sub> values with fully propagated analytical error estimates. |
| 4 | + |
| 5 | +All questions and suggestions are welcome and should be directed at [Mathieu Daëron ](mailto:[email protected]?subject=[D47crunch]). |
| 6 | + |
| 7 | +## 1. Requirements |
| 8 | + |
| 9 | +Python 3, [numpy], [lmfit]. We recommend installing the [Anaconda] distribution. |
| 10 | + |
| 11 | +[numpy]: https://numpy.org |
| 12 | +[lmfit]: https://lmfit.github.io |
| 13 | +[Anaconda]: https://www.anaconda.com/distribution |
| 14 | + |
| 15 | +## 2. Installation |
| 16 | + |
| 17 | +This should do the trick: |
| 18 | + |
| 19 | +```bash |
| 20 | +pip install D47crunch |
| 21 | +``` |
| 22 | + |
| 23 | +Alternatively: |
| 24 | + |
| 25 | +1. download [D47crunch-master.zip] |
| 26 | +2. unzip it |
| 27 | +3. rename the resulting directory to `D47crunch` |
| 28 | +4. move the `D47crunch` directory to somewhere in your `PYTHONPATH` or to your current working directory |
| 29 | + |
| 30 | +[D47crunch-master.zip]: https://github.com/mdaeron/D47crunch/archive/master.zip |
| 31 | + |
| 32 | +## 3. Documentation |
| 33 | + |
| 34 | +For the full API documentation, see [https://github.com/mdaeron/D47crunch/docs/index.html]. |
| 35 | + |
| 36 | +For a short tutorial see below. |
| 37 | + |
| 38 | +[https://github.com/mdaeron/D47crunch/docs/index.html]: https://github.com/mdaeron/D47crunch/docs/index.html |
| 39 | + |
| 40 | +## 4. Usage |
| 41 | + |
| 42 | +### 4.1 Import data |
| 43 | + |
| 44 | +Start with some raw data stored as CSV in a file named `rawdata.csv` (spaces after commas are optional). Each line corresponds to a single analysis. |
| 45 | + |
| 46 | +The only required fields are a sample identifier (`Sample`), and the working-gas delta values `d45`, `d46`, `d47`. If no session information is provided, all analuses will be treated as belonging to a single analytical session. Alternatively, to group analyses into sessions, provide session identifiers in a `Session` field. If not specified by the user, a unique identifier (`UID`) will be assigned automatically to each analysis. Independently known oxygen-17 anomalies may be provided as `D17O` (in ‰ relative to VSMOW, with λ equal to `D47data.lambda_17`), and are assumed to be zero otherwise. Working-gas deltas `d48` and `d49` may also be provided, and are otherwise treated as `NaN`. |
| 47 | + |
| 48 | +Example `rawdata.csv` file: |
| 49 | + |
| 50 | +``` |
| 51 | +UID, Session, Sample, d45, d46, d47, d48, d49 |
| 52 | +A01, Session1, ETH-1, 5.79502, 11.62767, 16.89351, 24.56708, 0.79486 |
| 53 | +A02, Session1, IAEA-C1, 6.21907, 11.49107, 17.27749, 24.58270, 1.56318 |
| 54 | +A03, Session1, ETH-2, -6.05868, -4.81718, -11.63506, -10.32578, 0.61352 |
| 55 | +A04, Session1, IAEA-C2, -3.86184, 4.94184, 0.60612, 10.52732, 0.57118 |
| 56 | +A05, Session1, ETH-3, 5.54365, 12.05228, 17.40555, 25.96919, 0.74608 |
| 57 | +A06, Session1, ETH-2, -6.06706, -4.87710, -11.69927, -10.64421, 1.61234 |
| 58 | +A07, Session1, ETH-1, 5.78821, 11.55910, 16.80191, 24.56423, 1.47963 |
| 59 | +A08, Session1, IAEA-C2, -3.87692, 4.86889, 0.52185, 10.40390, 1.07032 |
| 60 | +A09, Session1, ETH-3, 5.53984, 12.01344, 17.36863, 25.77145, 0.53264 |
| 61 | +A10, Session1, IAEA-C1, 6.21905, 11.44785, 17.23428, 24.30975, 1.05702 |
| 62 | +A11, Session2, ETH-1, 5.79958, 11.63130, 16.91766, 25.12232, 1.25904 |
| 63 | +A12, Session2, IAEA-C1, 6.22514, 11.51264, 17.33588, 24.92770, 2.54331 |
| 64 | +A13, Session2, ETH-2, -6.03042, -4.74644, -11.52551, -10.55907, 0.04024 |
| 65 | +A14, Session2, IAEA-C2, -3.83702, 4.99278, 0.67529, 10.73885, 0.70929 |
| 66 | +A15, Session2, ETH-3, 5.53700, 12.04892, 17.42023, 26.21793, 2.16400 |
| 67 | +A16, Session2, ETH-2, -6.06820, -4.84004, -11.68630, -10.72563, 0.04653 |
| 68 | +A17, Session2, ETH-1, 5.78263, 11.57182, 16.83519, 25.09964, 1.26283 |
| 69 | +A18, Session2, IAEA-C2, -3.85355, 4.91943, 0.58463, 10.56221, 0.71245 |
| 70 | +A19, Session2, ETH-3, 5.52227, 12.01174, 17.36841, 26.19829, 1.03740 |
| 71 | +A20, Session2, IAEA-C1, 6.21937, 11.44701, 17.26426, 24.84678, 0.76866 |
| 72 | +``` |
| 73 | + |
| 74 | +First create a `D47data` object named `foo` and import `rawdata.csv`: |
| 75 | + |
| 76 | +```python |
| 77 | +import D47crunch |
| 78 | + |
| 79 | +foo = D47crunch.D47data() |
| 80 | +foo.read('rawdata.csv') |
| 81 | + |
| 82 | +print('foo contains:') |
| 83 | +print(f'{len(foo)} analyses') |
| 84 | +print(f'{len({r["Sample"] for r in foo})} samples') |
| 85 | +print(f'{len({r["Session"] for r in foo})} sessions') |
| 86 | + |
| 87 | +# output: |
| 88 | +# foo contains: |
| 89 | +# 20 analyses |
| 90 | +# 5 samples |
| 91 | +# 2 sessions |
| 92 | +``` |
| 93 | + |
| 94 | +We can inspect the elements of `foo`: |
| 95 | + |
| 96 | +```python |
| 97 | +r = foo[0] |
| 98 | +for k in r: |
| 99 | + print(f'r["{k}"] = {repr(r[k])}') |
| 100 | + |
| 101 | +# output: |
| 102 | +# r["UID"] = 'A01' |
| 103 | +# r["Session"] = 'Session1' |
| 104 | +# r["Sample"] = 'ETH-1' |
| 105 | +# r["d45"] = 5.79502 |
| 106 | +# r["d46"] = 11.62767 |
| 107 | +# r["d47"] = 16.89351 |
| 108 | +# r["d48"] = 24.56708 |
| 109 | +# r["d49"] = 0.79486 |
| 110 | +``` |
| 111 | + |
| 112 | +### 4.2 Working gas composition |
| 113 | + |
| 114 | +There are two ways to define the isotpic composition of the working gas. |
| 115 | + |
| 116 | +#### 4.2.1 Option 1: explicit definition |
| 117 | + |
| 118 | +Directly writing to fields `d13Cwg_VPDB` and `d18Owg_VSMOW`: |
| 119 | + |
| 120 | +```python |
| 121 | +for r in foo: |
| 122 | + if r['Session'] == 'Session1': |
| 123 | + r['d13Cwg_VPDB'] = -3.75 |
| 124 | + r['d18Owg_VSMOW'] = 25.14 |
| 125 | + elif r['Session'] == 'Session2': |
| 126 | + r['d13Cwg_VPDB'] = -3.74 |
| 127 | + r['d18Owg_VSMOW'] = 25.17 |
| 128 | +``` |
| 129 | + |
| 130 | +#### 4.2.2 Option 2: based on the known composition of a sample: |
| 131 | + |
| 132 | +```python |
| 133 | +# The 2 code lines below are the default settings. It is thus not |
| 134 | +# necessary to include them unless you wish to use different values. |
| 135 | + |
| 136 | +foo.SAMPLE_CONSTRAINING_WG_COMPOSITION = ('ETH-3', 1.71, -1.78) |
| 137 | +foo.ALPHA_18O_ACID_REACTION = 1.00813 # (Kim et al., 2007), calcite at 90 °C |
| 138 | + |
| 139 | +# Compute the WG composition for each session: |
| 140 | +foo.wg() |
| 141 | + |
| 142 | +``` |
| 143 | + |
| 144 | +### 4.3 Crunch the data |
| 145 | + |
| 146 | +Now compute δ<sup>13</sup>C, δ<sup>18</sup>Ο, and raw Δ<sub>47</sub>, Δ<sub>48</sub>, Δ<sub>49</sub> values. Note that δ<sup>18</sup>Ο is the CO<sub>2</sub> composition. The user is responsible for any acid fractionation correction. |
| 147 | + |
| 148 | +```python |
| 149 | +foo.crunch() |
| 150 | + |
| 151 | +r = foo[0] |
| 152 | +for k in r: |
| 153 | + print(f'r["{k}"] = {r[k]}') |
| 154 | + |
| 155 | +# output: |
| 156 | +# r["UID"] = A01 |
| 157 | +# r["Session"] = Session1 |
| 158 | +# r["Sample"] = ETH-1 |
| 159 | +# r["d45"] = 5.79502 |
| 160 | +# r["d46"] = 11.62767 |
| 161 | +# r["d47"] = 16.89351 |
| 162 | +# r["d48"] = 24.56708 |
| 163 | +# r["d49"] = 0.79486 |
| 164 | +# r["d13Cwg_VPDB"] = -3.7555729459832765 |
| 165 | +# r["d18Owg_VSMOW"] = 25.1145492463934 |
| 166 | +# r["D17O"] = 0.0 |
| 167 | +# r["d13C_VPDB"] = 1.9948594073404546 |
| 168 | +# r["d18O_VSMOW"] = 37.03357105550355 |
| 169 | +# r["D47raw"] = -0.5746856128030498 |
| 170 | +# r["D48raw"] = 1.1496833191546596 |
| 171 | +# r["D49raw"] = -27.690248970251407 |
| 172 | +``` |
| 173 | + |
| 174 | +### 4.4 Oxygen-17 correction parameters |
| 175 | + |
| 176 | +Note that this crunching step uses the IUPAC oxygen-17 correction parameters, as recommended by [Daëron et al. (2016)](https://dx.doi.org/10.1016/j.chemgeo.2016.08.014) and [Schauer et al. (2016)](https://dx.doi.org/10.1002/rcm.7743): |
| 177 | + |
| 178 | +```python |
| 179 | +R13_VPDB = 0.01118 # (Chang & Li, 1990) |
| 180 | +R18_VSMOW = 0.0020052 # (Baertschi, 1976) |
| 181 | +lambda_17 = 0.528 # (Barkan & Luz, 2005) |
| 182 | +R17_VSMOW = 0.00038475 # (Assonov & Brenninkmeijer, 2003, rescaled to R13_VPDB) |
| 183 | +R18_VPDB = R18_VSMOW * 1.03092 |
| 184 | +R17_VPDB = R17_VSMOW * 1.03092 ** lambda_17 |
| 185 | +``` |
| 186 | + |
| 187 | +To use different numerical values for these parameters, change them before performing `foo.crunch()`: |
| 188 | + |
| 189 | +```python |
| 190 | +# to change the lambda value to 0.5164, |
| 191 | +# leaving the other parameters unchanged: |
| 192 | +foo.lambda_17 = 0.5164 |
| 193 | +``` |
| 194 | + |
| 195 | +### 4.5 Reference frame |
| 196 | + |
| 197 | +The nominal Δ<sub>47</sub> values assigned to the anchor samples are defined in `foo.Nominal_D47`, which may be redefined arbitrarily: |
| 198 | + |
| 199 | +```python |
| 200 | +print(foo.Nominal_D47) # default values from Bernasconi et al. (2018) |
| 201 | +# output: |
| 202 | +# {'ETH-1': 0.258, 'ETH-2': 0.256, 'ETH-3': 0.691} |
| 203 | + |
| 204 | +foo.Nominal_D47 = { |
| 205 | + "Foo-1": 0.232, |
| 206 | + "Foo-2": 0.289, |
| 207 | + "Foo-3": 0.455, |
| 208 | + "Foo-4": 0.704, |
| 209 | + } |
| 210 | + |
| 211 | +print(foo.Nominal_D47) |
| 212 | +# output: |
| 213 | +# {'Foo-1': 0.232, 'Foo-2': 0.289, 'Foo-3': 0.455, 'Foo-4': 0.704} |
| 214 | +``` |
| 215 | + |
| 216 | +### 4.6 Standardization (`pooled`) |
| 217 | + |
| 218 | +### 4.6.1 Default method (`pooled`) |
| 219 | + |
| 220 | +The default standardization approach computes the best-fit standardization parameters (a,b,c) for each session, along with the best-fit Δ<sub>47</sub> values of unknown samples, using a pooled regression model taking into account the relative mapping of all samples (anchors and unknowns) in (δ<sub>47</sub>, Δ<sub>47</sub>) space. |
| 221 | + |
| 222 | +```python |
| 223 | +foo.standardize() |
| 224 | +``` |
| 225 | + |
| 226 | +The following text is output: |
| 227 | + |
| 228 | +``` |
| 229 | +-------------------------------- ----------- |
| 230 | +N samples (anchors + unknowns) 5 (3 + 2) |
| 231 | +N analyses (anchors + unknowns) 20 (12 + 8) |
| 232 | +Repeatability of δ13C_VPDB 13.8 ppm |
| 233 | +Repeatability of δ18O_VSMOW 41.9 ppm |
| 234 | +Repeatability of Δ47 (anchors) 10.7 ppm |
| 235 | +Repeatability of Δ47 (unknowns) 3.4 ppm |
| 236 | +Repeatability of Δ47 (all) 8.6 ppm |
| 237 | +Model degrees of freedom 12 |
| 238 | +Student's 95% t-factor 2.18 |
| 239 | +-------------------------------- ----------- |
| 240 | +
|
| 241 | +-------- -- -- ----------- ------------ ------ ------ ------ ------------- ------------- -------------- |
| 242 | +Session Na Nu d13Cwg_VPDB d18Owg_VSMOW r_d13C r_d18O r_D47 a ± SE 1e3 x b ± SE c ± SE |
| 243 | +-------- -- -- ----------- ------------ ------ ------ ------ ------------- ------------- -------------- |
| 244 | +Session1 6 4 -3.756 25.115 0.0035 0.0415 0.0066 0.838 ± 0.016 3.340 ± 0.247 -0.859 ± 0.007 |
| 245 | +Session2 6 4 -3.743 25.117 0.0174 0.0490 0.0119 0.815 ± 0.015 4.601 ± 0.246 -0.847 ± 0.007 |
| 246 | +-------- -- -- ----------- ------------ ------ ------ ------ ------------- ------------- -------------- |
| 247 | +
|
| 248 | +
|
| 249 | +------- - --------- ---------- ------ ------ -------- ------ -------- |
| 250 | +Sample N d13C_VPDB d18O_VSMOW D47 SE 95% CL SD p_Levene |
| 251 | +------- - --------- ---------- ------ ------ -------- ------ -------- |
| 252 | +ETH-1 4 2.00 37.00 0.2580 0.0096 |
| 253 | +ETH-2 4 -10.03 20.18 0.2560 0.0154 |
| 254 | +ETH-3 4 1.71 37.45 0.6910 0.0039 |
| 255 | +IAEA-C1 4 2.46 36.88 0.3624 0.0061 ± 0.0133 0.0031 0.901 |
| 256 | +IAEA-C2 4 -8.04 30.18 0.7246 0.0082 ± 0.0178 0.0037 0.825 |
| 257 | +------- - --------- ---------- ------ ------ -------- ------ -------- |
| 258 | +``` |
| 259 | + |
| 260 | +### 4.6.1 `D47data.sessions` |
| 261 | + |
| 262 | +Under the hood, the normalization step does many things. It stores session information in `foo.sessions`: |
| 263 | + |
| 264 | +```python |
| 265 | +print([k for k in foo.sessions]) |
| 266 | +# output: ['Session1', 'Session2'] |
| 267 | + |
| 268 | +for k in foo.sessions['Session1']: |
| 269 | + if k == 'data': |
| 270 | + print(f"{k:>16}: [...] (too large to print)") |
| 271 | + else: |
| 272 | + print(f"{k:>16}: {foo.sessions['Session1'][k]}") |
| 273 | +# output: |
| 274 | +# data: [...] (too large to print) |
| 275 | +# scrambling_drift: False |
| 276 | +# slope_drift: False |
| 277 | +# wg_drift: False |
| 278 | +# d13Cwg_VPDB: -3.7555729459832765 |
| 279 | +# d18Owg_VSMOW: 25.1145492463934 |
| 280 | +# Na: 6 |
| 281 | +# Nu: 4 |
| 282 | +# a: 0.8381700022050721 |
| 283 | +# SE_a: 0.015603758280720111 |
| 284 | +# b: 0.0033401762623331823 |
| 285 | +# SE_b: 0.00024740622188942793 |
| 286 | +# c: -0.8586982120784981 |
| 287 | +# SE_c: 0.006737855778815339 |
| 288 | +# a2: 0.0 |
| 289 | +# b2: 0.0 |
| 290 | +# c2: 0.0 |
| 291 | +# r_d13C_VPDB: 0.0035270933192414504 |
| 292 | +# r_d18O_VSMOW: 0.04146499779427257 |
| 293 | +# r_D47: 0.006638319347677248 |
| 294 | +``` |
| 295 | + |
| 296 | +each element of `foo.sessions` has the following attributes: |
| 297 | + |
| 298 | ++ `data`: list of all the analyses in this session |
| 299 | ++ `scrambling_drift`, `slope_drift`, `wg_drift`: whether parameters `a`, `b`,`c` are allowed to drift (change linearly with with time) |
| 300 | ++ `d13Cwg_VPDB`, `d18Owg_VSMOW`: working gas composition |
| 301 | ++ `Na`: number of anchor analyses in this session |
| 302 | ++ `Nu`: number of unknown analyses in this session |
| 303 | ++ `a`,`SE_a`: best-fit value and model SE of scrambling factor |
| 304 | ++ `b`,`SE_b`: best-fit value and model SE of compositional slope |
| 305 | ++ `c`,`SE_c`: best-fit value and model SE of working gas offset |
| 306 | ++ `a2`,`b2`,`c2`: drift rates (per unit of `TimeTag`) of `a`,`b`, `c`. If `TimeTag` is one of the fields in the raw data, this will be used, otherwise `TimeTag` starts at 0 for each session and increases by 1 for each analysis, in the listed order (thus beware of datasets ordered by sample name). |
| 307 | ++ `r_d13C_VPDB`, `r_d18O_VSMOW`, `r_D47`: repeatabilities for `d13C_VPDB`, `d18O_VSMOW`, `D47` in this session |
| 308 | + |
| 309 | +### 4.6.2 `D47data.samples`, `D47data.anchors`, and `D47data.unknowns` |
| 310 | + |
| 311 | +Additional information about the samples is stored in `foo.samples` (the same information can also be accessed via `foo.anchors` and `foo.unknowns`): |
| 312 | + |
| 313 | +```python |
| 314 | +print([k for k in foo.samples]) |
| 315 | +# output: |
| 316 | +# ['ETH-1', 'ETH-2', 'ETH-3', 'IAEA-C1', 'IAEA-C2'] |
| 317 | + |
| 318 | +for k in foo.samples['IAEA-C1']: |
| 319 | + if k == 'data': |
| 320 | + print(f"{k:>12}: [...] (too large to print)") |
| 321 | + else: |
| 322 | + print(f"{k:>12}: {foo.samples['IAEA-C1'][k]}") |
| 323 | +# output: |
| 324 | +# data: [...] (too large to print) |
| 325 | +# N: 4 |
| 326 | +# SD_D47: 0.003120794222015294 |
| 327 | +# d13C_VPDB: 2.4606390899379327 |
| 328 | +# d18O_VSMOW: 36.87682448377142 |
| 329 | +# D47: 0.36241877475632883 |
| 330 | +# SE_D47: 0.006107113137661028 |
| 331 | +# p_Levene: 0.9011524351870661 |
| 332 | +``` |
| 333 | + |
| 334 | +Each element of `foo.samples` has the following attributes: |
| 335 | + |
| 336 | ++ `N`: total number of analyses in the whole data set |
| 337 | ++ `SD_D47`: the sample SD of Δ<sub>47</sub> for this sample |
| 338 | ++ `d13C_VPDB`, `d18O_VSMOW`: average δ<sup>13</sup>C, δ<sup>18</sup>Ο values for the analyte CO<sub>2</sub>. |
| 339 | ++ `D47`, `SE_D47`: best-fit value and model SE for the Δ<sub>47</sub> of this sample |
| 340 | ++ `p_Levene`: p-value for a [Levene's test](https://en.wikipedia.org/wiki/Levene%27s_test) of whether the observed Δ<sub>47</sub> variance for this sample is significantly larger than that for ETH-3 (to change the reference sample to compare with, e.g. to ETH-1: `foo.LEVENE_REF_SAMPLE = 'ETH-1'`). |
| 341 | + |
| 342 | +### 4.6.3 `D47data.repeatability` |
| 343 | + |
| 344 | +The overall analytical repeatabilities are now saved to `foo.repeatability`: |
| 345 | + |
| 346 | +```python |
| 347 | +for k in foo.repeatability: |
| 348 | + print(f"{k:>12}: {foo.repeatability[k]}") |
| 349 | + |
| 350 | +# output: |
| 351 | +# r_d13C_VPDB: 0.013821704833171146 |
| 352 | +# r_d18O_VSMOW: 0.04191487414887982 |
| 353 | +# r_D47a: 0.010690471302409636 |
| 354 | +# r_D47u: 0.0034370447628642863 |
| 355 | +# r_D47: 0.008561367687546161 |
| 356 | +``` |
| 357 | + |
| 358 | ++ `r_d13C_VPDB`: Analytical repeatability of δ<sup>13</sup>C for all samples |
| 359 | ++ `r_d18O_VSMOW`: Analytical repeatability of δ<sup>18</sup>O for all samples (CO<sub>2</sub> values) |
| 360 | ++ `r_D47a`: Analytical repeatability of Δ<sub>47</sub> for anchor samples only |
| 361 | ++ `r_D47u`: Analytical repeatability of Δ<sub>47</sub> for unknown samples only |
| 362 | ++ `r_D47`: Analytical repeatability of Δ<sub>47</sub> for all samples. |
| 363 | + |
| 364 | +### 4.6.4 `D47data.result` |
| 365 | + |
| 366 | +By default `foo.normalize()` uses the [`lmfit.Minimizer.leastsq()`](https://lmfit.github.io/lmfit-py/fitting.html#lmfit.minimizer.Minimizer) method, which returns an instance of [`lmfit.MinimizerResult`](https://lmfit.github.io/lmfit-py/fitting.html#lmfit.minimizer.MinimizerResult). This `MinimizerResult`instance is stored in `foo.result`. A detailed report may be printed using `foo.report()` |
| 367 | + |
| 368 | +```python |
| 369 | +print(type(foo.normalization)) |
| 370 | +# output: |
| 371 | +# <class 'lmfit.minimizer.MinimizerResult'> |
| 372 | +``` |
| 373 | + |
0 commit comments