- python (3.11 or above)
- numpy/scipy/matplotlib
- dask (used in some plotting functions only)
- h5py
- cupy (Most important)
- scikit-optimize (Used in refinement for Bayesian Optimization)
To run the integration code, you need to have:
integrated.exptormodels_with_profiles.exptrefined.refl
Make sure sigma_b and sigma_m values are present in the experiment file.
The refined.refl file is used to generate predicted reflections with updated flags.
dials.python sim_integrate.py integrated.expt refined.refl [options]This will run and create three output files:
-
simulated.h5 : This file contains the data from simulated spots. Notably,
sim_bbox: simulated bbox of the reflection.sim_data: Flattened simulated profilesim_mask: Flattened mask values as integers, and others You can view them using any hdf5 viewer
-
simulated.refl: The hdf5 file is converted to a reflection file. This can be analysed further for symmetry analysis and scaling.
-
simulated.log
To view all options, type:
dials.python sim_integrate.py --helpIn the options,:
--sigma_band--sigma_m: If you want to test some values, or you get the values from runningsim_refine.py, you can provide it. Otherwise, it takes the values form the experiment file.--block_size: The number of reflections in each block of reflections. Dividing the full predicted reflections into blocks helps in managing RAM during shoebox extraction. The default is 50_000. You can increase it if you like. Each block is executed sequentially.--chunk_size: Further division of blocks into chunks of this size. Each of these chunks are executed concurrently. All the reflections in the chunk are processed parallely inside the GPU. The default is 32. You can decrease or increase the size according to GPU memory capacity.--num_samples: The number of samples for Monte Carlo simulation. Default is 100_000. You can increase or decrease to balance memory and accuracy.--bkg_key: Choice of background estimation algorithm. Default is "linear" which is 2D regression model. The others are "constant"(mean) and "robust"(median).
The other options are related to the experimental setup. If you know them, you can provide them.
To see the overlaid simulated(red scale) and observed(gray scale) spot profiles along with mask values:
dials.python plot_hkl_profile.py simulated.h5 h k lMeaning of the mask values:
- 2: Invalid
- 3: Valid background
- 5: valid foreground
- 19: valid background used
They are consistent with that used in DIALS shoebox.
To view I/sigma_I plot:
dials.python intensity_plots.py simulated.h5It plots the results of summation integration only. In the scatter plot, you can click on any data point to view the overlaid spot profiles like in the above.
To compare dials integration with simulated integration results, use
dials.python compare_dials_sim.py integrated.refl simulated.refl --keyIt will match the two reflections by hkle and give a log-log scatter plot. The plot also shows a linear regression line between the two and regression parameters.
Note
- Not only for integration keys, this program can be used to compare any key in the reflection table.
- Other reflection files can also be compared, eg.
scaled.refl.
To run the refinement code, you need to have:
refined.exptrefined.refl
dials.python sim_refine.py refined.expt refined.refl [options]The refinement will:
- Remove unindexed reflections.
- Select reflections with
intensity.sum.valueranging between0.5*intensity_meanto2*intensity_mean. This is the region where reflections are most affected by model parameters.
Note: At present the limits are hard-coded. They can only be modified from inside the sim_refine.py file.
- mosaicity (in degrees)
- z_coord : Think of it like the position of the pin hole in the z-axis from where x-rays are originating in a diverging fashion. It helps to determine the beam divergence in x-axis, y-axis.
Note:
- The divergence of the x-axis is converted to degrees and used as
sigma_bparameter. - The limits of the parameters are hard-coded in
src/processing/bayesOpt_refinement/bayesOpt_refine(). If you feel, you can change the limits here.
This will run and create two output files:
-
sim_refine.log : At the end of the file, the refined parameters corresponsing to the lowest loss is shown. It shows the optmised
sigma_bsigma_m- z_coord : Think of it like the position of the pin hole in the z-axis from where x-rays are originating.
- x_divergence and y_divergence in radians: It is calculated by dividing the corresponding FWHM by the z_coord .
-
BOpt_result.gz : The file stores the result of the
scikit-optimise.gp_minimise()optimisation. This can be used later to analyse the optimised gaussian process surrogate model.
To view all options, type:
dials.python sim_refine.py --helpIn the options,:
--num_spots: The number of reflections to be used for refinement. The default is 10_000. You can also increase or decrease it.--chunk_size: Each of these chunks are executed concurrently. All the reflections in the chunk are processed parallely inside the GPU. The default is 32. You can decrease or increase the size according to GPU memory capacity.--num_samples: The number of samples for Monte Carlo simulation. Default is 100_000. You can increase or decrease to balance memory and accuracy.--loss_key: key to choose loss function. Default is "l2" loss. Other choices are "l1", "exp" and "gaussian". I would recommend using "l2" or "exp". Gaussian is not recommended.
The other options are related to the experimental setup. If you know them, you can provide them.
To analyse the optimisation, run
dials.python analyse_Bopt.py BOpt_result.gzIt provides:
- The convergence plot of the
skopt.gp_minimise()optimisation. - The comparison of loss vs refinement parameters through minimisation iterations.
- Bootstrap results around 95% of minimum loss value providing:
- Confidence Interval
- Std error
- Mean
Note: The 95% is hard-coded. Its upto the user to change the percentage inside the code
More detailed and statistically rigorous analyses can be added in future.