Skip to content

dials/gen_imp_cuda

Repository files navigation

Instructions for Running the Code

Dependencies

  • python (3.11 or above)
  • numpy/scipy/matplotlib
  • dask (used in some plotting functions only)
  • h5py
  • cupy (Most important)
  • scikit-optimize (Used in refinement for Bayesian Optimization)

Integration Code

To run the integration code, you need to have:

  • integrated.expt or models_with_profiles.expt
  • refined.refl

Make sure sigma_b and sigma_m values are present in the experiment file.

The refined.refl file is used to generate predicted reflections with updated flags.

dials.python sim_integrate.py integrated.expt refined.refl [options]

This will run and create three output files:

  1. simulated.h5 : This file contains the data from simulated spots. Notably,

    • sim_bbox: simulated bbox of the reflection.
    • sim_data: Flattened simulated profile
    • sim_mask: Flattened mask values as integers, and others You can view them using any hdf5 viewer
  2. simulated.refl: The hdf5 file is converted to a reflection file. This can be analysed further for symmetry analysis and scaling.

  3. simulated.log

To view all options, type:

dials.python sim_integrate.py --help

In the options,:

  • --sigma_b and --sigma_m: If you want to test some values, or you get the values from running sim_refine.py, you can provide it. Otherwise, it takes the values form the experiment file.
  • --block_size: The number of reflections in each block of reflections. Dividing the full predicted reflections into blocks helps in managing RAM during shoebox extraction. The default is 50_000. You can increase it if you like. Each block is executed sequentially.
  • --chunk_size: Further division of blocks into chunks of this size. Each of these chunks are executed concurrently. All the reflections in the chunk are processed parallely inside the GPU. The default is 32. You can decrease or increase the size according to GPU memory capacity.
  • --num_samples: The number of samples for Monte Carlo simulation. Default is 100_000. You can increase or decrease to balance memory and accuracy.
  • --bkg_key: Choice of background estimation algorithm. Default is "linear" which is 2D regression model. The others are "constant"(mean) and "robust"(median).

The other options are related to the experimental setup. If you know them, you can provide them.

Observing Images

To see the overlaid simulated(red scale) and observed(gray scale) spot profiles along with mask values:

dials.python plot_hkl_profile.py simulated.h5 h k l

Meaning of the mask values:

  • 2: Invalid
  • 3: Valid background
  • 5: valid foreground
  • 19: valid background used

They are consistent with that used in DIALS shoebox.

Intensity Analysis Plots

To view I/sigma_I plot:

dials.python intensity_plots.py simulated.h5

It plots the results of summation integration only. In the scatter plot, you can click on any data point to view the overlaid spot profiles like in the above.

Compare Integration

To compare dials integration with simulated integration results, use

dials.python compare_dials_sim.py integrated.refl simulated.refl --key

It will match the two reflections by hkle and give a log-log scatter plot. The plot also shows a linear regression line between the two and regression parameters.

Note

  • Not only for integration keys, this program can be used to compare any key in the reflection table.
  • Other reflection files can also be compared, eg. scaled.refl.

Refinement Code

To run the refinement code, you need to have:

  • refined.expt
  • refined.refl
dials.python sim_refine.py refined.expt refined.refl [options]

The refinement will:

  • Remove unindexed reflections.
  • Select reflections with intensity.sum.value ranging between 0.5*intensity_mean to 2*intensity_mean. This is the region where reflections are most affected by model parameters.

Note: At present the limits are hard-coded. They can only be modified from inside the sim_refine.py file.

Refinement Paramters
  • mosaicity (in degrees)
  • z_coord : Think of it like the position of the pin hole in the z-axis from where x-rays are originating in a diverging fashion. It helps to determine the beam divergence in x-axis, y-axis.

Note:

  • The divergence of the x-axis is converted to degrees and used as sigma_b parameter.
  • The limits of the parameters are hard-coded in src/processing/bayesOpt_refinement/bayesOpt_refine(). If you feel, you can change the limits here.

This will run and create two output files:

  1. sim_refine.log : At the end of the file, the refined parameters corresponsing to the lowest loss is shown. It shows the optmised

    • sigma_b
    • sigma_m
    • z_coord : Think of it like the position of the pin hole in the z-axis from where x-rays are originating.
    • x_divergence and y_divergence in radians: It is calculated by dividing the corresponding FWHM by the z_coord .
  2. BOpt_result.gz : The file stores the result of the scikit-optimise.gp_minimise() optimisation. This can be used later to analyse the optimised gaussian process surrogate model.

To view all options, type:

dials.python sim_refine.py --help

In the options,:

  • --num_spots: The number of reflections to be used for refinement. The default is 10_000. You can also increase or decrease it.
  • --chunk_size: Each of these chunks are executed concurrently. All the reflections in the chunk are processed parallely inside the GPU. The default is 32. You can decrease or increase the size according to GPU memory capacity.
  • --num_samples: The number of samples for Monte Carlo simulation. Default is 100_000. You can increase or decrease to balance memory and accuracy.
  • --loss_key: key to choose loss function. Default is "l2" loss. Other choices are "l1", "exp" and "gaussian". I would recommend using "l2" or "exp". Gaussian is not recommended.

The other options are related to the experimental setup. If you know them, you can provide them.

Optimised Parameters Analysis

To analyse the optimisation, run

dials.python analyse_Bopt.py BOpt_result.gz

It provides:

  • The convergence plot of the skopt.gp_minimise() optimisation.
  • The comparison of loss vs refinement parameters through minimisation iterations.
  • Bootstrap results around 95% of minimum loss value providing:
    1. Confidence Interval
    2. Std error
    3. Mean

Note: The 95% is hard-coded. Its upto the user to change the percentage inside the code

More detailed and statistically rigorous analyses can be added in future.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published