Medicines & Healthcare products Regulatory Agency (MHRA) laboratories at South Mimms handle, prepare, and distribute more than 90% of WHO International Standards, which serve as primary calibrants to standardize the measurement of complex biologicals and diagnostics, including those characterized by Next Generation Sequencing. Scientists conduct numerous international collaborative research studies annually to determine the suitability of candidate biological reference materials and/or analytical methods. Each project involves sharing materials and data among various laboratories worldwide and within the UK.
Currently, there is no off-the-shelf platform available to efficiently manage collaborative research studies and securely manage metadata while sharing experimental data. The involvement of multiple research labs globally makes metadata and experimental data management and sharing a challenging task, especially in -Omics project, where research generate extensive amounts of complex metadata and single or multimodal experimental data. The absence of an integrated collaborative platform and coherent policies makes data sharing and management a weak point in collaborative studies, leading to significant delays, increased workload for scientists, and difficulty in effective data dissemination. Additionally, current data management and sharing practices do not always adhere strictly to the principles of FAIR (Findable Accessible Interoperable and Reusable) data management. We are aware of the recent nationwide initiative to develop and implement BioFAIR data principles for Life Sciences, and we intend to adapt and integrate these principles into the proposed platform.
We propose developing a software platform designed to meet this scientific need. This platform will manage collaborative research studies and provide data management and sharing capabilities aligned with strict (Bio)FAIR data management principles. It will enable the capture and recording of comprehensive metadata for each study and participant. The platform will also provide a simple to use interface, built using user-centric design principles, that allows the study lead to add participants, define metadata profiles for the study, monitor the progress and quality of the data, and share the data securely and efficiently. Study participants will use the platform to complete the metadata profile to progress and complete the study. This will allow for all relevant details, irrespective of whether it is minutiae, to be captured about the study. The platform will be adaptable to different computing environments and storage options. The platform will be maintained and constantly improved on by the technical teams at MHRA in an agile manner to ensure the platform is kept constantly updated in terms of functionality, usability, security and other best practices as per Government Digital Service guidelines.
We believe that capturing comprehensive information, particularly in large collaborative studies, will facilitate the creation of a complete metadata profile. This, in turn, will allow advanced utilization through ML/AI approaches to better understand intra-study variations and identify sources of bias. To further enhance data quality, we will implement an algorithm to evaluate the completeness of each participant's metadata profile and assess data quality, providing a data quality index/indicator based on (Bio)FAIR data principles.
To ensure the platform meets the needs of a broader user community, we will conduct surveys to gather feedback from researchers who have conducted or participated in collaborative research in the UK and internationally. These surveys will help build an active metadata catalogue and address previous blockers and bottlenecks experienced by the research community. We anticipate that this platform will aid the research community in the UK and worldwide, enabling them to effectively manage and store metadata and research data in collaborative research studies.
Data related to this project: This project idea conceived with life sciences project in mind. So, the data types will be from standard Molecular and Biochemical methods, Proteomics, Metabolomics, Imaging, Next generation sequencing and etc.
Vinoy Ramachandran, Sumitra Varma