SHARPEN Project

SHARing data to accelerate Pharmaceutical manufacturing Efficiency across trusted Networks: a framework for risk assessing the value of Federated Learning to improve the fidelity of models in pharmaceutical manufacturing.
Eligibility United Kingdom
Max Sop

Max Sop

Data Science and AI Manager

Connect
Funded by

About the project

This project is a collaboration between CPI, the University of Strathclyde, CCDC, and Wyoming, with Good Digital Practice (GDP) providing technical implementation for risk management. We are conducting research to understand the sector’s current state with respect to Federated Learning (FL), gathering information on the barriers to adopting FL, and building demonstrators to showcase the opportunities FL can yield.

The project is funded by Innovate UK via the Future Medicines programme.

Innovate UK, part of UK Research and Innovation, is the UK’s innovation agency. Innovate UK works to create a better future by inspiring, involving, and investing in businesses developing life-changing innovations.

Hand with blue gloves adding vials to a tray.

The current challenge

The pharmaceutical industry excels at employing molecular-level chemistry models to help predict new cures for diseases. However, it lags behind other industries in applying modelling to manufacturing processes. This shortfall is driven by a lack of high-quality data, which limits the accuracy of predictive models. More data leads to better predictions, but generating this data independently is expensive and inefficient. 

A better approach is sharing data across organisations. Yet, unfortunately, companies often find the complexities of data sharing hold back pre-competitive collaboration, due to cybersecurity concerns, commercial risks, and uncertainty over how the data will be used. 

The role of Federated Learning

Federated data sharing technologies can address some of these concerns. They allow organisations to maintain control over who accesses their data and why, while ensuring strong cybersecurity protections. However, concerns over commercial risks remain, as sharing complete datasets could expose valuable intellectual property. 

A potential solution is to share only selected segments of data rather than entire datasets. While this could improve modelling outcomes without revealing sensitive information, there is little evidence to date showing that redacted datasets enhance predictions. In this project we will demonstrate evidence to show redacted datasets can be successfully used to this end. Moreover, businesses face risks due to the lack of practical tools for determining how much data is too much’ to share. Likewise, we will demonstrate an approach to safely overcome risks.

SHARPEN’s objectives

The SHARPEN project intends to deliver:

  • A platform for secure, federated data sharing that covers the full lifecycle from R&D to manufacturing.

  • A risk assessment tool to enable rapid evaluation and responsible data sharing.

Purpose of our research

We have developed a demonstrator of FL to show that it is effective, fast, and safe. The primary benefit is improving the fidelity of predictive models in pharmaceutical research, providing evidence that redacted datasets can train models with almost identical fidelity to open datasets.

To focus on the areas of greatest need, the team has produced a risk tool for research teams to use. It follows a questionnaire on the perceived risks of FL. The tool then presents mitigation strategies, workflows, scripts, demonstrators, and case studies that clarify how real these risks are and how they can be avoided or minimised.

For more information

Max Sop

Max Sop

Data Science and AI Manager

Connect
Start Date: 01 Aug 2024 End Date: 01 Feb 2026
CPI is your innovation partner to make your ideas a reality.