RD-TRC Informatics

84A3422-Retouched

Good quality data is the key to everything we do; we are committed to in-depth phenotyping in all research areas, and in some cases where funding permits we conduct whole genome analyses. In every case, though, we are rigorous in ensuring total patient confidentiality and that our patient contact, patient data, clinical data, laboratory data and sample handling systems meet every ethical requirement.

Clinical Data or Phenotyping

This involves the collection and integration of clinical/health data, which may be the results of patient assessments, patient-entered data, laboratory data or elements of the patient record from primary or hospital care. Because we know specific symptom patterns indicate a disease’s underlying mechanism, we can identify patients with common diseases or the signs to look for in the analysis.

Today clinical data or phenotyping is of many types and comes from many sources – so its effective integration calls for a specialist data management system. We have chosen OpenClinica for this purpose because it is already used on a national basis in Holland with great success. Because it was specifically developed for clinical trials, it already has the high levels of security and internal ‘firewalls’ to fully audit any access or changes to the data as well as the ability to remove or redact data if there are changes in patient consent. On the technical front, it links easily to other data sources such as the NHS systems or hospital systems and, as the core system for managing the patient’s journey through the BioResource, it can extend and augment the record across different studies or with the progress of the disease.

We are also using OpenClinica as the core secure and linked system for TRC records along with a freely available library of Case Record Forms for data collection. We will also make it available for use on an institutional and individual basis and will we will work with investigators where OpenClinica is not the primary system to enable a 2-way exchange from their system to OpenClinica, thereby ensuring longevity both of the data and of the collection method.

DNA Data or Genotyping

5 Storage of DNA samples-Retouch

Where we have funding for whole genome analysis we submit the DNA for testing and, thanks to technological advances, the cost per sequence, which was many millions of pounds just a few years ago, is now about £1000. With public support and patients’ consent, this technology-driven economy enables Genomics England to create a lasting legacy that’s extremely valuable to individual patients, the NHS and the UK economy by sequencing some 100,000 genomes.

When you consider the numbers involved, you come to appreciate the enormity of this task. The human genome consists of a sequence of four bases (CAGT), of which there are about 3 billion in the 23 chromosome pairs in the nucleus of every cell in our body. If all 3 billion letters were printed in a telephone directory it would cover 200 1000 page volumes. It is a massive amount of data.

Each of those chromosomes contains hundreds to thousands of genes carrying the instructions for making proteins… And it is here, when the instructions go wrong, that diseases are caused.

We use Bioinformatics to list the 3 billion bases in the sequence and then use this information, unique to each patient, to identify where the changes in the DNA structure can be associated with specific conditions – thus linking the genotype data with the phenotype data. The patterns we see in this data lead to research findings and uncover the mechanisms of disease.

To handle this enormous computing task, easily comparable with those of astrophysics research, we use the University of Cambridge’s High Performance Computing facilities, we develop efficient algorithms to speed up the processing, use good statistical techniques to recognise the patterns in the data and acute biological/medical insight to explain these patterns in mechanism, diagnostic and therapeutic terms.

Like the phenotyping work, data security and managed access is fundamental to our work our policy is that ‘data does not move’ – and the systems’ built-in data privacy safeguards ensure we always retain control of access and that no copies can be made and taken away.