Biorepositories—at the bleeding edge
National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892-2152, USA.
E-mail: manoliot{at}nhgri.nih.gov
Accepted 10 December 2007
Biorepositories or biobanks, that collect and store specimens linked to individual information on health characteristics from large numbers of persons, are increasingly being established for medical research.1–6 Extensive international experience with biorepositories has demonstrated their power and efficiency, but few have fully anticipated the challenges involved in collecting, processing, storing and retrieving very large numbers of samples.
This issue of the International Journal of Epidemiology describes the experience of UK Biobank in planning and implementing a protocol for the collection and archiving of 15 million sample aliquots in 500 000 participants over 4 years, requiring the processing of 19 000 sample aliquots per day. The associated biorepository was designed to house the samples securely and provide them for research use for 20 years or more. Due to its size and complexity, the repository was one of UK Biobank's largest cost components, as well as one of its greatest assets. The project's leadership thus crafted a careful strategy for design, testing and implementation, drawn heavily from standard industrial design principles, that appears unparalleled in epidemiologic research efforts and provides valuable lessons for the epidemiologic community.
Four key principles guided the development of the repository and its sample handling and storage protocol, including the need for future proofing and the control of sample quality, security and overall cost.7 Future proofing involved collecting and processing samples to permit the widest possible range of scientific uses, while avoiding approaches that would inherently preclude possible future uses. Though the success of this approach, and the prescience of its designers, will only be proven with time, the care and thought given at its outset are evidenced, for example, by the choices of stabilizing, inhibitory and preservative agents and the justifications provided for each.
A detailed description of the sample collection protocol demonstrates interesting innovations, such as uniquely barcoding each collection tube and linking it to a unique participant number only after collection is completed.7 Bar code scanning also activates automated timers, as, apparently, do initiation of centrifugation and arrival at the central processing facility. Quadruply redundant bar codes reduce the risk of potentially disastrous sample misidentification or de-identification, and split-septum seals reduce contamination and eliminate the need for manually capping tubes.
Unlike many similar, albeit smaller, repositories,8–10 UK Biobank chose a centralized approach for sample processing and aliquoting after deciding that adequate and consistent quality could not be sustained at multiple manual collection sites for the volume and duration needed. Centralization lessened the variability and potential for error in sample processing, and diminished the cost considerably, but raised significant concerns about the impact of delayed processing and transport on analyte stability. Stability and validity of a wide range of analytes were thus assessed in a series of carefully designed validation studies, following varying storage conditions and imposed delays in processing times.11–17
Not surprisingly, these validation studies nearly all demonstrated that delays in processing were associated with reductions in sample quality or validity, but reductions were generally minimal. A notable exception was in isolation of DNA from blood stored for up to 24 h before harvesting of white cells, which appeared to have no impact on DNA yield, length, or purity.12 Success of EBV-transformation of B-lymphocytes was also apparently unaffected by processing delays.17 In this and other analyses showing no detectable differences, estimating the minimum detectable effect for the available sample size would have been useful, and perhaps this information will be forthcoming. 5' RNA tags were another story, however, and though their stability per se appeared unaffected by processing delays, expression levels were considerably higher in delayed samples, potentially due to cellular anoxia or cell death following collection.13 Many of these changes seemed greatest in the first 12 h and point to the need for further research to develop predictive algorithms, if possible, on the effect of delayed processing time and other vicissitudes of sample handling on expression levels. Changes in haematological, clinical chemistry, proteomic and metabolomic parameters after 24 h processing delays were detectable but generally minimal, though here again predictive equations or nomograms generated from these data would be quite useful. One assumes that the recording of processing delays described above will permit some form of adjustment during data analysis, but this is not described and may remain to be defined.
The size, complexity and importance of the repository allowed its designers to explore (and ultimately to adopt) a custom-designed, or fit for purpose, automated processing facility built to UK Biobank's specifications.18 Heavy reliance was placed on modern manufacturing principles such as ensuring sufficient processing capacity to allow efficient operation, standardizing processes and quality, emphasizing a production facility environment and culture, and thoroughly testing all new technology off-line before implementation. Experience, both positive and negative, from comparable projects was also factored into the design; perhaps a subsequent description of early experience with this facility will include comparisons with the experience of others.
The innovative use of a Failure Modes Effect Analysis to identify major risks to success of the processing facility and approaches for minimizing them was quite intriguing, but only one example of a key failure risk was provided. Here again, more detail would have been enlightening, particularly on the highest failure risks, but the subsequent description of the prototyping, commissioning and quality management steps was very instructive. The comparisons of initial design proposals to final designs and prototypes,18 even without detailed reasons for the choices made, should be required reading for anyone considering a similar project. Only time will tell if these decisions were the best ones, and they likely are not transferable to all situations, but the expertise and thought that went into them suggest that they are well worth considering.
The supplemental issue closes with three papers describing the design and implementation of an automated blood fractionation system19 and two automated archives.20,21 The need for geographically separate sample archives has been recognized by many epidemiologic studies, though regrettably often not until some catastrophic sample loss has occurred, so implementing it in UK Biobank from the outset is a clear advantage. Less often, however, are the two archives designed to be quite different, with one as a working archive maintained at –80°C and the other a long-term storage facility maintained at –180°C. Engineering considerations involved in the design of these archives are well-described and instructive. Few studies will have the capacity needs of UK Biobank, but some thought might be given to expanding one or both repositories to hold other studies samples, perhaps on a fee-for-service basis.
Aspects of the sample handling and storage protocols that might be described in future communications include approaches for retrieving samples for analysis, including credentials for access to the repository and other security measures. Approaches for training and certifying staff, and for re-certifying them over time, should also not be overlooked. Supervized routine, as well as unannounced, testing of fail-safe procedures, such as temperature alarms, mislabelled or damaged samples, or power failures might be undertaken and described after ensuring that testing will not threaten the archives themselves. Data from validation and stability studies could be made widely available through the internet, in more than just summarized manuscript form, for others to use in designing their own sample repositories.
The computer science field coined the term bleeding edge to refer to technology that is so new and unproven that the user incurs considerable risks in adopting it (http://en.wikipedia.org/wiki/Bleeding_edge). Such technologies have been characterized by lack of consensus, lack of knowledge and industry resistance to change—all characteristics of the situation when UK Biobank began to design its protocol and storage facilities. The designers have risen to the challenge admirably, addressing lack of consensus through a series of consultations and meetings, lack of knowledge through a series of elegantly designed pilot studies, and resistance to change through a carefully reasoned and scientifically supported justification as described in this issue of the Journal. With the initiation of the full-scale UK Biobank protocol, experience with these protocols will accrue and will be of great value in determining the appropriateness of the choices made for guiding future research and practice.
| References |
|---|
|
|
|---|
1 Hakonarson H, Gulcher JR, Stefansson K. deCODE genetics, Inc. Pharmacogenomics (2003) 4:209–15.[CrossRef][Web of Science][Medline]
2 Kaiser J. Genetics. U.S. hospital launches large biobank of children's DNA. Science (2006) 312:1584–85.
3 LifeGene Sweden. Accessed May 28, 2007. Available at: http://lifegene.ki.se/research/index_en.html.
4 McCarty CA, Wilke RA, Giampietro PF, Wesbrook SD, Caldwell MD. Marshfield clinic personalized medicine research project (PMRP): design, methods and recruitment for a large population-based Biobank. Personalized Med (2005) 2:49–79.[CrossRef]
5 Parfitt T. Estonian efficiency. Lancet (2004) 364:1475–78.[CrossRef][Web of Science][Medline]
6 Triendl R. Japan launches controversial Biobank project. Nat Med (2003) 9:982.[Web of Science][Medline]
7 Elliott P, Peakman TC. The UK Biobank sample handling and storage protocol for the collection, processing and archiving of human blood and urine. Int J Epidemiol (2008) 37:234–44.
8 Bild DE, Detrano R, Peterson D, et al. Ethnic differences in coronary calcification: the multi-ethnic study of atherosclerosis (MESA). Circulation (2005) 111:1313–20.
9 Cushman M, Cornell ES, Howard PR, Bovill EG, Tracy RP. Laboratory methods and quality assurance in the cardiovascular health study. Clin Chem (1995) 41:264–70.
10 Papp AC, Hatzakis H, Bracey A, Wu KK. ARIC hemostasis study – I. Development of a blood collection and processing system suitable for multicenter hemostatic studies. Thromb Haemost (1989) 61:15–19.[Web of Science][Medline]
11 Peakman TC, Elliott P. The UK Biobank sample handling and storage validation studies. Int J Epidemiol (2008) 37(Suppl 1):i2–i6.
12 Halsall A, Ravetto P, Reyes Y, et al. The quality of DNA extracted from liquid or dried blood is not adversely affected by storage at 4°C for up to 24 hours. Int J Epidemiol (2008) 37(Suppl 1):i7–i10.
13 Salway F, Day PJR, Ollier WER, Peakman T. Levels of 5' RNA tags present in plasma and buffy coat from EDTA blood increase with time. Int J Epidemiol (2008) 37(Suppl 1):i11–i15.
14 Jackson C, Best N, Elliott P. UK Biobank Pilot Study: Stability of haematological and clinical chemistry analytes for up to 36 hours; validation study. Int J Epidemiol (2008) 37(Suppl 1):i16–i22.
15 Dunn WB, Broadhurst D, Ellis DI, et al. A GC-TOF-MS study of the stability of serum and urine metabolomes during the UK Biobank sample collection and preparation protocols. Int J Epidemiol (2008) 37(Suppl 1):i23–i30.
16 Barton RH, Nicholson JK, Elliott P, Holmes E. High throughput 1H NMR-based metabolic analysis of human biofluids for large-scale epidemiological studies: validation study. Int J Epidemiol (2008) 37(Suppl 1):i31–i40.
17 Amoli MM, Carthy D, Platt H, Ollier WER. EBV immortalisation of human B lymphocytes separated from small volumes of cryopreserved whole blood. Int J Epidemiol (2008) 37(Suppl 1):i41–i45.
18 Downey P, Peakman TC. Design and implementation of a high throughput biological sample processing facility using modern manufacturing principles. Int J Epidemiol (2008) 37(Suppl 1):i46–i50.
29 McQuillan AC, Sales SD. Designing an automated blood fractionation system. Int J Epidemiol (2008) 37(Suppl 1):i51–i55.
20 Owen JM. Designing and implementing a large scale automated –80°C archive. Int J Epidemiol (2008) 37(Suppl 1):i56–i61.
21 Fagan M, Ball P. Design and implementation of a large-scale liquid nitrogen archive. Int J Epidemiol (2008) 37(Suppl 1):i62–i64.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||