By: David Shifrin, PhD
Science Writer, Filament Life Science Communications

These days, there is a significant push in parts of the genetics community to share testing results. One of the organizations spearheading this effort is Free the Data, which focuses on getting patients to contribute their BRCA1/2 sequences to an anonymized database for use in research.

Along those lines, a series of papers recently came out in The New England Journal of Medicine dealing with one of the primary reasons data sharing has become a hot topic. The papers are a series of special reports, and together, they cover many aspects of one major question: How do we deal with the gap between the promise of genetic testing with the current reality?

Why is there a gap?

The gap exists largely because the clinical significance of so many variants is still unknown. Or, to look at it from a slightly different perspective, there are relatively few variants that are clearly pathogenic, while many others may or may not be. The goal, as Dr. Elizabeth Phimster points out in “Curating the Way to Better Determinants of Genetic Risk,” is to eventually understand the clinical relevance of every variant in each disease-associated gene, across different populations. Unfortunately, while the sequence data for many (most?) variants is available, there is still often confusion or mischaracterization as to whether a variant is pathogenic or benign. Additionally, Phimster notes that individual sequence data is often divorced from context, i.e., family history and the like, which adds another layer of difficulty to clarifying the relevance of any given variant.

Another article, titled “ClinGen – The Clinical Genome Resource,” summarizes the problem nicely by pointing out that genetic testing has uncovered over 80 million variants. Again, most of these are of unknown clinical significance. How far then are we from really being able to implement fully personalized medicine? Data is being accumulated incredibly rapidly. But aggregating all of it across databases and platforms, and then interpreting the combined dataset, is made difficult not only by the sheer volume but also the isolated nature of many databases.

Finally, even where a variant has been interpreted, that interpretation may be different between testing labs or databases. Once again, the silo effect is likely causing significant problems in understanding the relevance and consequences of any given genetic variant.

Is curation the answer?

For the above reasons, “curation” has become a primary goal for many geneticists. If the first step in understanding how genetic mutations relate to disease state is having all the information surrounding each mutation, then simply getting that data into a database is fundamental. ClinVar, an NCBI-run database, is making strides in this effort. Importantly, several testing companies deposit their results into the repository.

In “ClinGen – The Clinical Genome Resource,” the authors point out that curation and collaboration – in short, data sharing – was vital for completion of the human reference genome. Additionally, data sharing allowed for increased reproducibility and validation, leading to “more than 300 complex traits [being] identified and reported in more than 2000 articles.”

As an umbrella program of sorts, ClinGen has been funded by the NIH to build a network of “community resources” that will allow for better data sharing, interpretation, and eventually, clinical practice. The database serves as a nexus for both depositing information and moving it between more targeted repositories. In practice, this means that anyone – clinician, company, or patient – can deposit information. Additionally, virtually any piece of information can be fed into databases that may include more detailed annotation of the variant, so the clinical implications of that variant can more easily be determined. Then, the resulting analysis can be fed back into the larger repository, where everyone benefits from it.

Improving interpretation of genetic information

One of the big problems in the field is that individual publications often get the interpretation of a variant wrong. This is an unfortunate reality of research, where one study or the work of one group is not enough to validate a major finding. Coming at it from different angles is often necessary. In basic research this isn’t as much of a problem. Science goes from experiment to result to interpretation, then refinement of the initial idea occurs and the cycle repeats. When dealing with human health and individual patients, though, the stakes are much higher and more immediate. So, to get proper validation everyone has to have access to the same data.

Once the data is all in one place, it has to be scored. ClinVar does this using a 4-star rating system, which has just undergone a slight revision. The highest rating is given to information that “is endorsed by published practice guidelines,” while no stars are given to information where no claim to clinical relevance is made or no “documented method” is included. This system is valuable for aggregating and ranking data, but it still leaves the question as to what should be done with the 0-star entries. It’s great to know whether something has confirmed clinical significance, but if it doesn’t, well, then what? In “Curating the Way to Better Determinants of Genetic Risk,” Phimister begins with the example of her family, where a variant of unknown significance in the gene MRE11A was found but BRCA1 and 2 were normal. Why include the MRE11A result if there isn’t much evidence to support its role in breast cancer risk, she asks.

Part of the answer comes later in Phimister’s piece. If tests are going to be carried out where the results may not be clinically applicable, perhaps patients should at least be informed of this. Additionally, they should perhaps be given the opportunity to consent for “all relevant data (including ancestry, age, and family history)” to be added to a database for use in future studies. Thus, through data sharing, patients could directly participate in the effort to clarify and validate clinical relevance, precisely the goal of efforts like Share the Data.

Tue. August 4, 2015