Meta-analysisWikiTop journalCold ExposureHigh evidence score

PRISMA harms checklist: improving harms reporting in systematic reviews

Authors: Liliane Zorzela, Yoon K. Loke, John P. A. Ioannidis, Su Golder, Pasqualina Santaguida, Douglas G. Altman, David Moher, Sunita Vohra, PRISMA harms group
Journal: BMJ
Year: 2016
DOI: 10.1136/bmj.i157
Citations: 549

TL;DR

Systematic reviews of medical interventions almost never report harms adequately—only 5% of reviews focus on adverse events—so a team of 25 international experts created a 4-item checklist extension to PRISMA that forces authors to explicitly define, search for, and report harms with the same rigour as benefits.

What they tested

This is not an experiment testing an intervention on human subjects. It is a **reporting guideline development study** that created and validated a checklist to improve how systematic reviews report harms (adverse events, side effects, toxicity, complications). The "intervention" is the PRISMA harms checklist itself—a set of four mandatory reporting items plus additional recommendations for 27 existing PRISMA items when applied to harms.

The comparator is the original PRISMA statement (2009), which focused almost exclusively on efficacy/benefits. The outcome measures were:

Whether systematic reviews explicitly mention "harms" in their title

Whether they define each harm and how it was ascertained

Whether they specify how zero-event studies were handled

Whether they describe any assessment of possible causality

Who was studied

No human subjects were enrolled in an experiment. Instead, the study used:

**324 Delphi survey participants** (experts in systematic reviews, adverse events research, methodology, statistics, epidemiology, clinical medicine, journal editing, consumer advocacy, and health regulation). Of these, 112 contributed to at least one of three Delphi rounds; 56 completed more than one round.

**25 in-person consensus meeting participants** from 7 countries (Canada, UK, USA, Netherlands, Australia, Germany, Italy), including a consumer representative and a member of Health Canada (federal drug regulator).

**Background data from prior systematic reviews**: 296 DARE reviews and 13 Cochrane reviews published 2008–2011 that had adverse events as a primary outcome, plus longitudinal data from 1994–2010 showing that only ~5% of all systematic reviews focus on harms.

How they measured it

The study used a **modified Delphi process**—a structured consensus-building method where experts vote anonymously on items across multiple rounds, with results fed back between rounds to converge on agreement.

**Round 1**: 37 potential new checklist items were presented. Experts rated relevance on an unspecified scale. One item was excluded; 28 were voted relevant; 8 received scattered votes.

**Round 2**: Refined items based on Round 1 feedback.

**Round 3**: Final voting on remaining items.

**In-person meeting**: 2-day consensus conference in Banff, Canada (May 2012) where the 25 experts reviewed Delphi results and prior systematic review data to finalise the checklist.

The "measurement" was expert consensus—not a quantitative outcome like blood pressure or symptom score. The key metric was: did ≥80% of experts agree an item was essential for harms reporting?

Methodology

### Study design

This is a **reporting guideline development study** following the EQUATOR Network framework—the same methodology used to create the original PRISMA statement, PRISMA for abstracts, and PRISMA for protocols. It is not a randomised trial, cohort study, or meta-analysis of patient data. It is a **consensus-building exercise** using a modified Delphi technique plus an in-person consensus meeting.

### Why this design matters

The Delphi method is specifically designed for situations where:

1. There is no single gold-standard evidence source (e.g., no RCT comparing "good harms reporting" vs "bad harms reporting")

2. Expert opinion must be synthesised systematically

3. Group dynamics (dominant personalities, status hierarchies) could bias face-to-face decisions

The modified Delphi used anonymous electronic voting across three rounds, which reduces the risk that a loud voice or senior author dominates. The in-person meeting then allowed discussion of ambiguous items. This two-stage process is the standard for creating clinical reporting guidelines (CONSORT, STROBE, PRISMA all used similar methods).

### What this design can and cannot prove

**Can prove:**

That a group of international experts agreed these 4 items are essential for harms reporting

That prior systematic reviews of harms had systematic reporting deficiencies (lack of definitions, no causality assessment, no handling of zero events)

That the checklist has face validity (experts believe it would improve reporting)

**Cannot prove:**

That using the checklist actually improves the quality of systematic reviews (no randomised trial of checklist vs no checklist was done)

That better harms reporting leads to better clinical decisions (no patient outcomes were measured)

That the checklist is complete or optimal (it represents a minimum set, not an exhaustive list)

That the checklist works across all medical fields (the experts were mostly from high-income countries; no low-income country representation was reported)

### Major methodological weaknesses

1. **No validation study**: The checklist was developed but never tested prospectively to see if it actually changes author behaviour or improves review quality.

2. **Selection bias in Delphi participants**: Experts were "selected on the basis of their expertise in systematic reviews"—this likely over-represents people who already care about harms reporting, potentially inflating agreement.

3. **High dropout**: Only 56 of 324 invited participants (17%) completed more than one Delphi round. Those who stayed may be more motivated/opinionated than those who dropped out.

4. **No pre-registered protocol**: The paper does not mention pre-registering the Delphi process or analysis plan, which is now standard for systematic reviews.

5. **Industry funding not reported**: The paper does not state who funded the consensus meeting or whether any Delphi participants had conflicts of interest (e.g., pharmaceutical company employees who might resist harms reporting).

Key findings

### Primary finding: Four mandatory checklist items

1. **Title (Item 1)**: "Specifically mention 'harms' or other related terms, or the harm of interest in the review."

- Rationale: Without "harms" in the title, readers (and database searchers) cannot distinguish a review that assessed harms from one that simply ignored them.

2. **Synthesis of results (Item 14)**: "Specify how zero events were handled, if relevant."

- Rationale: Many harms are rare. If no events occurred in a study, authors must state whether they excluded that study, treated it as "zero," or used continuity corrections. Different methods produce different meta-analytic results.

3. **Study characteristics (Item 18)**: "Define each harm addressed, how it was ascertained (e.g., patient report, active search), and over what time period."

- Rationale: "Headache" means different things if measured by spontaneous patient report vs weekly questionnaire vs daily diary. Without this, you cannot compare across studies.

4. **Synthesis of results (Item 21)**: "Describe any assessment of possible causality."

- Rationale: An adverse event occurring during treatment is not necessarily caused by treatment. Authors must state whether they attempted to assess causality (e.g., using the Naranjo algorithm, Bradford Hill criteria, or simply noting that causality was not assessed).

### Secondary findings: Background data on poor harms reporting

**Only 5% of systematic reviews** focus on adverse events as a primary outcome, and this proportion has remained stable from 1994 to 2010 (despite absolute numbers increasing).

**In reviews that do assess harms**, common deficiencies include:

- No clear definition of the adverse event being reviewed

- No specification of which study designs were eligible for inclusion

- No report of length of participant follow-up

- No measurement of patient risk factors that could cause the adverse event

**Less than 10% of systematic reviews** have adverse events as a primary outcome.

### Additional recommendations (not mandatory but "desirable")

The paper provides 27 additional recommendations for how existing PRISMA items should be adapted when reporting harms. Key examples:

**Abstract**: Should report any analysis of harms undertaken, whether primary or secondary outcome.

**Introduction**: Should clearly describe which events are considered harms and provide rationale for the specific harm(s), condition(s), and patient group(s).

**Eligibility criteria**: Should report how the review handled studies where the outcomes of interest were not reported (e.g., did they contact authors? Exclude? Assume zero events?).

**Search strategy**: Should include adverse events-specific search terms and databases (e.g., TOXLINE, adverse effects subheadings in MEDLINE).

**Risk of bias**: Should assess whether harms were adequately measured in primary studies, not just whether randomisation was adequate.

**Results**: Should present absolute risks (not just relative risks) for harms, since a doubling of a rare risk may still be clinically negligible.

Effect magnitude

This is not a clinical trial with an effect size like "2.3 kg weight loss." The "effect" is a change in scientific reporting standards. However, we can quantify the problem the checklist aims to solve:

**Baseline**: In 2010, only 104 out of ~2,080 systematic reviews in CDSR and DARE (5%) evaluated adverse events exclusively. The other 95% either ignored harms entirely or reported them so poorly that readers cannot assess safety.

**Target**: If the PRISMA harms checklist is adopted by journals, every systematic review that touches on harms would be required to define those harms, state how they were measured, handle zero events transparently, and assess causality.

To put this in perspective: a 2013 study found that of 300 systematic reviews published in high-impact medical journals, only 23% reported harms adequately. If the PRISMA harms checklist were universally adopted, that number could theoretically rise to near 100% for the four mandatory items—but no study has tested this.

Limitations

### What the authors acknowledge

The checklist is a "minimum set"—not exhaustive. Authors may need additional items depending on the specific harm and review question.

The checklist applies to systematic reviews of observational studies as well as interventional studies, but the authors note that observational studies have additional complexities (confounding, selection bias) that the checklist does not fully address.

The checklist does not specify how to handle multiple harms (e.g., if a review assesses 20 different adverse events, must each be defined separately?).

### What a critical reader would note

1. **No prospective validation**: The checklist was never tested in a randomised trial where one group of systematic review authors used it and another did not. We have no evidence it actually changes behaviour or improves review quality.

2. **No patient outcomes measured**: Even if authors follow the checklist perfectly, we don't know if this leads to better clinical decisions or fewer patient harms.

3. **Selection bias in experts**: The Delphi participants were predominantly from high-income, English-speaking countries (Canada, UK, USA). Harms reporting in low-resource settings may have different challenges.

4. **Conflict of interest not reported**: The paper does not state whether any Delphi participants or consensus meeting attendees had financial ties to pharmaceutical companies, which could bias their views on harms reporting.

5. **No cost-benefit analysis**: Adding 4 mandatory items plus 27 recommendations increases the burden on systematic review authors. The paper does not estimate how much extra time or resources this requires, or whether the benefit justifies the cost.

6. **Publication bias in the background data**: The claim that "only 5% of reviews focus on harms" comes from DARE and Cochrane databases, which may not represent all systematic reviews (e.g., industry-funded reviews not indexed in these databases might have even worse harms reporting).

7. **No guidance on implementation**: The paper does not say how journals should enforce the checklist, what to do if authors refuse, or how to handle reviews published before the checklist existed.

Practical takeaways

For someone running their own n=1 experiment, this paper is not directly about self-experimentation—it's about how to evaluate whether a treatment is safe. However, the principles apply directly to personal experiments:

### What to test

**The PRISMA harms checklist itself**: When you read a scientific paper or systematic review about a supplement, drug, or intervention you're considering, apply the 4-item checklist to evaluate whether the harms reporting is trustworthy.

### Minimum meaningful duration

**One reading session**: You can apply the checklist to any paper in 10–15 minutes. No need to run a long-term experiment.

### What to measure

For each systematic review you read, ask:

1. **Title**: Does the title mention "harms," "adverse events," "side effects," or "safety"? If not, the review likely ignored harms.

2. **Zero events**: If the review includes studies where no harms occurred, does it state how those studies were handled? (Common dodges: excluding them, treating them as "zero events" without continuity correction, or simply not mentioning them.)

3. **Harm definitions**: Does the review define each harm? E.g., "headache" vs "migraine with aura lasting >4 hours" vs "any head pain." Does it state how harms were ascertained (spontaneous report, diary, questionnaire, blood test)?

4. **Causality assessment**: Does the review attempt to distinguish harms caused by the intervention from harms that happened coincidentally? (E.g., "The rate of headache was 12% in the treatment group vs 10% in placebo—this difference was not statistically significant, and no causality assessment was performed.")

### Key confounds to control for

When evaluating harms reporting in a systematic review, watch for:

**Publication bias**: Reviews that only include published studies may miss unpublished harms data. Check if the review searched clinical trial registries (ClinicalTrials.gov) for unpublished results.

**Sponsorship bias**: Reviews funded by the manufacturer of the intervention are less likely to report harms thoroughly. Check the funding statement.

**Duration of follow-up**: Short-term studies miss long-term harms. If the review only includes studies with <6 months follow-up, it cannot assess harms that take years to develop (e.g., cancer, organ damage).

**Population differences**: Harms in healthy young adults may differ from harms in elderly, children, or people with comorbidities. Check if the review's population matches your own characteristics.

**Outcome switching**: Some systematic reviews change their primary outcome after seeing the data (e.g., "We planned to assess liver toxicity but found no studies, so we assessed headache instead"). Check if the review has a pre-registered protocol.

### What a positive result would look like

A "positive result" in applying the PRISMA harms checklist means the systematic review passes all 4 checks:

1. Title says "harms" or equivalent

2. Zero events are handled transparently (e.g., "We used a continuity correction of 0.5 for studies with zero events in both arms")

3. Each harm is defined with ascertainment method and time period (e.g., "Nausea was defined as any self-reported feeling of sickness in the stomach, assessed via daily diary for 4 weeks")

4. Causality is assessed (e.g., "We used the Naranjo algorithm to assess causality for each reported adverse event; only events classified as 'probable' or 'definite' were included in the primary analysis")

If a review fails any of these checks, treat its safety conclusions with extreme caution. Absence of evidence of harm is not evidence of safety—this is the core message of the PRISMA harms checklist.

Read full paper →More Cold Exposure research