Six Common Quantitative Mistakes in Planning Clinical Trials
DOUGLAS A. MILIKIEN
According to a 2014 analysis conducted for the Department of Health and Human Services, for any new drug compound, the total cost of conducting clinical trials across all phases can be as much as $115 million (1). Failures are expensive. Even well- planned studies “fail”, in that the outcome does not work out in the direction desired to support regulatory approval. However, a trial planning mistake that results in a significant design flaw compromises the ability to form rational scientific conclusions and is therefore the worst kind of failure. This white paper enumerates some common clinical trial planning mistakes that we have seen here at Accudata Solutions pertaining to data collection and quantification.
1. Overestimating the expected success/treatment benefit of the investigational product
We refer to this as True Believer Syndrome. Enthusiasm for the new drug, device, biologic, or diagnostic is widespread among employees at the Sponsor company. Often, early enthusiasm is based on positive results from early-stage trials, with restrictive inclusion/exclusion criteria, a small to moderate number of patients, and a small number of study centers. Later in the clinical development timeline, when it comes to designing pivotal, confirmatory studies for submission, inclusion/exclusion criteria are relaxed and more investigative centers are recruited in order to increase enrollment and justify a broad label upon approval.
The mistake comes from expecting the same magnitude of treatment benefit in a broad, pivotal study that was evident in the earlier, restricted studies. If the pivotal study had been powered on the basis of the treatment effect observed in earlier studies and the pivotal study achieves a smaller treatment effect, the results from the pivotal study, even if generally positive, will not be statistically significant. Therefore, no conclusion regarding efficacy can be drawn.
An example of an unrealistic expected treatment effect, δ, is described in DeLong (2012) (2). The authors conducted a critical review of four clinical trials on the use of the antibiotic ceftriaxone in Lyme Disease patients who were still exhibiting symptoms after a primary round of antibiotics. The primary outcomes were changes in SF-36 Physical Component Score and Mental Component Score. In the Klempner trial, the expected treatment effects for planning sample size were set at 6.7 and 9.1, roughly twice the magnitude of clinically- significant changes of 2-5 seen in SF-36 improvement in a survey of five chronic diseases with severity similar to Lyme disease. Thus, the Klempner trial enrolled too few subjects to statistically identify a plausible clinical benefit.
2. Underestimating the clinical performance of the comparator treatment
Whether it’s a superiority comparison or a non- inferiority comparison, the value of an investigational drug, device, or biologic to the patient and the payer is only as good as what other treatments are available. A common mistake is underestimating how well subjects enrolled in a “standard of care” arm or placebo/sham arm will do.
Timing is a common reason for the “standard of care” underestimation mistake. The length of time between the conception of the clinical trial by Clinical Development management to completion by all study subjects can be over 1-2 years. During
that time, new competitive treatments or medical practices may become available improving the clinical outcomes of subjects enrolled in the “standard of care” arm beyond what was originally considered feasible. Thus, the anticipated treatment effect originally conceived for study planning purposes, δ, is no longer achievable, the original sample size is too small for a more realistic, alternative, treatment effect, δA and the observed difference in clinical outcome between investigational arm and “standard of care” arm is no longer statistically significant.
Similarly, if the estimate of clinical outcome for “standard of care” or placebo subjects is based on published literature, then there is the problem of extrapolating that published result from a historical study to the current environment for the intended population. The historical study may have had a different patient mix than the current study or there may be new treatments or practices available since the conduct of that study.
In a placebo-controlled study of 393 pediatric patients with severe meningococcemia reported by Levin (2000) (3), Xoma overestimated mortality in the placebo group (observed mortality= 9.9%) and was therefore underpowered to detect a difference compared to patients in the arm treated with Neuprex® (observed mortality= 7.4%).
3. Failing to give detailed consideration in the protocol to what role each endpoint will play in the product submission story
This flaw typically arises when Sponsors take a “kitchen sink” approach to clinical data collection and/or are overly influenced by the desire of academic investigative centers to collect measurements that are themselves experimental.
Think of every endpoint-efficacy, safety, utility- as one beam in the structure of
your compelling research story when seeking marketing approval. Just like
beams in a house have different roles, endpoints have different roles, whether
they are primary to the argument, secondary (supportive) to the argument,
or exploratory(not salient to the current argument but worthy of collecting data
for future use). How, or even if, a given type of clinical information can be
quantified and converted to a meaningful endpoint in the right role must be
decided at the protocol development stage in order to specify the proper
clinical procedures for measurement and collection. Kicking the decision down the road to the development of the Statistical Analysis Plan is too late.
A classic example of this case is a former Accudata Solutions client who was developing a treatment for a childhood neuromuscular disease. Children with this disease are typically assessed through both neurologist examination and standardized motor function testing administered by a physical therapist. In addition to these routine clinical procedures, the Sponsor added videotaping of the physical therapy sessions to the study procedures. Those videotaped sessions would then be placed in random order and viewed by an independent neurologist, blinded to the child’s age, identity, and assigned treatment arm. The neurologist would then score whether the videotape revealed evidence whether the child could perform certain motor function milestones. These scores were added as a secondary endpoint to the protocol, which was finalized and approved by the IRB. It wasn’t until many months later, during development of the Statistical Analysis Plan, that the Sponsor realized that the blinded-reviewer scores of motor milestone progress based on watching videotapes were neither quantifiably nor contextually similar to the treating neurologist’s in-person assessment and therefore could not serve as a secondary endpoint. Thus, time and money were wasted performing clinical procedures with no validated value and lengthy deviations from the protocol had to be written, damaging the reputation of the company.
4. Failing to provide adequate clinical justification for an acceptance criterion or non‐ inferiority margin.
Single-arm, open-label studies arise on occasion when, during development of an investigational treatment, it is unethical to not treat a study subject when no other cure is available. In the absence of a control arm, the argument for clinical efficacy of the investigational product hinges on the choice for what magnitude of treatment effect, δ0, is sufficiently high to warrant approval and reimbursement, i.e. the acceptance criterion. To find a justifiable δ0, Sponsors often turn to high- impact published literature of observational studies in the indication of interest. Similar to Mistake #2 above, the problem can be one of extrapolation. The population in the published literature may be a mixture of many subpopulations, only some of which are applicable to the current drug, device, or biologic study. However, the literature only provides aggregated clinical results.
Secondly, the research on which the publication is based may be old enough that, despite the publication’s medical reputation, the clinical outcomes observed then are now outdated because of the availability of new treatments or different types of patients now being referred to treatment. What was justifiable then as an acceptance criterion for superiority is no longer justified and could fail to meet the level of evidence for approvability or reimbursability.
Choosing an acceptance criterion without a convincing justification runs the risk of failure.
5. Lack of agreement in, or drift in definition of, clinical endpoint
Some clinical endpoints are either difficult to measure objectively or defy a single, unifying, standard of measurement, e.g. episodes of remission or relapse, occurrence of a disease exacerbation, occurrence of infection, etc. The problem arises when either: a) the Sponsor’s understanding of what constitutes a clinical event unknowingly differs from one or more investigator’s understanding; b) there is unwitting inconsistency across investigators as to what constitutes a clinical event, especially if investigator sites are spread out across many geographic regions; or c) there is a drift or fine-tuning over time during the clinical trial so that the understanding of the clinical event of interest changes as more patient experience is accumulated. The subjects enrolled toward the end of the study are monitored with the full benefit of that accumulated knowledge while the earliest enrollers do not receive such scrutiny.
If the sample size has been planned according to the Sponsor’s estimation that there will be X of these events according to the Sponsor’s understanding, but Investigators record Y events, where Y < X, based on the aggregate Investigator understanding, then the study will fail because there were too few study subjects to detect only Y events.
Clinical Endpoint Committees/Clinical Event Committees (CECs) can help prevent this lack of consistency. CECs are typically composed of physicians who specialize in the indication, but are NOT one of the investigators. They adjudicate events in question by comprehensively reviewing laboratory results, imaging data, pathology reports, medications, hospitalization notes, reported adverse events, clinical encounter observations, etc. to come to a joint decision. Thus, the endpoints are uniformly defined.
6. Employing a DSMB to review interim results without a statistically ‐ and operationally ‐ sound plan
The ability to know how well study subjects are performing in a clinical trial while the trial is still underway is attractive to Sponsors for business reasons. An operational mechanism for doing this while maintain the scientific integrity of the study is hiring a Data Safety Monitoring Board/Data Safety Committee (DSMB/DMC). A complete discussion of the use of DSMBs/DMCs is beyond the scope of this paper, see Ellenberg (2003) (4) for more extensive coverage of the topic. If early in the trial, an investigational treatment appears to be overwhelmingly successful or overwhelmingly a failure, the Sponsor would like to stop enrollment as soon as possible so that the company can move on to the next study or go back to the drawing board. In a randomized, controlled, clinical trial, Sponsors and Investigators are blinded to which subjects received the investigational treatment and which subjects received the comparator so that the conduct of the study or evaluation of outcomes is not biased by this knowledge.
In this context, DSMBs, who are given access to the actual treatment assignments, can monitor efficacy and safety on an ongoing basis and not have to wait until the end of the clinical trial. Typically, the first of these interim analyses during an ongoing study are pre-planned and scheduled when some fraction of the subjects have completed the study- say one-fourth or one-third. Without jeopardizing the study blind, the DSMB can then inform the Sponsor whether the interim results in aggregate are more extreme than the pre-specified thresholds necessary to claim superiority or futility and therefore recommend stopping the study.
However, the convenience of early decision-making comes at a price. Because inferences will be made on less than the full set of data, statistical stopping rules must be clear and pre-specified. The management of the clinical trial becomes operationally more complex because data have to be entered and cleaned faster and enrollment has to be closely monitored to schedule the DSMB meetings as close to the nominal milestone as possible. Additionally, firewalls between the Sponsor’s data analysis teams preparing the interim analyses and Sponsor’s clinical support team must be established, and communications between the Sponsor and the DSMB must be limited.
Sponsors who do not pay enough attention to the statistical or operational details do so at their own peril. Prematurely terminating a study before enough statistical evidence has been accumulated will result in a study in which no efficacy conclusion at all can be made, necessitating a new clinical trial for the same research objectives.
One real-world example of this mistake occurred in a Phase 1B/2A protocol for a gene therapy client who was developing a treatment for a rare, incurable, childhood disease. The draft protocol had a fundamental design flaw—namely, that the DSMB had been granted the authority to recommend stopping the placebo- controlled trial, even before enough evidence of superiority had been demonstrated. The trial in its original design had a high probability of failure and children randomized to the placebo arm would have died unnecessarily. Accudata Solutions prevented the company from conducting a study that had no scientific validity and worse, yet, would have caused children to die.
Clinical research for submission and approval of a new medical product is expensive. Once a Sponsor’s executive team has made the decision to commence a clinical trial, there is great urgency during the planning stage to get that first patient enrolled. No company can perfectly plan clinical trials all the time, but keeping in mind these six common quantitative planning mistakes and avoiding their pitfalls will result in clinical trials more likely to succeed. Accudata Solutions is happy to be your strategic partner in planning.
For a free consultation, go to https://www.accudatasolutions.com.
1. Sertkaya, Aylin, et al. Examination of Clinical Trial Costs and Barriers for Drug Development. Assistant Secretary of Planning and Evaluation, US Department of Health and Human Services. [Online] July 25, 2014. [Cited: October 18, 2017.] https://aspe.hhs.gov/report/examination-clinical- trial-costs-and-barriers-drug-development.
2. DeLong, Allison K., et al. Antibiotic retreatment of Lyme disease in patients with persistent symptoms: A biostatistical review of randomized, placebo-controlled, clinical trials. Contemporary Clinical Trials 33. 2012. 1132-1142.
3. Levin, Michael. Recombinant bactericidal/permeability-increasing protein (rBPI21) as adjunctive treatment for children with severe meningococcal sepsis: a randomised trial. s.l. : Lancet, 2000. 356:9234, pp. 961-967.
4. Ellenberg, Susan S., Fleming, Thomas R. and DeMets, David L. Data Monitoring Committees in Clinical Trials: A Practical Perspective. Chichester, West Sussex : John Wiley & Sons, 2003.