@arxivabs@mgtmccartney I fear that no one really cares how well PCRs compare to LFDs in selecting people for targeted isolation so as to reduce spread of SARS-Cov-in the population (https://t.co/gcOBbzwhSV) Instead they chatter about irrelevant false positives when LFDs try to predict PCR results.
@pash22 Dichotomising numerical test results wastes information. They indicate degrees of disease severity and the extent to which homeostatic and reparative feedback mechanisms are coping or failing and the extent to which treatment helps, including for cancer: https://t.co/TYLsJf4c2C
PS. My ‘subjective’ prior distribution of possible true mean values of a study would be conditional on my expected mean of the paired differences (e.g. 1.96) and the expected standard deviation (e.g. 10) in an imagined study based on prior knowledge etc, and the ‘conventional’ sample size required (e.g. 204.3=>205) for a power of 80% of getting P≤ 0.05 t/s. The sample size required for P≤ 0.05 in a proposed study would be 408.6 =>409 and also to get P≤ 0.05 in a second replicating study would be 612.9=>613. How does this compare to assurance calculations?
An interesting Power and Sample Size Calculator
https://t.co/5DkykYmDwo from @emil_hs & @FralickMike et al. Should we use a different sample size calculation to get a high probability of replication to address the replication crisis? See: https://t.co/kO6uXbrGD2 @pash22@stephensenn@f2harrell@d_spiegel
Thank you. My approach is different. My calculation of sample size for a power of x that a proposed study should provide a P value ≤ y is based on doubling the variance of the subjective prior distribution of possible true values of the study. The sample size required that a subsequent (replicating) study gives a P value ≤ y too, is based on tripling the variance.
@stephensenn@emil_hs@FralickMike@pash22 In addition to the discussion on DataMethods since February (https://t.co/d4u5U7BNHL), there is also a pre-print that goes in to details of the underlying assumptions: https://t.co/vTYNeb0KUV
@stephensenn@emil_hs@FralickMike@pash22 Thank you. For example, if a study result gave a P value of 0.025 o/s based on 100 observations and the probability of replication was 0.283, then by repeating it with 409 observations, then expected probability of replication would be 0.8 (see https://t.co/BMUgUVK61n )
I agree of course that we estimate the minimum number of observations but this 80% power is a likelihood. To calculate the number needed to get probability of 0.8 of a P value <=0.05, we need to use the Gaussian prior probability distribution conditional on the data (on which the likelihood distribution is based by assuming it to be identical). This means doubling the estimated variance, which means you need twice as many observations as for a ‘conventional’ power calculation to estimate the minimum posterior probability of getting a P value of <=0.05 (see
https://t.co/1OCdIAW1jy).
If there is an estimated Gaussian distribution based on 100 observations with a mean of 0 and a SD of 10, then 58% of the distribution will fall within 0.202 SDs of the mean. As the SEM is 10/√100 = 1, the 95% CLs of 58% will be based 0.202+/-0.196 and will be 0.655 and 0.502. However the binomial 95% CLs for 58/100 are wider at 0.677 and 0.483 and less precise. Does this illustrate your point about dichotomisation reducing precision or is there another explanation?
The same problem affects diagnostic test assessment. Sensitivity and specificity require dichotomisation of numerical values that experienced diagnosticians interpret as numbers. Instead of criticising the dichotomisation they think that they can’t understand statistics. P values also convert an observed mean into a range (what was actually observed or something more extreme). Has anyone else raised these issues too?
For example, we apply the risk ratio R from an RCT on statins to individuals with a total risk of a CV event over 10% based on summing the risks from individual risk factors. We then apply R to the 10%. But if that individual had a very low risk lipid profile already and a very high risk BP, would the estimated risk reduction be sensible? The RCT was designed with the hypothesis that statins reduce CV risk via lipid profile, which was already low risk in this individual and there is no evidence that statins lower the BP. These considerations apply to all applications of RCTs to individuals.
Agreed. It is undefined. However we seem to assume that a new individual to whom the measure of efficacy is to be applied is from the same population. That individual’s RELEVANT baseline risk has to be estimated conditional the known features used to recruit into the trial. The unknown features will contribute variance of this risk.
@stephensenn 3/3. The foregoing posts make all sorts of assumptions of course especially about the‘ transportability’ of risk ratios, differences etc. How should we test these assumptions?
2/2. The exchangeable sets allow us to estimate the efficacy of treatment in the form of average risk ratios, odds ratios, risk differences based on individuals in those sets. The question is: How are the latter applied to other individuals in the population from which they were recruited? How do we estimate the untreated risk for such individuals?