The validity is independent of the population size!
Let's see what kind of difference we can detect, actually.
Let's say we make a 9-point scale from -4 to +4:
-4 is kind, 0 is neutral, +4 is brutal. (So it rhymes. Sue me.) The standard error will be equal to
s / sqrt(n) =~= s / 7
Now, we want to see what magnitude of difference we can detect. We're only concerned with detecting brutality, so we'll do a one-sided test: at 49 df, that's
1.7 * s / 7 ,
but we have to guess s. (If we had the data, we'd estimate it from there, natch.)
Let's guesstimate the standard deviation s to be 1.5, so that about 95 % will fall between -3 and +3. Sounds reasonable, no?
Then the minimal difference that we can detect, the null hypothesis being mean <= 0:
1.7 * 1.5 / 7 = 0.4
That's not such bad power, is it now? It'd be quite good at detecting any tendencies towards the brutal. And before you start thinking that it's too sensitive, remember that there's a lot of kind soldiers out there, too.
* * *
I just realised that that test is not quite appropriate. Better, perhaps, to treat kind/brutal as dichotomous, and determine whether the number of brutal ones is 'acceptably small' . . . perhaps counting the number of incidences of evil, and a Poisson distribution. Or perhaps use an F-test on the ratio good/evil. Hmm. Anyway, you get my point --- 50 is not bad at all, though 100 would be better. |