Are your fries underweight?

Overview

In this article I aimed to identify if McDonalds fries were served at the advertised serving size in New Zealand. I originally had a small sample so borrowed power from a previous sample using a Bayesian prior. Bayesian estimation was conduced using the BEST package, a simple to use entry point for Bayesian analyses (although some background would be advisable). All code and data can be found here and there is also a branch with the original three samples I used. I highly suggest looking at the other branch to see how the smaller sample behaves.

This is a very easy way to ease into Bayesian analysis and wrap your head around Bayesian posteriors.

Data background

I managed to find one study where people weighed ten large fries from a Japanese McDonalds and a single sample from the UK. This data will be used to produce a prior for the New Zealand data. However, each country seems to have a different weight for their fries. To standardize the data I subtracted the serving size weight such that measurements represent how many grams over or under-weight they are (see below code).

#Set expected serving weights so we can standardize to zero
japWgt <- 170
britWgt <- 150
nzWgt <- 120

#Load Japanese data and British datum subtracting weight
japData <- read.csv("japaneseData.csv")$wgt - japWgt
britData <- read.csv("britishData.csv")$wgt - britWgt

#Merge Japanese and Bitish data
priorData <- append(japData, britData)

#Load NZ data
nzData <- read.csv("nzData.csv")$wgt - nzWgt

The table below shows a sample of the data. Just eyeballing we can see the Japanese data is under serving size and the New Zealand data is over serving size. It will be interesting to see how we can use this data to inform our small dataset.

	Weight	Country
16	14	New Zealand
13	19	New Zealand
9	-3	Japan
11	10	Britain
1	-21	Japan
6	-9	Japan
10	-1	Japan
7	-6	Japan
17	17	New Zealand
12	15	New Zealand

What would we expect the distribution to look like ?

I was interested in what we might expect the distribution to look like in NZ. What is an acceptable amount of variance from our expected mean / serving size ?

Well… we can assume people will be pissed off if they order a large fries and the weight falls within the same distribution as medium fries. In NZ a medium fries weighs 100g and a large fries weighs 120g (estimated from average serving info).

Therefore we can tinker with some distributions using our rescaled means of 0 and -20 for 120g and 100g respectively.

Using the plotAreaInROPE() function from the BEST package we can test distributions for large fries against the serving size of medium fries. The x-axis shows the radius or interval. For example, a value of 2 would be 2g on either side of the value we wish to compare to, in this case -20g. The y-axis shows the percentage of samples from our distribution in the given radius. This is pretty useful for testing our distributions but the actual purpose for this function is for generating a Bayesian p-value (See intended use for a ROPE here).

plotAreaInROPE(rnorm(10000,0,5), 
               compVal=-20, 
               maxROPEradius = 10,
               cex.lab=1,
               cex.axis=1,
               main="Proportion in given range of medium fry weight"
               )

We can see that with mu=0 and sigma=5 only around 2% of large fries will be closer to a medium fry weight (more than 10g under weight) than a large fry weight. It seems acceptable and we certainly wouldnt want error to be much higher than that. This translates to an approximate 95% credible interval of +/- 10g or 16%.

At this point I read this back and realise how obvious it is that there is around 2% of samples more than two standard deviations out on a single tail of the distribution. At least we have supported that maths still works and played with a new concept, the ROPE.

What does previous data tell us ?

Previous data looks very low. It could be possible that the data we collected is abnormally high, but there are a lot of confounds both our data and the Japanese data, e.g. likely the same person filled all boxes. Interestingly with Bayesian estimation we can build these concerns into our models using the prior.

Using the BESTmcmc() command I found it really easy to stick data in and specify a prior. I set a moderate uncertainty around the mean and standard deviation using the muSD and sigmaSD arguments (I realise moderate can be a little subjective). This means we are weighting the model more towards our prior than the data (see priorModel.prior). If we were increadibly uncertain of how the data is distributed then our prior would have very low weight in the model (see priorModel.vague).

#Prior with low certainty, i.e. high muSD and sigmaSD
priorModel.vague <- BESTmcmc(priorData,
                             prior = list(muM=0, 
                                          muSD=100, 
                                          sigmaMode=5, 
                                          sigmaSD=100)
                             )

#Prior with loose assumption that mu is near 0 and sigma near 5
priorModel.prior <- BESTmcmc(priorData,
                             prior = list(muM=0, 
                                          muSD=5, 
                                          sigmaMode=5, 
                                          sigmaSD=5)
                             )

The table below aggregates information from summary() when used on a BESTmcmc object¹. Compared to the vague prior, our informed prior did effect the data as suspected, pulling parameters about 2g closer to our expected values. An informed prior also decreased the Highest Density Interval (HDI) which is sort of like a confidence interval. I think when we compare the two posterior distributions, the effect of the informed prior seems fair and Im happy to use it going forward.

	mean	HDIlo	HDIup
Vague prior
mu	-8.298119	-14.250125	-2.170984
sigma	9.227303	4.802665	14.607963
Informed prior
mu	-6.419240	-11.311337	-1.412083
sigma	8.591092	4.953592	12.965692

The BEST package has a really nice plot feature too. It can plot mutliple parameters and below we can see the posterior distribution of the mean (note that the variance on the mean is not the same as sigma, thats a different parameter with its own posterior distribution in Bayesian estimation). The comparative value is also displayed which in this case is zero, a mean equal to the serving size of large fries. Based on what we see here there is only a 1% chance that the mean is equal to or greater than zero (my frequentist brain wants to say this interpretation is wrong though).

plot(priorModel.prior)

If we consider the overlap with our prior (mu=0, sigma=5), we only see a 36% overlap between posterior and prior distributions. Conversely if we used a vague prior the overlap was about 28% with our prior distribution. We can see even with a prior of moderate certainty the data pulled the posterior far from our expected parameters.

postPriorOverlap(priorModel.prior$mu, dnorm, mean=0, sd=5)

What does the New Zealand data tell us ?

Now that we have a posterior using previous data, we can use it as a prior for our NZ data. Ill also run the model using the original prior we assumed and a vague prior to understand how the priors are effecting the data. Note in the data informed prior below, I have increased the uncertainty of mu and sigma by multiplying muSD and sigmaSD to reflect our uncertainty in how realiable the previous data represents the New Zealand distribution. muSD is multiplied by a factor of 10 because its probable there is something particular about the sample leading to under count. We want to have a reasonable probability that the mean is zero or even a positive number. sigmaSD is only multiplied by a factor of two in order to reflect that different employees could have higher variability whereas the Japanese sample came from a single visit and likely a single person.

#Prior with low certainty
nzModel.vague <- BESTmcmc(nzData, 
                          prior = list(muM=0, 
                                       muSD=100, 
                                       sigmaMode=5, 
                                       sigmaSD=100))

#Prior with our assumed distribution
nzModel.prior <- BESTmcmc(nzData, 
                          prior = list(muM=0, 
                                       muSD=5, 
                                       sigmaMode=5, 
                                       sigmaSD=5))

#Prior using posterior from previous data. Uncertainty increased to account for hypothesised bias
nzModel.japPrior <- BESTmcmc(nzData, prior = list(muM=mean(priorModel.prior$mu), 
                                               muSD=sd(priorModel.prior$mu)*10, 
                                               sigmaMode=summary(priorModel.prior)[2,3], 
                                               sigmaSD=sd(priorModel.prior$sigma*2)))

Our table output shows sigma is vastly reduced with the data informed prior, that is, the variance in our New Zealand data has been borrowed from the previous data. We see in the informed prior that mu decreases and sigma has decreased slightly too. This was interesting to see how our more strict expectation of the prior effects the data.

We can see across all models that a mean less than zero is quite unlikely and with the data informed prior which borrows certainty for sigma from previous data, our credible interval (HDI) now doesn’t even include zero. Further to this, there is only a 0.4% chance of the mean being less than zero given the data.

	mean	HDIlo	HDIup
Vague prior
mu	18.282054	12.781038	23.96057
sigma	6.022361	1.711460	11.93653
Informed prior
mu	13.464925	5.684140	19.62233
sigma	7.274326	2.258868	14.23917
Data informed prior (DI prior)
mu	18.028217	12.281887	23.73467
sigma	6.451253	2.748194	11.02815

If we compare the posterior distribution of mu to our suspected mean (0g) and variance (5g), there is only a 3% overlap in the distributions. If this effect held with a higher sample then its good news. if for some reason you count calories and eat at McDonalds it could be you consume more calories than you think.

postPriorOverlap(nzModel.japPrior$mu, dnorm, mean=0, sd=5)

We can also look at our standard deviation to confirm if we are in the right ball park with an estimated 5g variance. The plot below shows that our mode sigma was 5.38 and that there is around a 70% chance that sigma is greater than 5g. Not too shabby.

plot(nzModel.japPrior, which = "sd", compVal = 5)

In one final check I want to know how our three posterior distributions of mu and sigma stack up between the three models by calculating overlap of each parameter. No surprises that the largest overlap is between the vague prior for mu and the data informed (DI) prior. This suggests that the data informed prior was more representative of the true distribution than our vague and informed priors.

	mu	sigma
DI prior ~ Informed prior	0.47	0.84
DI prior ~ Vague prior	0.95	0.82
Informed prior ~ vague prior	0.43	0.80

Conclusions

The mean of the Japanese/UK data was vastly different to the NZ data.
There is less than a 1% chance that the average weight of large McDonalds fries from the Japanese data is the 170g serving size in Japan.
Our samples benefit hugely from the increased certainty of sigma from the previous data.
It is possible the standard deviation of fry weight is larger than we estimate in our prior with a 70% chance of being greater than 5g.
There is less than a 1% chance that the average weight of large McDonalds fries is the estimated 120g or less.

To check the MCMC models converged properly I checked the Rhat was close to one and samples were greater than n.eff. You can bring these numbers up by typing your BESTmcmc object into the console and the documentation on these measures is quite good.↩