Homework for my TA

Here are the comments I have made this week (due 28th October):




Thank you very much!

Does removing outliers make us liars?

There are a great deal of reasons why outliers occur in data. There can be technical faults with machinery, a fault in the design of the experiment, the participant could have misunderstood the instructions (or not really respected the concept of the experiment), or a certain participant could just have extraordinary results. Although, it is often accepted that outliers can be removed in certain circumstances, I am going to tell you some of the reasons and why I think they should not be excluded from the findings.

When there is a technical fault with machinery, the data obtained can be drastically effected. On these occasions, the results may not represent what the investigator is attempting to analyse. For example, a research may be interested in reaction times but if a machine is faulty it might record times that are hugely different from the time it took the participant to react. In instances like this it is easy to assume that outliers should simply be cast away as they are in fact not valid. However, I would argue that if the machine had been faulty for some of the trials then there is no certainty that it had worked for the rest. Perhaps the other results seem ‘normal’ because the inaccuracies of the machinery were less obvious. I feel that really there is no other way to proceed than to recollect the answers from the participants (after ensuring the machinery is no longer faulty) to get accurate data. Obviously, this would come to a great cost and effort and time for the researchers but as researchers of science surely we need to make sure that the results are true. Here is a link http://pareonline.net/getvn.asp?v=9&n=6 that describes how, using certain statistical methods, you can instead keep your outliers without violating your results.

The next outlier issue I will discuss is concerning participants. Some times it is thought that participants can, purposefully or not, make mistakes when taking part in experiments. Although there are reason with which people remove outliers from data, such as data entry mistakes,(, I believe that regardless of whether it was because the participant did not understand or if they just didn’t bother doing it to the best of their capabilities, outliers should not be removed from the results of the investigation. I think that if the participants did not under stand what was expected of them, there has been a fault in the methods of the experimenter (Rosenthal, 1994). The instructions should be written or delivered well enough for everyone to understand easily. This will ensure that they are aware of what is expected of them as what ethical guidelines demand.

If participants have not completed the task suitably because of a lack of interest it is thoroughly accepted that the data is not valid and so should be dismissed. However, I think that sometimes this completely oversees the point. We, as psychologists, attempt to investigate human behaviour. If we ask a person to do a task, no matter how they react, what gives us the right to say that they are incorrect and so shouldn’t be counted? Any human reaction should be important to our full understanding of human behaviour even if it is not the reaction we were looking/hoping for. We are not laboratory rats, we are humans.

This link http://pareonline.net/getvn.asp?v=9&n=6 also describes beautifully how sometimes the only valid scores look like outliers. For example, if you ask teenagers about their drug use, many of them might underestimate their true score due to demand characteristics. Therefore, the real scores would seem exceptionally high and might be regarded as outliers which is another thing we must be careful to consider.

Then finally, there is the occurrence of chance. Sometimes a person’s results are just very extreme compared to the rest of the sample. Whether this is because they come from a different population compared to the rest or whether there is a different factor affecting them should we really exclude them just because their results are different? And if we can do this, where do we draw the line? We might one day end up with studies removing any and all pieces of data that do not agree with the theories, and that is not really science, is it?


Pictures from:





Rosenthal, R. (1994). Science and ethics in conducting, analysing, and reporting psychological research. Psychological Science, 5(3), 127-134. doi: 10.1111/j.14679280.1994.tb00646.x

Do you NEED statistics to understand your data?

For the duration of this blog I would like you to assume that when I say statistics I mean the level of statistics that is above what is learnt at GCSE level. The level I mean is the level above the basic mean, median and mode that my maths teacher desperately tried to yell across the room; over the cries of “I don’t GET IT”. And with that disclaimer over I will try to answer a question that I am sure has passed through many a mind of a 1st year psychology undergraduate during a particular distressing lecture about analysis of variance; Do you NEED statistics to understand your data?

With fairly basic levels of numeracy it is easy enough to work out the average or the median of a set of data. Then, if you have collected this data from more than one group/condition, you can draw conclusions about the different groups and voilà… you have interpreted data on a basic level. You can suggest that Group A could be happier/smarter/faster than Group B because the average was much higher. But that is where you will begin to get stuck if you have no idea (of even just the basic workings) of more advanced statistics. You will not be able to say that the difference was significant, and therefore, even relevant or if the difference could have just arisen by chance. Here is a link http://www.skorks.com/2010/03/you-dont-need-math-skills-to-be-a-good-developer-but-you-do-need-them-to-be-a-great-one/ to a lovely article that explains how statistics/maths is crucial if you want to have an interesting career (granted the article is talking about software developing but I think the point is the same for almost all professions). If you want to do research then, ok, maybe you don’t need statistics, but if you want to find your research interesting and become a great researcher then you will definitely have to do much more than comparing two means with each other time after time!

So maybe we do need statistics. But then there is the beautiful invention of the computer! The brain that was born from numbers and science and so has no problem crunching numbers all day to give us that glorious printed page of words that tells us quite (almost) plainly whether the data we have given it is in anyway worth bothering to write about. So if the computer is going to tell us the answer anyway, why should we bother trying to shove all of this information into our brains about calculations workings out when the computer can spit the information at us and we can just write it down? I believe that the answer is that as potential scientists, surely we want to really understand the work and research. When I am looking at data, especially if it is my own, I want to know WHY personal differences between the people in the study can make such a difference (and why the computer thinks it can work it out without knowing them). I want to know WHY anomalies can occur and WHY we can sometimes exclude them (and HOW we do this). When creating experiments I want to know WHY three people normally isn’t sufficient where as thirty is quite good. But maybe that is just me…

Even if you aren’t really that bothered about about finding your own research interesting or understanding why your computer knows that your data isn’t significant, I bet that you want to be able to tell when you are being lied to. If you read a research report and something about it just doesn’t feel right, you want to be able to go along and find out why you feel like that and understand their data for yourself. Here is an article that tells us about statistics in the world today and how, without statistics we won’t know when we are being lied to http://www.wired.com/magazine/2010/04/st_thompson_statistics/. Now, this obviously is not going to happen very often but how often has some news programme or newspaper reported some statistic that you just can’t believe? With statistics you have the tool to go along and check their facts and see if what they are reporting is being reported in a full descent manner or if they have left out some specific detail that changes everything. Even the BBC’s Science & Environment news page has been thought to contain flaws that mean the full story is never really told to it’s audience (http://www.guardian.co.uk/science/the-lay-scientist/2010/sep/28/science-journalism-spoof). Although in the case of the BBC it seems that it is their desire to remain COMPLETELY impartial that is leading to this problem.

So there we have it. Technically you could go along and try to complete research without statistics but this research could possibly be quite dull or you might not be able to answer questions about the variance between participants because it was your computer that worked it out. You might not be able to tell if there is a mistake in your own work or the work of others. Is that the kind of scientist you would want to be?

Pictures from: