Bias in market research, social listening and Big Data.

In analysis, big data, customer insight, market research on July 5, 2013 by sdobney

Bias and the potential for bias is a fundamental concept in market research and market analysis. It’s not that we always want perfect data. In some circumstances we know we have to make do with data that has certain potential biases but that there is no other practical data alternative (and by practical we might include the expense of alternatives). However, the emergence of Big Data and ease of access to large internet panels does mean that bias has disappeared from market research agendas a little.The gold standard of market research used to be a fully random sample with, in the UK at least, names picked from the electoral roll and face-to-face interviewers making repeated visits to try to interview the person selected. This was, not surprisingly, very very expensive and only carried out for the most major projects and prestige government work. In the US, random digit dialling – RDD – was used to the same effect (at least among telephone users) but to a much lower cost.

Now much more reliance is placed on convenience samples of people who have signed up to do surveys in the large internet panels, giving rise to issues like ‘professional’ respondents. Though even here, internet-based samples (with suitable weighting) have proven to be accurate at predicting the outcome of elections; better even than RDD telephone samples where interviewer effects have had a impact.

For those doing Social Media monitoring or tracking web-behaviour, these are obviously live data unblemished by sampling. So the data available should be getting more accurate. Shouldn’t it?

The problem is that in something like social media, the comments and voices that we can see tend to be those of the loudest customers, not necessarily the most important. Back in the midst of time (the 1990s) I worked in the IT industry and we did a lot of what would now be known as ‘social listening’ – listening into newsgroups, opinion pieces, discussion forums to find out what our customers were thinking. What we discovered was that those who were making comments were quite different to our main customers. Those people who bought the largest volume from us were business customers and never made public comments. Their issues were much more about management of technology than the technology itself. The online community was an important influencer community – it affected some of the opinions of our major buyers – but it did not represent the buyer community. In fact some of the noisiest bloggers were using the most out of date products – they had opinions despite not being buyers. Unless you can match viewpoints back to purchases it can be very difficult to say if you’re really listening to the market, or just to some small subset. It doesn’t mean don’t monitor and react to the opinions, just that the opinions often aren’t the whole picture.

Similarly with Big Data in the form of transactional information (eg web-tracking), the data the business has is information about people who use the business. For those who do not buy or do not visit (potential buyers) you have no information. It is very difficult to see what is turning people off if they just stop visiting. This has a parallel for customer satisfaction research, in that satisfaction research measures the opinions of customers. By becoming customers they have already shown themselves to be broadly satisfied with what you offer. The hidden group are those who didn’t buy, but as they are not customers you have no data. This leads to an occasional paradox in that failing businesses can see their customer satisfaction scores go up, not because they do anything better, but because dissatisfied customers go elsewhere.

In practice, researchers try to look at and understand the biases in the data sources they are using. Sometimes a customer list is the only source of viable contacts for an industry so research can only take place using those lists (a good researcher will try to isolate a group close to ‘non-buyers’ as a comparason group to check for potential variations. An online panel may be the fastest and cheapest way of getting a consumer sample, but the researcher will want to screen for hidden professional respondents with check questions so as to ensure quality.

The other factor is to think about the type of decision that is to be taken. For many mass market situations, the choice is a simple go:no-go decision. If a majority of people like something then go with it. Small scale bias only has an impact if the outcome is close to 50%:50%. If the results are more like 80%:20% the decision is much clearer. In fact for these types of simple decisions relatively small sample sizes can be used, or rough rules of thumb applied – if 5 out of 5 randomly picked people agree with something, then 95% of the time a majority of people would agree, or if 8 out of 10 people agree with something, then 95% of the time a majority of people would agree (obviously assuming random selection). These low N approaches can be used for rough and ready learning.

For trend-based data, change is often the most important element and researchers can find themselves able to achieve a replicable convenience sample, but not one that is fully representative. In these cases change between sample waves may be sufficient to show or test for market changes without having a fully rep picture. Trend measurement is typically looking for small changes so larger sample sizes really help. There are ways of rolling-up sample to increase the sample size effect if only small samples can be done wave-by-wave.

In some areas though, accuracy becomes important – eg market share measurement, pricing research, research involving correlations or statistical modelling. In these situations it is very useful to have external check points to validate the sample data. Does market volume seem reasonable given reported financial revenue figures for instance?

The best way to approach bias is often to triangulate known factors and sources. Even a very good random sample will give an incorrect significant result 1 time in 20 (which is what 95% confidence means). And even with great statistics there are potential biases in the way the question was asked, or was answered as human factors like social conditioning, anchoring or problems with recollection come into play too.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: