Big Data – The theory of everything vs experimental designs

In big data, market research on May 23, 2013 by sdobney

For those in the research world Big Data must be the biggest catchphrase of the moment. Marketing gurus are rushing to tell us that we can have so much data from observations and conversations online that we’ll never need to run a market research survey again. The data is out there, we just need to build bigger and bigger models and we’ll know everything about everyone. We won’t need research. Of course I’m exaggerating, but there is a sense being fostered that with Big Data we will have the ultimate theory of everything about people and how they buy. Us, at the experimental end of the scale, disagree.

The biggest problem awith the Big Data nirvana viewpoint, is that it implies that the data we have contains all the information that we will need. This is sort of a frequentist viewpoint. With lots of data, we know the world. We can build models and theories from the data and we will know everything.

The problem is though that we only have data about things that have happened (Big Data is historic) about things that existed or happened in the past. Now it may be the near past – eg tweets about a live event. But what we know about what happened doesn’t necessarily tell us about what will happen in a different circumstance, with different products or services. To take that tweet about a live event – it tells us how people are reacting after the event. But if we are trying to plan and build a live event for the future and so need to do all the hard work to get the event up and running we need to have ideas about how people will react to something that doesn’t yet exist. We don’t have tweets about the future.

The second issue is that Big Data can hold hidden biases. The reason market research became established was because George Gallup showed that an unbiased small sample (50,000) beat a biased large sample (2 million) when forecasting the outcome of the US elections in 1936. Lots of data doesn’t make that data representative. In work we’ve done, particularly for mixed B2B and consumer products, we’ve noticed that blogs and discussions tend to reflect the views of the noisy over the wealthy. In particular, the people (often in big companies) who spend most money are least likely to be commenting or discussing the products in question. And when we do representative research, we find these bigger customers often have specific needs that may not actually crop up directly in the discussions, or occur so rarely that they are not well captured at an aggregate or summary view level.

In fact this cuts to the chase of really what Big Data is useful for. If we’re looking for widespread views and opinions these are easily identified. Simple sampling theory tells us we don’t need to capture every piece of Big Data just a representative sample to get the broad picture. However, what Big Data can provide for us are the nuggets of gold, the a-ha moments among the outliers. The rare moments of real insight. But that means panning through a lot of comments, not for what they have in common, but to find things that are different, and then being able to see value in this difference.

Essentially Big Data gives us a finer resolution of the issues and problems we are looking at. We should be able to identify smaller groups, imaginative thinkers, thought leaders and get more information within the detail. It is not just the general picture that 5% were dissatisfied with the staff in store, but the detail that 2 staff were rude, in one the cash register broke down, and in the last one a member of staff was sleeping in the changing rooms. The detail is important.

The second benefit of Big Data is that rather than see Big Data as a pool to be dipped into, modelled and splashed about in. In reality, Big Data is much more of a stream of information (our company brand is built around the metaphor of knowledge as a flow of water). In a changing market, the information from Big Data from last year is already out of date. Your market has new competitors, new marketing campaigns, new information and new products. The way to use Big Data is then to see it as a flow, and see if you can perturb the flow. In other words run tests and experiments.

Experiments are easy to think of in an online advertising campaign where the results are clicks and responses can be measured. But we can also use other forms of Big Data as response measures. Tweets, social conversations – can we influence these with marketing and activities? More to the point, we want to make small scale tests to tune and optimise what we are doing. We need resolution and Big Data gives us the resolution to try things small before launching large later. Unlike a frequentist view where we need a Big Analysis followed by a Big Presentation and a Big Strategy from Big Data before we taken any action, the experimentalist viewpoint is that we take tiny steps. We build straw men and try them small scale, then learn and improve. Our experimenter is less worried about representativeness and more worried about replicability. Much more Bayesian than Frequentist.

It doesn’t mean there is no place for a frequentist view, but the bigger the dataset the more chance of finding spurious correlations and the more difficulty with developing an overarching viewpoint. We have experience from large survey studies and cohort studies (eg TGI, Food survey, 1951 medical cohort study) that the amount of potential analysis increases exponentially with the amount of data. Unless we know what we are looking for, we just end up with a slush of data – all potentially interesting, but without being able to say what is important.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: