Big Data Bigger Flops – Lessons Learned from Big Data Projects

Natalie has been an analyst and designer on multiple big data projects — some successful and some not. In this talk, she’ll take a trip down memory lane on some of the NOT-so-successful projects, exploring her past learning experiences and discussing what she’s found is needed to make big data projects successful — and things to avoid. We’ll discuss visionary goals, prioritization, architecture, build versus buy, and how to educate the rest of your team on big data concepts.


Is third party data really that bad?

The research below raises the question of the value of third party data, and it’s worth for marketing campaigns. I’ve always been a fan of the data – even when it’s not perfect. This just means one should 1) really evaluate if third party data is the right approach for your business case and need and recognize the inaccuracies before jumping in. 2) test it on a small scale with certain data providers to see if it’s money worth spending.

Third party data is so often questioned for it’s value, and often time the statistic of 35% inaccuracy between determining if the subject is male or female is thrown around. But where does that number come from and how much bearing does that have on the accuracy of third party data? In 2012 one anonymous ad tech exec told Digiday “the gender is wrong 30-35 percent of the time,” and that statistic has been plaguing the analytics market like wildfire.

Marketers are then commonly forced to ask: But how much does that number really stand up; how inaccurate is third party data really; is the price I am paying for third party data worth it?

During the Digiday Programmatic Summit in November of 2016, Matt Rosenberg, ChoiceStream, went over the importance of scale that third-party data sellers are pressured under, and often time in order to meet that need for scale, accuracy is thrown to the wind. “Advertisers need scale, and as a data vendor, if you can’t provide that, no one will buy your segment,” he said.

Rosenberg put it like this: “If you can get 300,000 people in a group with 95 percent confidence that they belong there, or 30 million people in a group with 60 percent confidence, well, it might not be such a hard decision to relax your model a bit, especially when no one is set up to audit you.”

In a study done by ChoiceStream, the company Rosenberg was once CMO of, it was found that a particular data vendor had identified 84 percent of users as both male and female, much higher than the traditional “35%” that is usually thrown around. While this could easily be seen as an outlier, ChoiceStream took the time to examine two vendors that were least likely to identify people as both male and female. By getting the third-party data internally from the vendors and syncing across data-sets, it was still found that about a third of the time the two vendors disagreed on what gender an individual was.

Imperfect data leads to imperfect analysis

I’m a big believer in looking at data even when it’s imperfect to see if you can gain insights as some data is better than nothing, but it’s important to be realistic and think of it as an indicator to test rather than a TRUTH to build on. I thought this article did a good job of pointing out several potential flaws that I’ve seen occur in my career.

The three legged stool of data science

“Data Science is a three-legged stool that combines business acumen, data wrangling and analytics to create extreme value. Focusing on the hard science skills such as statistical methods is a common mistake when actually, developing the knowledge about a particular business and wrangling the relevant data are often the most important skills to bring to the table.”

In my experience, failure of data science and advanced analytics projects most often is the result of a lack of business understanding or lack of clean data.  These skills are often under valued when searching for data scientists and can result in a diminished or absent ROI on big data implementations and projects.

Christmas Carol as created by an AI

I’m not sure that AIs are really ready to create art that I can appreciate, but there must be a start to it.  Here’s a first attempt of an AI (recurrent neural network) creating a Christmas Carol from an inspirational picture and over 100 hours of music.  Give it a listen and see what you think of it.


Data usage without consumer knowledge

As someone who has used online customer data extensively in multiple roles and at multiple companies, I really believe most companies are respectful in their use of customer data and are attempting to improve the customer’s life. I tend towards sharing my own data with companies that request it that I respect and have a relationship with. Perhaps for me especially that is why it is so difficult to hear that this extremely personal data was being taken without knowledge, consent, or request – and used for purposes undefined.

Ten reasons Data Projects Fail

I found this to be a very blunt and honest warning of why Data Projects can fail. It’s a good (if pessimistic) read for anyone who is running or planning to run a data science project within their company.