Statistics aren’t politically correct

There certainly are some limitations to human decision making, and when seeding with human decisions, the models don’t have the guile to hid the statistics of those decisions.  In this example, the AI pointed out that it is statistically significant that Amazon’s decision to hire is based heavily on gender – even going down to scoring women’s only colleges as undesirable.

In addition, it sounds like this model that Amazon built for recruitment wasn’t successfully able to provide qualified candidates for positions.  I’m surprised that this was rolled out across recruiters at all.  I know of a statistician who would testify in court in the 1970s – 2000s about whether such things were statistically significant in corporate decisions for discriminatory cases.  This algorithm produced is really no different in terms of how discrimination was measured, but it’s hard for me that this wasn’t caught before a recommendations engine was given to recruiters.  Obviously, this is not the story that Amazon would like leaked under any circumstances.

Lessons to learn here:

  • It’s important to be aware of that when building models to mimic current human decisions that human discrimination might be part of the model.
  • Validate that what your model is using to make recommendations fits within your plans for the future before rolling out to people to assist in recommendations.


Discrimination Issues with Digital Targeting

It’s not too surprising to me that digital targeting is starting to be labelled discriminatory.   When applied just to marketing messages and pushing specific brands,targeting by gender, race, political affiliation has been accepted.  However,  as digital targeting has been used for other purposes rather than just marketing, it raises questions of egalitarian principles and in this case, legality.   It’s a good reminder for those of us in the field to consider the ramifications of what we do as we start using accepted marketing methods on other problems.

Really? Is it AI?

I found this article while planning for my upcoming panel session ( tomorrow on Artificial Intelligence in Marketing Analytics.  I have struggled so much with the term.  So many vendors or data scientists are saying that they are doing AI.  Recently, I had a friend post that they wrote their first AI and it was only four lines of code.  Umm… that’s just not how it works.  (Sorry, friend, if you are reading this…)

I think about some of the analysis that I was doing in 2006 and I believe if we had a PR or sales person attached to our team, they’d label that “AI” today.  At the time, I was doing pricing with a model whose data would be refreshed all the time and the model output was tweaked every few months by yours truly and the output of the suggested price changes were evaluated by a pricing manager.  Yet, it was an ever-improving, ever-changing fast model that made decisions that didn’t require or need human intervention.  In 2006, AI was the trendy term and we’d never think to call it that, but…

Is it just me who thinks we have lowered the standard in what is meant by Artificial Intelligence?  (Here’s a dated article that I think helps capture my point.)


Does it really matter if you round your model results?

I worked with a fantastic marketing EVP, that adored analytics and would use it to drive her decisionmaking despite not being so math-savvy.  She quickly discovered that not all insights were equally strong.  Due to differences from assumptions, measurements of error, or dirtiness of data, some insights were much better than others.  She asked me to color-code the insights coming from my team with my assessment of how strong the insights were.   I was honored by her trust in me to net out the confidence of each insight, but also felt the responsibility of taking all the uncertainty of an analysis and communicating it with a color.  It was challenging.

In recent years, I’ve recognized that I continue to do this communication – maybe not as directly with color coding, but with the rounding and format of my charts and insights. I try not to present data as more exact or precise than it actually is, and this helps set expectations for the viewers.

IMG_20171026_152951Take for example, this picture of a speed limit sign from a downtown street.  It’s a bit silly — we all know that vehicle speeds aren’t measured that precisely.  But yet, when you do an analysis and output a value, if you show a number, like 5.2435, the recipient thinks that you have a very exact answer — and it doesn’t matter if elsewhere on the screen you say ‘results are accurate to +/- 1%’. Rounding is a conscious choice, and the rounding that you choose communicates your confidence in the preciseness of the answer.  To an analyst, sometimes, model outputs are just a number… but for a business person, there is value in whether you round to the nearest penny, dollar or thousands of dollars.  It communicates your certainty in your analysis.

Since realizing this, I have tried to always be intentional in my rounding and consider in graphs what I show and what it means. The better your communication, the better your chance to make an impact!

Chief Data/Analytics Officer

I’ve always believed that more companies should have a Chief Data/Analytics Officer.  Having separate analytics departments or a lack of C level title can makes their ability to do their job as truth tellers more difficult as they are forced to not only navigate, but yield to the politics and interests of the C-levels in the company in order to have a seat at the table.  I’m excited to hear that more companies are moving in that direction.  Also, 70% of these new Chief Data/Analytics Officers have backgrounds in math.  I really want to go show this to that academic adviser who told me that a career in math was only useful for those who wanted to teach.  :)  Analytics is definitely integrating well into the business world these days.


Is third party data really that bad?

The research below raises the question of the value of third party data, and it’s worth for marketing campaigns. I’ve always been a fan of the data – even when it’s not perfect. This just means one should 1) really evaluate if third party data is the right approach for your business case and need and recognize the inaccuracies before jumping in. 2) test it on a small scale with certain data providers to see if it’s money worth spending.

Third party data is so often questioned for it’s value, and often time the statistic of 35% inaccuracy between determining if the subject is male or female is thrown around. But where does that number come from and how much bearing does that have on the accuracy of third party data? In 2012 one anonymous ad tech exec told Digiday “the gender is wrong 30-35 percent of the time,” and that statistic has been plaguing the analytics market like wildfire.

Marketers are then commonly forced to ask: But how much does that number really stand up; how inaccurate is third party data really; is the price I am paying for third party data worth it?

During the Digiday Programmatic Summit in November of 2016, Matt Rosenberg, ChoiceStream, went over the importance of scale that third-party data sellers are pressured under, and often time in order to meet that need for scale, accuracy is thrown to the wind. “Advertisers need scale, and as a data vendor, if you can’t provide that, no one will buy your segment,” he said.

Rosenberg put it like this: “If you can get 300,000 people in a group with 95 percent confidence that they belong there, or 30 million people in a group with 60 percent confidence, well, it might not be such a hard decision to relax your model a bit, especially when no one is set up to audit you.”

In a study done by ChoiceStream, the company Rosenberg was once CMO of, it was found that a particular data vendor had identified 84 percent of users as both male and female, much higher than the traditional “35%” that is usually thrown around. While this could easily be seen as an outlier, ChoiceStream took the time to examine two vendors that were least likely to identify people as both male and female. By getting the third-party data internally from the vendors and syncing across data-sets, it was still found that about a third of the time the two vendors disagreed on what gender an individual was.

Imperfect data leads to imperfect analysis

I’m a big believer in looking at data even when it’s imperfect to see if you can gain insights as some data is better than nothing, but it’s important to be realistic and think of it as an indicator to test rather than a TRUTH to build on. I thought this article did a good job of pointing out several potential flaws that I’ve seen occur in my career.