Arthur C. Clarke, best remembered for his screenwriting role in 2001: A Space Odyssey is also the source of a famous quotation
Any sufficiently advanced technology is indistinguishable from magic.
We can think of data science as an advanced technology that draws heavily on statistics that have developed in the past decade to exploit the huge gains in power and reduction in cost of computing. Like any rapidly developing technology, approaches, methods, results and usefulness are all over the map. It would be hard to find an executive who has not heard or read glowing tributes to many of the successful applications of data science.
Accounts of failure and disappointment are harder to come by, if for no other reason that they don’t make good press and a shortage of spectacular damage to the unfortunate enterprises that couldn’t find the secret sauce.
It’s been 20 years since the last time we have been enthralled by vast new horizons in technology that appeared in the dotcom boom. Some of that potential has materialized into vast empires; most of the effort disappeared, stories we didn’t hear about until the dotbust.
Relatively few in executive management are data scientists, nor do they need to be. But they should know enough to understand the pitfalls.
Everyone understands cherry picking (aka selection bias). The executive needs to keep that in mind when reading the latest success story.
Every public company has an audit committee of the board of directors, a CFO and a CEO with some combination of finance education, experience and acumen. They worry rightly, about the data stream measured in dollars and cents. How collected? How recorded? How reconciled? How tested? How accurate?
Data governance has been an evolving corporate tool over the past several years, recognizing the hazards of not knowing what you actually do know. I’ll round out Donald Rumsfield with the unknown knowns. It can get ugly when a court decides that a corporate defendant had constructive knowledge of a crucial adverse fact – because there it was right in its own business records, but buried.
Data science, of course, involves data. It may be corporate data, commissioned data or public data. Like all the rest of the data the danger exists that if you know it, you own it. Where did it come from? When? How was it accessed? Who accessed it? How was it transformed? Was it sampled? Were the samples retained? What version control was applied? What tests were performed? Aren’t those test statistics data, too?
Money is fungible, data is not. Does the difference argue for less stringent or more stringent internal controls on the application of data science? I don’t have the answer, and I’m not privy to any internal debates. I do know of at least one substantial organization that is struggling to harmonize the Sarbanes-Oxley processes of a recent acquisition with those of the new parent.
We may see Chief Data Science Officers or even audit committee members with data science literacy. The Industrial Age had well developed protocols for R&D. Data science hasn’t yet matured to the point where it’s susceptible to that level of discipline. At some point, it will need to be.