“Democratizing” data and insight, open access, data transparency, it goes by a lot of names, many want it, but the correct decision for you depends on your org.
I’m only going into the layers of trust issues here today, and leave technical/practical considerations for another day.
Every organization needs to make a decision about who has access to that data, whether implicitly or explicitly. The “correct” answer for your org will be unique to your org at that time. As Data Scientists, we’re often seen as the stewards of good data practice and often asked to be leaders in driving data-driven culture. But how do we decide what to do?
Why does “democratizing data” come up so often?
The basic desire for effectively broadly sharing data out stems from how information in an organization naturally tends to stay within local parts of an organization. You have the finance team looking at their data, the customer support team looking at theirs, and the product teams looking at their product metrics. Unless there are active efforts to push or pull information across organizational boundaries, things tend to stay local just from inertia.
On the surface, the calculus seems simple: people all across an organization need to make decisions. Good evidence and data is needed to make good decisions, such data comes from all across the organization. Ergo, we should make sure decision makers can access the data they need to maximize their effectiveness.
And yet, working in a number of tech companies over the years, I’ve witnessed different approaches of how companies. At the extremes you find companies that are very hesitant to give workers access to much information and data at all. At the other extreme, there are companies that give access to practically everything with few exceptions. Where you stand on this spectrum seems to depend on your specific organizational realities.
Many Different Views, Thanks DS-Twitter!
I was recently curious about where different data scientists felt their companies are/should be on the data democratization spectrum, and so I asked some people on Twitter.
There was so many good responses, I’m gonna try to summarize and highlight examples along the way.
Different Kinds of Trust
When synthesizing my own experiences with the wide range of feedback I’ve gotten, it seems clear to me that the topic of how widely data within a company is shared revolves around a few different sorta of trust. The unique combination of where you stand on each axis determines to what you do share, and changing what you share implies changing where you stand.
#1 — Access
We can start with the most obvious axis, who can access what data to begin with. To start there are many external constraints to this. For example, compliance with regulations may tightly restrict who can access certain kinds of data — healthcare data, personal data, credit card data all fall into this category. These are generally not negotiable and must be navigated.
Then there is sensitive data like company secrets, plans, etc. You can share these out to the extent that you trust the people who have access. Some organizations flat out don’t trust certain levels of employees with anything remotely related to such information, others share practically everything. It’s the result of the risk trade-offs the organization is willing to make.
But beyond those sensitive bits of data, there’s a very wide array of data where that could potentially help someone do their job better without posing a major risk. I’d argue that much of a company’s data falls under this scope. My observation is that companies that tend to hire data scientists and analysts already have a certain level of desire to be data driven at a company-wide level, so access restrictions tend to be on the more open side of the spectrum.
Under this backdrop, I don’t find it particularly surprising that many data scientists seem to lean towards giving people more access to data, where practical. It’s the “where practical” part hits upon the next trust axis.
#2 — Individual data literacy
A very common concern amongst management when data is accessible company wide is “what if someone takes data, does an incorrect analysis, and makes poor decisions based off it? Where are the guard rails?” This is a fundamental question of the state of data literacy within the company. There are two classes of people in this, people who are bad actors, and people who simply don’t have the skills needed to do an analysis.
There is a real worry that bad actors can use bad data to justify bad, even horrible, decisions without safeguards. However, a good number of people chimed in that this risk of dubious decision making is always present, regardless of whether data is available or not. Having bad actors within an organization is a larger organizational issue, not merely a data one.
Instead, many more people, myself included, treat having people who don’t have the skills to do analysis as an opportunity for education and training. People, even those who have no data credentials, are encouraged to engage with data. The caveat is that when they want to use data to make a decision, they should have their work checked by someone who is more skilled with data. It gives the data scientist/analyst/etc a chance to educate on better methods, while also having more eyes on the data, looking for opportunities and inconsistencies.
Strong process and a culture of learning data literacy can act as the guardrails. Yes, it’s okay for a layperson to look at data and use it to figure out how their part of the organization works, especially if the data has been cleaned and curated to some degree. But only as long as they’re aware that they are not experts at using data, and there is a support network involved. It’s a kind of “Trust, but verify” kind of mentality that works in many practical situations.
If you take up this sort of model, the extent that the actual data scientists have the time, resources, and interest (not everyone wants to teach), to devote to this will largely determine the extent that you can trust people along this axis.
Organization’s Decision Making
Finally, how your organization delegates trust in decision making has a strong effect on how efforts of democratizing data play out.
A company organized to be rigidly top-down with centralized decision and power structures uses data and information differently than one that is strongly bottom-up where decision-making power is diffuse.
While on average bottom-up type organizations that are common in tech tend to think of sharing data widely more, it’s not necessarily a given. Both might actually want data widely available to everyone for different reasons, but the exact form it takes will be very different to suit the existing structure. The structures within the org may make it so that people don’t seek out data to help their decision making. Incentives built into the org can bias people to be risk adverse, or become risk seekers. It all depends, and you have to take a long hard look at things to come to a decision on where you want to stand.
Data as a force for change
People tend to have a notion that an organization “is ready” for data democratization, like it’s a vague property of the org. I’d argue that since there’s a very big leap to go from “everyone having data” to “everyone having insight”, it is the first step in actually pushing an org to be more data driven.
The more people are familiar with interacting data, asking questions using data, and leveraging the skills that analysts and data science can provide, the easier it is for them to adopt it into their work.
To all the great folk who made lots of useful comments on twitter about this topic. My brain still hurts trying to synthesize the many viewpoints.
I also want to specifically call out this Data Landscape Manifesto that Scout24 referred me to, which is an awesome example of one organization laying out where they stand on many of these issues. It works for them, obviously YMMV.