Data assets or data liabilities?
A contrarian view on the value of raw data
We all discuss data as an asset, but is there a contrarian view?
Those who know me are familiar with how much I appreciate many of Naseem Taleb’s (Antifragile, Black Swan, Fooled by Randomness) views, specifically that of Via Negativa. This method argues that sometimes, a more productive avenue of thought is to explore scenarios where a certain situation can be flipped to its opposite - since a negative view can be more actionable.
For example, instead of motivating a data science team, we should think about how not to demotivate them (and stop doing that).
How does this apply to data? Instead of looking at the myriad of ways it can be valuable, let’s have a look at the situations when it can become a problem:
Data leads to management overhead: the more data we have, the more we need to manage it. For example, we need to look at who owns the data, how does its lineage looks like, and who should have access to it.
Data leads to security problems: with more data available, the chances of a leaking of confidential data increases.
Data leads to infrastructural stress: datasets, at some point, can start to take up an exorbitant amount of space, especially in some formats (image, audio, video) and those from sensors. This can be expensive, especially if we need to make it accessible and safely stored.
So next time, before we decide to collect all possible data - let’s take a pause and reflect on this contrarian view.



And the cost for data literacy... to get your community to use the data and pull insights from it. The sheer volume of different datasets that needs to be evaluated to be fit for use (or not) can be a barrier of entry for analysts.
I would add to infrastructural stress, that it's not only storage capacity it's also network bandwith and compute power. E.g. imagine a backup job for all your stored data using compute-intensive compression. At last data liabilities lead to unnessecary infrastructure costs.