Dumping data results in data swamps, not data lakes. What does it take to harness it better?
Data lakes are nothing but repository-platforms that help collate raw data, facilitating quick and easy access at the most granular level, in its native format. When banks look to create data lakes, the most common refrain is to ‘dump’ data into a common repository, often mistaken to be forming a data lake. This results in more issues than solutions, not very different from the water-bodies whose name it has borrowed. Highly repeated data sets, too many inconsistencies, and untagged data sets create more confusion than convenience. Are there then, some simple rules to follow that can make this effective?