« Back to Glossary Index

Data Lake


A data lake is a system or repository of data stored in its natural/raw format, usually object blobs or files. A data lake is usually a single store of data including raw copies of source system data, sensor data, social data etc., and transformed data used for tasks such as reporting, visualization, advanced analytics and machine learning. A data lake can include structured data from relational databases (rows and columns), semi-structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs) and binary data (images, audio, video). A data lake can be established “on premises” (within an organization’s data centers) or “in the cloud” (using cloud services from vendors such as Amazon, Microsoft, or Google).


A data swamp is a deteriorated and unmanaged data lake that is either inaccessible to its intended users or is providing little value.


A data pit is an enigma. Much like a data swamp such that it provides little value, however, identifying the root cause to this problem is usually ambiguous. The raw data collected is typically accessible and contains appropriate data points yet fails to yield meaningful insights for its application. Having a thorough understanding on how to action on data by applying proper data science techniques is a key factor to preventing the data pit enigma.


Source: Wikipedia


Please contact us for related solutions.


#DataLake #Data_Lake

« Back to Glossary Index