Data Warehouse

A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources.

In addition to a relational database, a data warehouse environment includes an extraction, transportation, transformation, and loading (ETL) solution, an online analytical processing (OLAP) engine, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users.
A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources.

In addition to a relational database, a data warehouse environment includes an extraction, transportation, transformation, and loading (ETL) solution, an online analytical processing (OLAP) engine, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users.

Data Modeling

Conceptual Modeling provides a high level of abstraction in describing multidimensional data, and is aimed at achieving independence of implementation issues. It is widely recognised to be the necessary foundation for building a database that is well-documented and that fully satisfies user requirements; usually, it relies on a graphical notation that facilitates writing, understanding, and managing conceptual schemata by both designers and business users.
The main benefits of Conceptual Modeling can be summarised as follows:
• it gives designers and end-users a platform-independent, non-ambiguous, comprehensive picture of the Data Warehouse content
• it is 100% independent of the OLAP multidimensional engine chosen for deployment
• it enables effective communication between designers and end-users with the goal of formalizing more accurately the requirement specifications
• it decreases the overall complexity of design by breaking it into two distinct but inter-related phases
• it streamlines the Data Warehouse life-cycle by enabling logical design to be automated based on widely recognised best practices
• it provides clear and expressive design documentation, readable by both ICT and business people, which improves the overall system maintainability
• it provides a graphical environment for expressing OLAP queries in a more intuitive way
• it enables early testing of requirements based on the core workload expressed by users, thus reducing the probability of errors and misunderstanding
• it encourages self-service BI by helping business users understand the information content of their Data Warehouse
• it enables the computation of metrics, in order to assess the quality of design effectively

Architecture Design

Data warehouse architecture is the key factor in building a good data warehouse for your business. Choosing the most suitable data warehouse architecture is a critical task in data warehouse lifecycle.

Enterprise data warehouse architecture - the information from various source systems are consolidated into a central repository called an enterprise data warehouse. Data warehouse applications, such as reporting tools, query data from data marts instead of enterprise data warehouse directly.

Dimensional data warehouse architecture –data is brought from throughout the enterprise into a central place called a dimensional data warehouse. Like Inmon’s data warehouse architecture, the dimensional data warehouse also has the enterprise focus. In Kimball’s data warehouse architecture, the data mart is a subset of the tables linking together using star and snowflake schema. Unlike Inmon’s enterprise data warehouse architecture, analytic systems can access data directly from the dimensional data warehouse.

Data Transformation

Data transformation is the process of converting information or data from one format to another format which can reveal features hidden in the data that are not observable in their original form. We can transform the distribution of data to make it easier to see.. While the strategy is often thought of in terms of converting documents from one format to another, data transformations may also involve converting programs from one type of computer language to a different format in order to allow the program to run on a specific platform. The actual transformation may involve converting multiple data streams into a common format, or converting a single format into multiple different forms for use across a wide spectrum of platforms.

Data Quality & Cleaning

Poor data quality is a well-known problem in data warehouses that arises for a variety of reasons such as data entry errors and differences in data representation among data sources. For example, one source may use abbreviated state names while another source may use fully expanded state names. However, high quality data is essential for accurate data analysis. Data cleaning is the process of detecting and correcting errors and inconsistencies in data. Typical data cleaning tasks include record matching, deduplication, and column segmentation which often need logic that go beyond using traditional relational queries. This has led to development of utilities for data transformation and cleaning.

#DataWarehouse #數據倉庫

Comments are closed.