The following benefits emphasize the importance of a repository and are regularly realized in implementations.
One location = improved analysis
One of the biggest benefits of a central data repository is that it puts the entire data landscape in one place. By collecting data in one place, it is easier to create analytical reports that contain information about each individual source as well as multiple sources. These analytical reports provide a 360-degree view of the data landscape, identifying conditions that cause problems such as duplicates, missing/invalid values, and other integrity issues. Without a data repository, cross-system analysis is difficult, leaving many issues undetected, leading to faulty/duplicate/abandoned data in the target system.
No impact on production
The ability to create and run analytical reports across the landscape adds incredible value to the project. Without a data warehouse, analysis would need to be performed in test or production environments. If analysis is performed in test environments, the data will be out of date and out of sync. If there is an attempt to cleanse the data, monitoring the status of data quality issues becomes difficult. If the analysis is production-related, the impact on ongoing processing is a concern, and actions will need to be performed within limited windows. By being able to extract data as needed, the repository circumvents both issues, as the extraction actions have minimal impact on the production databases.
Predicting transformation results
Predicting the outcome of a conversion without actually populating the new application is critical to knowing that the data is ready to run before the switch. A centralized data repository makes this job easier. With a repository in place, you can import configuration data, table structures, and metadata from the target system and run mini-conversion tests to ensure that there are no missing cross-reference values, customizations, or other conversion errors. This predictability becomes more important in cloud implementations where control of the environment is limited and restoring backups/reverses is difficult/impossible. Because these mini-test cycles are frequent, problems are identified early and moved forward in the project. When it is a formal test cycle or launch, the project team has high confidence in the data,
Cleanliness regardless of production
Data quality is a primary complaint in all projects, so to meet data cleansing requirements, additional data sources and cleansing spreadsheets need to be incorporated into the process. Often these data sources are populated by the company to replace the data that needs to be cleansed. A data repository provides a method to validate and incorporate these additional cleansing sources into the data transformation/cleansing processes.
Consistently transform and enrich multiple data sources
Legacy and targeted applications have been developed by different companies with different business requirements in different eras of technology. A byproduct of this is that the new system will require data that does not currently exist. To improve the data by incorporating this information, the new data has to come from somewhere. Sometimes this data can be calculated by conversion programs. Perhaps a county code is needed in a new system but not in legacy systems. Data conversion programs can calculate county values as they convert data.
Optimize data migration testing
Data is constantly changing. The constant state of change makes it difficult to validate and test data conversion programs. Because data in a repository is collected at a specific point in time, improvements can be reviewed and tested without worrying about time issues. Without a data repository, checking record counts and tracking various data issues becomes a futile effort as these metrics change minute by minute.
Simplify post-conversion reconciliation
An important part of all data conversion projects is post-conversion reconciliation. There are several ways to approach reconciliation, and a repository speeds up the effort. Once data is migrated, it can be extracted into the repository alongside the original source data. Since the repository contains the data at the exact moment of conversion, you can finally validate the migration process without worrying about post-conversion updates.
Focus on exceptions
A powerful method of validation is to identify exceptions/differences that occur from test cycle to test cycle. A central data warehouse allows you to retrieve data at any point in time. When this snapshot is created, it is possible to identify any part of the data that is different from another point in time. This information only focuses attention on the differences, meaning there is no need to spend time re-checking/reviewing any unchanged data. This exception-based reporting method speeds up review time and a lot of personal frustration over not having to recheck the same data.