The SQA2 Blog:
Data volume and complexity grows over time
Raj Kamal, talks about some key points of Business Intelligence and Data Warehouse testing
In this global economy, mergers, expansions and acquisitions have become quite common. As a result, multiple data sources get combined and a single repository is constructed to consolidate all the data in one place. Eventually as the data grows the complexity increases exponentially in terms of understanding syntax and semantics of the data. Also, the complex transformations logic to tackle this problem may further impact user query performance.
Upstream changes often leads to failure
Any changes made to the design of upstream data sources directly impacts the integration process. This further results in modifications of the existing schema and/or transformation logic. This eventually leads to not to be able to meet the SLA on time. Another constraint lies in the availability of data sources due to any unplanned outage.
Upstream Data Quality Issues
The quality of upstream data acquired is often in question. Primary keys are not quite as unique as expected; also the duplicate data or malformed data do exists in source systems.
Data Retention (Archival & Purge) Policy increase maintenance and storage cost
Business needs drive Data Archival and Purging policies. And the longer the duration for keeping data, the higher the cost of maintaining the data. To ensure that performance doesn’t degrade over time, proper Data Partition techniques must be applied.
Data Freshness required can be quite costly for NRTR (Near Real-Time Reports)
Accuracy and timeliness is a critical component for many time-critical applications running in sectors like stock exchanges or banking. It’s important that operational information presented in trending reports, scorecards and dashboards presents the latest information from transactional sources. Thus access, even from around the globe, to this information facilitates accurate decisions in a timely fashion. This requires very frequent processing of the source data which can be very cumbersome and cost intensive.