The SQA2 Blog: General
Having worked on multiple business intelligence projects, they can be very complex and overwhelming to develop and test. BI projects are often plagued with vague requirements and long development and test cycle times. The following tips will help you navigate the BI waters and help you deal with everything that plagues it.
Requirement Honing
Vague requirements can plague any project, but it is very pronounced in a BI project. The purpose of a BI solution is to integrate data from different systems to tell an impactful and sensible story about a business. This information enhances the business’s ability to make more informed decisions. The problem is that the people driving the requirements are often the people that need the answers the solution will provide. Also, most decision makers do not understand all the intricacies of the incoming data. This results in vague requirements. It is our job as QA Professionals to lead the effort in hone the requirements. Here are some quick steps to take toward achieving strong requirements:
Understand the Big Picture
Understand the big picture means understanding the business. Not only does this mean understanding what the business owner wants, but why they want it. Understanding the goals of the business puts you in a unique position to make sure the acceptance criteria truly align with the business goals.
Break It Down
Break down the big picture into smaller independent pieces and have the business owner take a first pass at writing the purpose of each piece. This helps in understanding how each piece fits in the bigger picture. Also have the business owner take a first pass at filling out the requirements for completion.
Analysis Meeting
When it comes time to begin development on each item, have an analysis meeting. Go through each requirement and make sure everyone on the development team understands its purpose.
Explore the data and Write Test Cases
Writing test cases often bring up more questions that did not come up in the initial analysis meeting. When these questions come up, ask them, and update the requirements as needed.
Execute Test Cases and Iterate
At this point you should be confident that you have gotten the requirements to a strong state. Due to the complexity of data, it is often the case that more questions will come up during test case execution. If this happens, go back to the previous step, ask the question, and update the requirement and test cases as needed.
Each step is just as important as the next. It is very important to always keep the big picture in mind, even when you are highly focused on a smaller part of the project. The least obvious bugs and often the most impactful, are caused when the big picture is forgotten. If you follow each of the listed steps you’ll end up with clear requirements that are actionable and testable.
BI application have very expensive development and testing cycles, making strong requirements even more important than most other projects. Unlike web-based applications, which can be tested right after deployment, BI application require a data processing step that can take several hours. The data processing step greatly increases the cost of each testing cycle and can continue to increase as data grows. The next step after achieving strong requirements, is to reduce the cost of each development and testing cycle, which can be achieved by working with less data.
Work With Less Data
BI solutions often deal with large amounts of data that can take several hours to process. The development process is greatly slowed when each bug and fix iteration requires several hours of data processing to begin testing. This results in slow reporting of bugs back to the developers, slow discovery of unclear requirements, and overall slow delivery of the project. Even though using the full set of production data is effective when verifying that the logic was applied correctly, it is not the most efficient. Finding the right balance between effectiveness and efficiency is very important for all technical projects, and BI is no exception. What this means in BI is using less data during the development and testing phases. Here are a few strategies for getting to a reduced data set:
Trimming Data
Start with a restore of the required source databases to a test environment and begin deleting excess data. Larger tables within a database should be looked at first when identifying where to trim data. Likely candidates for trimming data are tables that contain some sort of transactional data. Your focus, when trimming data, is to reduce data redundancy. For example, if your test case needs to verify the ETL correctly averages transactions across several days for any given store, it makes sense to only keep data related to a few stores. The advantage of this method is that you have all the data you need already loaded into the test environment. To reduce cycle times, you only need to work backwards to reduce the data.
Cherry Picking Data
This strategy is similar to the trimming strategy, but works by first identifying the data you need. You then use a process to extract the data into a testing environment. You can achieve this by building a simple ETL process that can be configured based on a data point that makes sense. Using the store transactions example, the ETL would accept a key that identifies a store (storeId), then query all transaction records related to that store and load it into your test environment. The advantage of this method is the ability to quickly grab data related to other stores for additional testing. The main disadvantage of this strategy is you’ll likely have to bring additional data that is not directly related to your test case if you are dealing a relational database.
Generating Data
Creating data from scratch is another method of reducing the amount of data you are using. You can do this by using one of the many tools that support the generation of data, use the system that is populating the database you are sourcing from, or just by simply creating insert statements as needed. This method gives you full control over the data and allows you to specifically targets your test cases. The most effective way of generating the data is by using the system that actually populates the database you are sourcing from. This isn’t always an option, and can often times take too long to do. Using a data generation tool, or creating your own scripts, can be effective and efficient, but also requires a very strong knowledge of the data. Without the familiarity of the data, it is difficult to generate any meaningful data.
Each of these strategies requires a strong understanding of the data, and of the requirements of the BI solution. Taking the time to hone requirements, and exploring the data, will give you the ability to create an effective data for testing. Creating a smaller data set will greatly decrease the cycle times in development and test. Once you begin working with these modified data sets, it will also help to have separate environments to deploy these data sets to.
Separate Test Environment
Another issue that causes BI teams to slow in progress is not having separate environments to work in. The ability to frequently deploy, run ETL jobs, and test the results, will speed up development and test cycles. Team members will end up being blocks to each other, and test cases results will end up invalid because of shared environments.
It would be a good idea to have a Staging environment that is as close to production as resources allow. This environment is used to run regression tests against data sets that are as close to production as possible. Team members should also use the Staging environment for performance testing and demoing to business. Each team member should be given smaller, independent environments. The development and testing environments do not need to be resource heavy, since smaller data sets will be used. Independent environments gives each team member full control of data and the flexibility to test specific features. This also reduces the chance inadvertently invalidating another team member’s efforts, and speeds up cycle times.
Conclusion
In the business intelligence world, data is always changing and growing and can be very difficult to handle. It can be hard for product owners to fully flush out all the requirements. It is the job of the entire team to support them in this effort. The requirement honing process ensures that the team goes through the exercise of flushing out details. Also, when the data begins to grow, development and testing cycles tends to grow with it. Making use of smaller, more efficient data sets, and separate testing environments we can shorten the development and testing cycles. Successful BI projects have clear requirement definitions, and short feedback loops. Following these three tips will get you to both.