Saturday, February 4, 2012

In 10 Words Or Less, What Do You Believe Is The Purpose Of A Data Warehouse?

Through a lot of recent discussions, reading blog entries, and industry articles, I started to think about why there are so many differing opinions about data warehouse implementation best practices. When you look at it from an IT or vendor perspective, there are a lot of technology pieces to the puzzle. There are ETL tools, metadata tools, data model preferences, end user access tools, etc. Along with that, you have technology specialists in each of those areas. And, then to round out the picture, you have what roles known as architects, who are responsible for making sure all the pieces fit together. Too often, I feel the technology side of things takes a higher priority than what I feel the real purpose of the data warehouse is, which is my answer to the question: “to facilitate fast, flexible, accurate, and sustainable access to information”.

In my view, there are so many technical specialists in the data warehousing industry, I think the real purpose of the data warehouse, providing data to users, gets lost in the shuffle. I’m not suggesting that we should be sloppy about our technology choices and their implementation. However, elegant technical solutions in this space, which would be considered a success, often fail in the ability to make data easy to access and analyze for end users. Quite often, we as technologists get so focused on the mechanics of the data warehouse that we lose sight of the big picture.

Granted, any data warehouse implementation has a lot of complicated moving pieces. You have very complex source data, which requires complicated tools and methodologies to consolidate the data. You have different data model implementations that range anywhere from properly structuring diverse data together (i.e. normalized models), all the way through data models that facilitate high speed end user access (i.e. star schemas), and combinations of the two that land somewhere in the middle. Then, you have end user tools that each implement an end user access strategy ranging from static reports, through adhoc queries, all the way through dashboards. While many end user tools attempt to cover the gamut of end user access needs, none of them cover everything. As technologists, I admit it can be a lot of fun solving the difficult and complex technical problems associated with a data warehouse implementation. However, I often feel that gets in the way of what we really should be trying to do, which is facilitate access to information. We should always try to step back from the technical details, and look at the bigger picture.

When I built the original data warehouse solution, which was the genesis for the founding of DecisionPoint, the goal was to facilitate access and analysis to financial information within Sequent. We built an ETL and management infrastructure, had metadata components, and built out star schema data models. We stayed away from the end user tool space because Sequent already had investments in a set of tools that were being used in other areas of the business. I would not call our tools very sophisticated from a technical perspective. However, the users could access financial information in a very fast and flexible manner. We didn’t really build a lot of pre-defined reports or dashboards. We gave them adhoc query access to the data in a way that made sense to them. We also didn’t spend a lot of time protecting the users from the data or massaging the data so that it always made sense to them. We wanted them poking around at the data and exploring on their own. It allowed them to not only access the information they needed, but also give them the power to poke around at the data that to research anything that might look “odd” or not make sense from a business perspective. While the solution was not elegant, it accomplished the goal of providing access to data for the end users without a lot of IT support.

I did have to make one tough choice about technology because it would have been choosing technology for the sake of the technology, not for the benefit of the users. The original solution we developed was based on an Oracle database. At the time, Sequent was forming a strong partnership with a new database software company called Redbrick. It was a database product originally built on the principles of Ralph Kimball’s star schema data model design. I was asked to look into migrating our solution to Redbrick, and got a lot of pressure from Sequent management and marketing to use Redbrick. However, Redbrick was designed around the principle that users would ask specific questions and would get quick answers. The problem with that was that if the questions weren’t already known, Redbrick wasn’t a good answer as it had limited ability to scan large quantities of data quickly. Oracle was much better at that. So, even though Redbrick would have been the cool choice to make from a technology standpoint, it would have made the user environment much more difficult to use. I did have to put up with a lot of pressure from the highest levels of management in Sequent, but I resisted, and have no regrets about doing that. Anything that did not directly benefit end users was not something I was willing to add to the mix.

When DecisionPoint was officially started, one of the more controversial decisions that we made was to build our own end user query tool. We had worked with companies that provide tools in that space, and even looked into pre-integrating with some of them. However, no matter which tool we chose to go with, there was always a customer that wasn’t using that tool. And, there was no way we could possibly integrate with all of the tools out there. Additionally, we felt we were in a better position to provide a tool because we knew what our data structures were and we knew how people were accessing the data. The one thing we did do to make this decision a bit easier was to not charge a per seat fee for our tool. Customers could let as many users as they wanted to use the tool. We didn’t want to limit access to the information based on the cost of the tool. Our feeling was the more users that were able to look at the data, the more business value they could get from the data. What was interesting was that even customers that already had invested in an end user access tool would still implement our tool, and in a lot of cases, our tool became the most used tool, counting by the number of users using the tool, at many customer sites. The reason we had such high user counts using our tool was that the tool not only provided access to the data we were providing, but also did it in a way that helped the users navigate the data in a way that was easy and familiar to them. The tool is still part of the Decision Experts solution that Teradata sells, and it’s still, in my opinion, the only tool that truly understands both the data and how it is accessed. We implemented a unique and cool feature called “Drill Across” that is heavily used by the user community, but still not part of other tools on the market.

I guess this experience brings me back to my core philosophy. You can have the best technology components on the market to implement your data warehouse, and from a technology standpoint, the data warehouse would be considered a success. However, in my view, if the users can’t easily access the data in a performant way and be able to navigate that data, the data warehouse is a failure. Technology is cool, but it’s only as good as the business value it helps the users derive. In the end, the business value of any data warehouse implementation is directly related to the ability of users to access data, and their ability to use that data to guide the decisions they make that influence changes in the business. Once again, without any of that, nothing else matters.

So, when you ask yourself the same question I asked at the beginning of this blog entry, try putting it in the context of whether you think your answer makes it easier for business users to ask questions and get answers of the data warehouse. I would argue that if your answer doesn’t include the user component, your data warehouse implementation would not be viewed as a success by the business.

Visitors

HTML hit counter - Quick-counter.net