Data inquiry complexity tied to increasing volume, velocity, variation

In the book [e-Data](http://www.amazon.com/e-Data-Turning-Data-Information-Warehousing/dp/0201657805), [Jill Dyche](http://hbr.org/search/Jill%20Dyche)Here is the original model that Jill presented.
![Dychepyramid Thumb 580x 369 2762](http://66.147.244.76/~shawnmeh/blog/wp-content/uploads/2012/11/dychepyramid_thumb_580x_369_2762.png)
This pyramid is capped by Knowledge Discovery, the detection of patterns in data. She wrote, “These patterns are too specific and seemingly arbitrary to specify, and the analyst would be playing a perpetual guessing-game trying to figure out all the possible patterns in the database. Instead, special knowledge discovery software tools find the patterns and tell the analyst what–and where–they are.”

There is wisdom that Jill captured way back in 2000 when the book was published where she wisely partitioned the layers of data presentation and interaction by hypothesis formulation. That was valuable. What strikes me about the model which is not correct is the need to invert the entire model according to the realities of big data in today’s data-centric organization.

![Data Query Stack](http://66.147.244.76/~shawnmeh/blog/wp-content/uploads/2012/11/data_query_stack.png)
Increasingly, with the introduction of data warehouses, especially those built using a [2.0](http://www.amazon.com/2-0-Architecture-Generation-Warehousing-Management/dp/0123743192) model, we will see increasing data volumes of different types, from different sources, with an emphasis on semi and unstructured data. Jill saw the world as continuously refined and restricted in terms of the data space that was being used for discovery. In reality, we will see more data being pulled into the data space over and above that where the typical business user is making standard reports. More skilled data concerned users will need to join in more types of data. The volumes are going to increase, not decrease.

Things that Jill wasn’t concerned with at the time of her model in 2000 which we know now to be true and also affect this view of segmentation is velocity and variation. The velocity with which new data is coming in will increase as a direct correlation of increasing volumes, and with technologies like [Hadoop](http://hadoop.apache.org) we will see it easier and easier to accommodate increasing numbers of sources in the warehouse.

All this said, it merely puts emphasis on some of the points made by Jill in a recent [article](http://blogs.hbr.org/cs/2012/11/eureka_doesnt_just_happen.html) where she talks about the need to allocate time and resource to allow for the more complex data discoveries.

This kind of “eureka” doesn’t just happen. Business leaders have to foster a culture of discovery, allotting resources for big data proofs-of-concept and surrendering expectations for their outcomes. It also means training the new batch of data scientists to leverage the technologies that enable such discovery, and then translating the findings into business actions whose outcomes are then measured. Running discovery trials on big data should be a continuous process, where the results may feed more traditional business intelligence or drive additional discovery tests.

twitter
twitter

Leave a Reply

Your email address will not be published. Required fields are marked *