Sangeeta Nov 11, 2014 1 Comment
If you are working with Big Data or advanced analytics, this is one term you will invariably come across – the data sandbox. The term takes its cue from the traditional sandbox, designed to prevent sand from getting mixed with other on-site material while the child plays. The ‘data sandbox’ is similarly devised to enable free experimentation of data in an isolated environment.
Initially this functioned as a working directory or a test server to serve as a testing environment for data and queries. Later, it assumed importance in the real-time analytic environment preventing incoming data from affecting a live system. The traditional approach to data sandboxes often resulted in creation of IT ‘shadow systems’ limiting analytics maturity. This led to the creation of the analytic sandbox, considered critical to every Big Data system today.
Techopedia defines the data sandbox as
“a scalable and developmental platform used to explore an organization’s rich information sets through interaction and collaboration.”
Components of a Sandbox
In the context of Big Data
The data sandbox is obtained “from stand-alone, analytic datamarts or logical partitions in enterprise data warehouses, providing the computing required for data scientists”. Data sandboxes enable analysts to dig deep into corporate information across various high powered computing technologies.
This article elucidates how the auction site eBay uses data sandbox environment to analyze its more than 50 terabytes of data a day, keep data movement down and reduce the need for copies and storage in other systems.
Another post in Oracle describes the data sandbox as a non-operational environment for business analysts and data scientists to test ideas, manipulate data and model “what if” scenarios without placing computational load on the core operational processes / putting the central EDW at risk. So at any given time an organization may run any number of analytical experiments spread across hundreds of sandboxes.
The Sandbox platform can be deployed in various architectural modes, depending upon requirements of hosting, storage and processing.
Benefits of Data Sandboxes
Bottom line – The Analytic Sandbox or Data Sandbox is a platform largely used by senior business analysts and “power users” when testing and reporting within an isolated framework.