What is Data Sandbox?

If you are working with Big Data or advanced analytics, this is one term you will invariably come across – the data sandbox. The term takes its cue from the traditional sandbox, designed to prevent sand from getting mixed with other on-site material while the child plays. The ‘data sandbox’ is similarly devised to enable free experimentation of data in an isolated environment.

data sandbox

Initially this functioned as a working directory or a test server to serve as a testing environment for data and queries. Later, it assumed importance in the real-time analytic environment preventing incoming data from affecting a live system. The traditional approach to data sandboxes often resulted in creation of IT ‘shadow systems’ limiting analytics maturity. This led to the creation of the analytic sandbox, considered critical to every Big Data system today.

Techopedia defines the data sandbox as

“a scalable and developmental platform used to explore an organization’s rich information sets through interaction and collaboration.”

Components of a Sandbox

data sandbox 2

In the context of Big Data

The data sandbox is obtained “from stand-alone, analytic datamarts or logical partitions in enterprise data warehouses, providing the computing required for data scientists”. Data sandboxes enable analysts to dig deep into corporate information across various high powered computing technologies.

This article elucidates how the auction site eBay uses data sandbox environment to analyze its more than 50 terabytes of data a day, keep data movement down and reduce the need for copies and storage in other systems.

Another post in Oracle describes the data sandbox as a non-operational environment for business analysts and data scientists to test ideas, manipulate data and model “what if” scenarios without placing computational load on the core operational processes / putting the central EDW at risk. So at any given time an organization may run any number of analytical experiments spread across hundreds of sandboxes.

Deployment options

The Sandbox platform can be deployed in various architectural modes, depending upon requirements of hosting, storage and processing.

  • Analytical processing – BI,  traditional distributed servers, on-premise, cloud
  • In-memory business analytics
  • With Database options – relational,  columnar , structured ,  unstructured, hybrid

Benefits of Data Sandboxes

  • Allow access, filtering and combination of data from multiple sources – external/internal, structured / unstructured
  • enable analysts to conduct situational analytics and increase efficacy of decisions
  • provide isolated area for dedicated storage and processing of resources
  • allow to work with the data first, before defining metrics and making BI reports
  • facilitate integration of external data with data from the EDW
  • make possible one-off exploratory initiatives and experimentation without affecting other users or systems
  • provide an architectural framework and foundation to create self-service data environments for advanced analytics

Bottom line – The Analytic Sandbox or Data Sandbox is a platform largely used by senior business analysts and “power users” when testing and reporting within an isolated framework.

One response to “What is Data Sandbox?”

  1. Srinivasan says:

    Good article..learned a new concept SandBox..

Leave a Reply

Your email address will not be published. Required fields are marked *