How to learn popular analytics tools on your own – Apache Hadoop

Often it makes sense to learn the basics of the technologies in use before learning them formally. There are plenty of resources on the internet that give you an insight into how you could download for free and work on analytics tools to comprehend the ecosystem.

We begin with open source software projects.

Apache Hadoop

To learn Hadoop, you need to be familiar with the Linux based Operating System. For this, programming knowledge in Java or Python is a definite advantage. However, central to Hadoop is the understanding of the Hadoop architecture which is actually a combination of two technologies – the Hadoop Distributed File System (HDFS) used for storage, and the MapReduce programming model, used for processing.

The recommended installation hardware is a dual-core processor with 2 GBs of RAM and 2 GBs of available hard drive space. Your system will need a recent Linux distribution with Java installed.  A bash shell environment too.

The next step is to download Apache Hadoop from the appropriate mirror site. Download the 2.2.0 version. Read the documentation to get started. You can begin with the non-distributed mode, setting up a single-node Hadoop installation. Everything runs inside a single Java Virtual Machine (JVM). Then you can move up to the Cluster Setup to set up a multi-node Hadoop installation that allows you to use MapReduce.

I would also recommend joining the mailing lists as these give you excellent insights into the software framework. Learning your way around Hadoop can be very lucrative for you, whether you are a student of analytics, a developer or a professional working with Big Data. You learn a technology that allows you to perform tasks with terabytes or petabytes of data while gaining insights into daily operations, driving new ideas or answering questions. What about personalised ad targeting? How to generate a more efficient ROI on the marketing budget?

However, the Hadoop structure has its own limitations of Big Data analytics. So you will come across out-of-the-box smart analytics products from companies that leverage Hadoop as the underpinning.

A good way to get an insight into how you can analyse and visualise Hadoop in a Big Data platform,  is to download the Hunk for a free 60 day trial license. It enables you to detect patterns  across voluminous raw data in Hadoop without specialized skill sets.  Another recommendation is a trial of the offering from IBM, the InfoSphere BigInsights Basic Edition with built-in analytics. It is free-to-download and a great resource for analysing massive volumes of diverse data  based on the Apache Hadoop platform. This tutorial from Yahoo is also a good source for the developer community.

So get started on the Hadoop and drive your Big Data decisions equipped with Hadoop fundamentals!