6 Useful R Packages for Data Science & Machine Learning



R can be used for various functionalities in the Data world. Right from Loading Data to modeling data to manipulating it and creating sophisticated visualizations, R can handle it all. There exist over 10,000 packages at our disposal that we can download and install and use according to our utility & preference.


The packages I have covered in this post fall broadly under four categories


  1. Exploratory Data Analysis & Visualization
  2. Machine Learning
  3. Integration of R and Python
  4. Sentiment Analysis


Exploratory Data Analysis & Visualization




A crucial phase of data analysis/predictive modeling consists of Exploratory Data Analysis (EDA) where we get the first look at the data and thus generate relevant hypotheses and probably decide on the next steps as well. This R package automates the data exploration process so that users could focus on understanding data and extracting insights. The package scans and analyzes each variable, and visualizes them with typical graphical techniques. Common data processing methods are also available to treat and format data.

The package can be installed directly from CRAN.



(i) Use it to generate Report

(ii) For Visualizations

(iii) To make quick updates to data:

(iv) It can be used to set values for missing observations as well.




Fancy creating charts in ‘ggplot2’ with just drag and drop? Esquisse package in R is your answer. This add in allows you to interactively explore your data by visualizing it with the ggplot2 package. It can used to draw bar graphs, curves, scatter plots, histograms, then export the graph to ‘PNG’ or ‘PowerPoint’ or even retrieve the code generating the graph. Perfect for a quick and easy visualization of the data accordingly to their type.

Install from CRAN with :



Source : rdocumentation


Why :

(i) Quick exploration of the data to extract insights by creating simple plots, but for the other custom scales and other functionalities use ggplot2.





What :

This is probably the interface to a large number of classification and regression techniques, including machine-readable parameter descriptions. There exists an experimental extension for survival analysis, clustering and general, also likelihood ratio statistics based on multinomial log-linear model.

Try out doing generic resampling, including cross-validation, bootstrapping and subsampling. There exist all modern optimization techniques, for single- and multi-objective problems and even methods for feature selection.

Why :

(i)  It can handle basically handle all your Machine Learning requirements.

(ii) Creating multiple linear regression for familial or population data, converting predictions to a format package ROCR can handle, running machine learning benchmarks as distributed experiments, generating dummy variables for factor features, creating (spatial) resampling plot objects, getting underlying R model of learner integrated into mlr are just a few of the functions this package has built in.




What :

It is what they call : “Interface to Python” modules, classes, and functions. The best part about this package is easily that when calling into ‘Python’, R data types are automatically converted to their equivalent ‘Python’ types. When values are returned from ‘Python’ to R they are converted back to R types.

Install the reticulate package from CRAN as follows:


Why :

(i) It simplifies using R and Python in the same R Notebook.

(ii) It can be used as unique identifer for Python object

(iii) Install Python packages

(iv) Convert between Python and R objects

(v) Create NumPy arrays and convert the data type and in-memory ordering of existing NumPy arrays


Sentiment Analysis


What :

Is a sentiment analysis on your to do list and you also like coding in R? Collecting Twitter Data was made simpler by this package.

Disclaimer :{rtweet}} should be used in strict accordance with Twitter’s developer terms.

To get the current released version from CRAN:



For loading the rtweet package :


Why :

(i) Search tweet

(ii) Quickly visualize frequency of tweets over time using ts_plot()

(iii) Search tweets by geo-location

(iv) Randomly sample (approximately 1%) from the live stream of all tweets

(v) Stream all geo enabled tweets from a particular location or the get the most recent trends there

(vi) Retrieve a list of all the accounts a user follows or the accounts following a user basically everything you require.

Twitter rate limits cap the number of search results returned to 18,000 every 15 minutes. To request more than that, simply set retryonratelimit = TRUE and rtweet will wait for rate limit resets for you.



What :

No need to browse and look for the updated version of a new software. This package offers a set of R functions for the installation and updating of software in R (on Windows) using updateR(). The updateR() command performs the following: finding the latest R version, downloading it, running the installer, deleting the installation file, copy and updating old packages to the new R installation.


To install the stable version on CRAN:


Why :

(i) Shorter installation process (for Windows user )

(ii) This package has two main goals:

To make updating R (on windows) as easy as running a function.

To make it as easy as possible to install all of the needed software for R development (such as git, RTools, etc), as well as for reproducible research using R (such as MikTeX, pandoc, etc).


Another two packages I recently came across and had to cover in this post are:



What :

Prophet is an open source software created by Facebook, for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data.

Why :

Prophet is robust to missing data and shifts in the trend, and typically handles outliers well. It is available for download on CRAN and PyPI.


RevoScaleR package

What :

The RevoScaleR is a Microsoft package which provides new implementations of some of R’s statistical functions (rxGlm is the equivalent of R’s glm).

Why :

It is designed to work with data sizes much larger than available memory. It also uses parallel computing to speed things up when running on a multi-core server, in a Hadoop or Spark cluster, or in a SQL Server database.



Companies around the globe are using R for various applications:

  1. AirBnb uses googlesheets and mailR packages in R to automate reporting
  2. Twitter uses R to monitor user experience
  3. R is being used by The New York Times to create infographics
  4. Amazon web services containerized applications with R for fraud detection
  5. The BBC Visual and Data Journalism team creates graphics in R


You can also check out the newest packages here.


As a Data professional you need to continuously up your game, be in trend with the cutting edge technologies, latest industry trends because of how quickly the field is evolving. The courses at Ivy Pro School are designed keeping that in mind.The modules are short duration to hold the interest but still are long enough to give good exposure and ability to explore further on not so easy applications like R, Python, SQL VBA etc


Learn from industry experts with extensive knowledge and experience in this field. Check out the course most suitable to you with hands on case studies and live industry projects.







Shromona Kahali – Content Strategist – IvyPro School


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.