Write Your Own Functions in R – II

Spread the love


write your own functions in R

Welcome to the second article in my series of “Writing Your Own Functions in R”.

In this article, we will see how we can write a custom function that iterates through different sheets in .xlsx files stored in a directory. We will also see how we can use this function to even process the data stored in each sheet based on our need or even extract certain information from the data as required.

Let’s assume we have set our directory and listed all .xlsx files in a particular folder. For ease of understanding let us further assume the files are kept in our current working directory. To do that we will use the following chunk of code.

[php]
cdir<-(dirname(rstudioapi::getSourceEditorContext()$path)) ##This gives the path of the folder where the current R script is stored.
setwd(cdir)
files<-list.files(cdir,pattern=".xlsx",full.names = F)
[/php]

This is how my global environment looks like after running the chunk above.

look of the R

As we can see, “cdir” is my working directory, and all “.xlsx” files have been stored under the “files” list.

What we want to do next is to write the custom function which will iterate through each sheet in each of the files listed under “files”. The idea here is, the function should read the files one after the other from the list, and while it reads a particular file, it prints the name of the file, tells us how many sheets are stored in that file and the names of the sheets. Think about it! It should be a loop wrapped within a loop. The wrapper should read the files, while the loop within iterates through the sheets in a particular file.

Read Now: Write Your Own Functions In R 1

Now, since I may use functions from certain libraries within the loop, I will load the required libraries within the function. If they are not locally installed, the library will be installed first and then loaded. Additionally, in the end, it creates a data frame that can be found in the Global Environment which stores all the File Names and their corresponding sheet names. This enables us to open the gateway towards retrieving and storing more such information from the data. Let us now look at the function itself.

[php]
datapro_excel<-function(files){

###########################Load Libraries###############################
libraries<<-c("tidyverse","filesstrings","tools","readxl","data.table")
a<-suppressWarnings(sapply(libraries,require,character.only=T))
unlib<-c()
if(FALSE %in% suppressWarnings(sapply(libraries,require,character.only=T))){
unlib<-c(unlib,names(a[grep(FALSE,a)]))
install.packages(unlib,dependencies = T)
print(sapply(libraries,require,character.only=T))
}else{
print(sapply(libraries,require,character.only=T))
}
###########################Start processing###############################
D<-0
C<-0
File_Name<-c()
Sheet_Name<-c()
for(i in files){
C<-C+1
D<-0
names<-excel_sheets(files[C])
message(paste("File Name:",files[C]))
print(paste("No of Sheets:",length(names)))
File_Name<-c(File_Name,rep(files[C],length(names)))
for(j in names){
D<-D+1
# print(paste("File Name:",files[C]))
print(paste("Sheet Name:",names[D]))
Sheet_Name<-c(Sheet_Name,names[D])
if(C==length(files) & D==length(names)){
message("Processing Successful!")
print(paste("Files Processed:",C))
print(paste("Last File Sheet Count:",D))
File_Info<<-data.frame(File_Name,Sheet_Name, stringsAsFactors = F)
message("File Info generated!")
}
}
}
}
[/php]

Now let’s see the function in action and check the output below,

R function in action

Let us also take a peek at the data frame that is generated

data frame generated

Now, one can always modify this function to process data in each sheet, just remember to add the processing chunk within the inner loop. That is all for today, and I hope this helps!


Spread the love

Leave a Reply

Your email address will not be published. Required fields are marked *

Paste your AdWords Remarketing code here