Rounak Jain Mar 11, 2020 No Comments
“To err is human” is an age-old saying. But does the same apply to Data Scientists of the 21st century? It is a well-known fact that Data Scientists around the globe are in high demand since the last few years. They influence decision making that directly impacts the business. Thus the room for error is very narrow to nil in such scenarios. Ivy Professional School has trained 19000+ students over the last 13 years. We have closely observed Data Science beginners and identified some common mistakes they commit in their journey towards being a Data Scientist. In this article, we are going to elaborate on the top 6 mistakes every Data Science beginner makes and how can one avoid them.
Being a Data Science expert is no easy task. A Data Scientist is supposed to have all related technical knowledge comprising of Mathematics, Statistics, Programming Languages, Machine Learning algorithms, etc. You can visit our article A Beginner’s guide to Data Science to know more about these. They are also required to show a good understanding of the domain to which their client belongs to. Aspirants who are Data Science beginners are often completely occupied with learning and upskilling themselves with technical knowledge. They often miss the fact that generating a clear understanding of domain knowledge is also a necessary requirement to solve a business problem in the best possible manner. The tendency to show technical prowess while working on a project or problem while not having clarity about the business might not be a good idea.
Let us consider a scenario. A banking company is looking for a Data Scientist who can work on Fraud detection, Credit card defaulters or loan defaulters, etc. Naturally, they will be preferring someone who understands the criticality of the banking business along with sound technical knowledge.
Generating a multi-dimensional thought process is highly desired if this mistake is to be avoided. Both technical and domain knowledge is equally vital. The aspirant must step into the shoes of the client and look at the problem as his/her own and then critically think about it. This way the problem is solved in a manner in which the business wants it to be solved and not how an aspirant would want the solution to be. The more problems we solve with this mindset, the better we will get in providing solutions to the business.
Once beginners learn the various technical concepts and algorithms, the next step is to solve real-time problems and do a self-check on the lessons learned. It is tempting to try all the relevant algorithms on available data to generate predictive or descriptive analytics. Here, a Golden Rule that might be overlooked is “Garbage in, garbage out”. When cleaning data is avoided or skipped, the results generated are badly impacted.
For example, if we are performing Sentiment Analysis on Twitter data, we need to clean the tweets collected. In other words, we need to clean let say the web links present in many tweets. If not, it drastically hampers the Sentiment Analysis results.
It must be brought to practice that the available or collected data is first of all thoroughly analyzed and subjected to cleaning. Once we are satisfied with the tidy data should we proceed with the other steps towards the solution.
This is a very important phase in the life cycle of a Data Science project. After cleaning the data, we must perform data visualization. It helps in generating important insights and immediately helps get a better understanding of data. Performing Data Visualization is in itself a tedious and time-consuming task but is interesting if one knows appropriate tools or programming language for this. One can visit our Tableau vs Power BI blog to get an idea about the two data visualization tools.
For example, if we are working on predicting the outbreak of fatalities from Corona Virus, it would be beneficial to first visualize the spread of corona affected people rather than jumping to models.
One must generate a solid grasp on Exploratory Data Analysis be it using any tool or through Python, R, etc. Once the beauty of data visualization is experienced first hand, aspirants will perform this step out of curiosity and interest thereby avoiding such mistake.
Another important aspect data science beginners do miss is to brainstorm the project and then ask as many relevant questions about the same as possible. Once the above-mentioned points are being followed, the aspirant will start getting many doubts related to the data, problem, tools to use, etc. A Data Scientist must ensure he or she continues the critical thinking and get their doubts clarified.
For example, when we work on a loan default problem related to Banking, it will be useful to first generate a good understanding of the problem and the data we have.
One must ensure to follow this as a first step as soon as one gets a problem to work on. Also, sometimes aspirants do hesitate to ask questions. Changing this habit can help an aspirant drastically.
A tendency to believe complex algorithms will help solve a problem quickly and with better accuracy is another common mistake. We often overlook that simple Machine Learning algorithms can often beat complex algorithms while also being less computation-intensive.
For example, many times it is seen that beginners do have a notion that Deep Learning should be implemented in various predictive analytics cases. This is not true. Deep Learning finds its use in a different set of cases.
While practicing on various problems, aspirants must inculcate a habit of trying to keep things simple. The idea is to apply algorithms like Logistic Regression, Decision Tree, etc. in several cases. This will enable us to recognize the power of simple algorithms over complex ones.
The aspirants oftentimes do not consider Data Science competitions as an important means to learn and grow. This is another notable mistake every Data Science beginner should avoid. Not only do these contests help us practice, but they also provide a way to create a network of aspiring Data Scientists. The discussion forums help clarify several doubts and gain confidence in the Data Science skills.
In order to gain more information about Data Science competitions, you can visit our blog named Everything you need to know about Data Science competitions.
We hope all the beginners do find this post useful. By learning these points one will surely save a lot of time and effort in their journey to become a Data Scientist. If you are thinking of starting your journey, you can contact us at the Ivy Professional School website. Happy Learning!