8 Steps To Learn Data Science
There have been a lot of surveys over the past few years on the educational background of data scientists. As a result, there have also been many different results. In the O’Reilly Data Science Salary Survey of 2014, about 28% of the respondents had a Bachelor’s degree, while 44% had a Master’s degree and 20% had a Ph.D. Common fields that data scientists have as backgrounds are mathematics/Statistics, Computer Sciences, and Engineering.
In general, you could conclude that the degree that you need to have completed to become a data scientist is usually a Master’s degree or Ph.D. The field that you come from is of less importance, but you have an advantage if you have a quantitative background.
Step 1. Get Good at Stats, Maths and Machine Learning:
The perspective on the definition of data science might have changed over the years, but data science has remained a somewhat technical occupation. A sound knowledge of statistics, mathematics, and machine learning are still considered a main requirement for anyone to do data science.
Getting up to speed with these three can be a pain, especially for those who have no technical background whatsoever. Luckily, you have more than enough qualitative resources to help you out on this: Khan Academy offers online courses on a variety of mathematics topics that will undoubtedly be of great value to you, but make sure to also take a look at the Linear Algebra course from MIT Open Courseware. For statistics, DataCamp, Udacity and OpenIntro’s material might help you, and for Machine Learning, you should keep an eye out for the content on DataCamp, Stanford Online and Coursera.
Step 2. Learn to Code:
Developing your hacking skills is also one of the things that you need to take into account still if you want to learn data science.
You can start by getting familiar with the computer science fundamentals: get to know the basic data structures and search algorithms. Then, step up to understanding how end-to-end development works: the stuff you will work on will be integrated with other systems, so it’s best to understand how development from beginning to end, from the requirements gathering and analysis to testing and maintaining code. When you have grasped this concept, it’s time to pick a language. You can go for an open source language or a commercial one. Things to take account in your decision are the learning curve, the industry you want to work in, the salary that comes with being proficient in the language.
Step 3. Understand Databases:
When you start out learning data science, you see that a lot of tutorials focus on you retrieving data from flat files. However, when you start working or when you get in touch with the industry itself, you see that most of the work happens through a connection with one or multiple databases.
And there are a lot of databases out there. Companies might work with commercial ones like Oracle or they might opt for open-source alternatives. The key to seeing the forest for the trees here is to understand how databases work. Learn about the why and how of databases and the what will come. Concepts that you should grasp and know your way around in are the Relational Database Management Systems (RDBMS) and data warehousing. That means that relational versus dimensional modeling should not hold any secrets for you, nor should SQL or the Extract-Transform-Load process (ETL) surprise you.
Step 4. Explore The Data Science Workflow:
A next phase in the learning process would be to explore the data science workflow. A lot of tutorials or courses focus on only one or two aspects of it, but lose the general overview of the process that you will need to go through once you’re working as a data scientist or in a data science team. It’s essential not to lose sight of the iterative process that data science is.
For data science beginners that know how to program, the easiest way to discover how the data science workflow works is by practicing your coding skills: get started on your journey with R or Python. There are several in-built packages and libraries in both R and Python that will make your coding life easier.
Step 5. Gain Understanding of Big Data:
Big data might have been a hype, but it’s definitely out there, and it’s important to realize this and understand what it encompasses. Three things to learn about big data are:
-
See why big data requires a different approach of data processing. The best approach to do this is probably by looking at big data use cases. You can read up on some here.
-
Get familiar with the Hadoop framework: it’s widely used for distributed data storage and processing.
-
Don’t forget about Spark. Getting the hang out of Spark in combination Scala is the way to go. And, even better, you kill two birds with one stone: you practice your coding skills and widen your view on data science.
Step 6. Grow, Connect and Learn:
Grow: Once you have gotten to this point where you already master the fundamentals, it’s time to grow: practice as much as you can by doing data science challenges, like the ones you find on Kaggleor DrivenData. They will definitely challenge you to put the theory into practice. Also, you should also let your intuition grow.
Connect: As a data science learner, you might fall into the pitfall of staying occupied with your learning and that of other learners, but it is equally important to connect to those who already have some more experience in the field. This way, you build up a network to fall back on in case you have questions, need advice or tips, or whatever. These people will motivate you to keep up the good learning and will challenge you to go even further.
Learn: Continuous learning and data science could be synonyms. The Kaggle and DrivenDatachallenges that have been mentioned above will teach you a thing or two about how data science is done in practice. Apart from these relatively small exercises, you might consider starting up a pet project and explore some things even on a deeper level.
Step 7. Immerse Yourself Completely:
Just like a language bath, you’re in need of a data science bath. Depending on your skills and knowledge that you already have, you might consider a bootcamp, an internship or a job. A bootcamp is an amazing way of kickstarting or boosting your data science learning. As a plus, you meet a lot of people, and you have an opportunity to build or extend your network. Are you having trouble finding one? Check out Galvanize and Metis, but also don’t forget that your Meetup Groups might also organize bootcamps and workshops for the community!
Secondly, when you have already got the basics of data science under control, you should consider getting an internship. A lot of the big companies like Facebook, Quora and Amazon have looked for interns before, so this is a great place to start your search. Also, you can use your social channels or your network to get first-hand information on open positions for internships. Lastly, also take a look at startups: these smaller companies can be willing to let you learn on the job as long as you learn quickly. AngelList is worth checking out for startup jobs.
Step 8. Engage with The Community:
This last step is one that can be overlooked sometimes. Even when you have a job in data science or as a data scientist, you still need to remember that data science equals continuous learning. There are new advancements all the time, and it’s of key importance to stay informed and curious about what’s happening around you. So don’t hold back to contribute to discussions on social media, subscribe to a newsletter, follow the key people of the data science industry, listen to a podcast. Whatever you can do to engage with the community!
To stay up to date with the latest news, you can register to the following newsletters: the bimonthly KD Nuggets newsletter and Data Elixir or the Data Science Weekly newsletters. Next, follow some of the key people in the data science industry on Twitter. This will also keep you up to speed with the latest. Just some of the people that might interest you are DJ Patil, Andrew Ng, and Ben Lorica.
Join some communities online. LinkedIn, Facebook, Reddit. They all offer the possibility to connect with peers. You should take on the opportunity to become a member of one of those groups:
-
On LinkedIn, make sure to take a look at the “Big Data, Analytics, Business Intelligence”, “Big Data Analytics”, “Data Scientists” or “Data Mining, Statistics, Big Data, Data Visualization, and Data Science” groups.
-
At Facebook, the “Beginning Data Science, Analytics, Machine Learning, Data Mining, R, Python”, “Learn Python” groups might interest you.
-
Subreddits that you can keep an eye on are “/r/datascience”, “/r/rstats” and “/r/python”, among many others!