We are going to be learning ML by practice. So we need to learn a platform that we can use. I look for the following attributes on a ML platform
- Easy to learn : this is paramount.. We need to be able to learn this quickly and improve as we go along.
- Modern
- Easy to setup and use
- Has an easy to use UI (IDE or UI)
- Has a good thriving ecosystem (good community around it. good amount of content on Stackoverflow ..etc)
- Has good proven ML libraries
Here are a few good choices
- Python
- Easy to learn and use
- ML libraries : Numpy, Pandas, Scikit
- Java
- Enterprisy
- ML Libraries : Weka, Mahout
- R
- Very capable
- ML Libraries : many many
- Spark
- Provides a scale
- ML Libraries : Spark ML
You can't go wrong with any of the above.
My suggestion is you learn PYTHON.
Here is why
- In the early days, Python was considered a ‘toy language for ML’ and all serious work was done in R. But Python has come a long way in the last few years as a very solid ML / DL language.
- It has a very thriving ecosystem of libraries in both ML and DL. Lot of popular packages like Tensorflow have python APIs.
- It is a very easy language to pick up. Most programmers (Java / C / PHP) can pick up Python very easily in a couple of days.. And can keep learning as they go along. To me this very important as we don’t want to spend too much time learning the language
- Python is general purpose language. If you learn Python, and you are not practicing ML, you can pretty much write any other system — web service, generic scripting ..etc using Python.
- On the other hand R – as good as it is for ML work – is very specific for analytics. It is not a general purpose language.
- Python has very easy to use UIs. My favorite is Jupyter notebooks. They are web based, light weight and easy to use
- There are lots of FREE and open source resources to learn Python
Once you are comfortable in Python, I'd recommend also learning Spark. Here is why:
- When you have more data that can be handled by a single computer, we need to use cluster computing to crunch all that data
- Spark is a popular distributed computing framework. It allows us to process large amount of data
- Spark has a built in Machine Learning library that you can utilize out of the box
- And the best news is Spark has Python API (in addition to Scala, Java and R)