Skip to content

Latest commit

 

History

History
56 lines (43 loc) · 2.67 KB

ml-platform.md

File metadata and controls

56 lines (43 loc) · 2.67 KB

Selecting a platform for Machine Learning

We are going to be learning ML by practice. So we need to learn a platform that we can use. I look for the following attributes on a ML platform

  • Easy to learn : this is paramount.. We need to be able to learn this quickly and improve as we go along.
  • Modern
  • Easy to setup and use
  • Has an easy to use UI (IDE or UI)
  • Has a good thriving ecosystem (good community around it. good amount of content on Stackoverflow ..etc)
  • Has good proven ML libraries

Here are a few good choices

  • Python
    • Easy to learn and use
    • ML libraries : Numpy, Pandas, Scikit
  • Java
    • Enterprisy
    • ML Libraries : Weka, Mahout
  • R
    • Very capable
    • ML Libraries : many many
  • Spark
    • Provides a scale
    • ML Libraries : Spark ML

First Choice : Python

You can't go wrong with any of the above.

My suggestion is you learn PYTHON.

Here is why

  • In the early days, Python was considered a ‘toy language for ML’ and all serious work was done in R. But Python has come a long way in the last few years as a very solid ML / DL language.
  • It has a very thriving ecosystem of libraries in both ML and DL. Lot of popular packages like Tensorflow have python APIs.
  • It is a very easy language to pick up. Most programmers (Java / C / PHP) can pick up Python very easily in a couple of days.. And can keep learning as they go along. To me this very important as we don’t want to spend too much time learning the language
  • Python is general purpose language. If you learn Python, and you are not practicing ML, you can pretty much write any other system — web service, generic scripting ..etc using Python.
  • On the other hand R – as good as it is for ML work – is very specific for analytics. It is not a general purpose language.
  • Python has very easy to use UIs. My favorite is Jupyter notebooks. They are web based, light weight and easy to use
  • There are lots of FREE and open source resources to learn Python

Second Pick: Spark

Once you are comfortable in Python, I'd recommend also learning Spark. Here is why:

  • When you have more data that can be handled by a single computer, we need to use cluster computing to crunch all that data
  • Spark is a popular distributed computing framework. It allows us to process large amount of data
  • Spark has a built in Machine Learning library that you can utilize out of the box
  • And the best news is Spark has Python API (in addition to Scala, Java and R)