Powerful Open-Source Analytics/Machine Learning Tools: SparkR, H2O, and WEKA

Shown below is a list of powerful open-source analytics/machine learning tools, which can potentially replace commercial packages such as SAS or IBM Modeler.


1. WEKA: machine learning toolkit from University of Waikato (New Zealand) 

WEKA is a popular machine learning tool in Computer Science (machine learning community). It is written in Java, thus it integrates nicely with "Rapid Miner". In addition, Java plug-in in R enables WEKA to interact with R as well. Shown below is a download site.

http://www.cs.waikato.ac.nz/ml/weka/downloading.html


2. H2O: machine learning toolkit from OxData

H2O is a big data machine learning tool, which integrates nicely with Hadoop. It is written in Java, thus customization is possible in Java. It enables scalable in-memory analytics with big data. Moreover, H2O offers "R API", which allows R scripting.

http://0xdata.com/h2o/


3. SparkR: Use Apache Spark natively from R

R is one of the most powerful languages for data scientists, and its capabilities are enhanced even further by SparkR, which enables R programmers to use Apache Spark natively from R. As a big data framework for in-memory data processing at scale, Apache Spark, has been gaining a lot of traction lately. Thus, SparkR will help drive adoption of Apache Spark by many data scientists who are more familiar with R than Java or Scala. Shown below is a link to Apache Spark and SparkR.

- Apache Spark
http://spark.apache.org/

- SparkR from AMPLAB
http://amplab-extras.github.io/SparkR-pkg/

Comments


  1. You have discussed an interesting topic that everybody should know. Very well explained with examples. I have found a similar website
    Analytics consulting firms
    visit the site to know more about Omdata.

    ReplyDelete
  2. You have discussed an interesting topic that everybody should know. Very well explained with examples. I have found a similar website
    Analytics consulting firms visit the site to know more about Omdata.

    ReplyDelete

Post a Comment

Popular posts from this blog

Cracking Business Case Interviews for Data Scientists: Part 1

How The Influence of Multi-Tiered Private Label Brand Architecture Varies Across Retailers