Powerful Open-Source Analytics/Machine Learning Tools: SparkR, H2O, and WEKA
Shown below is a list of powerful open-source analytics/machine learning tools, which can potentially replace commercial packages such as SAS or IBM Modeler.
1. WEKA: machine learning toolkit from University of Waikato (New Zealand)
WEKA is a popular machine learning tool in Computer Science (machine learning community). It is written in Java, thus it integrates nicely with "Rapid Miner". In addition, Java plug-in in R enables WEKA to interact with R as well. Shown below is a download site.
http://www.cs.waikato.ac.nz/ml/weka/downloading.html
2. H2O: machine learning toolkit from OxData
H2O is a big data machine learning tool, which integrates nicely with Hadoop. It is written in Java, thus customization is possible in Java. It enables scalable in-memory analytics with big data. Moreover, H2O offers "R API", which allows R scripting.
http://0xdata.com/h2o/
3. SparkR: Use Apache Spark natively from R
R is one of the most powerful languages for data scientists, and its capabilities are enhanced even further by SparkR, which enables R programmers to use Apache Spark natively from R. As a big data framework for in-memory data processing at scale, Apache Spark, has been gaining a lot of traction lately. Thus, SparkR will help drive adoption of Apache Spark by many data scientists who are more familiar with R than Java or Scala. Shown below is a link to Apache Spark and SparkR.
- Apache Spark
http://spark.apache.org/
- SparkR from AMPLAB
http://amplab-extras.github.io/SparkR-pkg/
1. WEKA: machine learning toolkit from University of Waikato (New Zealand)
WEKA is a popular machine learning tool in Computer Science (machine learning community). It is written in Java, thus it integrates nicely with "Rapid Miner". In addition, Java plug-in in R enables WEKA to interact with R as well. Shown below is a download site.
http://www.cs.waikato.ac.nz/ml/weka/downloading.html
2. H2O: machine learning toolkit from OxData
H2O is a big data machine learning tool, which integrates nicely with Hadoop. It is written in Java, thus customization is possible in Java. It enables scalable in-memory analytics with big data. Moreover, H2O offers "R API", which allows R scripting.
http://0xdata.com/h2o/
3. SparkR: Use Apache Spark natively from R
R is one of the most powerful languages for data scientists, and its capabilities are enhanced even further by SparkR, which enables R programmers to use Apache Spark natively from R. As a big data framework for in-memory data processing at scale, Apache Spark, has been gaining a lot of traction lately. Thus, SparkR will help drive adoption of Apache Spark by many data scientists who are more familiar with R than Java or Scala. Shown below is a link to Apache Spark and SparkR.
- Apache Spark
http://spark.apache.org/
- SparkR from AMPLAB
http://amplab-extras.github.io/SparkR-pkg/
ReplyDeleteYou have discussed an interesting topic that everybody should know. Very well explained with examples. I have found a similar website
Analytics consulting firms visit the site to know more about Omdata.
You have discussed an interesting topic that everybody should know. Very well explained with examples. I have found a similar website
ReplyDeleteAnalytics consulting firms visit the site to know more about Omdata.