From my perspective the most important event that happened at useR! 2014 was that I got to meet the 0xdata team and now, long story short, here I am introducing the latest version of H2O, labeled Lagrange (2.6.0.11), to the R and greater data science communities. Before joining 0xdata, I was working at a competitor on a rival project and was repeatedly asked why my generalized linear model analytic didn’t run as fast as H2O’s GLM. The answer then as it is now is the same — because H2O has a cutting edge distributed in-memory parallel computing architecture — but I no longer receive an electric shock every time I say so.
For those hearing about H2O for the first time, it is an open-source distributed in-memory data analysis tool designed for extremely large data sets and the H2O Lagrange (2.6.0.11) release provides scalable solutions for the following analysis techniques:
- Generalized Linear Model
- K-Means
- Random Forest
- Principal Components Analysis
- Summary
- Gradient Boosted Regression and Classification
- Naive Bayes
- Deep Learning
In my first blog post at 0xdata, I wanted to keep it simple and make sure R
users know how to get the h2o
package, which is cross-referenced on the
High-Performance and Parallel Computing
and
Machine and Statistical Learning
CRAN Task Views, up and running on their
computers. To so do, open an R console of your choice and type
# Download, install, and initialize the H2O package
install.packages("h2o",
repos = c("http://h2o-release.s3.amazonaws.com/h2o/rel-lagrange/11/R", getOption("repos")))
library(h2o)
localH2O <- h2o.init()
# List and run some demos to see H2O at work
demo(package = "h2o")
demo(h2o.glm)
demo(h2o.deeplearning)
After you are done experimenting with the demos in R, you can open up a web browser to http://localhost:54321/ to give the H2O web interface a once over and then hop over to 0xdata’s YouTube channel for some in-depth talks.
Over the coming weeks we at 0xdata will continue to blog about how to use H2O through R and other interfaces. If there is a particular use case you would like to see addressed, join our h2ostream Google Groups conversation or e-mail us at support@0xdata.com. Until then, happy analyzing.
Related Blogs
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...