Replies: 3
Hello!
I am trying to get RHadoop working on a Hadoop cluster.
This is a test cluster for development / proof of concept.
It has 5 nodes, virtualized in VMware.
The OS on all nodes is centos 6.4
To install R on Hadoop, I followed these instructions:
https://github.com/RevolutionAnalytics/RHadoop/wiki
I did not encounter any errors during installation.
However, the following example analysis is constantly failing with a Java heap size error:
groups = rbinom(100, n = 500, prob = 0.5)
tapply(groups, groups, length)
require(‘rmr2′)
groups = rbinom(100, n = 500, prob = 0.5)
groups = to.dfs(groups)
result = mapreduce(
input = groups,
map = function(k,v) keyval(v, 1),
reduce = function(k,vv) keyval(k, length(vv)))
print(result())
print(from.dfs(result, to.data.frame=T))
The code above is from this repo:
https://github.com/hortonworks/HDP-Public-Utilities/tree/master/Installation/r
Please find more information here:
https://raw.githubusercontent.com/manuel-at-coursera/mixedStuff/master/RHadoop_bugReport.md
Any help to get this solved would be very much appreciated!
Best,
Manuel