Big data analytics is becoming an important topic for companies and R is a popular research tool. I had trouble finding an updated resource on how to install RHadoop on some of the most recent platforms available today with many of the tutorials out there being several software versions back or years old. So I felt it necessary to provide an updated guide on how to proceed with the installation. Currently, I am using the most up-to-date software possible consisting of: Hortonworks HDP-2.2.6.3-1 Ubuntu 12.04.5 LTS (GNU/Linux 3.13.0-55-generic x86_64) (on the latest stable Ubuntu version supported) R version 3.2.1 (2015-06-18) -- "World-Famous Astronaut" RStudio Version 0.99.463 We will begin by verifying that some prerequisites are installed: $ sudo apt-get install libboost-dev libboost-test-dev libboost-program-options-dev libboost-system-dev libboost-filesystem-dev libevent-dev automake libtool flex bison pkg-config g++ libssl-dev Then we can proceed to install R. You may need to add the repository to your sources list which can be found in /etc/apt/sources.list An example of the line you will need to add is: deb http://http://cran.revolutionanalytics.com/bin/linux/ubuntu precise/ The universal repository does contain R however, so you may want to see what is available there. $ sudo apt-get install r-base r-base-dev Next we will need to download a few files from Revolution Analytics for the actually R hadoop process. In your terminal execute: $ wget https://github.com/RevolutionAnalytics/plyrmr/releases/download/0.6.0/plyrmr_0.6.0.tar.gz $ wget https://github.com/RevolutionAnalytics/rmr2/releases/download/3.3.1/rmr2_3.3.1.tar.gz $ wget https://github.com/RevolutionAnalytics/rhdfs/releases/download/1.0.8/rhdfs_1.0.8.tar.gz $ wget https://github.com/RevolutionAnalytics/rhbase/releases/download/1.2.1/rhbase_1.2.1.tar.gz For some reason or another, I had issues downloading those files easily, so for convenience and my sanity later on for myself, I've attached those files which can be found below:
Once R is installed, we can log into R by typing "R" and install a few packages. Now, I've included rmr, rhdfs, rhbase, and plyrmr in the case that those packages eventually become available, however, at the present moment, they will error out.
$ install.packages(c("rJava", "RJSONIO", "rmr", "rhdfs", "rhbase", "plyrmr"), dependencies=TRUE, repos='http://cran.us.r-project.org') Once it is complete, we can quit R q() and no need to save the desktop. At this point we can begin to install some of the RHadoop packages. It is important that sudo is used here because these packages need to be installed under the system library packages, not the user library. This can be done by: $ sudo R CMD INSTALL plyrmr_0.6.0.tar.gz $ sudo R CMD INSTALL rmr2_3.3.1.tar.gz Java 7 JDK should already be installed, however, if it isn't, install it by running the following command and set all of $ sudo apt-get install openjdk-7-jdk $ sudo R CMD javareconf $ curl https://archive.apache.org/dist/thrift/0.8.0/thrift-0.8.0.tar.gz | tar zx $ cd thrift-0.8.0/ $ ./configure $ make $ sudo make install $ export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/lib/pkgconfig/ Verifiy pkg-config path is correct: $ pkg-config --cflags thrift returns: -I/usr/local/include/thrift $ sudo cp /usr/local/lib/libthrift-0.8.0.so /usr/lib/ I had issues using Thrift 0.9.2 so I back dated to 0.8.0, however, I have heard that 0.9.0 should work as well. Once these items are installed, then we can finish the installation with rhbase. $ R CMD INSTALL rhbase_1.2.1.tar.gz Your comment will be posted after it is approved.
Leave a Reply. |
AuthorJames Benson is an IT professional. Archives
August 2022
Categories
All
|