Wednesday, February 19, 2014

Installing Hadoop on Single Pi [Starters]

Hadoop is a framework written in Java, for handling large datasets. It has a very robust architecture and is being used by many organisations in general. Since I have been experimenting a bit with my Raspberry Pi, this post is dedicated to all those people who really want to learn more Hadoop in a hands-on fashion.
  I currently have ordered 4 Pi's from Farnell electronics but till the time they arrive I though it would be a good idea, if I'll be able to port the Hadoop framework on a single machine. As we know Pi has got limited capability when it comes to it's memory which is 512 MB. But make no mistake about it, it is still fairly capable of handling some of the tedious and complex applications.

Getting Started
 Before you want to install Hadoop on your system you need to make sure that your Raspberry Pi should have openjdk [latest version] installed.
  You can install openjdk on Raspberry Pi by using the following command

$ : sudo apt-get install openjdk-7-jdk

After it is installed, you can check the version using the following command

$: java -version

You can treat this as pre-requisite for installing Hadoop over Raspberry Pi.

Step II 

  Now since we have open jdk installed on our Pi, we'll create a dedicated Hadoop system user. You can create that by shooting the following commands

$: sudo addgroup hadoopuser
$:  sudo add user --ingroup hadoopuser haduser
$: sudo adduser haduser sudo

Just to give you a brief gist of the following commands, we have created a dedicated group called "hadoopuser" now in the second command we have created a user called "haduser" and in the third command we are giving him the "sudo" or in simple terms administrative privileges.  During user configuration it will ask you to enter the password, please be particular about that.

Step III
   Now we'll log into the "haduser" account we have created. We can do this using the following command:

$: su - haduser

This command will prompt you for your password, enter and GO!

Now you need to generate the RSA key pair.

$: ssh-keygen -t rsa -P ""

We are creating a passkey because everytime Hadoop access it's nodes, it asks for a passphrase and we are avoiding that reiteration using the following.

Now in your terminal type the following command
$: cat ~/.ssh/ >> ~/.ssh/authorized_keys

Step IV 

 Now comes the most fun part, download the Hadoop from any of the mirros. In my case I used


This will download the Hadoop  .tar.gz file on your Pi. Now you can use the "tar" command to decompress it.

$ sudo tar xzvf "FILE NAME".tar.gz

Now move the content of the file to /usr/local

Change the folder name to Hadoop.

And use the following command for explicit permission

$: sudo chown -R haduser:hadoopuser hadoop

Step V 

 In this part most of the people do the mistake, and this is the step where I got stuck with my installation. Go to your home directory and install the following first

$: sudo apt-get install lzop

Once you are done with the installation, using vi editor open the file ~/.bashrc

and add the following lines to it :
 #Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop

# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-armh

# Some convenient aliases and functions for running Hadoop-related commands
unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"

# If you have LZO compression enabled in your Hadoop cluster and
# compress job outputs with LZOP (not covered in this tutorial):
# Conveniently inspect an LZOP compressed file from the command
# line; run via:
# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo
lzohead () {
    hadoop fs -cat $1 | lzop -dc | head -1000 | less

# Add Hadoop bin/ directory to PATH
 There are a few things you need to be particular about
1. First check your JAVA_HOME path very carefully!
2. Make sure you have installed lzop 


Now once you are done with this step. Go to the following director

And again the set the Java Path there too. Now restart you Pi board using the following command

$: sudo shutdown -r 0

This will restart the Pi. Again log in using the same "su" command

$: su -  haduser

Once you are logged in, in the terminal type the following

$: hadoop version

You'll see the information about the Hadoop version running on you Pi. If not! then I am afraid you have to check your steps again!!

Furthermore, you can check some tutorial over the internet regarding configuring site.xml files and all. That's it for starters, comments and feedback are welcome.


No comments:

Post a Comment