Hadoop is a framework written in Java, for handling large datasets. It has a very robust architecture and is being used by many organisations in general. Since I have been experimenting a bit with my Raspberry Pi, this post is dedicated to all those people who really want to learn more Hadoop in a hands-on fashion.
I currently have ordered 4 Pi's from Farnell electronics but till the time they arrive I though it would be a good idea, if I'll be able to port the Hadoop framework on a single machine. As we know Pi has got limited capability when it comes to it's memory which is 512 MB. But make no mistake about it, it is still fairly capable of handling some of the tedious and complex applications.
Getting Started
Before you want to install Hadoop on your system you need to make sure that your Raspberry Pi should have openjdk [latest version] installed.
You can install openjdk on Raspberry Pi by using the following command
$ : sudo apt-get install openjdk-7-jdk
After it is installed, you can check the version using the following command
$: java -version
You can treat this as pre-requisite for installing Hadoop over Raspberry Pi.
Step II
Now since we have open jdk installed on our Pi, we'll create a dedicated Hadoop system user. You can create that by shooting the following commands
$: sudo addgroup hadoopuser
$: sudo add user --ingroup hadoopuser haduser
$: sudo adduser haduser sudo
Just to give you a brief gist of the following commands, we have created a dedicated group called "hadoopuser" now in the second command we have created a user called "haduser" and in the third command we are giving him the "sudo" or in simple terms administrative privileges. During user configuration it will ask you to enter the password, please be particular about that.
Step III
Now we'll log into the "haduser" account we have created. We can do this using the following command:
$: su - haduser
This command will prompt you for your password, enter and GO!
Now you need to generate the RSA key pair.
$: ssh-keygen -t rsa -P ""
We are creating a passkey because everytime Hadoop access it's nodes, it asks for a passphrase and we are avoiding that reiteration using the following.
Now in your terminal type the following command
$: cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Step IV
Now comes the most fun part, download the Hadoop from any of the mirros. In my case I used
$: wget http://www.apache.org/dyn/closer.cgi/hadoop/core/"YOUR HADOOP VERSION.tar.gz"
This will download the Hadoop .tar.gz file on your Pi. Now you can use the "tar" command to decompress it.
$ sudo tar xzvf "FILE NAME".tar.gz
Now move the content of the file to /usr/local
Change the folder name to Hadoop.
And use the following command for explicit permission
$: sudo chown -R haduser:hadoopuser hadoop
Step V
In this part most of the people do the mistake, and this is the step where I got stuck with my installation. Go to your home directory and install the following first
$: sudo apt-get install lzop
Once you are done with the installation, using vi editor open the file ~/.bashrc
and add the following lines to it :
Now once you are done with this step. Go to the following director
PATH_TO_HADOOP/core/hadoop-env.sh
And again the set the Java Path there too. Now restart you Pi board using the following command
$: sudo shutdown -r 0
This will restart the Pi. Again log in using the same "su" command
$: su - haduser
Once you are logged in, in the terminal type the following
$: hadoop version
You'll see the information about the Hadoop version running on you Pi. If not! then I am afraid you have to check your steps again!!
Furthermore, you can check some tutorial over the internet regarding configuring site.xml files and all. That's it for starters, comments and feedback are welcome.
I currently have ordered 4 Pi's from Farnell electronics but till the time they arrive I though it would be a good idea, if I'll be able to port the Hadoop framework on a single machine. As we know Pi has got limited capability when it comes to it's memory which is 512 MB. But make no mistake about it, it is still fairly capable of handling some of the tedious and complex applications.
Getting Started
Before you want to install Hadoop on your system you need to make sure that your Raspberry Pi should have openjdk [latest version] installed.
You can install openjdk on Raspberry Pi by using the following command
$ : sudo apt-get install openjdk-7-jdk
After it is installed, you can check the version using the following command
$: java -version
You can treat this as pre-requisite for installing Hadoop over Raspberry Pi.
Step II
Now since we have open jdk installed on our Pi, we'll create a dedicated Hadoop system user. You can create that by shooting the following commands
$: sudo addgroup hadoopuser
$: sudo add user --ingroup hadoopuser haduser
$: sudo adduser haduser sudo
Just to give you a brief gist of the following commands, we have created a dedicated group called "hadoopuser" now in the second command we have created a user called "haduser" and in the third command we are giving him the "sudo" or in simple terms administrative privileges. During user configuration it will ask you to enter the password, please be particular about that.
Step III
Now we'll log into the "haduser" account we have created. We can do this using the following command:
$: su - haduser
This command will prompt you for your password, enter and GO!
Now you need to generate the RSA key pair.
$: ssh-keygen -t rsa -P ""
We are creating a passkey because everytime Hadoop access it's nodes, it asks for a passphrase and we are avoiding that reiteration using the following.
Now in your terminal type the following command
$: cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Step IV
Now comes the most fun part, download the Hadoop from any of the mirros. In my case I used
$: wget http://www.apache.org/dyn/closer.cgi/hadoop/core/"YOUR HADOOP VERSION.tar.gz"
This will download the Hadoop .tar.gz file on your Pi. Now you can use the "tar" command to decompress it.
$ sudo tar xzvf "FILE NAME".tar.gz
Now move the content of the file to /usr/local
Change the folder name to Hadoop.
And use the following command for explicit permission
$: sudo chown -R haduser:hadoopuser hadoop
Step V
In this part most of the people do the mistake, and this is the step where I got stuck with my installation. Go to your home directory and install the following first
$: sudo apt-get install lzop
Once you are done with the installation, using vi editor open the file ~/.bashrc
and add the following lines to it :
#Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop
# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-armh
# Some convenient aliases and functions for running Hadoop-related commands
unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"
# If you have LZO compression enabled in your Hadoop cluster and
# compress job outputs with LZOP (not covered in this tutorial):
# Conveniently inspect an LZOP compressed file from the command
# line; run via:
#
# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo
#
#
lzohead () {
hadoop fs -cat $1 | lzop -dc | head -1000 | less
}
# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin
There are a few things you need to be particular about
1. First check your JAVA_HOME path very carefully!
2. Make sure you have installed lzop
Now once you are done with this step. Go to the following director
PATH_TO_HADOOP/core/hadoop-env.sh
And again the set the Java Path there too. Now restart you Pi board using the following command
$: sudo shutdown -r 0
This will restart the Pi. Again log in using the same "su" command
$: su - haduser
Once you are logged in, in the terminal type the following
$: hadoop version
You'll see the information about the Hadoop version running on you Pi. If not! then I am afraid you have to check your steps again!!
Furthermore, you can check some tutorial over the internet regarding configuring site.xml files and all. That's it for starters, comments and feedback are welcome.