Wednesday, February 19, 2014

Installing Hadoop on Single Pi [Starters]

Hadoop is a framework written in Java, for handling large datasets. It has a very robust architecture and is being used by many organisations in general. Since I have been experimenting a bit with my Raspberry Pi, this post is dedicated to all those people who really want to learn more Hadoop in a hands-on fashion.
  I currently have ordered 4 Pi's from Farnell electronics but till the time they arrive I though it would be a good idea, if I'll be able to port the Hadoop framework on a single machine. As we know Pi has got limited capability when it comes to it's memory which is 512 MB. But make no mistake about it, it is still fairly capable of handling some of the tedious and complex applications.

Getting Started
 Before you want to install Hadoop on your system you need to make sure that your Raspberry Pi should have openjdk [latest version] installed.
  You can install openjdk on Raspberry Pi by using the following command

$ : sudo apt-get install openjdk-7-jdk

After it is installed, you can check the version using the following command

$: java -version

You can treat this as pre-requisite for installing Hadoop over Raspberry Pi.

Step II 

  Now since we have open jdk installed on our Pi, we'll create a dedicated Hadoop system user. You can create that by shooting the following commands

$: sudo addgroup hadoopuser
$:  sudo add user --ingroup hadoopuser haduser
$: sudo adduser haduser sudo

Just to give you a brief gist of the following commands, we have created a dedicated group called "hadoopuser" now in the second command we have created a user called "haduser" and in the third command we are giving him the "sudo" or in simple terms administrative privileges.  During user configuration it will ask you to enter the password, please be particular about that.

Step III
   Now we'll log into the "haduser" account we have created. We can do this using the following command:

$: su - haduser

This command will prompt you for your password, enter and GO!

Now you need to generate the RSA key pair.

$: ssh-keygen -t rsa -P ""

We are creating a passkey because everytime Hadoop access it's nodes, it asks for a passphrase and we are avoiding that reiteration using the following.

Now in your terminal type the following command
$: cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Step IV 

 Now comes the most fun part, download the Hadoop from any of the mirros. In my case I used

$:
wget http://www.apache.org/dyn/closer.cgi/hadoop/core/"YOUR HADOOP VERSION.tar.gz"

This will download the Hadoop  .tar.gz file on your Pi. Now you can use the "tar" command to decompress it.

$ sudo tar xzvf "FILE NAME".tar.gz

Now move the content of the file to /usr/local

Change the folder name to Hadoop.

And use the following command for explicit permission

$: sudo chown -R haduser:hadoopuser hadoop

Step V 

 In this part most of the people do the mistake, and this is the step where I got stuck with my installation. Go to your home directory and install the following first

$: sudo apt-get install lzop

Once you are done with the installation, using vi editor open the file ~/.bashrc

and add the following lines to it :
 #Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop

# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-armh

# Some convenient aliases and functions for running Hadoop-related commands
unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"

# If you have LZO compression enabled in your Hadoop cluster and
# compress job outputs with LZOP (not covered in this tutorial):
# Conveniently inspect an LZOP compressed file from the command
# line; run via:
#
# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo
#
#
lzohead () {
    hadoop fs -cat $1 | lzop -dc | head -1000 | less
}

# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin
 
 
 
 There are a few things you need to be particular about
1. First check your JAVA_HOME path very carefully!
2. Make sure you have installed lzop 



 

Now once you are done with this step. Go to the following director

PATH_TO_HADOOP/core/hadoop-env.sh
 
And again the set the Java Path there too. Now restart you Pi board using the following command

$: sudo shutdown -r 0

This will restart the Pi. Again log in using the same "su" command

$: su -  haduser

Once you are logged in, in the terminal type the following

$: hadoop version

You'll see the information about the Hadoop version running on you Pi. If not! then I am afraid you have to check your steps again!!

Furthermore, you can check some tutorial over the internet regarding configuring site.xml files and all. That's it for starters, comments and feedback are welcome.

 







Monday, February 17, 2014

The First Error

Last night, I read an article about configuring Hadoop on Raspberry Pi. I went through all the instructions and also read about basics of Hadoop.

    Today as usual I went to Rahul's place to do some more hacking with Raspberry Pi, I thought it might be easy to configure Hadoop on Raspberry Pi, but my mistake! I took it too lightly, because there are some of the steps which can be termed as "RECURSIVE ERROR". Meaning, if you are stuck with an error, you keep on repeating it and ultimately you reach the same point where you started.

Last night, I configured Drupal and webserver onto the Pi. Also I am now working with a persistent copy of Raspbian Wheezy.

Here are some of the pics :



Apache Webserver Running on Localhost [Pi]














Upgrading the 'Wheezy'


































As you can see in the pics I have added the HDUSER and have also configured ssh. But I guess there is some problem with .bashrc file. I am probably not configuring it as required!

My further plan of action is to read the Hadoop manual more carefully and also about installing it and most importantly about .bashrc file configurations.

At the end task completed today :
  • Webserver is persistent on the drive
  • Installed Open JDK 7 for ARM
  • Apache2, PHPMYADMIN, MYSQL-SERVER
  • Managed the DHCP configuration for ethernet port
  • Added HDUSER, and copied the basic hadoop directory, and configured upto some extent, still work to be done. 
  • Fixed the phpmyadmin not found error, using the following documentation : https://help.ubuntu.com/community/phpMyAdmin

Sunday, February 16, 2014

Task Completed!

Well basically I have bought the HDMI to VGA converter. Though there seems to be a bit of problem with it, but I guess I'll be able to figure that one out.

The current position as of now is :
  • Installed Drupal on the Pi
  • Modules working fine

The next step is to read about cluster and distributed computing, so that we can install HADOOP on the machine and perform some analysis using that. It will also brush up our knowledge pertaining to Distributed Computing. 

Saturday, February 15, 2014

The Agenda of the day

After working with the Raspberry Pi, for about 10 hours on a stretch I now have brushed up my skills related to Pi. Now my agenda for today is going to

  • Buy HDMI-VGA Converter : as it eases the pain to connect to those monitors which does not have HDMI slots. 
  • Memory Card : Rajat will be buying memory card of some good quality. [SDHC card 8 GB capacity]
  • Moreover, he will also be buying one powered USB hub, we may be requiring it in case we attach more peripherals to Pi, such as wifi dongle. 
  • I'll be porting Drupal onto the Rasbian Wheezy. 
  • And most important of all, updating the blog 


Pi 1


  Action begins pretty soon!! Apparently one of my seniors who is in M.Tech 1st year, invited me for the HACK DAY! Our main task list includes:

 1. Configuring the Pi
 2. Startup
 3. Installing Apache
 4. Webserver Configuration
 5. Testing


Since I was excited about the HACK DAY I was well prepared for it. Meaning I already downloaded the .img file of 'Raspbian Wheezy' distro for Raspberry Pi, and put all the necessary extensions which include USB Mouse, Keyboard, LAN wire, card reader, micro-usb charger [for powering the pi].

Configuring the Pi.

This was probably the easiest task of the all mentioned in the list above. I burned the .img file into my Micro USB card using PowerISO [oh, yes for this I use WINDOWS!! LORD SAVE ME :P].

Caution : While making a USB bootable drive, make sure your memory card has ample amount of space available. Since my basic task doesn't require me to use much space I used my old card having a total capacity of 4 GB.

After creating the bootable Card, I plugged in all the necessary hardware and started running the Pi. Now there are three small LED at the side of the Pi, The RED one denotes that the power is ON, just below that you'll see another LED with ACT written over it, well that tell you that your Pi is not activated and you are good to go. Apart from that there is another LED located which is a ETHERNET connection indicator.

After turning ON, the Pi takes some time to initialize.

During the first startup it'll ask for LOGIN ID AND PASSWORD. The default login id is 'pi' and default password is 'raspberry'.

Once you are logged in you need to start the 'startx' server which starts the graphical user interface in the Pi. For those who are well familiar with LINUX this won't be a problem.

STEP 2 :

Now next I turned on to configure the webserver on the Pi. My basic aim as of now is to install apache on the Pi's localhost. Well again, it's easy all I need to do was to shoot the following command

 $:apt-get install apache2 phpmyadmin mysql-server php5 php5-cli 

And VOILA!! without any further woes! I was able to install the Apache webserver.

And the most important of all, since I do not have the usb wifi dongle with me, I used connectify [windows platform] to connect Pi, to the internet using the ehternet port available on the Pi.

That's it for now. The next task is to access the Pi remotely and also to make a full fledged Pi cluster.

To reiterate my words from my first post yesterday, I HOPE WE'LL DO THIS IN THE GIVEN TIME FRAME.

Till the next time Adios!










Friday, February 14, 2014

Intro

Hello everybody!
    This is our blog related to our experiments with Raspberry Pi.

Raspberry Pi is a card sized mini computer, which was developed in the UK by the Raspberry Pi Foundation with the intention of promoting the teaching of basic computer science in schools.

I had some previous exposure to Pi, but my friend Rajat Goyal is completely new to it. During our Cloud Infrastructure class our subject teacher Mr. Manoj Baliyan gave us the task of implementing HADOOP on Raspberry Pi.

Well as of now I have just basic Idea of Hadoop and by no chance I have prior knowledge of configuring Hadoop on a local machine. So currently, we are dividing the task in the following manner


  • I'll be configuring Raspberry Pi 
  • Cluster Configuration 
  • Hadoop Installation 


On the other hand Rajat will be doing some
  • Data analysis using Hadoop and Python
  • Running the webserver
  • Optimization 

We'll be writing our progress report on this blog. Let's hope that we can pull this in the given time frame.