Install and Configure HADOOP on OsX

by BoxOfSugar in Circuits > Apple

3074 Views, 5 Favorites, 0 Comments

Install and Configure HADOOP on OsX

BigDataArticle.jpg

Installing Hadoop on OSX

I decided that I wanted to setup a Hadoop cluster on the Mac’s I run, this was mainly decided because of Xgrid not begin available anymore on the new version os OsX. I have setup SGE clusters before, Xgrid obviously, and Microsoft Cluster Server so I wanted to get it under my belt. This isn’t the definitive guide but it worked fairly well for me, I am still not sure of some of the concepts but that will come with practice.

The first step is to make sure you have the basics.

Command line Xcode tools and Java Developer for your version os OsX.

https://developer.apple.com/downloads/index.action

Lets first create a group and a user on every machine.

Create a group named ‘hadoop’ and then add an admin user ‘hadoopadmin’ to the group.

Lets do everything as hadoopadmin to make it easy.

You can download Hadoop and install it yourself but I took a shortcut and used homebrew to install it.

->brew install hadoop

This will set all your env paths in the proper hadoop config files so this is a help.

Once installed lets set the config files in hadoop.

I named my machines

hadoop01 & hadoop02 for the first two.

Configure the masters and slaves file on all machines.

master:

hadoopadmin@hadoop01

slaves:

hadoopadmin@hadoop01

hadoopadmin@hadoop02

Also configure /etc/hosts on all machines.

#

# localhost is used to configure the loopback interface

# when the system is booting. Do not change this entry.

##

127.0.0.1 localhost

255.255.255.255 broadcasthost

::1 localhost

fe80::1%lo0 localhost

#

#

#

# hadoop

132.235.132.67 hadoop01

132.235.132.46 hadoop02

I am using 2.4.0 so they are located in

/usr/local/Cellar/hadoop/2.4.0/libexec/etc/hadoop

Edit

hadoop-env.sh

I changed these two lines.

#export JAVA_HOME=“$(/usr/libexec/java_home)”

to

export JAVA_HOME=`/usr/libexec/java_home -v 1.6`

and

#export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true”

to

export HADOOP_OPTS="-Djava.security.krb5.realm= -Djava.security.krb5.kdc=“

This last one stopped an error I was getting upon startup.

Edit

hdfs-site.xml

Insert this configuration

dfs.replication

3

dfs.name.dir

/usr/local/Cellar/hadoop/2.4.0/hdfs/name

dfs.data.dir

/usr/local/Cellar/hadoop/2.4.0/hdfs/data

Edit

mapred-site.xml.template

Insert

mapred.job.tracker

hadoop01:9001

Edit

core-site.xml

fs.default.name

hdfs://hadoop01:9000

hadoop.tmp.dir

/usr/local/Cellar/hadoop/2.4.0/tmp

Now lets create a few hadoop directories

/usr/local/Cellar/hadroop/2.4.0

-> hadoop -mkdir tmp

-> hadoop -mkdir hdfs

-> hadoop -mkdir hdfs/name

-> hadoop -mkdir hdfs/data

I enabled passwordless SSH on all machines.

ssh-keygen -t dsa -P ” -f ~/.ssh/id_dsa

cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

I found info on this at

http://stackoverflow.com/questions/7134535/setup-passphraseless-ssh-to-localhost-on-os-x

I then formatted the name node

-> hadoop namenode -format

Then started hadoop by running

/usr/local/Cellar/hadoop/2.4.0/libexec/sbin/start-all.sh

I did all of this stuff on all my machines, although some items I think do not need to be.

I have to thank

http://stackoverflow.com &

http://dennyglee.com

For tutorials and help getting through this.

Thanks

Joe Murphy

AKA Grehyton