Run remote wordcount mapreduce from eclipse in windows8

By | October 29, 2014
Share the joy

I have already deployed 4 nodes hadoop cluster in VMWare. I will run the wordcount application in myeclipse. Myeclipse is run in win8, which is my own laptop.
1. Deploy a hadoop work environment in eclipse. (You can refer to “Build hadoop work environment in MyEclipse” in my blog).
2. Import hadoop-mapreduce-client-common-2.3.0.jar and hadoop-mapreduce-client-jobclient-2.3.0.jar to workspace. They exist in HADOOP/share/hadoop/mapreduce directory.

3. You should still have a hadoop directory in windows. And set set the environment vairable of HADOOP_HOME. This is mandantory.

4. Download hadoop-common-2.2.0-bin-master.rar, extract the files and overwrite in hadoop/bin. Here, there are 2 necessary files. They are hadoop.dll and winutils.exe.

6. I copied file1.txt, file2.txt to centmaster:9000/input

5. Modify the, and run it in eclipse.
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

public static class TokenizerMapper extends
Mapper《Object, Text, Text, IntWritable》 {

private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
context.write(word, one);

public static class IntSumReducer extends
Reducer《Text, IntWritable, Text, IntWritable》 {
private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
context.write(key, result);

public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = { “hdfs://centmaster:9000/input”,
“hdfs://centmaster:9000/output” };
System.setProperty(“HADOOP_USER_NAME”, “root”);        //this is very important. Set the root name of the NameNode(centmaster).
Job job = new Job(conf, “word count”);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
6. You can find the result.

Use the cat to check the result:
[root@centmaster bin]# bash hadoop fs -cat /output/part-r-00000
2012-3-1        2
2012-3-2        2
2012-3-3        4
2012-3-4        2
2012-3-5        2
2012-3-6        2
2012-3-7        2
a       4
b       4
c       5
d       3

Some difference from the online articles.
1. Some articles said, you should configure the hadoop/etc/hadoop/hadoop-env.cmd, and set the JAVA_HOME. But in my case, it is not necessary. I tested, that I modified “etc” into “etc2”. Mapreduce runs well.
2. Some articles said, that you should put hadoop.dll in windows/system32. I copied, but I didn’t test if it is necessary to do that. You can try that. But you need to restart your computer.
3. I didn’t configure the %HADOOP_HOME%\bin in PATH environment viariable, it still runs well.

Experience to debug:
Eclipse doesn’t show many error information. The best way is that you find it in logs/hadoop-root-namenode-centmaster.log file in the namenode, and google it.