Hadoop tutorial for Windows and Eclipse.
Just posted a tutorial on how to configure hadoop environment for Windows using CYGWIN. The tutorial explains how to set-up a hadoop cluster in the pseudo distributed mode and how to get it working with the Eclipse.
If you have any questions / comments / suggestions about this tutorial post them here.

Ben Said,
March 29, 2009 @ 10:19 am
Thanks for your excellent tutorial! I followed it this weekend and was able to get mostly up and running.
One question I had was how to use it with EC2 — I set up on EC2 rather than on localhost, and I’m wondering what I need to do in order to make it run… getting weird unknown host errors when I run, despite having set up a proxy server.
Thanks for the very helpful tutorial!
Ben
vlad Said,
March 29, 2009 @ 11:14 am
No problem.
Setting hadoop right on EC2 could be tricky. I am going to post another tutorial about it in a few weeks.
Rez Said,
March 31, 2009 @ 5:17 pm
Hey, this page on your tutorial (Unpacking Hadoop)
http://v-lad.org/Tutorials/Hadoop/09%20-%20unpack%20hadoop.html
is not working.
vlad Said,
April 9, 2009 @ 8:30 am
Strange. Works for me, can’t see what the problem is. Does anybody else have this problem?
Jeff Said,
April 9, 2009 @ 2:32 pm
Thanks for the tutorial… it would have saved me a few hours of frustration.
Have you tried it with other versions of Eclipse. The main distribution is 3.4 (Ganymede), which will shortly be 3.5 in May.
vlad Said,
April 9, 2009 @ 10:09 pm
Jeff,
I tried with the other version of eclipse and it doesn’t work with 3.4 and probably won’t work with 3.5 until somebody fixes the hadoop plugin, because plug-in API has been changed for new versions of eclipse. You can use the plug-in with 3.4 to browse for the HDFS, but you won’t be able to start the project.
Joseph Said,
April 15, 2009 @ 12:09 am
Vlad,
thanks for the well documented tutorial. it is good work..
Towards the last step i got following error
09/04/15 15:00:33 INFO mapred.JobClient: Task Id : attempt_200904151224_0004_m_000000_2, Status : FAILED
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
kindly advice for some clue..
my code is as follows:
// TODO: specify input and output DIRECTORIES (not files)
//conf.setInputPath(new Path(“src”));
//conf.setOutputPath(new Path(“out”));
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(“In”));
FileOutputFormat.setOutputPath(conf, new Path(“Out3″));
thanks and regards
Joseph
vlad Said,
April 15, 2009 @ 12:00 pm
The error you getting is actually correct. The Mappers / Reducers generated by the plug-in need some tweaking. I will post another tutorial regarding sometime in May.
ash Said,
April 16, 2009 @ 11:40 pm
Hi Vlad,
thanks for the excellent turorial.. in the last step when i try to run the TestDriver class i get this error.
Pls help…
>>>>>>>>> START >>>>>>>
09/04/17 11:58:39 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
09/04/17 11:58:40 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1280)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
at org.apache.hadoop.ipc.Client.call(Client.java:697)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy0.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
09/04/17 11:58:40 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar retries left 4
09/04/17 11:58:40 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1280)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
at org.apache.hadoop.ipc.Client.call(Client.java:697)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy0.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
09/04/17 11:58:40 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar retries left 3
09/04/17 11:58:41 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1280)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
at org.apache.hadoop.ipc.Client.call(Client.java:697)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy0.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
09/04/17 11:58:41 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar retries left 2
09/04/17 11:58:42 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1280)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
at org.apache.hadoop.ipc.Client.call(Client.java:697)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy0.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
09/04/17 11:58:42 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar retries left 1
09/04/17 11:58:46 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1280)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
at org.apache.hadoop.ipc.Client.call(Client.java:697)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy0.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
09/04/17 11:58:46 WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null
09/04/17 11:58:46 WARN hdfs.DFSClient: Could not get block locations. Source file “/tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar” – Aborting…
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1280)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
at org.apache.hadoop.ipc.Client.call(Client.java:697)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy0.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
>>>>>> END >>>>>
vlad Said,
April 17, 2009 @ 7:24 am
I seen this error before. Usually it is caused by having not enough space on your workstation. Try to clean up some space and recreate HDFS. Also check for error messages in DataNode and NameNode windows.
tony Said,
April 19, 2009 @ 12:18 am
hi, i followed what you showed and also that quick start on the official apache website.
There is a big problem when i execute the command :
“bin/hadoop namenode -format”
it shows that the bin/hadoop ,the “hadoop” script, contains certain errors.
While i installed it on linux in VM, that was ok.
how can i run this hadoop script in cygwin correctly?
thanks
vlad Said,
April 20, 2009 @ 2:59 pm
Could you post the error message that you are getting?
Wen-Han Said,
April 22, 2009 @ 1:26 pm
Hi VLAD,
May I know how recent is your tutorial? Is it updated to the most recent versions of hadoop and eclipse?
Thank you,
Wen-Han
vlad Said,
April 22, 2009 @ 4:16 pm
The tutorial was written in April using the most recent version of the hadoop 0.19.1. As for eclipse the newest version of the Eclipse ( Ganymede ) is not compatible with the Hadoop plug-in that is supplied with version 0.19.1, so you have to use previous version of the eclipse ( Europa ).
I saw that the new version of the Hadoop 0.20 came out, so I will take a look at what have changed and update the tutorial if needed.
Saurabh Said,
April 23, 2009 @ 5:59 am
Hi vlad tutorial is good
I am setting it on my Mandriva Machine &whenever i run
ssh localhost
I get::
[abc@localhost .ssh]$ ssh localhost
ssh: connect to host localhost port 22: Connection refused
Please Help me
vlad Said,
April 23, 2009 @ 6:23 am
Hmm,
This tutorial is done for windows machines. To resolve your problem check that you have sshd installed and running. Also check that you don’t have firewall blocking port 22.
Sid Said,
April 25, 2009 @ 12:38 pm
Hi I am working on the hadoop eclipse in Linux everything was working fine when one day hadoop started to ignore any code changes I did in my project. Instead it just ran an old copy of the code from somewhere. Looking at the mapred.local folder where the temporary source files are jared together to run the job the source code was indeed changed… i created another dummy project in eclipse and ran it and it ran just fine, changes were reflected every time… What could be the problem?
vlad Said,
April 25, 2009 @ 6:20 pm
Sorry man never seen that happen. Maybe somebody else on this board will comment.
Joe Said,
May 1, 2009 @ 4:44 am
Vlad,
Thank you so much for this tutorial. I am having a problem when running : bin/hadoop namenode –format
First it said “JAVA_HOME not set”, so I set my windows environment variable to the correct path, which is c:\program files\Java\jdk1.6.0_06
Then I closed and re-opened cygwin, and tried again. This time it appeard to work, but the first line of the output was “bin/hadoop: line 234: C:\Program: command not found”. The rest of the output looked like your screenshot. Is this normal?
Thanks,
Joe
Wen-Han Said,
May 1, 2009 @ 11:12 am
Hi vlad,
thanks for your reply for last one. I configure Eclipse Europa according to Yahoo tutorial on hadoop:
http://public.yahoo.com/gogate/hadoop-tutorial/html/module3.html
and in the instruction it goes about creating new DFS Location:
“…..Next, click on the “Advanced” tab. There are two settings here which must be changed.
Scroll down to hadoop.job.ugi. It contains your current Windows login credentials. Highlight the first comma-separated value in this list (your username) and replace it with hadoop-user.”
I can’t find this attribute(hadoop.job.ugi) in the advance list from “Define Hadoop location” on Eclipse. Do you have an idea?
Thank you, fast reply will be much appreciated.
Wen-Han
Wen-Han Said,
May 1, 2009 @ 11:15 am
PS., The yahoo tutorial on Hadoop have the hadoop installed on VM ware, not in localhost by cygwin.
Thanks,
sneha Said,
May 2, 2009 @ 8:52 am
hello!!
thank u 4 d good hadoop tutorial… i am setting up a hadoop cluster of 4 systems…when i run bin/start-dfs.sh command i get an error as error:JAVA_HOME NOT set .. can u plz let me know d solution n also can u let me know how to set java home path in .bash_profile in cygwin promt
thank you!!!!!!!!!1
Muhammad Mudassar Said,
May 5, 2009 @ 11:36 pm
Hi
Tutorial is helping one. I want to know about that how to upload some images or some structured data on HDFS by using cygwin, eclipse, in windows.
One more thing that after restart of my pc while working with hadoop it was not working well but then I restarted the CYGWIN sshd service it started again well. I want to know that after every time restarting the pc the service also has to be restarted?
Thanks.
vlad Said,
May 8, 2009 @ 7:48 am
First you have to ask yourself a question, what are you planning to do with your data. Depending on the answer you could use the hdfs cp command or use HBase.
Note that if you are planning to use binary data you might have to write your own record readers.
vlad Said,
May 8, 2009 @ 7:51 am
As for your second comment. Make sure that in the Services window your sshd service is set to start automatically.
vlad Said,
May 8, 2009 @ 8:01 am
bin/start_dfs.sh script won’t work in the environment described in this tutorial, to start DFS services refer to section 10 of the tutorial. On the additional machines you have to start only data node and task tracker processes.
Remember that on the worker machines you have to edit the hadoop-site file to configure the name of your namenode machine instead of localhost. Also make sure all necessary firewall ports are open.
vlad Said,
May 8, 2009 @ 8:02 am
That’s right. But this way you will incur the penalties of running another operating system, and it is tricky to debug processes in vmware.
vlad Said,
May 8, 2009 @ 8:04 am
Not sure, what could be causing this. Check the dates on the files.
vlad Said,
May 8, 2009 @ 8:05 am
It’s the problem with the scripts. Try setting up your JDK in the directory that doesn’t have a space. I use C:\Java\JDK1.6 for that.
Kim Said,
May 13, 2009 @ 2:55 pm
This tutorial is great. Hadoop is running perfectly in VM (windows xp).
Just one question.
Is there any way that I can use “start-all.sh”, instead of initiating “hadoop namenode”, “hadoop jobtracker”, …. in multiple cygwin windows?
Thank you again, for your all efforts.
vlad Said,
May 13, 2009 @ 9:23 pm
Not in Windows XP. The hadoop start scripts are written for Linux machines and for debugging purposes it is just easier to run each of the hadoop components in its own window.
Mayank Said,
May 21, 2009 @ 4:35 am
Hi vlad, the tutorial is great.
Currently I am facing problem in upload data step, in my eclipse i get localhost->2->error I am unable to see the user and “In” folder and so on…please suggest me what to do now..
Charitha Said,
May 28, 2009 @ 2:12 am
error in eclipse europa while running a TestDriver.java….
please advise me. help will be appriciated..
09/05/28 14:40:01 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:9100/user/charitha/Out already exists
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:111)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:793)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142)
at TestDriver.main(TestDriver.java:41)
Regards,
Charitha Reddy.
vlad Said,
May 28, 2009 @ 8:27 am
Looks like it is a second time you are trying to run the project. Every time you run the project it creates “Out” directory to store the output. You have to delete that directory before you run your project or change the code to create a new directory every time you run. Look at the hadoop examples to see how to do the later.
vlad Said,
May 28, 2009 @ 8:36 am
Do you see any activity in the cygwin windows when you are trying to connect. Could be the firewall blocking incoming ports.
Use the following command from the command window and let me know what do you get, note that you have to have hadoop started.
telnet localhost 9100
Joseph Said,
May 28, 2009 @ 8:59 pm
Vlad
would like to know whether you have some update on the following
>>snip>>
The error you getting is actually correct. The Mappers / Reducers generated by the plug-in need some tweaking. I will post another tutorial regarding sometime in May.
vlad – April 15th, 2009 at 12:00 pm
>>end of snip>>
vlad Said,
May 28, 2009 @ 9:22 pm
Sorry, been really busy lately.
Martinus Said,
June 6, 2009 @ 7:48 am
Hello Vlad,
Thanks for the Tutorial. I still have Problem with compiling the TestDriver class. After I compile the class, I got Error message from Eclipse:
09/06/06 16:44:03 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
09/06/06 16:44:03 INFO mapred.FileInputFormat: Total input paths to process : 4
09/06/06 16:44:04 INFO mapred.JobClient: Running job: job_200906061639_0001
09/06/06 16:44:05 INFO mapred.JobClient: map 0% reduce 0%
09/06/06 16:44:14 INFO mapred.JobClient: Task Id : attempt_200906061639_0001_m_000000_0, Status : FAILED
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
09/06/06 16:44:18 INFO mapred.JobClient: Task Id : attempt_200906061639_0001_m_000000_1, Status : FAILED
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
09/06/06 16:44:22 INFO mapred.JobClient: Task Id : attempt_200906061639_0001_m_000000_2, Status : FAILED
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at TestDriver.main(TestDriver.java:40)
I have no idea, I use all Programs, you wrote in tutorial (eclipse 3.3.2, hadoop 1.9.1, etc).
Thanks
Martinus
Carspar Said,
June 9, 2009 @ 12:49 am
Hi vlad, the tutorial is great.
I followed your tutorial and met a probelm in step:11 – Setup Hadoop Location in Eclipse.
At the step 6, In the Project Explorer tab on the left hand side of the Eclipse window, find the DFS Locations item. Open it using the “+” icon on its left. Inside, you should see the localhost location reference with the blue elephant icon. Keep opening the items below it until you see something like the image below.
I used the “+” icon on the left. Inside, it is a folder with empty name like your image. When I keep opening, the following folder is not a “tmp(1)”, but a “Error: null”.
thanks,
Carspar Said,
June 9, 2009 @ 1:41 am
I solved the problem. It is because I did not set the environment variable of cygwin rightly.
Thanks,
kerenann Said,
July 22, 2009 @ 12:37 am
Hello,vlad,your tutorial is very helpful.
Only one problem in step:11-Setup Hadoop Location in Eclipse.
At the step 6, in the project explorer tab on the left side of the eclipse window, i have found the DFS location. clink the “+” icon. There has a folder named (1). When i keep opening, the following folder is not “tmp(1)”, but a “Error:call to localhost/127.0.0.1:9000 failed on connection exception:java.net.ConnectException: Connection refused: no further information”.
I think my environment variable of cygwin is right.
so, I don’t know what’s wrong with it?
thanks
Wylie van den Akker Said,
July 27, 2009 @ 11:07 am
Just thought I would mention for hadoop-0.20.0+ under cygwin you also need to install rsynch (under the “NET” section) for filesystem replication to work. Additionally the xml configuration is split up into 3 different files. Details on that can be found here: http://hadoop.apache.org/common/docs/r0.20.0/cluster_setup.html
Cheers,
Wylie
Collective Medical Technologies
http://www.collectivemedicaltech.com
vlad Said,
August 5, 2009 @ 6:05 am
Check if your cluster is running. [ No error messages in the command windows]. Also check if you have firewall installed that might be preventing the connections.
Arun Jamwal Said,
August 7, 2009 @ 4:55 pm
To get rid of
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
Change the following lines in TestDriver.java as
//conf.setOutputKeyClass(Text.class);
//conf.setOutputValueClass(IntWritable.class);
conf.setOutputKeyClass(LongWritable.class);
conf.setOutputValueClass(Text.class);
HTH,
Arun Jamwal
richilee Said,
August 22, 2009 @ 1:08 pm
for those who have the “bin/hadoop: line 234: C:\Program: command not found” problem. This is caused by the the whitespace between “Program Files”. In other words. if your JAVA_HOME is “c:\Program Files\java”, there is a whitespace between “Program and Files”. So one way to solve the problem is put your jdk in a different folder. I put my jdk in c:\java\jdk . then everything works pretty well. hope it helps.
Charanjeet Said,
September 16, 2009 @ 3:49 am
Hi All,
I was using the article for installing the hadoop.
While running the command
$ bin/hadoop namenode -format
I found that there are errors because the installed JDK was in ‘C:\Prpgram Files’ and the command was reffering it through environment veriable JAVA_HOME since there is space in ‘Program’ and ‘Files’ it was dying.
I resolved it by creating a cymbolic link as
$ln -s /cygdrive/c/Program Files/java/jdk1.6.0_02 /java
inside ‘/’ folder through cygwin and made an entry in <>/conf/hadoop-env.sh like
‘export JAVA_HOME=/java’
Regards
Charanjeet singh
Senior Engineer
Impetus infotech India Pvt. Ltd.
Ken Church Said,
September 20, 2009 @ 1:46 pm
Extremely useful. I’m thinking of pointing a bunch of students at this. One detail: the tutorial has some stale links to hadoop-0.19.1 (as well as a number of references to that elsewhere in the text). It would be good to write the tutorial in such a way that the text doesn’t need to be updated with each new version.
Deng Wanyu Said,
September 30, 2009 @ 1:17 am
Hi:
it is very helpful for me!
my problem is:
I upload the txt file by command, but I find the uploaded file is empty. why?
Azuryy Said,
October 14, 2009 @ 6:50 am
If I don’t open five seperate Cygwin windows, instead, I run start-all.sh, I got: Could not obtail block error.
but I open five seperated Cygwin windows as said in the tuorial, it does work.
Azuryy Said,
October 14, 2009 @ 7:06 pm
My Found:
If you want to run start-all.sh, instead open five seperated Cygwin windows as this toturial said, please do
hadoop fs -put before you run start-all.sh, if not, you will get “Could not obtail block” error when you run your job.
sam Said,
October 22, 2009 @ 4:08 pm
i get this error when i open the mapreduce perspective in eclipse and i dont see the file after localhost->1 in dsf locations the below errors was in the namenode window
lVersion(org.apache.hadoop.dfs.ClientProtocol, 35) from 127.0.0.1:3282: error: j
ava.io.IOException: Unknown protocol to name node: org.apache.hadoop.dfs.ClientP
rotocol
java.io.IOException: Unknown protocol to name node: org.apache.hadoop.dfs.Client
Protocol
at org.apache.hadoop.hdfs.server.namenode.NameNode.getProtocolVersion(Na
meNode.java:98)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
09/10/22 15:58:08 INFO ipc.Server: IPC Server handler 4 on 9100, call getProtoco
lVersion(org.apache.hadoop.dfs.ClientProtocol, 35) from 127.0.0.1:3282: error: j
ava.io.IOException: Unknown protocol to name node: org.apache.hadoop.dfs.ClientP
rotocol
java.io.IOException: Unknown protocol to name node: org.apache.hadoop.dfs.Client
Protocol
at org.apache.hadoop.hdfs.server.namenode.NameNode.getProtocolVersion(Na
meNode.java:98)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
Ravi Said,
October 25, 2009 @ 10:44 am
Hi there, your tutorial is excellent. Very good job and I dont say that often.
So I was trying to setup hbase using your hadoop tutorial. I was able to follow up to step 12 but when I try to execute
$bin/hbase namenode -format
: No such file or directory
bin/hbase: line 45: $’\r’: command not found
Can you tell me what am I missing?
Thanks
Ravi Said,
October 25, 2009 @ 12:10 pm
well after a few internet searches and 1 hour later, I am able to execute it, but now I get this error:
$ bin/hbase namenode -format
Exception in thread “main” java.lang.NoClassDefFoundError: namenode
Sharad Said,
October 29, 2009 @ 4:14 am
Is there an elegant way to stop dfs? Stopping using Ctrl-C seems to corrupt it and bin/hadoop/stop-dfs.sh don’t seem to work (some error message like localhost: cat: cannot open file /dev/fs/C/tmp/hadoop-sk-secondarynamenode.pid : No such file or directory)
Thanks!
vlad Said,
October 29, 2009 @ 8:30 am
It should be bin/hdfs not bin/hbase
vlad Said,
October 29, 2009 @ 8:31 am
Not sure. Never had the problem with corruption.
steve Said,
November 2, 2009 @ 12:01 pm
Great tutorial!
I’ve almost got this working, but I’m having trouble connecting to localhost with ssh.
If I do:
ssh localhost -v
the last two lines are:
Offering public key: /home/user.name/.ssh/id_rsa
Connection closed by xxx.x.x.x
Any ideas what is going on?
I also had to manually add ssh_server to administrators and change the password in order to get the sshd service to run.
-Steve
RezaMor Said,
November 9, 2009 @ 8:05 pm
Thanks for your excellent tutorial! However, in the last
step I got the following error, and I mentioned that two others wrote the same Error as comment for you.
Would you please answer Me.
09/11/10 12:53:01 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
09/11/10 12:53:01 INFO mapred.FileInputFormat: Total input paths to process : 4
09/11/10 12:53:02 INFO mapred.JobClient: Running job: job_200911101209_0003
09/11/10 12:53:03 INFO mapred.JobClient: map 0% reduce 0%
09/11/10 12:53:13 INFO mapred.JobClient: Task Id : attempt_200911101209_0003_m_000000_0, Status : FAILED
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
09/11/10 12:53:17 INFO mapred.JobClient: Task Id : attempt_200911101209_0003_m_000000_1, Status : FAILED
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
09/11/10 12:53:22 INFO mapred.JobClient: Task Id : attempt_200911101209_0003_m_000000_2, Status : FAILED
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at TestDriver.main(TestDriver.java:41)
vlad Said,
November 18, 2009 @ 12:38 pm
The reason you are getting this error is that API has been changed since hadoop version .17. And the code generated by eclipse needs some tweaking.
Rill Said,
November 24, 2009 @ 7:58 pm
I got a problem in eclipse plugin.
—————————————————-
Cannot connect to the Map/Reduce location:localhost.
Failed to get the current user’s information.
—————————————————-
user of my windows need password to login.
Please help me~, thank you!
Jason Venner Said,
January 1, 2010 @ 10:18 am
The prohadoop website has a lot of information on Hadoop and Hadoop setup as well as a good community of people to ask and answer questions with.
This particular error java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
is because the input format for your job is TextInputFormat, rather than KeyValueTextInputFormat
TextInputFormat provides a LongWritable as a key, which is the input line number, and a Text as the value, which is the input line data.
KeyValueTextInputFormat provides a Text key, that portion of the input line up to the first TAB character, and a Text value that portion of the input line after the first TAB character.
Alternatively you can modify the definition of your Map class to accept a LongWritable as the input key type.
Swetha Said,
January 4, 2010 @ 1:17 am
hello!
When I run the code I get the below error. I understand there is some change in the path where the job cache files are created; but I don’t know how to change it. Any clue??
Thanks in advance.
INFO mapred.JobClient: Task Id : attempt_201001041128_0006_m_000006_1, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-MBS/mapred/local/taskTracker/jobcache/job_201001041128_0006/attempt_201001041128_0006_m_000006_1/work/tmp does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
Abhishek Said,
February 27, 2010 @ 2:15 pm
Hi,
In step 5 when I hit the command explorer as shown in the tutorial I get an error
abc@ZULFI ~
$explorer
-bash: explorer: command not found
Anybody any ideas ??
vlad Said,
February 27, 2010 @ 5:25 pm
Hi,
What is your system? Is it Windows XP?
Also, type this command:
echo $PATH
and post the results here
vlad Said,
February 27, 2010 @ 9:06 pm
Abishek,
Either your system is something not standard or your PATH variable is not set-up right. Type this command in the cygwin window and post the output here:
echo $PATH
Vlad
Keith Said,
March 2, 2010 @ 2:32 pm
Everything works great, except…
The Run As menu offers “On Hadoop”, but the Debug As menu does not. Obviously, the Run As options don’t trigger break points or otherwise offer debugging capability.
So, how do I debug?
Thanks.
Iris Said,
March 3, 2010 @ 10:39 am
vlad,
Thank you for the excellent tutouial.
I have a problem in the last step, after running the code, it showed the error below:
10/03/04 01:06:00 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
10/03/04 01:06:01 INFO mapred.FileInputFormat: Total input paths to process : 4
10/03/04 01:06:01 INFO mapred.JobClient: Running job: job_201003040054_0001
10/03/04 01:06:02 INFO mapred.JobClient: map 0% reduce 0%
10/03/04 01:06:11 INFO mapred.JobClient: Task Id : attempt_201003040054_0001_m_000006_0, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/03/04 01:06:15 INFO mapred.JobClient: Task Id : attempt_201003040054_0001_m_000006_1, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/03/04 01:06:20 INFO mapred.JobClient: Task Id : attempt_201003040054_0001_m_000006_2, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/03/04 01:06:29 INFO mapred.JobClient: Task Id : attempt_201003040054_0001_m_000005_0, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/03/04 01:06:33 INFO mapred.JobClient: Task Id : attempt_201003040054_0001_m_000005_1, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/03/04 01:06:37 INFO mapred.JobClient: Task Id : attempt_201003040054_0001_m_000005_2, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at TestDriver.main(TestDriver.java:43)
I have no idea about what’s wrong with it.
Please help me!
Thank you in advance!
iris Said,
March 4, 2010 @ 4:06 am
vlad,
Thank you for you excellent tutorial, however I have the error when running the last step, the output error as below,
10/03/04 18:59:57 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
10/03/04 18:59:58 INFO mapred.FileInputFormat: Total input paths to process : 4
10/03/04 18:59:59 INFO mapred.JobClient: Running job: job_201003041848_0001
10/03/04 19:00:00 INFO mapred.JobClient: map 0% reduce 0%
10/03/04 19:00:14 INFO mapred.JobClient: Task Id : attempt_201003041848_0001_m_000006_0, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/03/04 19:00:18 INFO mapred.JobClient: Task Id : attempt_201003041848_0001_m_000006_1, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/03/04 19:00:24 INFO mapred.JobClient: Task Id : attempt_201003041848_0001_m_000006_2, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/03/04 19:00:33 INFO mapred.JobClient: Task Id : attempt_201003041848_0001_m_000005_0, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/03/04 19:00:37 INFO mapred.JobClient: Task Id : attempt_201003041848_0001_m_000005_1, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/03/04 19:00:43 INFO mapred.JobClient: Task Id : attempt_201003041848_0001_m_000005_2, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at TestDriver.main(TestDriver.java:44)
I have no idea about it, please help me!
Thank you in advance.
song Said,
March 7, 2010 @ 5:29 am
Thanks for your excellent tutorial! However, in step 9 setup Hadoop plugin ,I followed it but when I try to execture ,I didn’t find map/reduce in “open perspective”,why?
Thanks!
euqinoxia Said,
March 19, 2010 @ 1:43 am
hi,Vlad,
thanks for excellent tutorial.
Towards the last step i got following error:
10/03/19 15:30:23 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
10/03/19 15:30:23 INFO mapred.FileInputFormat: Total input paths to process : 4
10/03/19 15:30:24 INFO mapred.JobClient: Running job: job_201003191529_0002
10/03/19 15:30:25 INFO mapred.JobClient: map 0% reduce 0%
10/03/19 15:30:31 INFO mapred.JobClient: Task Id : attempt_201003191529_0002_m_000006_0, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/03/19 15:30:35 INFO mapred.JobClient: Task Id : attempt_201003191529_0002_m_000006_1, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/03/19 15:30:39 INFO mapred.JobClient: Task Id : attempt_201003191529_0002_m_000006_2, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/03/19 15:30:48 INFO mapred.JobClient: Task Id : attempt_201003191529_0002_m_000005_0, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/03/19 15:30:52 INFO mapred.JobClient: Task Id : attempt_201003191529_0002_m_000005_1, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/03/19 15:30:56 INFO mapred.JobClient: Task Id : attempt_201003191529_0002_m_000005_2, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at TestDriver.main(TestDriver.java:41)
thanks and regards
equinoxia
vlad Said,
March 19, 2010 @ 5:43 am
Iris,
You get this error because your Map task is failing. Could you post your mapper code here.
Vlad
chefc17 Said,
March 20, 2010 @ 4:33 am
hi,Vlad,
thanks for excellent tutorial.
i used eclipse galileo
i have a problem at “setup hadoop location”
in Project Explorer /DFS Locations / localhost
it’s empty “(0)
rananjay Said,
March 26, 2010 @ 4:43 am
Hi
thanks for this nice tutorial.
it is really a good work and i have no words to describe your effort.
I follow every steps of this tutorial.
But later while running Criver Class file I am getting these error :-
10/03/26 17:07:36 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
10/03/26 17:07:36 INFO mapred.FileInputFormat: Total input paths to process : 4
10/03/26 17:07:36 INFO mapred.JobClient: Running job: job_201003261700_0002
10/03/26 17:07:37 INFO mapred.JobClient: map 0% reduce 0%
10/03/26 17:07:45 INFO mapred.JobClient: Task Id : attempt_201003261700_0002_m_000006_0, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-RananjayS/mapred/local/taskTracker/jobcache/job_201003261700_0002/attempt_201003261700_0002_m_000006_0/work/tmp does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:244)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:520)
at org.apache.hadoop.mapred.Child.main(Child.java:143)
10/03/26 17:07:50 INFO mapred.JobClient: Task Id : attempt_201003261700_0002_m_000006_1, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-RananjayS/mapred/local/taskTracker/jobcache/job_201003261700_0002/attempt_201003261700_0002_m_000006_1/work/tmp does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:244)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:520)
at org.apache.hadoop.mapred.Child.main(Child.java:143)
10/03/26 17:07:55 INFO mapred.JobClient: Task Id : attempt_201003261700_0002_m_000006_2, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-RananjayS/mapred/local/taskTracker/jobcache/job_201003261700_0002/attempt_201003261700_0002_m_000006_2/work/tmp does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:244)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:520)
at org.apache.hadoop.mapred.Child.main(Child.java:143)
10/03/26 17:08:06 INFO mapred.JobClient: Task Id : attempt_201003261700_0002_m_000005_0, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-RananjayS/mapred/local/taskTracker/jobcache/job_201003261700_0002/attempt_201003261700_0002_m_000005_0/work/tmp does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:244)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:520)
at org.apache.hadoop.mapred.Child.main(Child.java:143)
10/03/26 17:08:12 INFO mapred.JobClient: Task Id : attempt_201003261700_0002_m_000005_1, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-RananjayS/mapred/local/taskTracker/jobcache/job_201003261700_0002/attempt_201003261700_0002_m_000005_1/work/tmp does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:244)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:520)
at org.apache.hadoop.mapred.Child.main(Child.java:143)
10/03/26 17:08:19 INFO mapred.JobClient: Task Id : attempt_201003261700_0002_m_000005_2, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-RananjayS/mapred/local/taskTracker/jobcache/job_201003261700_0002/attempt_201003261700_0002_m_000005_2/work/tmp does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:244)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:520)
at org.apache.hadoop.mapred.Child.main(Child.java:143)
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at FirstDriver.main(FirstDriver.java:42)
Manish Said,
April 1, 2010 @ 2:37 am
Hi,Vlad,
Thanks for such a excellent tutorial on Hadoop configuration on Window.
I have followed each steps in tutorial. Every steps went fine, but execution of the program is giving me troble. Following is the problem message on console,
10/04/01 15:05:24 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
10/04/01 15:05:25 INFO mapred.FileInputFormat: Total input paths to process : 4
10/04/01 15:05:27 INFO mapred.JobClient: Running job: job_201004011443_0001
10/04/01 15:05:30 INFO mapred.JobClient: map 0% reduce 0%
10/04/01 15:05:41 INFO mapred.JobClient: Task Id : attempt_201004011443_0001_m_000006_0, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/04/01 15:05:45 INFO mapred.JobClient: Task Id : attempt_201004011443_0001_m_000006_1, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/04/01 15:05:50 INFO mapred.JobClient: Task Id : attempt_201004011443_0001_m_000006_2, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/04/01 15:05:58 INFO mapred.JobClient: Task Id : attempt_201004011443_0001_m_000005_0, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/04/01 15:06:04 INFO mapred.JobClient: Task Id : attempt_201004011443_0001_m_000005_1, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/04/01 15:06:08 INFO mapred.JobClient: Task Id : attempt_201004011443_0001_m_000005_2, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at TestDriver.main(TestDriver.java:37)
Please let me know what could have gone wrong.
Thanks & R
jamo Said,
April 8, 2010 @ 8:18 am
I get “almost” all the way through the tutorial using the 19.2 version, but when running TestDriver, it throws several FileNotFound exceptions as in Swetha’s Jan 4 2010 post above. I tried changing the mapred.job.tracker setting to c:/cygwin/tmp, and restarting the jobtracker, but this didn’t change the error. Any idea what parameter needs to be changed?
thx,
jamo
Vaibhav Said,
April 13, 2010 @ 4:35 am
Hi Vlad,
Thanks for the tutorial. I setup my environment exactly as you had specified in the tutorial. However when I run my project from eclipse (by selecting run on hadoop option), nothing happens and it fails silently. It doesn’t give any error. What could be the issue ?
Regards,
Vaibhav
princessayu Said,
April 13, 2010 @ 4:24 pm
Hi there
Nice tutorial…Help me lot for my assignment. Please can you tell me where is the link to your new tutorial with hadoop-0.20.0
Rim Moussa Said,
April 19, 2010 @ 3:13 am
excellent tutorial
please add the following imports to the last
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
kiwibird Said,
April 22, 2010 @ 11:35 am
Hello
Thank you for your excellent tutorial!However, at the last step, I cannot run hadoop project. I right click the TestDriver class and choose “Run on Hadoop”, but nothing happens–no window comes out, no info is shown in Console. And I just update my elipse to the latest version.
Please help me.
thanks and regards
xuesf Said,
May 5, 2010 @ 8:20 pm
Thanks for your hadoop on windows tutorial
I have the same problems as some people said
I just copy the code of WordCount.hava in hadoop-0.19.1,my eclipse is 3.3.2
So I hope you can help me
Thanks a lot
10/05/06 11:03:23 INFO mapred.FileInputFormat: Total input paths to process : 4
10/05/06 11:03:23 INFO mapred.JobClient: Running job: job_201005061033_0003
10/05/06 11:03:24 INFO mapred.JobClient: map 0% reduce 0%
10/05/06 11:03:29 INFO mapred.JobClient: Task Id : attempt_201005061033_0003_m_000006_0, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/05/06 11:03:33 INFO mapred.JobClient: Task Id : attempt_201005061033_0003_m_000006_1, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/05/06 11:03:37 INFO mapred.JobClient: Task Id : attempt_201005061033_0003_m_000006_2, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/05/06 11:03:46 INFO mapred.JobClient: Task Id : attempt_201005061033_0003_m_000005_0, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/05/06 11:03:50 INFO mapred.JobClient: Task Id : attempt_201005061033_0003_m_000005_1, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/05/06 11:03:55 INFO mapred.JobClient: Task Id : attempt_201005061033_0003_m_000005_2, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
Exception in thread “main” java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at WordCount.run(WordCount.java:134)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at WordCount.main(WordCount.java:140)
Jim Said,
May 9, 2010 @ 9:22 am
I followed all steps to Running Hadoop Project. I can not run a Hadoop Project. Once I clicked “Run as” -> “Run on Hadoop”, nothing happens, there is no output on Eclipse Console, and I am pretty sure one thread is running in background.
I am using Windows Vista, Java 6 (latest version for 32 bit). I started Eclipse from window. Everything is running under Cygwin.
How do I debug hadoop applicaiton in eclipse?
Jim
Senthil Said,
May 19, 2010 @ 8:04 am
Thanks for this tutorial.
I’ve small issue. In TestDriver.java, JobConf is deprecated. I am using Hadoop0.20.2,
JobClient client = new JobClient();
JobConf conf = new JobConf(TestDriver.class);
// TODO: specify output types
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
// TODO: specify input and output DIRECTORIES (not files)
//conf.setInputPath(new Path(“src”));
//conf.setOutputPath(new Path(“out”));
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(“In”));
FileOutputFormat.setOutputPath(conf, new Path(“Out”));
Which one I need to import to resolve Jobconf. I am getting error like, ” The setInputFormat in the type JobConf is not applicable for the arguments.Samething for setOutputFormat also. Kindly do the needful
Jo Said,
May 25, 2010 @ 8:43 pm
Hi,
i followed yout tutorial under the eclipse part and managed to setup the plugin and able to browse/access the dfs directory.
but i am unable to use the plugin to run jobs on hadoop. clicking “run on hadoop” does not seem to be doing anything… (i.e. there is no window to show me which hadoop server to choose).
plugin version: 0.20.2
eclipse version: 3.5.2 galileo
os: ubuntu 10.04 desktop 64bit
any thoughts?
Shivam Sharma Said,
June 1, 2010 @ 1:31 am
I configured Hadoop on window + cygwin according to your document. All my nodes and trackers are running fine. When I run the map reducer program from eclipse, its give me the following exception
10/06/01 13:58:59 INFO mapred.JobClient: Running job: job_201006011357_0002
10/06/01 13:59:00 INFO mapred.JobClient: map 0% reduce 0%
10/06/01 13:59:09 INFO mapred.JobClient: Task Id : attempt_201006011357_0002_m_000004_0, Status : FAILED
java.io.FileNotFoundException: File C:/cygwin/tmp/hadoop-ssharma1/mapred/local/taskTracker/jobcache/job_201006011357_0002/attempt_201006011357_0002_m_000004_0/work/tmp does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
In all the configuration files i have put the correct entries.
It would be your great effort if you would help me out for solving this problem.
Arun Said,
June 3, 2010 @ 12:00 am
Hi,
That is nice tutorial, Is there any update for latest version of hadoop-0.20.2. because the structure is bit different compare to the older version. what are things we need to change in code for eclipse/etc… ?
Thanks in advance!
Arun.
vlad Said,
June 3, 2010 @ 12:03 am
I am planning to post an updated tutorial soon.
Siddharth prasad Said,
June 16, 2010 @ 8:54 pm
Hi
it seems everything is set up cleanly on windows vista.. but when i run a job , a small word count problem ..
i get theis in my console
10/06/17 09:15:36 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
10/06/17 09:15:37 INFO mapred.FileInputFormat: Total input paths to process : 1
10/06/17 09:15:37 INFO mapred.JobClient: Running job: job_201006170907_0002
10/06/17 09:15:38 INFO mapred.JobClient: map 0% reduce 0%
but from here .. it just stucks and when i see the job state in eclipse it saying running, but when i type localhost:50030 in the haddop saying there is no running job.
i can’t under stand what is going wrong, will be glad if you can help me on this.
Thankyou
Siddharth prasad.
Jony Blues Said,
June 22, 2010 @ 8:44 pm
I am working on a standalone server through Putty and I got the namenode and secondarynamenode working without errors. Yet when running the command “hadoop jobtracker”, I have the following errors:
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
For that reason, I can not perform “fs -put” to insert the files into hdfs. I dont think diskspace is a problem because the file is really small (1MB), and I dont think it is DNS as I use the direct IP address and I remove the localhost to use the specific Namemode IP address. How can I overcome this problem? I have performed namenode -format many times with a delete of hadoop_tmp_dir, but I still see the problem.
Thanks for your help
vlad Said,
June 22, 2010 @ 8:50 pm
Jony,
Check the space on the datanodes, and make sure they are reachable from the namenode.
Bejoy Said,
June 24, 2010 @ 3:31 am
Hi,
I’m new into Hadoop. I found your guide so useful and interactive in helping me out for initial set up. But unfortunately i’m facing a challenge while configuring cygwin for Hadoop development.
I had genered the rsa key and when i give
ssh localhost
it is prompting me for
@localhost’s password
But i havent set any password before. Itied almost all options but none did work. Could you please help me out with the same.
Hussain Said,
July 9, 2010 @ 4:43 am
Hi Vlad,
Thank you for the tutorial. I was facing a problem in step 5/6. When I enter the explorer command my documents windows open up (Maybe its the home). I pasted the hadoop archive there and then as mentioned in step 6 I tried to unpack the archive, it said no such file or directory. I tried ls command and it came up empty as well. I ran the command
echo $path and the output was
/usr/local/bin:/usr/bin:/bin:/cygdrive/g/WINDOWS/system32:/cygdrive/g/WINDOWS:/c
ygdrive/g/WINDOWS/System32/Wbem:/cygdrive/c/MATLAB7/bin/win32:/cygdrive/c/cygwin
/bin:/cygdrive/c/cygwin/usr/bin
What can be the problem?
Sven Said,
July 19, 2010 @ 2:16 am
Thx for the nice tutorial, vlad!
I have the same problems like others with “java.io.FileNotFoundException: File C:/cygwin/tmp/hadoop- …” exception being thrown.
Has anyone solved this problem already?
Sven Said,
July 19, 2010 @ 5:25 am
I found out, what works for me:
1) In ecelipse: open “localhost” location in “map/reduce” locations. Open advanced tab. Set “mapred.child.tmp” to /tmp/hadoop-/mapred/mapred.child.tmp
2) Use follwoing text as TestDriver:
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class TestDriver {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
Job job = new Job(conf, “hadoop test”);
job.setJarByClass(TestDriver.class);
job.setMapperClass(Mapper.class);
job.setCombinerClass(Reducer.class);
job.setReducerClass(Reducer.class);
job.setMapOutputKeyClass(LongWritable.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(“In”));
FileOutputFormat.setOutputPath(job, new Path(“Out”));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Sven Said,
July 19, 2010 @ 5:26 am
P.S.: I use 0.20.2 on eclipse europa (newer didn’t work)!
vlad Said,
July 19, 2010 @ 5:29 am
Hmm, did you add your rsa key to authorized_keys file as described in the tutorial?
Guohua Liu Said,
August 18, 2010 @ 8:15 pm
It is a good tutorial, but In the last step,why can’t bring up the windows “Run on Hapdoop” and select a Hadoop location to run on when I click “run as”->”run on hadoop”, so can’n see console output similar to your tutorial.Thank you for you asking!
vlad Said,
August 19, 2010 @ 2:00 pm
What is the version of eclipse you are running? The eclipse plugin only works with the version specified in the tutorial, it is not compatible with newer versions of eclipse. I am working on the upgrade to the plugin, but it is not available now.