Hadoop tutorial for Windows and Eclipse.
Just posted a tutorial on how to configure hadoop environment for Windows using CYGWIN. The tutorial explains how to set-up a hadoop cluster in the pseudo distributed mode and how to get it working with the Eclipse.
If you have any questions / comments / suggestions about this tutorial post them here.

Ben Said,
March 29, 2009 @ 10:19 am
Thanks for your excellent tutorial! I followed it this weekend and was able to get mostly up and running.
One question I had was how to use it with EC2 — I set up on EC2 rather than on localhost, and I’m wondering what I need to do in order to make it run… getting weird unknown host errors when I run, despite having set up a proxy server.
Thanks for the very helpful tutorial!
Ben
vlad Said,
March 29, 2009 @ 11:14 am
No problem.
Setting hadoop right on EC2 could be tricky. I am going to post another tutorial about it in a few weeks.
Rez Said,
March 31, 2009 @ 5:17 pm
Hey, this page on your tutorial (Unpacking Hadoop)
http://v-lad.org/Tutorials/Hadoop/09%20-%20unpack%20hadoop.html
is not working.
vlad Said,
April 9, 2009 @ 8:30 am
Strange. Works for me, can’t see what the problem is. Does anybody else have this problem?
Jeff Said,
April 9, 2009 @ 2:32 pm
Thanks for the tutorial… it would have saved me a few hours of frustration.
Have you tried it with other versions of Eclipse. The main distribution is 3.4 (Ganymede), which will shortly be 3.5 in May.
vlad Said,
April 9, 2009 @ 10:09 pm
Jeff,
I tried with the other version of eclipse and it doesn’t work with 3.4 and probably won’t work with 3.5 until somebody fixes the hadoop plugin, because plug-in API has been changed for new versions of eclipse. You can use the plug-in with 3.4 to browse for the HDFS, but you won’t be able to start the project.
Joseph Said,
April 15, 2009 @ 12:09 am
Vlad,
thanks for the well documented tutorial. it is good work..
Towards the last step i got following error
09/04/15 15:00:33 INFO mapred.JobClient: Task Id : attempt_200904151224_0004_m_000000_2, Status : FAILED
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
kindly advice for some clue..
my code is as follows:
// TODO: specify input and output DIRECTORIES (not files)
//conf.setInputPath(new Path(“src”));
//conf.setOutputPath(new Path(“out”));
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(“In”));
FileOutputFormat.setOutputPath(conf, new Path(“Out3″));
thanks and regards
Joseph
vlad Said,
April 15, 2009 @ 12:00 pm
The error you getting is actually correct. The Mappers / Reducers generated by the plug-in need some tweaking. I will post another tutorial regarding sometime in May.
ash Said,
April 16, 2009 @ 11:40 pm
Hi Vlad,
thanks for the excellent turorial.. in the last step when i try to run the TestDriver class i get this error.
Pls help…
>>>>>>>>> START >>>>>>>
09/04/17 11:58:39 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
09/04/17 11:58:40 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1280)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
at org.apache.hadoop.ipc.Client.call(Client.java:697)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy0.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
09/04/17 11:58:40 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar retries left 4
09/04/17 11:58:40 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1280)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
at org.apache.hadoop.ipc.Client.call(Client.java:697)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy0.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
09/04/17 11:58:40 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar retries left 3
09/04/17 11:58:41 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1280)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
at org.apache.hadoop.ipc.Client.call(Client.java:697)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy0.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
09/04/17 11:58:41 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar retries left 2
09/04/17 11:58:42 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1280)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
at org.apache.hadoop.ipc.Client.call(Client.java:697)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy0.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
09/04/17 11:58:42 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar retries left 1
09/04/17 11:58:46 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1280)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
at org.apache.hadoop.ipc.Client.call(Client.java:697)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy0.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
09/04/17 11:58:46 WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null
09/04/17 11:58:46 WARN hdfs.DFSClient: Could not get block locations. Source file “/tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar” – Aborting…
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1280)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
at org.apache.hadoop.ipc.Client.call(Client.java:697)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy0.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
>>>>>> END >>>>>
vlad Said,
April 17, 2009 @ 7:24 am
I seen this error before. Usually it is caused by having not enough space on your workstation. Try to clean up some space and recreate HDFS. Also check for error messages in DataNode and NameNode windows.
tony Said,
April 19, 2009 @ 12:18 am
hi, i followed what you showed and also that quick start on the official apache website.
There is a big problem when i execute the command :
“bin/hadoop namenode -format”
it shows that the bin/hadoop ,the “hadoop” script, contains certain errors.
While i installed it on linux in VM, that was ok.
how can i run this hadoop script in cygwin correctly?
thanks
vlad Said,
April 20, 2009 @ 2:59 pm
Could you post the error message that you are getting?
Wen-Han Said,
April 22, 2009 @ 1:26 pm
Hi VLAD,
May I know how recent is your tutorial? Is it updated to the most recent versions of hadoop and eclipse?
Thank you,
Wen-Han
vlad Said,
April 22, 2009 @ 4:16 pm
The tutorial was written in April using the most recent version of the hadoop 0.19.1. As for eclipse the newest version of the Eclipse ( Ganymede ) is not compatible with the Hadoop plug-in that is supplied with version 0.19.1, so you have to use previous version of the eclipse ( Europa ).
I saw that the new version of the Hadoop 0.20 came out, so I will take a look at what have changed and update the tutorial if needed.
Saurabh Said,
April 23, 2009 @ 5:59 am
Hi vlad tutorial is good
I am setting it on my Mandriva Machine &whenever i run
ssh localhost
I get::
[abc@localhost .ssh]$ ssh localhost
ssh: connect to host localhost port 22: Connection refused
Please Help me
vlad Said,
April 23, 2009 @ 6:23 am
Hmm,
This tutorial is done for windows machines. To resolve your problem check that you have sshd installed and running. Also check that you don’t have firewall blocking port 22.
Sid Said,
April 25, 2009 @ 12:38 pm
Hi I am working on the hadoop eclipse in Linux everything was working fine when one day hadoop started to ignore any code changes I did in my project. Instead it just ran an old copy of the code from somewhere. Looking at the mapred.local folder where the temporary source files are jared together to run the job the source code was indeed changed… i created another dummy project in eclipse and ran it and it ran just fine, changes were reflected every time… What could be the problem?
vlad Said,
April 25, 2009 @ 6:20 pm
Sorry man never seen that happen. Maybe somebody else on this board will comment.
Joe Said,
May 1, 2009 @ 4:44 am
Vlad,
Thank you so much for this tutorial. I am having a problem when running : bin/hadoop namenode format
First it said “JAVA_HOME not set”, so I set my windows environment variable to the correct path, which is c:\program files\Java\jdk1.6.0_06
Then I closed and re-opened cygwin, and tried again. This time it appeard to work, but the first line of the output was “bin/hadoop: line 234: C:\Program: command not found”. The rest of the output looked like your screenshot. Is this normal?
Thanks,
Joe
Wen-Han Said,
May 1, 2009 @ 11:12 am
Hi vlad,
thanks for your reply for last one. I configure Eclipse Europa according to Yahoo tutorial on hadoop:
http://public.yahoo.com/gogate/hadoop-tutorial/html/module3.html
and in the instruction it goes about creating new DFS Location:
“…..Next, click on the “Advanced” tab. There are two settings here which must be changed.
Scroll down to hadoop.job.ugi. It contains your current Windows login credentials. Highlight the first comma-separated value in this list (your username) and replace it with hadoop-user.”
I can’t find this attribute(hadoop.job.ugi) in the advance list from “Define Hadoop location” on Eclipse. Do you have an idea?
Thank you, fast reply will be much appreciated.
Wen-Han
Wen-Han Said,
May 1, 2009 @ 11:15 am
PS., The yahoo tutorial on Hadoop have the hadoop installed on VM ware, not in localhost by cygwin.
Thanks,
sneha Said,
May 2, 2009 @ 8:52 am
hello!!
thank u 4 d good hadoop tutorial… i am setting up a hadoop cluster of 4 systems…when i run bin/start-dfs.sh command i get an error as error:JAVA_HOME NOT set .. can u plz let me know d solution n also can u let me know how to set java home path in .bash_profile in cygwin promt
thank you!!!!!!!!!1
Muhammad Mudassar Said,
May 5, 2009 @ 11:36 pm
Hi
Tutorial is helping one. I want to know about that how to upload some images or some structured data on HDFS by using cygwin, eclipse, in windows.
One more thing that after restart of my pc while working with hadoop it was not working well but then I restarted the CYGWIN sshd service it started again well. I want to know that after every time restarting the pc the service also has to be restarted?
Thanks.
vlad Said,
May 8, 2009 @ 7:48 am
First you have to ask yourself a question, what are you planning to do with your data. Depending on the answer you could use the hdfs cp command or use HBase.
Note that if you are planning to use binary data you might have to write your own record readers.
vlad Said,
May 8, 2009 @ 7:51 am
As for your second comment. Make sure that in the Services window your sshd service is set to start automatically.
vlad Said,
May 8, 2009 @ 8:01 am
bin/start_dfs.sh script won’t work in the environment described in this tutorial, to start DFS services refer to section 10 of the tutorial. On the additional machines you have to start only data node and task tracker processes.
Remember that on the worker machines you have to edit the hadoop-site file to configure the name of your namenode machine instead of localhost. Also make sure all necessary firewall ports are open.
vlad Said,
May 8, 2009 @ 8:02 am
That’s right. But this way you will incur the penalties of running another operating system, and it is tricky to debug processes in vmware.
vlad Said,
May 8, 2009 @ 8:04 am
Not sure, what could be causing this. Check the dates on the files.
vlad Said,
May 8, 2009 @ 8:05 am
It’s the problem with the scripts. Try setting up your JDK in the directory that doesn’t have a space. I use C:\Java\JDK1.6 for that.
Kim Said,
May 13, 2009 @ 2:55 pm
This tutorial is great. Hadoop is running perfectly in VM (windows xp).
Just one question.
Is there any way that I can use “start-all.sh”, instead of initiating “hadoop namenode”, “hadoop jobtracker”, …. in multiple cygwin windows?
Thank you again, for your all efforts.
vlad Said,
May 13, 2009 @ 9:23 pm
Not in Windows XP. The hadoop start scripts are written for Linux machines and for debugging purposes it is just easier to run each of the hadoop components in its own window.
Mayank Said,
May 21, 2009 @ 4:35 am
Hi vlad, the tutorial is great.
Currently I am facing problem in upload data step, in my eclipse i get localhost->2->error I am unable to see the user and “In” folder and so on…please suggest me what to do now..
Charitha Said,
May 28, 2009 @ 2:12 am
error in eclipse europa while running a TestDriver.java….
please advise me. help will be appriciated..
09/05/28 14:40:01 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:9100/user/charitha/Out already exists
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:111)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:793)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142)
at TestDriver.main(TestDriver.java:41)
Regards,
Charitha Reddy.
vlad Said,
May 28, 2009 @ 8:27 am
Looks like it is a second time you are trying to run the project. Every time you run the project it creates “Out” directory to store the output. You have to delete that directory before you run your project or change the code to create a new directory every time you run. Look at the hadoop examples to see how to do the later.
vlad Said,
May 28, 2009 @ 8:36 am
Do you see any activity in the cygwin windows when you are trying to connect. Could be the firewall blocking incoming ports.
Use the following command from the command window and let me know what do you get, note that you have to have hadoop started.
telnet localhost 9100
Joseph Said,
May 28, 2009 @ 8:59 pm
Vlad
would like to know whether you have some update on the following
>>snip>>
The error you getting is actually correct. The Mappers / Reducers generated by the plug-in need some tweaking. I will post another tutorial regarding sometime in May.
vlad – April 15th, 2009 at 12:00 pm
>>end of snip>>
vlad Said,
May 28, 2009 @ 9:22 pm
Sorry, been really busy lately.
Martinus Said,
June 6, 2009 @ 7:48 am
Hello Vlad,
Thanks for the Tutorial. I still have Problem with compiling the TestDriver class. After I compile the class, I got Error message from Eclipse:
09/06/06 16:44:03 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
09/06/06 16:44:03 INFO mapred.FileInputFormat: Total input paths to process : 4
09/06/06 16:44:04 INFO mapred.JobClient: Running job: job_200906061639_0001
09/06/06 16:44:05 INFO mapred.JobClient: map 0% reduce 0%
09/06/06 16:44:14 INFO mapred.JobClient: Task Id : attempt_200906061639_0001_m_000000_0, Status : FAILED
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
09/06/06 16:44:18 INFO mapred.JobClient: Task Id : attempt_200906061639_0001_m_000000_1, Status : FAILED
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
09/06/06 16:44:22 INFO mapred.JobClient: Task Id : attempt_200906061639_0001_m_000000_2, Status : FAILED
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at TestDriver.main(TestDriver.java:40)
I have no idea, I use all Programs, you wrote in tutorial (eclipse 3.3.2, hadoop 1.9.1, etc).
Thanks
Martinus
Carspar Said,
June 9, 2009 @ 12:49 am
Hi vlad, the tutorial is great.
I followed your tutorial and met a probelm in step:11 – Setup Hadoop Location in Eclipse.
At the step 6, In the Project Explorer tab on the left hand side of the Eclipse window, find the DFS Locations item. Open it using the “+” icon on its left. Inside, you should see the localhost location reference with the blue elephant icon. Keep opening the items below it until you see something like the image below.
I used the “+” icon on the left. Inside, it is a folder with empty name like your image. When I keep opening, the following folder is not a “tmp(1)”, but a “Error: null”.
thanks,
Carspar Said,
June 9, 2009 @ 1:41 am
I solved the problem. It is because I did not set the environment variable of cygwin rightly.
Thanks,
kerenann Said,
July 22, 2009 @ 12:37 am
Hello,vlad,your tutorial is very helpful.
Only one problem in step:11-Setup Hadoop Location in Eclipse.
At the step 6, in the project explorer tab on the left side of the eclipse window, i have found the DFS location. clink the “+” icon. There has a folder named (1). When i keep opening, the following folder is not “tmp(1)”, but a “Error:call to localhost/127.0.0.1:9000 failed on connection exception:java.net.ConnectException: Connection refused: no further information”.
I think my environment variable of cygwin is right.
so, I don’t know what’s wrong with it?
thanks
Wylie van den Akker Said,
July 27, 2009 @ 11:07 am
Just thought I would mention for hadoop-0.20.0+ under cygwin you also need to install rsynch (under the “NET” section) for filesystem replication to work. Additionally the xml configuration is split up into 3 different files. Details on that can be found here: http://hadoop.apache.org/common/docs/r0.20.0/cluster_setup.html
Cheers,
Wylie
Collective Medical Technologies
http://www.collectivemedicaltech.com
vlad Said,
August 5, 2009 @ 6:05 am
Check if your cluster is running. [ No error messages in the command windows]. Also check if you have firewall installed that might be preventing the connections.
Arun Jamwal Said,
August 7, 2009 @ 4:55 pm
To get rid of
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
Change the following lines in TestDriver.java as
//conf.setOutputKeyClass(Text.class);
//conf.setOutputValueClass(IntWritable.class);
conf.setOutputKeyClass(LongWritable.class);
conf.setOutputValueClass(Text.class);
HTH,
Arun Jamwal
richilee Said,
August 22, 2009 @ 1:08 pm
for those who have the “bin/hadoop: line 234: C:\Program: command not found” problem. This is caused by the the whitespace between “Program Files”. In other words. if your JAVA_HOME is “c:\Program Files\java”, there is a whitespace between “Program and Files”. So one way to solve the problem is put your jdk in a different folder. I put my jdk in c:\java\jdk . then everything works pretty well. hope it helps.
Charanjeet Said,
September 16, 2009 @ 3:49 am
Hi All,
I was using the article for installing the hadoop.
While running the command
$ bin/hadoop namenode -format
I found that there are errors because the installed JDK was in ‘C:\Prpgram Files’ and the command was reffering it through environment veriable JAVA_HOME since there is space in ‘Program’ and ‘Files’ it was dying.
I resolved it by creating a cymbolic link as
$ln -s /cygdrive/c/Program Files/java/jdk1.6.0_02 /java
inside ‘/’ folder through cygwin and made an entry in <>/conf/hadoop-env.sh like
‘export JAVA_HOME=/java’
Regards
Charanjeet singh
Senior Engineer
Impetus infotech India Pvt. Ltd.
Ken Church Said,
September 20, 2009 @ 1:46 pm
Extremely useful. I’m thinking of pointing a bunch of students at this. One detail: the tutorial has some stale links to hadoop-0.19.1 (as well as a number of references to that elsewhere in the text). It would be good to write the tutorial in such a way that the text doesn’t need to be updated with each new version.
Deng Wanyu Said,
September 30, 2009 @ 1:17 am
Hi:
it is very helpful for me!
my problem is:
I upload the txt file by command, but I find the uploaded file is empty. why?
Azuryy Said,
October 14, 2009 @ 6:50 am
If I don’t open five seperate Cygwin windows, instead, I run start-all.sh, I got: Could not obtail block error.
but I open five seperated Cygwin windows as said in the tuorial, it does work.
Azuryy Said,
October 14, 2009 @ 7:06 pm
My Found:
If you want to run start-all.sh, instead open five seperated Cygwin windows as this toturial said, please do
hadoop fs -put before you run start-all.sh, if not, you will get “Could not obtail block” error when you run your job.
sam Said,
October 22, 2009 @ 4:08 pm
i get this error when i open the mapreduce perspective in eclipse and i dont see the file after localhost->1 in dsf locations the below errors was in the namenode window
lVersion(org.apache.hadoop.dfs.ClientProtocol, 35) from 127.0.0.1:3282: error: j
ava.io.IOException: Unknown protocol to name node: org.apache.hadoop.dfs.ClientP
rotocol
java.io.IOException: Unknown protocol to name node: org.apache.hadoop.dfs.Client
Protocol
at org.apache.hadoop.hdfs.server.namenode.NameNode.getProtocolVersion(Na
meNode.java:98)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
09/10/22 15:58:08 INFO ipc.Server: IPC Server handler 4 on 9100, call getProtoco
lVersion(org.apache.hadoop.dfs.ClientProtocol, 35) from 127.0.0.1:3282: error: j
ava.io.IOException: Unknown protocol to name node: org.apache.hadoop.dfs.ClientP
rotocol
java.io.IOException: Unknown protocol to name node: org.apache.hadoop.dfs.Client
Protocol
at org.apache.hadoop.hdfs.server.namenode.NameNode.getProtocolVersion(Na
meNode.java:98)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
Ravi Said,
October 25, 2009 @ 10:44 am
Hi there, your tutorial is excellent. Very good job and I dont say that often.
So I was trying to setup hbase using your hadoop tutorial. I was able to follow up to step 12 but when I try to execute
$bin/hbase namenode -format
: No such file or directory
bin/hbase: line 45: $’\r’: command not found
Can you tell me what am I missing?
Thanks
Ravi Said,
October 25, 2009 @ 12:10 pm
well after a few internet searches and 1 hour later, I am able to execute it, but now I get this error:
$ bin/hbase namenode -format
Exception in thread “main” java.lang.NoClassDefFoundError: namenode
Sharad Said,
October 29, 2009 @ 4:14 am
Is there an elegant way to stop dfs? Stopping using Ctrl-C seems to corrupt it and bin/hadoop/stop-dfs.sh don’t seem to work (some error message like localhost: cat: cannot open file /dev/fs/C/tmp/hadoop-sk-secondarynamenode.pid : No such file or directory)
Thanks!
vlad Said,
October 29, 2009 @ 8:30 am
It should be bin/hdfs not bin/hbase
vlad Said,
October 29, 2009 @ 8:31 am
Not sure. Never had the problem with corruption.
steve Said,
November 2, 2009 @ 12:01 pm
Great tutorial!
I’ve almost got this working, but I’m having trouble connecting to localhost with ssh.
If I do:
ssh localhost -v
the last two lines are:
Offering public key: /home/user.name/.ssh/id_rsa
Connection closed by xxx.x.x.x
Any ideas what is going on?
I also had to manually add ssh_server to administrators and change the password in order to get the sshd service to run.
-Steve
RezaMor Said,
November 9, 2009 @ 8:05 pm
Thanks for your excellent tutorial! However, in the last
step I got the following error, and I mentioned that two others wrote the same Error as comment for you.
Would you please answer Me.
09/11/10 12:53:01 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
09/11/10 12:53:01 INFO mapred.FileInputFormat: Total input paths to process : 4
09/11/10 12:53:02 INFO mapred.JobClient: Running job: job_200911101209_0003
09/11/10 12:53:03 INFO mapred.JobClient: map 0% reduce 0%
09/11/10 12:53:13 INFO mapred.JobClient: Task Id : attempt_200911101209_0003_m_000000_0, Status : FAILED
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
09/11/10 12:53:17 INFO mapred.JobClient: Task Id : attempt_200911101209_0003_m_000000_1, Status : FAILED
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
09/11/10 12:53:22 INFO mapred.JobClient: Task Id : attempt_200911101209_0003_m_000000_2, Status : FAILED
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at TestDriver.main(TestDriver.java:41)
vlad Said,
November 18, 2009 @ 12:38 pm
The reason you are getting this error is that API has been changed since hadoop version .17. And the code generated by eclipse needs some tweaking.
Rill Said,
November 24, 2009 @ 7:58 pm
I got a problem in eclipse plugin.
—————————————————-
Cannot connect to the Map/Reduce location:localhost.
Failed to get the current user’s information.
—————————————————-
user of my windows need password to login.
Please help me~, thank you!
Jason Venner Said,
January 1, 2010 @ 10:18 am
The prohadoop website has a lot of information on Hadoop and Hadoop setup as well as a good community of people to ask and answer questions with.
This particular error java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
is because the input format for your job is TextInputFormat, rather than KeyValueTextInputFormat
TextInputFormat provides a LongWritable as a key, which is the input line number, and a Text as the value, which is the input line data.
KeyValueTextInputFormat provides a Text key, that portion of the input line up to the first TAB character, and a Text value that portion of the input line after the first TAB character.
Alternatively you can modify the definition of your Map class to accept a LongWritable as the input key type.
Swetha Said,
January 4, 2010 @ 1:17 am
hello!
When I run the code I get the below error. I understand there is some change in the path where the job cache files are created; but I don’t know how to change it. Any clue??
Thanks in advance.
INFO mapred.JobClient: Task Id : attempt_201001041128_0006_m_000006_1, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-MBS/mapred/local/taskTracker/jobcache/job_201001041128_0006/attempt_201001041128_0006_m_000006_1/work/tmp does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
Abhishek Said,
February 27, 2010 @ 2:15 pm
Hi,
In step 5 when I hit the command explorer as shown in the tutorial I get an error
abc@ZULFI ~
$explorer
-bash: explorer: command not found
Anybody any ideas ??
vlad Said,
February 27, 2010 @ 5:25 pm
Hi,
What is your system? Is it Windows XP?
Also, type this command:
echo $PATH
and post the results here
vlad Said,
February 27, 2010 @ 9:06 pm
Abishek,
Either your system is something not standard or your PATH variable is not set-up right. Type this command in the cygwin window and post the output here:
echo $PATH
Vlad
Keith Said,
March 2, 2010 @ 2:32 pm
Everything works great, except…
The Run As menu offers “On Hadoop”, but the Debug As menu does not. Obviously, the Run As options don’t trigger break points or otherwise offer debugging capability.
So, how do I debug?
Thanks.
Iris Said,
March 3, 2010 @ 10:39 am
vlad,
Thank you for the excellent tutouial.
I have a problem in the last step, after running the code, it showed the error below:
10/03/04 01:06:00 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
10/03/04 01:06:01 INFO mapred.FileInputFormat: Total input paths to process : 4
10/03/04 01:06:01 INFO mapred.JobClient: Running job: job_201003040054_0001
10/03/04 01:06:02 INFO mapred.JobClient: map 0% reduce 0%
10/03/04 01:06:11 INFO mapred.JobClient: Task Id : attempt_201003040054_0001_m_000006_0, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/03/04 01:06:15 INFO mapred.JobClient: Task Id : attempt_201003040054_0001_m_000006_1, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/03/04 01:06:20 INFO mapred.JobClient: Task Id : attempt_201003040054_0001_m_000006_2, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/03/04 01:06:29 INFO mapred.JobClient: Task Id : attempt_201003040054_0001_m_000005_0, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/03/04 01:06:33 INFO mapred.JobClient: Task Id : attempt_201003040054_0001_m_000005_1, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/03/04 01:06:37 INFO mapred.JobClient: Task Id : attempt_201003040054_0001_m_000005_2, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at TestDriver.main(TestDriver.java:43)
I have no idea about what’s wrong with it.
Please help me!
Thank you in advance!
iris Said,
March 4, 2010 @ 4:06 am
vlad,
Thank you for you excellent tutorial, however I have the error when running the last step, the output error as below,
10/03/04 18:59:57 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
10/03/04 18:59:58 INFO mapred.FileInputFormat: Total input paths to process : 4
10/03/04 18:59:59 INFO mapred.JobClient: Running job: job_201003041848_0001
10/03/04 19:00:00 INFO mapred.JobClient: map 0% reduce 0%
10/03/04 19:00:14 INFO mapred.JobClient: Task Id : attempt_201003041848_0001_m_000006_0, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/03/04 19:00:18 INFO mapred.JobClient: Task Id : attempt_201003041848_0001_m_000006_1, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/03/04 19:00:24 INFO mapred.JobClient: Task Id : attempt_201003041848_0001_m_000006_2, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/03/04 19:00:33 INFO mapred.JobClient: Task Id : attempt_201003041848_0001_m_000005_0, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/03/04 19:00:37 INFO mapred.JobClient: Task Id : attempt_201003041848_0001_m_000005_1, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/03/04 19:00:43 INFO mapred.JobClient: Task Id : attempt_201003041848_0001_m_000005_2, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at TestDriver.main(TestDriver.java:44)
I have no idea about it, please help me!
Thank you in advance.
song Said,
March 7, 2010 @ 5:29 am
Thanks for your excellent tutorial! However, in step 9 setup Hadoop plugin ,I followed it but when I try to execture ,I didn’t find map/reduce in “open perspective”,why?
Thanks!
euqinoxia Said,
March 19, 2010 @ 1:43 am
hi,Vlad,
thanks for excellent tutorial.
Towards the last step i got following error:
10/03/19 15:30:23 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
10/03/19 15:30:23 INFO mapred.FileInputFormat: Total input paths to process : 4
10/03/19 15:30:24 INFO mapred.JobClient: Running job: job_201003191529_0002
10/03/19 15:30:25 INFO mapred.JobClient: map 0% reduce 0%
10/03/19 15:30:31 INFO mapred.JobClient: Task Id : attempt_201003191529_0002_m_000006_0, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/03/19 15:30:35 INFO mapred.JobClient: Task Id : attempt_201003191529_0002_m_000006_1, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/03/19 15:30:39 INFO mapred.JobClient: Task Id : attempt_201003191529_0002_m_000006_2, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/03/19 15:30:48 INFO mapred.JobClient: Task Id : attempt_201003191529_0002_m_000005_0, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/03/19 15:30:52 INFO mapred.JobClient: Task Id : attempt_201003191529_0002_m_000005_1, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/03/19 15:30:56 INFO mapred.JobClient: Task Id : attempt_201003191529_0002_m_000005_2, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at TestDriver.main(TestDriver.java:41)
thanks and regards
equinoxia
vlad Said,
March 19, 2010 @ 5:43 am
Iris,
You get this error because your Map task is failing. Could you post your mapper code here.
Vlad
chefc17 Said,
March 20, 2010 @ 4:33 am
hi,Vlad,
thanks for excellent tutorial.
i used eclipse galileo
i have a problem at “setup hadoop location”
in Project Explorer /DFS Locations / localhost
it’s empty “(0)
rananjay Said,
March 26, 2010 @ 4:43 am
Hi
thanks for this nice tutorial.
it is really a good work and i have no words to describe your effort.
I follow every steps of this tutorial.
But later while running Criver Class file I am getting these error :-
10/03/26 17:07:36 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
10/03/26 17:07:36 INFO mapred.FileInputFormat: Total input paths to process : 4
10/03/26 17:07:36 INFO mapred.JobClient: Running job: job_201003261700_0002
10/03/26 17:07:37 INFO mapred.JobClient: map 0% reduce 0%
10/03/26 17:07:45 INFO mapred.JobClient: Task Id : attempt_201003261700_0002_m_000006_0, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-RananjayS/mapred/local/taskTracker/jobcache/job_201003261700_0002/attempt_201003261700_0002_m_000006_0/work/tmp does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:244)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:520)
at org.apache.hadoop.mapred.Child.main(Child.java:143)
10/03/26 17:07:50 INFO mapred.JobClient: Task Id : attempt_201003261700_0002_m_000006_1, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-RananjayS/mapred/local/taskTracker/jobcache/job_201003261700_0002/attempt_201003261700_0002_m_000006_1/work/tmp does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:244)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:520)
at org.apache.hadoop.mapred.Child.main(Child.java:143)
10/03/26 17:07:55 INFO mapred.JobClient: Task Id : attempt_201003261700_0002_m_000006_2, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-RananjayS/mapred/local/taskTracker/jobcache/job_201003261700_0002/attempt_201003261700_0002_m_000006_2/work/tmp does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:244)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:520)
at org.apache.hadoop.mapred.Child.main(Child.java:143)
10/03/26 17:08:06 INFO mapred.JobClient: Task Id : attempt_201003261700_0002_m_000005_0, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-RananjayS/mapred/local/taskTracker/jobcache/job_201003261700_0002/attempt_201003261700_0002_m_000005_0/work/tmp does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:244)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:520)
at org.apache.hadoop.mapred.Child.main(Child.java:143)
10/03/26 17:08:12 INFO mapred.JobClient: Task Id : attempt_201003261700_0002_m_000005_1, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-RananjayS/mapred/local/taskTracker/jobcache/job_201003261700_0002/attempt_201003261700_0002_m_000005_1/work/tmp does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:244)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:520)
at org.apache.hadoop.mapred.Child.main(Child.java:143)
10/03/26 17:08:19 INFO mapred.JobClient: Task Id : attempt_201003261700_0002_m_000005_2, Status : FAILED
java.io.FileNotFoundException: File C:/tmp/hadoop-RananjayS/mapred/local/taskTracker/jobcache/job_201003261700_0002/attempt_201003261700_0002_m_000005_2/work/tmp does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:244)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:520)
at org.apache.hadoop.mapred.Child.main(Child.java:143)
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at FirstDriver.main(FirstDriver.java:42)
Manish Said,
April 1, 2010 @ 2:37 am
Hi,Vlad,
Thanks for such a excellent tutorial on Hadoop configuration on Window.
I have followed each steps in tutorial. Every steps went fine, but execution of the program is giving me troble. Following is the problem message on console,
10/04/01 15:05:24 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
10/04/01 15:05:25 INFO mapred.FileInputFormat: Total input paths to process : 4
10/04/01 15:05:27 INFO mapred.JobClient: Running job: job_201004011443_0001
10/04/01 15:05:30 INFO mapred.JobClient: map 0% reduce 0%
10/04/01 15:05:41 INFO mapred.JobClient: Task Id : attempt_201004011443_0001_m_000006_0, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/04/01 15:05:45 INFO mapred.JobClient: Task Id : attempt_201004011443_0001_m_000006_1, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/04/01 15:05:50 INFO mapred.JobClient: Task Id : attempt_201004011443_0001_m_000006_2, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/04/01 15:05:58 INFO mapred.JobClient: Task Id : attempt_201004011443_0001_m_000005_0, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/04/01 15:06:04 INFO mapred.JobClient: Task Id : attempt_201004011443_0001_m_000005_1, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/04/01 15:06:08 INFO mapred.JobClient: Task Id : attempt_201004011443_0001_m_000005_2, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at TestDriver.main(TestDriver.java:37)
Please let me know what could have gone wrong.
Thanks & R
jamo Said,
April 8, 2010 @ 8:18 am
I get “almost” all the way through the tutorial using the 19.2 version, but when running TestDriver, it throws several FileNotFound exceptions as in Swetha’s Jan 4 2010 post above. I tried changing the mapred.job.tracker setting to c:/cygwin/tmp, and restarting the jobtracker, but this didn’t change the error. Any idea what parameter needs to be changed?
thx,
jamo
Vaibhav Said,
April 13, 2010 @ 4:35 am
Hi Vlad,
Thanks for the tutorial. I setup my environment exactly as you had specified in the tutorial. However when I run my project from eclipse (by selecting run on hadoop option), nothing happens and it fails silently. It doesn’t give any error. What could be the issue ?
Regards,
Vaibhav
princessayu Said,
April 13, 2010 @ 4:24 pm
Hi there
Nice tutorial…Help me lot for my assignment. Please can you tell me where is the link to your new tutorial with hadoop-0.20.0
Rim Moussa Said,
April 19, 2010 @ 3:13 am
excellent tutorial
please add the following imports to the last
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
kiwibird Said,
April 22, 2010 @ 11:35 am
Hello
Thank you for your excellent tutorial!However, at the last step, I cannot run hadoop project. I right click the TestDriver class and choose “Run on Hadoop”, but nothing happens–no window comes out, no info is shown in Console. And I just update my elipse to the latest version.
Please help me.
thanks and regards
xuesf Said,
May 5, 2010 @ 8:20 pm
Thanks for your hadoop on windows tutorial
I have the same problems as some people said
I just copy the code of WordCount.hava in hadoop-0.19.1,my eclipse is 3.3.2
So I hope you can help me
Thanks a lot
10/05/06 11:03:23 INFO mapred.FileInputFormat: Total input paths to process : 4
10/05/06 11:03:23 INFO mapred.JobClient: Running job: job_201005061033_0003
10/05/06 11:03:24 INFO mapred.JobClient: map 0% reduce 0%
10/05/06 11:03:29 INFO mapred.JobClient: Task Id : attempt_201005061033_0003_m_000006_0, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/05/06 11:03:33 INFO mapred.JobClient: Task Id : attempt_201005061033_0003_m_000006_1, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/05/06 11:03:37 INFO mapred.JobClient: Task Id : attempt_201005061033_0003_m_000006_2, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/05/06 11:03:46 INFO mapred.JobClient: Task Id : attempt_201005061033_0003_m_000005_0, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/05/06 11:03:50 INFO mapred.JobClient: Task Id : attempt_201005061033_0003_m_000005_1, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
10/05/06 11:03:55 INFO mapred.JobClient: Task Id : attempt_201005061033_0003_m_000005_2, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
Exception in thread “main” java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at WordCount.run(WordCount.java:134)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at WordCount.main(WordCount.java:140)
Jim Said,
May 9, 2010 @ 9:22 am
I followed all steps to Running Hadoop Project. I can not run a Hadoop Project. Once I clicked “Run as” -> “Run on Hadoop”, nothing happens, there is no output on Eclipse Console, and I am pretty sure one thread is running in background.
I am using Windows Vista, Java 6 (latest version for 32 bit). I started Eclipse from window. Everything is running under Cygwin.
How do I debug hadoop applicaiton in eclipse?
Jim
Senthil Said,
May 19, 2010 @ 8:04 am
Thanks for this tutorial.
I’ve small issue. In TestDriver.java, JobConf is deprecated. I am using Hadoop0.20.2,
JobClient client = new JobClient();
JobConf conf = new JobConf(TestDriver.class);
// TODO: specify output types
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
// TODO: specify input and output DIRECTORIES (not files)
//conf.setInputPath(new Path(“src”));
//conf.setOutputPath(new Path(“out”));
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(“In”));
FileOutputFormat.setOutputPath(conf, new Path(“Out”));
Which one I need to import to resolve Jobconf. I am getting error like, ” The setInputFormat in the type JobConf is not applicable for the arguments.Samething for setOutputFormat also. Kindly do the needful
Jo Said,
May 25, 2010 @ 8:43 pm
Hi,
i followed yout tutorial under the eclipse part and managed to setup the plugin and able to browse/access the dfs directory.
but i am unable to use the plugin to run jobs on hadoop. clicking “run on hadoop” does not seem to be doing anything… (i.e. there is no window to show me which hadoop server to choose).
plugin version: 0.20.2
eclipse version: 3.5.2 galileo
os: ubuntu 10.04 desktop 64bit
any thoughts?
Shivam Sharma Said,
June 1, 2010 @ 1:31 am
I configured Hadoop on window + cygwin according to your document. All my nodes and trackers are running fine. When I run the map reducer program from eclipse, its give me the following exception
10/06/01 13:58:59 INFO mapred.JobClient: Running job: job_201006011357_0002
10/06/01 13:59:00 INFO mapred.JobClient: map 0% reduce 0%
10/06/01 13:59:09 INFO mapred.JobClient: Task Id : attempt_201006011357_0002_m_000004_0, Status : FAILED
java.io.FileNotFoundException: File C:/cygwin/tmp/hadoop-ssharma1/mapred/local/taskTracker/jobcache/job_201006011357_0002/attempt_201006011357_0002_m_000004_0/work/tmp does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
In all the configuration files i have put the correct entries.
It would be your great effort if you would help me out for solving this problem.
Arun Said,
June 3, 2010 @ 12:00 am
Hi,
That is nice tutorial, Is there any update for latest version of hadoop-0.20.2. because the structure is bit different compare to the older version. what are things we need to change in code for eclipse/etc… ?
Thanks in advance!
Arun.
vlad Said,
June 3, 2010 @ 12:03 am
I am planning to post an updated tutorial soon.
Siddharth prasad Said,
June 16, 2010 @ 8:54 pm
Hi
it seems everything is set up cleanly on windows vista.. but when i run a job , a small word count problem ..
i get theis in my console
10/06/17 09:15:36 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
10/06/17 09:15:37 INFO mapred.FileInputFormat: Total input paths to process : 1
10/06/17 09:15:37 INFO mapred.JobClient: Running job: job_201006170907_0002
10/06/17 09:15:38 INFO mapred.JobClient: map 0% reduce 0%
but from here .. it just stucks and when i see the job state in eclipse it saying running, but when i type localhost:50030 in the haddop saying there is no running job.
i can’t under stand what is going wrong, will be glad if you can help me on this.
Thankyou
Siddharth prasad.
Jony Blues Said,
June 22, 2010 @ 8:44 pm
I am working on a standalone server through Putty and I got the namenode and secondarynamenode working without errors. Yet when running the command “hadoop jobtracker”, I have the following errors:
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
For that reason, I can not perform “fs -put” to insert the files into hdfs. I dont think diskspace is a problem because the file is really small (1MB), and I dont think it is DNS as I use the direct IP address and I remove the localhost to use the specific Namemode IP address. How can I overcome this problem? I have performed namenode -format many times with a delete of hadoop_tmp_dir, but I still see the problem.
Thanks for your help
vlad Said,
June 22, 2010 @ 8:50 pm
Jony,
Check the space on the datanodes, and make sure they are reachable from the namenode.
Bejoy Said,
June 24, 2010 @ 3:31 am
Hi,
I’m new into Hadoop. I found your guide so useful and interactive in helping me out for initial set up. But unfortunately i’m facing a challenge while configuring cygwin for Hadoop development.
I had genered the rsa key and when i give
ssh localhost
it is prompting me for
@localhost’s password
But i havent set any password before. Itied almost all options but none did work. Could you please help me out with the same.
Hussain Said,
July 9, 2010 @ 4:43 am
Hi Vlad,
Thank you for the tutorial. I was facing a problem in step 5/6. When I enter the explorer command my documents windows open up (Maybe its the home). I pasted the hadoop archive there and then as mentioned in step 6 I tried to unpack the archive, it said no such file or directory. I tried ls command and it came up empty as well. I ran the command
echo $path and the output was
/usr/local/bin:/usr/bin:/bin:/cygdrive/g/WINDOWS/system32:/cygdrive/g/WINDOWS:/c
ygdrive/g/WINDOWS/System32/Wbem:/cygdrive/c/MATLAB7/bin/win32:/cygdrive/c/cygwin
/bin:/cygdrive/c/cygwin/usr/bin
What can be the problem?
Sven Said,
July 19, 2010 @ 2:16 am
Thx for the nice tutorial, vlad!
I have the same problems like others with “java.io.FileNotFoundException: File C:/cygwin/tmp/hadoop- …” exception being thrown.
Has anyone solved this problem already?
Sven Said,
July 19, 2010 @ 5:25 am
I found out, what works for me:
1) In ecelipse: open “localhost” location in “map/reduce” locations. Open advanced tab. Set “mapred.child.tmp” to /tmp/hadoop-/mapred/mapred.child.tmp
2) Use follwoing text as TestDriver:
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class TestDriver {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
Job job = new Job(conf, “hadoop test”);
job.setJarByClass(TestDriver.class);
job.setMapperClass(Mapper.class);
job.setCombinerClass(Reducer.class);
job.setReducerClass(Reducer.class);
job.setMapOutputKeyClass(LongWritable.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(“In”));
FileOutputFormat.setOutputPath(job, new Path(“Out”));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Sven Said,
July 19, 2010 @ 5:26 am
P.S.: I use 0.20.2 on eclipse europa (newer didn’t work)!
vlad Said,
July 19, 2010 @ 5:29 am
Hmm, did you add your rsa key to authorized_keys file as described in the tutorial?
Guohua Liu Said,
August 18, 2010 @ 8:15 pm
It is a good tutorial, but In the last step,why can’t bring up the windows “Run on Hapdoop” and select a Hadoop location to run on when I click “run as”->”run on hadoop”, so can’n see console output similar to your tutorial.Thank you for you asking!
vlad Said,
August 19, 2010 @ 2:00 pm
What is the version of eclipse you are running? The eclipse plugin only works with the version specified in the tutorial, it is not compatible with newer versions of eclipse. I am working on the upgrade to the plugin, but it is not available now.
suka_hati Said,
August 22, 2010 @ 8:48 pm
Hi,
I’ve follow your tutorial on configuring hadoop and eclipe. however, Im having problem in Setup hadoop location, at step 6. im not able to get the folder. I got error ‘Unknown protocol to job tracker’. Do anybody know how to resolve this issue?
Lady Di Said,
September 16, 2010 @ 6:11 am
Well done Mr Vlad,thank you very much.Spasibo bolshoe.
Priya Said,
October 3, 2010 @ 8:02 am
Hi Vlad!
Thank you for the nice tutorial. I am having some issues in the last step of the tutorial wherein we have to right-click on the Map/Reduce driver and “Choose existing Hadoop location”. The problem is that when I do this and select “run on Hadoop” nothing happens! The window that should come-up asking whether I wish to use an existing hadoop server of a new server does not come up.
The console remains blank too. The “Problems” tab just shows the warning:
“Description Resource Path Location Type
The import org.apache.hadoop.mapred.Mapper is never used testdriver.java /Hadoop Test/src line 8 Java Problem
”
I don’t know what the issue is. All the previous steps worked fine. I am using hadoop 0.19.2 and eclipse halios!
Thanks!
junfeng_feng Said,
October 17, 2010 @ 11:51 pm
Could you teach how config Eclipse in linux to run hadoop,please?
Erfan Said,
November 13, 2010 @ 5:40 am
Hi Vlad,
Great step-by-step tutorial. These days you can’t see these kind of tutorials very often.
I have some issues regarding the step 11:”Start the local hadoop cluster”. On my jobtracker and tasktracker window, I keep getting following error:
Error mapred.TaskTracker: can not start task tracker because java.lang.RuntimeException: Not a host:port pair: local
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:136)
and a bunch of other classes.
Sasha NorCal Said,
December 1, 2010 @ 8:15 pm
Great tutorial! Works.
Spasibo.
Otilia Said,
December 7, 2010 @ 6:20 am
Hi Vlad,
like Priya, I also have the same problem: on TestDriver class, when I choose “Run As”-> ârun on Hadoopâ nothing happens! The window that should come up asking whether I wish to use an existing hadoop server, does not show up.
I use eclipse 3.5 Helios and Hadoop-0.18.3
Do you have any idea what could be the problem? (maybe, if the plug-in is not good for this version of eclipse, do you know any other plug-in ?)
Thanks very much,
Otilia
hth Said,
January 10, 2011 @ 11:39 am
Excellent tutorial
I got it working using Eclipse Helios + Hadoop 20.2 using the eclipse plugin here
https://issues.apache.org/jira/browse/MAPREDUCE-1280
Also ./start-all.ksh works fine if you set the jdk correctly in conf/hadoop-env.sh
export JAVA_HOME=C:\\Progra~1\\Java\\jdk1.6.0_19
Also configure the site using the cluster setup here (modify the 3 files)
http://hadoop.apache.org/common/docs/r0.20.0/cluster_setup.html
and make the testdriver code change as per Arun’s suggestion earlier to get around the LongWritable issue
Happy hadooping!!
Arun K Said,
January 20, 2011 @ 11:32 pm
Hello,
Excellent Tutorial. I followed all the steps, created the TestDriver Program, but when I give Run as -> Run on Hadoop,
Nothing happens.
Can you please help me ?
I’m using Windows 7, Hadoop 0.19.2, and Java6
arunk786 Said,
January 24, 2011 @ 8:49 am
Hi V-lad !
This is the best tutorial to set up Hadoop cluster on a single node in windows.
1->
It would be the best one for times to come if the setup was discussed for HADOOP 0.2XXX version(where xml file is split into three) and for the latest Eclipse versions.
2->
Also a setup tutorial for a MULTINODE cluster would be of immense great help for students and research guys like us.
Happy hadooping!
Niluk Said,
March 8, 2011 @ 8:16 am
Thanks for the great tutorial. It’s immensely helpful.
One issue though is that in the section that sets up SSH authorization key, once everything is done and I execute
ssh localhost
I’m prompted for the localhost password of the user. I don’t remember ever setting such as password up. Have I missed a step somewhere? Can someone tell me how I can get around it?
Thanks in advance for your time!
-Niluk
Vikas Gupta Said,
March 18, 2011 @ 11:23 am
Hello ,
I m vikas gupta. this is really a gud tutorial. it works.
i wanna learn hadoop perfactly can u suggets me the way..pls m waiting for ur reply…………
Paulo Ramos Said,
March 29, 2011 @ 7:30 am
Good morning,
During the installation of Cygwin, at one point appears several sites, which one to choose? The first option whenever an error occurs.
I thank the attention.
Sincerely,
Paulo Ramos.
Mandar Said,
April 8, 2011 @ 1:35 am
hello sir,
i am following ur tutorial on hadoop on windows eclips but while formating namenode by using command bin/hadoop namenode -format i am not getting expected result as u shown in ur tutorial.I am getting following warning
$ bin/hadoop namenode -format
cygwin warning:
MS-DOS style path detected: C:\cygwin\home\Mandar\HADOOP~1.2\/build/native
Preferred POSIX equivalent is: /home/Mandar/HADOOP~1.2/build/native
CYGWIN environment variable option “nodosfilewarning” turns off this warning.
Consult the user’s guide for more details about POSIX paths:
http://cygwin.com/cygwin-ug-net/using.html#using-pathnames
bin/hadoop: line 243: C:\Program: command not found
bin/hadoop: line 273: C:\Program Files\Java\jre6\bin/bin/java: No such file or d
irectory
bin/hadoop: line 273: exec: C:\Program Files\Java\jre6\bin/bin/java: cannot exec
ute: No such file or directory
plz help me out soon..–
Regards-
Mandar Bedse
Bioinformatics Centre,
University of Pune.
Rakesh Jadhav Said,
April 8, 2011 @ 4:15 am
Hi,
I am getting error when I do following step… (creating hdfs file system)
$ bin/hadoop namenode -format
bash: bin/hadoop: /usr/bin/env: bad interpreter: Permission denied
Any clue? Appreciate help!
venkat Said,
April 19, 2011 @ 4:47 am
HI ur tutorial is excelent.I have done all the steps wt u have given.But when i run as Hadoop I did nt get any widow on eclipse and no response on the console.pls help me to solve this problem.
[WORDPRESS HASHCASH] The poster sent us ’0 which is not a hashcash value.
Adarsh Said,
April 21, 2011 @ 1:06 am
Sir, I am following this tutorial & everything works fine but when I Run the testdriver program as Run as > Run on Hadoop Nothing happens.
The Next Popup window doesn’t appears.
Don’t know what to do.
Thanks
tanvi Said,
April 27, 2011 @ 1:00 pm
I have one query.I followed all steps to open ssh but
after executing command ssh localhost it is showing connection closed by ::1
how to open ssh connection
[WORDPRESS HASHCASH] The poster sent us ’0 which is not a hashcash value.
myat kyaw Said,
May 15, 2011 @ 9:48 pm
hi Vlad,
Thanks for your tutorial.
When i run a testDriver class, i get this error.
I cannot solve this error.
Please give me some suggessions and help me.
The error message is as below…
11/05/16 10:41:08 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
11/05/16 10:41:09 INFO mapred.FileInputFormat: Total input paths to process : 4
11/05/16 10:41:10 INFO mapred.JobClient: Running job: job_201105161029_0001
11/05/16 10:41:11 INFO mapred.JobClient: map 0% reduce 0%
11/05/16 10:41:26 INFO mapred.JobClient: Task Id : attempt_201105161029_0001_m_000000_0, Status : FAILED
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
11/05/16 10:41:31 INFO mapred.JobClient: Task Id : attempt_201105161029_0001_m_000001_0, Status : FAILED
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
11/05/16 10:41:33 INFO mapred.JobClient: Task Id : attempt_201105161029_0001_m_000000_1, Status : FAILED
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
11/05/16 10:41:38 INFO mapred.JobClient: Task Id : attempt_201105161029_0001_m_000001_1, Status : FAILED
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
11/05/16 10:41:40 INFO mapred.JobClient: Task Id : attempt_201105161029_0001_m_000000_2, Status : FAILED
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
11/05/16 10:41:43 INFO mapred.JobClient: Task Id : attempt_201105161029_0001_m_000001_2, Status : FAILED
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at TestDriver.main(TestDriver.java:40)
vlad Said,
May 15, 2011 @ 10:00 pm
Congratulations, you managed to set up the cluster correctly. The problem with your job is that you have type mismatch, between what Mapper expects and what comes to it’s input. The mappers and reducers generated by Eclipse plug-in do not work well with never versions of hadoop. Just google the web for some hadoop examples and try to run those.
vlad Said,
May 15, 2011 @ 10:01 pm
Check your eclipse version.. the standard plug-in does not work well with recent versions of eclipse.
Gio Said,
May 27, 2011 @ 5:24 pm
Hi – I was very happy to find this tutorial but I’m stuck at the ssh step
with this error:
$ ssh localhost
Connection closed by ::1
If I remove the key files ‘ssh localhost’ works (i.e., I’m prompted for my password and successfully connect).
I’ve tried various tweaks of the /etc/ssh_config and /etc/sshd_config file with no luck yet… thank you in advance for your help.
A little more verbose output:
$ ssh -v localhost
OpenSSH_5.8p1, OpenSSL 0.9.8r 8 Feb 2011
debug1: Reading configuration data /etc/ssh_config
debug1: Applying options for *
debug1: Connecting to localhost [::1] port 22.
debug1: Connection established.
debug1: identity file /home/gio/.ssh/id_rsa type 1
debug1: identity file /home/gio/.ssh/id_rsa-cert type -1
debug1: identity file /home/gio/.ssh/id_dsa type -1
debug1: identity file /home/gio/.ssh/id_dsa-cert type -1
debug1: identity file /home/gio/.ssh/id_ecdsa type -1
debug1: identity file /home/gio/.ssh/id_ecdsa-cert type -1
debug1: Remote protocol version 2.0, remote software version OpenSSH_5.8
debug1: match: OpenSSH_5.8 pat OpenSSH*
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_5.8
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: server->client aes128-ctr hmac-md5 none
debug1: kex: client->server aes128-ctr hmac-md5 none
debug1: sending SSH2_MSG_KEX_ECDH_INIT
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: Server host key: ECDSA 1b:57:34:0e:9a:a7:da:09:ae:62:7a:81:cf:0c:a9:2f
The authenticity of host ‘localhost (::1)’ can’t be established.
ECDSA key fingerprint is 1b:57:34:0e:9a:a7:da:09:ae:62:7a:81:cf:0c:a9:2f.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added ‘localhost’ (ECDSA) to the list of known hosts.
debug1: ssh_ecdsa_verify: signature correct
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: Roaming not allowed by server
debug1: SSH2_MSG_SERVICE_REQUEST sent
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey,password
debug1: Next authentication method: publickey
debug1: Offering RSA public key: /home/gio/.ssh/id_rsa
Connection closed by ::1
Gio Said,
June 2, 2011 @ 11:10 pm
Browse HDFS Troubleshooting:
Hi Vlad – using your tutorial and several comments here, I too was able to get almost everything working
using Eclipse Helios 3.6 + Hadoop 20.2… thanks. I can launch jobs from within eclipse, but I can’t browse the local HDFS.
Under “Project Explorer” > “DFS Locations” > “localhost” > “(1)” I’m getting:
Error: Call to localhost/127.0.0.1:9100 failed on local exception: java.io.EOFException
From the command line I can confirm HDFS is working properly — i.e., I can issue commands such as “hadoop fs âls /”, so it’s probably a setting in the “location” “Advanced” tab. Any suggestions for how o diagnose/fix? Thanks.
vlad Said,
June 2, 2011 @ 11:18 pm
The hadoop plugin described in this tutorial is written for Eclipse Europa 3.3 it does not work on Eclipse Helios. I had plans to update it and post it here, but never got around to actually do it.
myat kyaw Said,
June 6, 2011 @ 10:24 pm
when i run the testDriver on hadoop, i haven’t seen a hadoop
location box to choose a hadoop server. Why?
Give me some suggesstion.Please.
praveen Said,
June 10, 2011 @ 3:03 am
I got the following error while running namenode command could you please help for resolving the issue.
INFO metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=9100
INFO namenode.NameNode: Namenode up at: localhost/127.0.0.1:9100
INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null
metrics.NameNodeMetrics: Initializing NameNodeMeterics us
ing context object:org.apache.hadoop.metrics.spi.NullContext
INFO namenode.FSNamesystem: fsOwner=EE205782,Domain,Users,root,Administrators,Users,Debugger,Users
INFO namenode.FSNamesystem: supergroup=supergroup
namenode.FSNamesystem: isPermissionEnabled=true
INFO metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
INFO namenode.FSNamesystem: Registered FSNamesystemStatusMBean
ERROR namenode.FSNamesystem: FSNamesystem initialization failed.
java.io.IOException: NameNode is not formatted.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:305)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:309)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:288)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:163)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:208)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:194)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:859)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:868)
INFO ipc.Server: Stopping server on 9100
ERROR namenode.NameNode: java.io.IOException: NameNode is notformatted.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:305)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:309)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:288)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:163)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:859)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:868)
sudhir Said,
June 16, 2011 @ 12:12 am
Excellent, I just started with hadoop, and was looking for exactly the same information.
BTW, how stable is the hadoop eclipse plugin !
Troy Said,
June 16, 2011 @ 7:31 am
I’m stuck in step 3 – setup ssh daemon.
This is the message i get:
*** Query: Should privilege separation be used? (yes/no) no
*** Info: Updating /etc/sshd_config file
*** Warning: The following functions require administrator privileges!
*** Query: Do you want to install sshd as a service?
*** Query: (Say “no” if it is already installed as a service) (yes/no) yes
*** Query: Enter the value of CYGWIN for the daemon: [] ntsec
*** Info: On Windows Server 2003, Windows Vista, and above, the
*** Info: SYSTEM account cannot setuid to other users — a capability
*** Info: sshd requires. You need to have or to create a privileged
*** Info: account. This script will help you do so.
*** Info: You appear to be running Windows XP 64bit, Windows 2003 Server,
*** Info: or later. On these systems, it’s not possible to use the LocalSyste
*** Info: account for services that can change the user id without an
*** Info: explicit password (such as passwordless logins [e.g. public key
*** Info: authentication] via sshd).
*** Info: If you want to enable that functionality, it’s required to create
*** Info: a new account with special privileges (unless a similar account
*** Info: already exists). This account is then used to run these special
*** Info: servers.
*** Info: Note that creating a new user requires that the current account
*** Info: have Administrator privileges itself.
*** Info: No privileged account could be found.
*** Info: This script plans to use ‘cyg_server’.
*** Info: ‘cyg_server’ will only be used by registered services.
*** Query: Do you want to use a different name? (yes/no) no
*** Query: Create new privileged user account ‘cyg_server’? (yes/no) yes
*** Info: Please enter a password for new user cyg_server. Please be sure
*** Info: that this password matches the password rules given on your system.
*** Info: Entering no password will exit the configuration.
*** Query: Please enter the password:
hansuksoo. Said,
June 18, 2011 @ 6:47 pm
hi Vlad,
Thanks for your tutorial.
When i run a testDriver class, i get this error.
I cannot solve this error.
Please give me some suggessions and help me.
The error message is as belowâŚ
eclipse : 3.3
hadoop : hadoop-0.19.1
jdk : 1.6.0_26
JVM problem?
11/06/19 10:37:03 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
11/06/19 10:37:03 INFO mapred.FileInputFormat: Total input paths to process : 4
11/06/19 10:37:04 INFO mapred.JobClient: Running job: job_201106191031_0001
11/06/19 10:37:05 INFO mapred.JobClient: map 0% reduce 0%
11/06/19 10:37:09 INFO mapred.JobClient: Task Id : attempt_201106191031_0001_m_000006_0, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
11/06/19 10:37:13 INFO mapred.JobClient: Task Id : attempt_201106191031_0001_m_000006_1, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
11/06/19 10:37:17 INFO mapred.JobClient: Task Id : attempt_201106191031_0001_m_000006_2, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
11/06/19 10:37:24 INFO mapred.JobClient: Task Id : attempt_201106191031_0001_m_000005_0, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
11/06/19 10:37:27 INFO mapred.JobClient: Task Id : attempt_201106191031_0001_m_000005_1, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
11/06/19 10:37:31 INFO mapred.JobClient: Task Id : attempt_201106191031_0001_m_000005_2, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at TestDriver.main(TestDriver.java:40)
vlad Said,
June 18, 2011 @ 7:53 pm
Everything is okay. It’s just the newer code generated by the plug-in is not compatible with the newer versions of hadoop. Try to run the sample code that came with your hadoop distribution.
vlad Said,
June 22, 2011 @ 7:56 pm
You need to enter a new password for the ‘sshd’ service account. Pick any password you want. Also make sure you are running this step as an Administrator.
vlad Said,
June 22, 2011 @ 8:00 pm
I found eclipse plugin to be pretty stable. But it only works with older versions of Eclipse. Also, hadoop itself is not considered stable on Windows. It works for me doing initial development, but for production usage consider running the real jobs on the linux system. If you don’t have a cluster at your disposal you could use Amazon Elastic MapReduce. I found it pretty good, and quite cheap for what it does. I was able to run humongous jobs processing 400G of data within a few hours.
Rahul Said,
June 24, 2011 @ 6:52 pm
Very useful tutorial. Without this it would not have been possible to install on Windows machine for me. Thank you Vlad.
vlad Said,
June 28, 2011 @ 8:54 am
Did you format your namenode? Looks like you either missed that step or something happen during formatting. Try do redo the steps starting with namenode format.
alex Said,
July 3, 2011 @ 8:42 am
Hi vlad..
Thnks 4 awesome tutorial on hadoop…It helped me a lot..
I did exactly d same u mentioned in your tutorial..but in the end when i ran the test project…its giving some exception…
11/07/03 21:06:19 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
11/07/03 21:06:19 INFO mapred.FileInputFormat: Total input paths to process : 4
11/07/03 21:06:20 INFO mapred.JobClient: Running job: job_201107032031_0009
11/07/03 21:06:21 INFO mapred.JobClient: map 0% reduce 0%
11/07/03 21:06:27 INFO mapred.JobClient: Task Id : attempt_201107032031_0009_m_000000_0, Status : FAILED
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
11/07/03 21:06:32 INFO mapred.JobClient: Task Id : attempt_201107032031_0009_m_000000_1, Status : FAILED
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
11/07/03 21:06:37 INFO mapred.JobClient: Task Id : attempt_201107032031_0009_m_000000_2, Status : FAILED
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at TestDriver.main(TestDriver.java:42)
…plz help me out…:)
Thnks..
vlad Said,
July 6, 2011 @ 11:12 am
It is not a JVM problem. However I can’t tell because you did not give exact exception information. Take a look at the TaskTracker logs to see what is the actual exception given by the task.
sunny Said,
July 12, 2011 @ 3:20 am
When i am installing the Cygwin it displaying the cygncurses-9.dll was not found.After re-installing
the application it showing same problem.
Can any one can help me to solve this problem
sunny Said,
July 27, 2011 @ 1:45 am
Hi vlad,
Your tutorial helped me a lot.I did exactly the same you mentioned in your tutorial,but in the end when i click the new hadoop location it’s didn’t display the dialog box.
Can you help me………..
sunny Said,
August 16, 2011 @ 2:51 am
Hi vlad,
Above problem solved,When i run a TestDriver class, i get this error
11/08/16 12:36:09 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
11/08/16 12:36:10 INFO mapred.FileInputFormat: Total input paths to process : 4
11/08/16 12:36:10 INFO mapred.JobClient: Running job: job_201108161203_0001
11/08/16 12:36:11 INFO mapred.JobClient: map 0% reduce 0%
11/08/16 12:36:17 INFO mapred.JobClient: Task Id : attempt_201108161203_0001_m_000006_0, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
11/08/16 12:36:21 INFO mapred.JobClient: Task Id : attempt_201108161203_0001_m_000006_1, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
11/08/16 12:36:25 INFO mapred.JobClient: Task Id : attempt_201108161203_0001_m_000006_2, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
11/08/16 12:36:34 INFO mapred.JobClient: Task Id : attempt_201108161203_0001_m_000005_0, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
11/08/16 12:36:37 INFO mapred.JobClient: Task Id : attempt_201108161203_0001_m_000005_1, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
11/08/16 12:36:42 INFO mapred.JobClient: Task Id : attempt_201108161203_0001_m_000005_2, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at TestDriver.main(TestDriver.java:40)
can you help me
Vlad Said,
August 16, 2011 @ 7:28 am
Sunny,
Glad you got everything working. Regarding your last message, there is not enough information for me to help you. What I see is that your tasks are failing, you need to see the log’s for individual tasks ( best way to do it is through web interface to the job tracker, when you click on the job you will see the list of all tasks ( failed and successful ) and should be able to see the links to the logs for individual tasks.
You could probably discern from these logs what is wrong with your job. It could be anything, from not enough space on the drive to runtime error in the tasks. If you get stuck post the logs here I will try to help you.
rohit Said,
August 17, 2011 @ 11:00 am
Hi Vlad,
When i running the TestDriver class It showing like this :
11/08/17 23:24:37 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
11/08/17 23:24:37 INFO mapred.FileInputFormat: Total input paths to process : 4
11/08/17 23:24:37 INFO mapred.JobClient: Running job: job_201108172316_0002
11/08/17 23:24:38 INFO mapred.JobClient: map 0% reduce 0%
11/08/17 23:24:47 INFO mapred.JobClient: Task Id : attempt_201108172316_0002_m_000000_0, Status : FAILED
java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.io.IntWritable, recieved org.apache.hadoop.io.Text
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:563)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
11/08/17 23:24:51 INFO mapred.JobClient: Task Id : attempt_201108172316_0002_m_000000_1, Status : FAILED
java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.io.IntWritable, recieved org.apache.hadoop.io.Text
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:563)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
11/08/17 23:24:55 INFO mapred.JobClient: Task Id : attempt_201108172316_0002_m_000000_2, Status : FAILED
java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.io.IntWritable, recieved org.apache.hadoop.io.Text
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:563)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.Child.main(Child.java:158)
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at TestDriver.main(TestDriver.java:46)
Can you help me…..
rohit Said,
August 17, 2011 @ 11:27 am
Vlad, just a note … the earlier output was from another machine (Which you think may have mem problems), so I set up Hadoop @ home as well to continue with my project. And at home unfortunately Im running into another problem, which is the second log I posted just above this comment.
vlad Said,
August 18, 2011 @ 12:03 am
Your cluster is functioning correctly. However you have an error in your mapreduce code. Your mapper is expecting an Integer, but hadoop is sending a String. Did you use generated code by the plugin? It is not compatible with the newer versions of hadoop. Take a look at WordCount example that came with your hadoop distro and use that as your guide. The code changes are very minimal.
John Said,
September 5, 2011 @ 1:55 am
Hi Vlad,
Please let me know when do we need to configure hadoop-env.sh file with JAVA_HOME path.
In this tutorial, it seems that you’ve not configured the same. Is it necessary to configure hadoop-env.sh file or is it optional?
Monk Said,
September 9, 2011 @ 12:33 pm
Hey great tutorial!
Maybe i can help out. I had a few problems myself. I did net set JAVA HOME in windows(caused problems) but directly in the hadoop-env file. Also i’ve my JAVA map in the native folder. It worked for me. For people that have permission problems etc maybe it’s wise to create a new user account with admin rights or run cygwin as administrator.
I used hadoop 0.20.2
thanks for this tutorial vlad!
naveen Said,
September 12, 2011 @ 1:34 am
Thank you for you excellent tutorial, however I have the error when running the last step, the output error as below,
11/09/07 12:57:51 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
11/09/07 12:57:52 INFO mapred.FileInputFormat: Total input paths to process : 4
11/09/07 12:57:56 INFO mapred.JobClient: Running job: job_201109071219_0001
11/09/07 12:57:57 INFO mapred.JobClient: map 0% reduce 0%
11/09/07 12:58:06 INFO mapred.JobClient: Task Id : attempt_201109071219_0001_m_000006_0, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
11/09/07 12:58:10 INFO mapred.JobClient: Task Id : attempt_201109071219_0001_m_000006_1, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
11/09/07 12:58:14 INFO mapred.JobClient: Task Id : attempt_201109071219_0001_m_000006_2, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
11/09/07 12:58:24 INFO mapred.JobClient: Task Id : attempt_201109071219_0001_m_000005_0, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
11/09/07 12:58:28 INFO mapred.JobClient: Task Id : attempt_201109071219_0001_m_000005_1, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
11/09/07 12:58:32 INFO mapred.JobClient: Task Id : attempt_201109071219_0001_m_000005_2, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at TestDriver.main(TestDriver.java:41)
Michael Said,
September 23, 2011 @ 5:41 am
Hi Vlad,
I followed your tutorial, however using more current versions of Hadoop (0.20.203.0) and Eclipse (3.3.0 and 3.7.0). The first problem that I encountered is not related to Eclipse. In the section âStart the local hadoop clusterâ the command
bin/hadoop tasktracker
gives me the following errors:
11/09/21 08:28:25 ERROR mapred.TaskTracker: Can not start task tracker because java.io.IOException: Failed to set permissions of path: /tmp/hadoop-Michael/mapred/local/ttprivate to 0700
at org.apache.hadoop.fs.RawLocalFileSystem.checkReturnValue(RawLocalFileSystem.java:525)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:499)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:318)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:183)
at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:635)
at org.apache.hadoop.mapred.TaskTracker.(TaskTracker.java:1328)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3430)
I tried to Google it but couldnât find the way to get around it. Do you happen to know how to fix it. Iâm getting the same error when running the test project in Eclipse as Java Application. Running on Hadoop doesnât work for me neither with the same results in v. 3.3 and 3.7 by bringing up a list of classes to run as Java App.
Thank you in advance.
Aditya Said,
September 28, 2011 @ 3:43 am
Hi Vlad,
I followed your tutorial. It was really helpful.
I m stuck with the following.
I m able to start sshd service and loggen on to local host but When I execute start-df.sh or start-mapred.sh it shows me error that
“daemon wants to run as MY_USERNAME but not running as that user or root”
and in the log files it says that the storage directory “\hadoopBase\aditya\namenode\dfs\name is in an inconsistent state: storage directory does not exist or is not accessible.”
I looked in that directory. That directory was not even present there. No matter how many times I run the command
“Hadoop namenode âformat”
it always says that namenode formatted successfully but the directory or the files in it are never actually created.
I think that is the main problem. I would really appreciate if you could help me with it.
Parita Said,
October 1, 2011 @ 10:36 pm
Hey my hadoop is not getting unpacked properly,it is showing ‘reached the end of file’.And as a result the namenode is not getting formatted.Pls help!
Amit Said,
October 5, 2011 @ 1:53 am
Hi,
Can you please step by step guide, how to convert Mysql table to hadoop, how to create table in hadoop.
Prashant Said,
October 17, 2011 @ 11:41 pm
Hi Vlad,
I am getting error here
$ bin/hadoop fs -help
bin/hadoop: line 243: C:\Program: command not found
bin/hadoop: line 273: C:\Program File\Java\jdk1.6.0_27/bin/java: No such file or
directory
bin/hadoop: line 273: exec: C:\Program File\Java\jdk1.6.0_27/bin/java: cannot ex
ecute: No such file or directory
kindly help..
jeris Said,
October 26, 2011 @ 2:32 pm
I tried to navigate to other directory using the cd command in the cygwin command prompt but the directory doesn’t change as a result of which i am stuck in the middle of configuration. Could you please say what can the possible reason behind this be???
Harish Said,
November 7, 2011 @ 8:40 pm
Hi ,
Will running bin/hadoop namenode format
Actually formats my physical drive ? Will I lose the contents already in that ?
vlad Said,
November 11, 2011 @ 6:59 am
Namenode format formats your hdfs system not the physical drive. So you will not lose your drive contents.
HOWEVER, if you previously had HDFS filesystem on that drive it will erase all your hadoop data. So be careful with that. If it is your first Hadoop installation you obviously won’t have this problem. Otherwise make some arrangements to backup your data.
vlad Said,
November 11, 2011 @ 7:01 am
Sorry, this is way out of scope for this tutorial. Hadoop doesn’t even have tables in the SQL sense of way. It is designed to process data from the streams of records.
vlad Said,
November 11, 2011 @ 7:02 am
Something is wrong with your archive. Try to download it again. Also check how much free space you have on your drive and if you have any disk quotas turned on.
vlad Said,
November 11, 2011 @ 7:04 am
This tutorial will not run with newer versions of hadoop. Configration files in Hadoop 0.20 have changed dramatically. And the eclipse plugin used in this version of the tutorial will not work in newer versions of eclipse.
Jim Said,
November 15, 2011 @ 2:57 pm
Which Eclipse- for C/C+, Java Devs, RCP/Plug-in or Java EE?
Bhavesh Said,
November 22, 2011 @ 2:45 am
Hi vlad,
I want to ask just one thing that –
Is Hadoop will be useful in Data Retrieval and analyzing
from very large database?
Bhavesh Said,
November 22, 2011 @ 3:00 am
Do we need to install hadoop seperately through cygwin or will these steps do that?
omprakash Said,
November 22, 2011 @ 11:58 am
thanks for the tutorial
i followed all the properly but when i was trying to start the jobtracker and tasktracker…they are not getting started
plz help me to solve this issue
thank u..
Paul Said,
December 1, 2011 @ 3:49 am
Thanks for the tutorial!!
When uploading data in HDFS(12th step) I type
bin/hadoop fs -mkdir In
its showing
bin/hadoop: line 243: C:\Program: command not found
mkdir: cannot create directory In: File exists
What do I do now??
Thanks once again.
vlad Said,
December 1, 2011 @ 4:32 am
Looks like your Java is installed into default location in C:\Program Files\, this breaks cygwin scripts. I suggest moving your Java install to something like C:\Java and adjusting your code accordingly.
You can just install another copy of JDK, it is not very big by modern standards, and it will solve your problem without breaking the rest of the system.
vlad Said,
December 1, 2011 @ 4:33 am
Java Devs is sufficient, Java EE might come useful.
Paul Said,
December 1, 2011 @ 1:43 pm
Hi,
I installed java jdk in c:\JAVA again. But the error persists!!
Any other ways to get rid of this error??
Thanks.
Bhavesh Said,
December 1, 2011 @ 9:49 pm
When I run enter the query on Hive CLI, I get the errors as below:
hive> SELECT a.foo FROM invites a WHERE a.ds=’2008-08-15′;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there’s no reduce operator
Starting Job = job_201111291547_0013, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201111291547_0013
Kill Command = C:\cygwin\home\Bhavesh.Shah\hadoop-0.20.2\/bin/hadoop job -Dmapred.job.tracker=localhost:9101 -kill job_201111291547_0013
2011-12-01 14:00:52,380 Stage-1 map = 0%, reduce = 0%
2011-12-01 14:01:19,518 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201111291547_0013 with errors
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
So my question is that how to stop a job? In this case the job is : job_201111291547_0013
Pls help me out so that I can remove these errors and try for next.
Thanks.
Jacky Hou Said,
December 7, 2011 @ 3:54 am
Hi, Vlad!
Thank you for your tutorial.
I had followed every step and everything comes good, but when I create the file of testDriver, threr emerged many errors, many of them are deprecated. and when I execute the final step, running the project, there is nothing happened.
my email is
Pls help…
vlad Said,
December 7, 2011 @ 4:19 pm
The code generated by the plugin is not compatible with newer versions of the hadoop. Hence the deprecated errors. You need a few changes to the code to make it work with current versions.
Annamalai Said,
December 21, 2011 @ 8:10 pm
Thank you Vlad for such a nice tutorial,
Can you help me resolving the following error,
I understand the Out directory exists already. But where can i check for this directory and delete it before rerunning the job.
11/12/21 20:25:14 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:9100/user/anna-pc/anna/Out already exists
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:111)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:793)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142)
at TestDriver.main(TestDriver.java:43)
Yuriy Said,
December 23, 2011 @ 1:58 am
As I see the most common error after configuring Hadoop and Eclipse on Windows is as follows:
java.io.FileNotFoundException: File C:/tmp/hadoop/mapred/local/taskTracker/jobcache/job_201112221128_0001/attempt_201112221128_0001_m_000001_1/work/tmp does not exist.
As it was already mentioned, it’s likely to be caused by incompatibility of hadoop-eclipse plugin with versions of Hadoop and Eclipse newer than mentioned in the tutorial.
However I’ve got the following configuration working on Windows with hadoop-0.20.2, Eclipse Indigo 3.7.1 and hadoop-eclipse-plugin-0.20.3-SNAPSHOT
Yuriy Said,
December 23, 2011 @ 1:59 am
1) jdk1.7.0_01
2) I configured paths by commands
echo âexport JAVA_HOME=/usr/local/java/jdk1.10_01 > /etc/profile.d/jdk.sh
echo âexport PATH=$JAVA_HOME/bin:$PATHâ >>
/etc/profile.d/jdk.sh
source /etc/profile.d/jdk.sh
(I copied the jdk1.7.0_01 directory from C:\Program Files\Java to /usr/java to avoid spaces in paths)
3) hadoop-env.sh
export JAVA_HOME=/usr/local/java/jdk1.7.0_01
Yuriy Said,
December 23, 2011 @ 2:06 am
4)add this to mapred-site.xml:
mapred.tasktracker.map.tasks.maximum
1
mapred.child.tmp
/tmp/hadoop/mapred/mapred.child.tmp
Yuriy Said,
December 23, 2011 @ 2:06 am
5) eclipse.ini: (modify teh lines after -vmargs)
-Xms512m
-Xmx1024m
-XX:MaxPermSize=256m
-Dosgi.classloader.lock=classname
-Dosgi.requiredJavaVersion=1.7
6) replace hadoop-0.20.1-eclipse-plugin in eclipse/plugins directory by hadoop-eclipse-plugin-0.20.3-SNAPSHOT
I’m not strongly convinced that all of these configurations are obligitory for normal processing. I’ve only shared mine.
I’ve also haven’t tested versions of hadoop higher than 0.20.2
Yuriy Said,
December 23, 2011 @ 2:10 am
As I see the most common error after configuring Hadoop and Eclipse on Windows is as follows:
java.io.FileNotFoundException: File C:/tmp/hadoop/mapred/local/taskTracker/jobcache/job_201112221128_0001/attempt_201112221128_0001_m_000001_1/work/tmp does not exist.
As it was already mentioned, it’s likely to be caused by incompatibility of hadoop-eclipse plugin with versions of Hadoop and Eclipse newer than mentioned in the tutorial.
However I’ve got the following configuration working on Windows with hadoop-0.20.2, Eclipse Indigo 3.7.1 and hadoop-eclipse-plugin-0.20.3-SNAPSHOT
1) jdk1.7.0_01
2) I configured paths by commands
echo âexport JAVA_HOME=/usr/local/java/jdk1.10_01 > /etc/profile.d/jdk.sh
echo âexport PATH=$JAVA_HOME/bin:$PATHâ >>
/etc/profile.d/jdk.sh
source /etc/profile.d/jdk.sh
(I copied the jdk1.7.0_01 directory from C:\Program Files\Java to /usr/java to avoid spaces in paths)
3) hadoop-env.sh
export JAVA_HOME=/usr/local/java/jdk1.7.0_01
4)add this propeties to mapred-site.xml:
mapred.tasktracker.map.tasks.maximum
1
mapred.child.tmp
/tmp/hadoop/mapred/mapred.child.tmp
5) eclipse.ini: (modify teh lines after -vmargs)
-Xms512m
-Xmx1024m
-XX:MaxPermSize=256m
-Dosgi.classloader.lock=classname
-Dosgi.requiredJavaVersion=1.7
6) replace hadoop-0.20.1-eclipse-plugin in eclipse/plugins directory by hadoop-eclipse-plugin-0.20.3-SNAPSHOT
I’m not strongly convinced that all of these configurations are obligitory for normal processing. I’ve only shared mine.
I’ve also haven’t tested versions of hadoop higher than 0.20.2
Yuriy Said,
December 24, 2011 @ 1:41 am
As I see the most common error after configuring Hadoop and Eclipse on Windows is as follows:
java.io.FileNotFoundException: File C:/tmp/hadoop/mapred/local/taskTracker/jobcache/job_201112221128_0001/attempt_201112221128_0001_m_000001_1/work/tmp does not exist.
As it was already mentioned, it’s likely to be caused by incompatibility of hadoop-eclipse plugin with versions of Hadoop and Eclipse newer than mentioned in the tutorial.
However I’ve got the following configuration working on Windows with hadoop-0.20.2, Eclipse Indigo 3.7.1 and hadoop-eclipse-plugin-0.20.3-SNAPSHOT
1) jdk1.7.0_01
2) I configured paths by commands
echo âexport JAVA_HOME=/usr/local/java/jdk1.10_01 > /etc/profile.d/jdk.sh
echo âexport PATH=$JAVA_HOME/bin:$PATHâ >>
/etc/profile.d/jdk.sh
source /etc/profile.d/jdk.sh
(I copied the jdk1.7.0_01 directory from C:\Program Files\Java to /usr/java to avoid spaces in paths)
3) hadoop-env.sh
export JAVA_HOME=/usr/local/java/jdk1.7.0_01
4) add these propeties to mapred-site.xml:
mapred.tasktracker.map.tasks.maximum
1
mapred.child.tmp
/tmp/hadoop/mapred/mapred.child.tmp
5) eclipse.ini: (modify the lines after -vmargs)
-Xms512m
-Xmx1024m
-XX:MaxPermSize=256m
-Dosgi.classloader.lock=classname
-Dosgi.requiredJavaVersion=1.7
6) replace hadoop-0.20.1-eclipse-plugin in eclipse/plugins directory by hadoop-eclipse-plugin-0.20.3-SNAPSHOT
I’m not strongly convinced that all of these configurations are obligitory for normal processing. I’ve only shared mine.
I’ve also haven’t tested versions of hadoop higher than 0.20.2
Valon Said,
December 24, 2011 @ 9:29 am
Hi, thanks again for this tutorial. Could this plugin be used for the Standalone Mode? I can run the Hadoop-0.20.2 examples without starting the 5 separate Cygwin windows.
By the way I am using Hadoop-0.20.2 + Cygwin + Standalone Mode and would like to use Eclipse Indigo. I found plugins for hadoop-0.20.1, .2 and .3 but not sure if they were patched for newer version of Eclipse.
MK Said,
December 27, 2011 @ 5:50 pm
Hi,
Thanks for the awesome tutorial!! I have My eclipse 6.0 at my workplace and I have done all the steps that you have shown but when I open Eclipse and try to open the Map/Reduce perspective Eclipse flags an error saying “Problems opening perspective ‘org.apache.hadoop.eclipse.Perspective” Please Help!!
Thanks!!
Faten Said,
December 30, 2011 @ 1:07 pm
Hi, Vlad!
,I hope that you can help me,I am new in programming with Hadoop and after installing is done I don’t know what to do after that,please help me and thanks again,all what I want to know is how to make the hadoop server receive and send data to the client
Thank you for your tutorial.
I had followed every step and everything comes good,thanks again
Vikram Said,
January 9, 2012 @ 3:03 am
Hi,
I am getting following error while configuring ssh.
cygrunsrv: Error installing a service: OpenSCManager: Win32 error 5:
Access is denied.
I have admin rights on my machine.I have Windows 7 OS.
Please suggest what could be the problem
TIA
Priyanka Said,
January 11, 2012 @ 3:23 am
Hi,
Thank you for this tutorial
I have hadoop0.20.203.0 version.and i am not finding hadoop-site file inside conf folder?What to do now??
vlad Said,
January 11, 2012 @ 3:37 am
This tutorial is not compatible with hadoop version 18 and above. The newer versions of Hadoop are not compatible with the eclipse plugin.
Priyanka Said,
January 11, 2012 @ 3:45 am
As i read about hadoop that in hadoop0.20 hadoop-site.xml has been divided into three files namely core-site.xml,hdfs-site.xml,mapred.xml,Is i can use the settings that u provided to configure eclipse with all three files
vlad Said,
January 11, 2012 @ 3:50 am
Still, it is not going to work with Eclipse. You can use this tutorial to get ideas how to setup hadoop on Windows. But, it will not be compatible with eclipse plugin. There is an updated version of the plugin somewhere on IBM developerworks site, not sure exactly where it is. Also, keep in mind that newer version of hadoop use a lot of native Linux FS features, so your performance and stability is not going to be great on Windows.
Priyanka Said,
January 11, 2012 @ 3:57 am
Thanks a lot for the information..Now i will try to work with hadoop older version.
Priyanka Said,
January 11, 2012 @ 4:26 am
$ bin/hadoop namenode -format
After typing this command,I am getting-
bin/hadoop: line 243: C:\Program: command not found
bin/hadoop: line 273: C:\Program Files\Java\jdk1.6.0\bin;/bin/java: No such file or directory
bin/hadoop: line 273: exec: C:\Program Files\Java\jdk1.6.0\bin;/bin/java: cannot execute: No such file or directory
My java path is C:\Program Files\Java\jdk1.6.0\bin.What is to do??
Thanks in advance..
vlad Said,
January 11, 2012 @ 12:57 pm
You have your JDK installed into C:\Program Files. That drives cygwin bash scripts crazy. Best way to fix that is to install JDK into C:\Java, also make sure that the path to C:\Java\Bin comes first in your PATH variable settings. You can see how to configure you path variable in the tutorial itself.
vlad Said,
January 13, 2012 @ 3:39 am
The directory is in HDFS. Use ‘hadoop fs’ command to manipulate it.
Burcu Said,
January 18, 2012 @ 1:24 am
Hi,
I run the code and it works. I use this command to see the output file.
“bin/hadoop fs -cat Out2/*”
The input files are shown and after end of files; it states “cat: Source must be a file.”
What is wrong here?
Thanks,
———-
(…)
294366 49. HADOOP-96. Logging improvements. Log files are now separate from
294437 standard output and standard error files. Logs are now rolled.
294505 Logging of all DFS state changes can be enabled, to facilitate
294572 debugging. (Hairong Kuang via cutting)
294616
294617
294618 Release 0.1.1 – 2006-04-08
294645
294646 1. Added CHANGES.txt, logging all significant changes to Hadoop. (cutt ing)
294723
294724 2. Fix MapReduceBase.close() to throw IOException, as declared in the
294795 Closeable interface. This permits subclasses which override this
294865 method to throw that exception. (cutting)
294911
294912 3. Fix HADOOP-117. Pathnames were mistakenly transposed in
294973 JobConf.getLocalFile() causing many mapred temporary files to not
295043 be removed. (Raghavendra Prabhu via cutting)
295093
295095 4. Fix HADOOP-116. Clean up job submission files when jobs complete.
295165 (cutting)
295179
295180 5. Fix HADOOP-125. Fix handling of absolute paths on Windows (cutting)
295252
295253 Release 0.1.0 – 2006-04-01
295280
295281 1. The first release of Hadoop.
295314
cat: Source must be a file.
vlad Said,
January 18, 2012 @ 6:23 am
Probably you have some non-file entity besides the text file you are cating..
Krishna Kishore Vangavolu Said,
January 25, 2012 @ 1:07 am
Hi Here is the Issue fix for
java.io.FileNotFoundException: File C:/tmp/hadoop-kvangavolu/mapred/local/taskTracker/jobcache/job_201201251217_0003/attempt_201201251217_0003_m_000006_3/work/tmp does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420)
Fix :
1. Put the following chunk in the hadoop-site.xml:
mapred.child.tmp
C:\tmp
2. Also change the property value of ‘mapred.child.tmp’ to C:\tmp in Edit Hadoop Location -> Advanced parameters of eclipse.
This should fix the FileNotFound issues
Sandeep Said,
January 27, 2012 @ 10:14 pm
Vlad,
With your tutorial I was able to setup hadoop. But failed to run ‘TestDriver’ example. I have been searching web over a week for the solution, but hard luck. Can you please help me for the below error I get.
>>Err>>>>>>>>>>>>>>>
Meta VERSION=”1″ .
Job JOBID=”job_201201280957_0001″ JOBNAME=”Hadoop Test_TestDriver\.java-2996741353767810850\.jar” USER=”Administrator” SUBMIT_TIME=”1327725089199″ JOBCONF=”hdfs://localhost:9100/tmp/hadoop-Administrator/mapred/system/job_201201280957_0001/job\.xml” .
Job JOBID=”job_201201280957_0001″ JOB_PRIORITY=”NORMAL” .
Job JOBID=”job_201201280957_0001″ LAUNCH_TIME=”1327725089918″ TOTAL_MAPS=”5″ TOTAL_REDUCES=”1″ JOB_STATUS=”PREP” .
Task TASKID=”task_201201280957_0001_m_000006″ TASK_TYPE=”SETUP” START_TIME=”1327725092793″ SPLITS=”" .
MapAttempt TASK_TYPE=”SETUP” TASKID=”task_201201280957_0001_m_000006″ TASK_ATTEMPT_ID=”attempt_201201280957_0001_m_000006_0″ START_TIME=”1327725093339″ TRACKER_NAME=”tracker_192\.168\.65\.129:localhost/127\.0\.0\.1:1276″ HTTP_PORT=”50060″ .
MapAttempt TASK_TYPE=”SETUP” TASKID=”task_201201280957_0001_m_000006″ TASK_ATTEMPT_ID=”attempt_201201280957_0001_m_000006_0″ TASK_STATUS=”FAILED” FINISH_TIME=”1327725097996″ HOSTNAME=”tracker_192\.168\.65\.129″ ERROR=”java\.io\.IOException: Task process exit with nonzero status of 1\.
at org\.apache\.hadoop\.mapred\.TaskRunner\.run(TaskRunner\.java:425)
” .
MapAttempt TASK_TYPE=”SETUP” TASKID=”task_201201280957_0001_m_000006″ TASK_ATTEMPT_ID=”attempt_201201280957_0001_m_000006_1″ START_TIME=”1327725098089″ TRACKER_NAME=”tracker_192\.168\.65\.129:localhost/127\.0\.0\.1:1276″ HTTP_PORT=”50060″ .
MapAttempt TASK_TYPE=”SETUP” TASKID=”task_201201280957_0001_m_000006″ TASK_ATTEMPT_ID=”attempt_201201280957_0001_m_000006_1″ TASK_STATUS=”FAILED” FINISH_TIME=”1327725101949″ HOSTNAME=”tracker_192\.168\.65\.129″ ERROR=”java\.io\.IOException: Task process exit with nonzero status of 1\.
at org\.apache\.hadoop\.mapred\.TaskRunner\.run(TaskRunner\.java:425)
” .
MapAttempt TASK_TYPE=”SETUP” TASKID=”task_201201280957_0001_m_000006″ TASK_ATTEMPT_ID=”attempt_201201280957_0001_m_000006_2″ START_TIME=”1327725102011″ TRACKER_NAME=”tracker_192\.168\.65\.129:localhost/127\.0\.0\.1:1276″ HTTP_PORT=”50060″ .
MapAttempt TASK_TYPE=”SETUP” TASKID=”task_201201280957_0001_m_000006″ TASK_ATTEMPT_ID=”attempt_201201280957_0001_m_000006_2″ TASK_STATUS=”FAILED” FINISH_TIME=”1327725105855″ HOSTNAME=”tracker_192\.168\.65\.129″ ERROR=”java\.io\.IOException: Task process exit with nonzero status of 1\.
at org\.apache\.hadoop\.mapred\.TaskRunner\.run(TaskRunner\.java:425)
” .
MapAttempt TASK_TYPE=”SETUP” TASKID=”task_201201280957_0001_m_000006″ TASK_ATTEMPT_ID=”attempt_201201280957_0001_m_000006_3″ START_TIME=”1327725105918″ TRACKER_NAME=”tracker_192\.168\.65\.129:localhost/127\.0\.0\.1:1276″ HTTP_PORT=”50060″ .
MapAttempt TASK_TYPE=”SETUP” TASKID=”task_201201280957_0001_m_000006″ TASK_ATTEMPT_ID=”attempt_201201280957_0001_m_000006_3″ TASK_STATUS=”FAILED” FINISH_TIME=”1327725109746″ HOSTNAME=”tracker_192\.168\.65\.129″ ERROR=”java\.io\.IOException: Task process exit with nonzero status of 1\.
at org\.apache\.hadoop\.mapred\.TaskRunner\.run(TaskRunner\.java:425)
” .
Task TASKID=”task_201201280957_0001_m_000006″ TASK_TYPE=”SETUP” TASK_STATUS=”FAILED” FINISH_TIME=”1327725109746″ ERROR=”java\.io\.IOException: Task process exit with nonzero status of 1\.
at org\.apache\.hadoop\.mapred\.TaskRunner\.run(TaskRunner\.java:425)
” TASK_ATTEMPT_ID=”" .
Task TASKID=”task_201201280957_0001_m_000005″ TASK_TYPE=”CLEANUP” START_TIME=”1327725109746″ SPLITS=”" .
MapAttempt TASK_TYPE=”CLEANUP” TASKID=”task_201201280957_0001_m_000005″ TASK_ATTEMPT_ID=”attempt_201201280957_0001_m_000005_0″ START_TIME=”1327725109793″ TRACKER_NAME=”tracker_192\.168\.65\.129:localhost/127\.0\.0\.1:1276″ HTTP_PORT=”50060″ .
MapAttempt TASK_TYPE=”CLEANUP” TASKID=”task_201201280957_0001_m_000005″ TASK_ATTEMPT_ID=”attempt_201201280957_0001_m_000005_0″ TASK_STATUS=”FAILED” FINISH_TIME=”1327725113668″ HOSTNAME=”tracker_192\.168\.65\.129″ ERROR=”java\.io\.IOException: Task process exit with nonzero status of 1\.
at org\.apache\.hadoop\.mapred\.TaskRunner\.run(TaskRunner\.java:425)
” .
MapAttempt TASK_TYPE=”CLEANUP” TASKID=”task_201201280957_0001_m_000005″ TASK_ATTEMPT_ID=”attempt_201201280957_0001_m_000005_1″ START_TIME=”1327725113730″ TRACKER_NAME=”tracker_192\.168\.65\.129:localhost/127\.0\.0\.1:1276″ HTTP_PORT=”50060″ .
MapAttempt TASK_TYPE=”CLEANUP” TASKID=”task_201201280957_0001_m_000005″ TASK_ATTEMPT_ID=”attempt_201201280957_0001_m_000005_1″ TASK_STATUS=”FAILED” FINISH_TIME=”1327725117543″ HOSTNAME=”tracker_192\.168\.65\.129″ ERROR=”java\.io\.IOException: Task process exit with nonzero status of 1\.
at org\.apache\.hadoop\.mapred\.TaskRunner\.run(TaskRunner\.java:425)
” .
MapAttempt TASK_TYPE=”CLEANUP” TASKID=”task_201201280957_0001_m_000005″ TASK_ATTEMPT_ID=”attempt_201201280957_0001_m_000005_2″ START_TIME=”1327725117605″ TRACKER_NAME=”tracker_192\.168\.65\.129:localhost/127\.0\.0\.1:1276″ HTTP_PORT=”50060″ .
MapAttempt TASK_TYPE=”CLEANUP” TASKID=”task_201201280957_0001_m_000005″ TASK_ATTEMPT_ID=”attempt_201201280957_0001_m_000005_2″ TASK_STATUS=”FAILED” FINISH_TIME=”1327725122027″ HOSTNAME=”tracker_192\.168\.65\.129″ ERROR=”java\.io\.IOException: Task process exit with nonzero status of 1\.
at org\.apache\.hadoop\.mapred\.TaskRunner\.run(TaskRunner\.java:425)
” .
MapAttempt TASK_TYPE=”CLEANUP” TASKID=”task_201201280957_0001_m_000005″ TASK_ATTEMPT_ID=”attempt_201201280957_0001_m_000005_3″ START_TIME=”1327725122058″ TRACKER_NAME=”tracker_192\.168\.65\.129:localhost/127\.0\.0\.1:1276″ HTTP_PORT=”50060″ .
MapAttempt TASK_TYPE=”CLEANUP” TASKID=”task_201201280957_0001_m_000005″ TASK_ATTEMPT_ID=”attempt_201201280957_0001_m_000005_3″ TASK_STATUS=”FAILED” FINISH_TIME=”1327725125949″ HOSTNAME=”tracker_192\.168\.65\.129″ ERROR=”java\.io\.IOException: Task process exit with nonzero status of 1\.
at org\.apache\.hadoop\.mapred\.TaskRunner\.run(TaskRunner\.java:425)
” .
Task TASKID=”task_201201280957_0001_m_000005″ TASK_TYPE=”CLEANUP” TASK_STATUS=”FAILED” FINISH_TIME=”1327725125949″ ERROR=”java\.io\.IOException: Task process exit with nonzero status of 1\.
at org\.apache\.hadoop\.mapred\.TaskRunner\.run(TaskRunner\.java:425)
” TASK_ATTEMPT_ID=”" .
Job JOBID=”job_201201280957_0001″ FINISH_TIME=”1327725125996″ JOB_STATUS=”FAILED” FINISHED_MAPS=”0″ FINISHED_REDUCES=”0″ .
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
at http://localhost:50060/logs/userlogs/attempt_201201261350_0001_m_000005_0/syslog, I see following log
2012-01-26 14:10:33,259 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
java.io.FileNotFoundException: File C:/tmp/hadoop-Administrator/mapred/local/taskTracker/jobcache/job_201201261350_0001/attempt_201201261350_0001_m_000005_0/work/tmp does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:244)
at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:527)
at org.apache.hadoop.mapred.Child.main(Child.java:143)
2012-01-26 14:10:33,275 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the task
Thanks!
Sandeep.
Antony Said,
January 30, 2012 @ 9:19 pm
Hello Vlad,
Thanks for your tutorial!!
I am running into issues when I try to set up the Hadoop plugin on eclipse.
There error is: problem opening perspective ‘org.apache.hadoop.eclipse.Perspective’.
My JDK is jdk1.6.0_25
Antony Said,
January 31, 2012 @ 12:44 pm
Nevermind, my path was set to JDK 1.5
Alex Said,
February 4, 2012 @ 8:23 pm
Thanx man, your tutorial is more than perfect.
It is not easy to find tutorials like this.
Everything worked for me, I tried to run WordCount example.
Keep up the fantastic work!
Alfan Said,
February 16, 2012 @ 7:29 am
First, thank you for giving such nice and great tutorial
I’ve followed your instructions step-by-step. and it was success until 10th page. Problem happened when i was trying to follow your instruction in 11st page.
after the step number 6 (page 11), Project explorer on my eclipse doesn’t want to show the HDFS structure…
it was..
DFS Locations
—> localhost
——-> (1)
————> (node: null)
Please kindly reply my message
thank you very much for your attention,
usesay Said,
February 16, 2012 @ 8:49 pm
hi Vlad,
Many thanks for the tutorial.
I have Win 7 64-bit, Hadoop0.203.00 and Eclipse 3.7 and Plug-in for hadoop-0.203.jar
I had issues with my eclipse like this:
An internal error occurred during: “Connecting to DFS VMware server”.org/apache/commons/configuration/Configuration
i appreciate your help.
regards,
Unisa
vlad Said,
February 17, 2012 @ 1:47 pm
Hi,
The tutorial is written for Hadoop 18 and Eclipse 3.4. The eclipse plugin does not work in newer versions of the eclipse.
vlad Said,
February 17, 2012 @ 1:48 pm
Check your eclipse version. The plugin supplied with the hadoop is for the Eclipse Galileo it does not work for newer version. And the symptos are exactly as you described above.
usesay Said,
February 18, 2012 @ 1:26 am
Hi Vlad,
Many thanks, Infact i crased my laptop so i had to rrun everything afresh. In so doing, i got into trouble again with Cygwin…i got these Commanp prompt like this:
unisa@unisa-PC ~/hadoop-0.20.203.0
$ bin/hadoop namenode -format
bin/hadoop: line 297: c:/Program: No such file or directory
12/02/18 07:37:56 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = unisa-PC/192.168.163.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.203.0
STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203 -r 1099333; compiled by ‘oom’ on Wed May 4 07:57:50 PDT 2011
************************************************************/
Re-format filesystem in \tmp\hadoop-unisa\dfs\name ? (Y or N) y
Format aborted in \tmp\hadoop-unisa\dfs\name
12/02/18 07:37:58 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at unisa-PC/192.168.163.1
Why line 279
Vijay jadhav Said,
February 20, 2012 @ 4:21 am
Now I am facing this error when All setup are done correctly but at the last when I execute on Hadoop then I give this error ………plz help me…….
12/02/20 16:33:24 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/02/20 16:33:26 INFO mapred.FileInputFormat: Total input paths to process : 4
12/02/20 16:33:31 INFO mapred.JobClient: Running job: job_201202201507_0009
12/02/20 16:33:32 INFO mapred.JobClient: map 0% reduce 0%
12/02/20 16:33:48 INFO mapred.JobClient: Task Id : attempt_201202201507_0009_m_000006_0, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
12/02/20 16:33:56 INFO mapred.JobClient: Task Id : attempt_201202201507_0009_m_000006_1, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
12/02/20 16:34:04 INFO mapred.JobClient: Task Id : attempt_201202201507_0009_m_000006_2, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
12/02/20 16:34:20 INFO mapred.JobClient: Task Id : attempt_201202201507_0009_m_000005_0, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
12/02/20 16:34:27 INFO mapred.JobClient: Task Id : attempt_201202201507_0009_m_000005_1, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
12/02/20 16:34:36 INFO mapred.JobClient: Task Id : attempt_201202201507_0009_m_000005_2, Status : FAILED
java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at TestDriver.main(TestDriver.java:40)