Hadoop tutorial for Windows and Eclipse.

Just posted a tutorial on how to configure hadoop environment for Windows using CYGWIN. The tutorial explains how to set-up a hadoop cluster in the pseudo distributed mode and how to get it working with the Eclipse.

If you have any questions / comments / suggestions about this tutorial post them here.

The tutorial is located here.

341 Comments »

  1. Ben Said,

    March 29, 2009 @ 10:19 am

    Thanks for your excellent tutorial! I followed it this weekend and was able to get mostly up and running.

    One question I had was how to use it with EC2 — I set up on EC2 rather than on localhost, and I’m wondering what I need to do in order to make it run… getting weird unknown host errors when I run, despite having set up a proxy server.

    Thanks for the very helpful tutorial!
    Ben

  2. vlad Said,

    March 29, 2009 @ 11:14 am

    No problem.

    Setting hadoop right on EC2 could be tricky. I am going to post another tutorial about it in a few weeks.

  3. Rez Said,

    March 31, 2009 @ 5:17 pm

    Hey, this page on your tutorial (Unpacking Hadoop)

    http://v-lad.org/Tutorials/Hadoop/09%20-%20unpack%20hadoop.html

    is not working.

  4. vlad Said,

    April 9, 2009 @ 8:30 am

    Strange. Works for me, can’t see what the problem is. Does anybody else have this problem?

  5. Jeff Said,

    April 9, 2009 @ 2:32 pm

    Thanks for the tutorial… it would have saved me a few hours of frustration.

    Have you tried it with other versions of Eclipse. The main distribution is 3.4 (Ganymede), which will shortly be 3.5 in May.

  6. vlad Said,

    April 9, 2009 @ 10:09 pm

    Jeff,

    I tried with the other version of eclipse and it doesn’t work with 3.4 and probably won’t work with 3.5 until somebody fixes the hadoop plugin, because plug-in API has been changed for new versions of eclipse. You can use the plug-in with 3.4 to browse for the HDFS, but you won’t be able to start the project.

  7. Joseph Said,

    April 15, 2009 @ 12:09 am

    Vlad,

    thanks for the well documented tutorial. it is good work..

    Towards the last step i got following error
    09/04/15 15:00:33 INFO mapred.JobClient: Task Id : attempt_200904151224_0004_m_000000_2, Status : FAILED
    java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
    at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.Child.main(Child.java:158)

    kindly advice for some clue..

    my code is as follows:
    // TODO: specify input and output DIRECTORIES (not files)
    //conf.setInputPath(new Path(“src”));
    //conf.setOutputPath(new Path(“out”));

    conf.setInputFormat(TextInputFormat.class);
    conf.setOutputFormat(TextOutputFormat.class);

    FileInputFormat.setInputPaths(conf, new Path(“In”));
    FileOutputFormat.setOutputPath(conf, new Path(“Out3″));

    thanks and regards
    Joseph

  8. vlad Said,

    April 15, 2009 @ 12:00 pm

    The error you getting is actually correct. The Mappers / Reducers generated by the plug-in need some tweaking. I will post another tutorial regarding sometime in May.

  9. ash Said,

    April 16, 2009 @ 11:40 pm

    Hi Vlad,

    thanks for the excellent turorial.. in the last step when i try to run the TestDriver class i get this error.

    Pls help…

    >>>>>>>>> START >>>>>>>

    09/04/17 11:58:39 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    09/04/17 11:58:40 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar could only be replicated to 0 nodes, instead of 1
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1280)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
    at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)

    at org.apache.hadoop.ipc.Client.call(Client.java:697)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
    at $Proxy0.addBlock(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy0.addBlock(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

    09/04/17 11:58:40 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar retries left 4
    09/04/17 11:58:40 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar could only be replicated to 0 nodes, instead of 1
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1280)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
    at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)

    at org.apache.hadoop.ipc.Client.call(Client.java:697)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
    at $Proxy0.addBlock(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy0.addBlock(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

    09/04/17 11:58:40 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar retries left 3
    09/04/17 11:58:41 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar could only be replicated to 0 nodes, instead of 1
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1280)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
    at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)

    at org.apache.hadoop.ipc.Client.call(Client.java:697)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
    at $Proxy0.addBlock(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy0.addBlock(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

    09/04/17 11:58:41 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar retries left 2
    09/04/17 11:58:42 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar could only be replicated to 0 nodes, instead of 1
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1280)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
    at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)

    at org.apache.hadoop.ipc.Client.call(Client.java:697)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
    at $Proxy0.addBlock(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy0.addBlock(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

    09/04/17 11:58:42 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar retries left 1
    09/04/17 11:58:46 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar could only be replicated to 0 nodes, instead of 1
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1280)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
    at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)

    at org.apache.hadoop.ipc.Client.call(Client.java:697)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
    at $Proxy0.addBlock(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy0.addBlock(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

    09/04/17 11:58:46 WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null
    09/04/17 11:58:46 WARN hdfs.DFSClient: Could not get block locations. Source file “/tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar” – Aborting…
    org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-ashwath_kannan/mapred/system/job_200904171117_0002/job.jar could only be replicated to 0 nodes, instead of 1
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1280)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
    at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)

    at org.apache.hadoop.ipc.Client.call(Client.java:697)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
    at $Proxy0.addBlock(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy0.addBlock(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

    >>>>>> END >>>>>

  10. vlad Said,

    April 17, 2009 @ 7:24 am

    I seen this error before. Usually it is caused by having not enough space on your workstation. Try to clean up some space and recreate HDFS. Also check for error messages in DataNode and NameNode windows.

  11. tony Said,

    April 19, 2009 @ 12:18 am

    hi, i followed what you showed and also that quick start on the official apache website.
    There is a big problem when i execute the command :
    “bin/hadoop namenode -format”
    it shows that the bin/hadoop ,the “hadoop” script, contains certain errors.
    While i installed it on linux in VM, that was ok.
    how can i run this hadoop script in cygwin correctly?
    thanks

  12. vlad Said,

    April 20, 2009 @ 2:59 pm

    Could you post the error message that you are getting?

  13. Wen-Han Said,

    April 22, 2009 @ 1:26 pm

    Hi VLAD,

    May I know how recent is your tutorial? Is it updated to the most recent versions of hadoop and eclipse?

    Thank you,

    Wen-Han

  14. vlad Said,

    April 22, 2009 @ 4:16 pm

    The tutorial was written in April using the most recent version of the hadoop 0.19.1. As for eclipse the newest version of the Eclipse ( Ganymede ) is not compatible with the Hadoop plug-in that is supplied with version 0.19.1, so you have to use previous version of the eclipse ( Europa ).

    I saw that the new version of the Hadoop 0.20 came out, so I will take a look at what have changed and update the tutorial if needed.

  15. Saurabh Said,

    April 23, 2009 @ 5:59 am

    Hi vlad tutorial is good
    I am setting it on my Mandriva Machine &whenever i run

    ssh localhost
    I get::

    [abc@localhost .ssh]$ ssh localhost
    ssh: connect to host localhost port 22: Connection refused

    Please Help me

  16. vlad Said,

    April 23, 2009 @ 6:23 am

    Hmm,

    This tutorial is done for windows machines. To resolve your problem check that you have sshd installed and running. Also check that you don’t have firewall blocking port 22.

  17. Sid Said,

    April 25, 2009 @ 12:38 pm

    Hi I am working on the hadoop eclipse in Linux everything was working fine when one day hadoop started to ignore any code changes I did in my project. Instead it just ran an old copy of the code from somewhere. Looking at the mapred.local folder where the temporary source files are jared together to run the job the source code was indeed changed… i created another dummy project in eclipse and ran it and it ran just fine, changes were reflected every time… What could be the problem?

  18. vlad Said,

    April 25, 2009 @ 6:20 pm

    Sorry man never seen that happen. Maybe somebody else on this board will comment.

  19. Joe Said,

    May 1, 2009 @ 4:44 am

    Vlad,
    Thank you so much for this tutorial. I am having a problem when running : bin/hadoop namenode format

    First it said “JAVA_HOME not set”, so I set my windows environment variable to the correct path, which is c:\program files\Java\jdk1.6.0_06

    Then I closed and re-opened cygwin, and tried again. This time it appeard to work, but the first line of the output was “bin/hadoop: line 234: C:\Program: command not found”. The rest of the output looked like your screenshot. Is this normal?

    Thanks,
    Joe

  20. Wen-Han Said,

    May 1, 2009 @ 11:12 am

    Hi vlad,

    thanks for your reply for last one. I configure Eclipse Europa according to Yahoo tutorial on hadoop:
    http://public.yahoo.com/gogate/hadoop-tutorial/html/module3.html

    and in the instruction it goes about creating new DFS Location:
    “…..Next, click on the “Advanced” tab. There are two settings here which must be changed.

    Scroll down to hadoop.job.ugi. It contains your current Windows login credentials. Highlight the first comma-separated value in this list (your username) and replace it with hadoop-user.”

    I can’t find this attribute(hadoop.job.ugi) in the advance list from “Define Hadoop location” on Eclipse. Do you have an idea?

    Thank you, fast reply will be much appreciated.

    Wen-Han

  21. Wen-Han Said,

    May 1, 2009 @ 11:15 am

    PS., The yahoo tutorial on Hadoop have the hadoop installed on VM ware, not in localhost by cygwin.

    Thanks,

  22. sneha Said,

    May 2, 2009 @ 8:52 am

    hello!!

    thank u 4 d good hadoop tutorial… i am setting up a hadoop cluster of 4 systems…when i run bin/start-dfs.sh command i get an error as error:JAVA_HOME NOT set .. can u plz let me know d solution n also can u let me know how to set java home path in .bash_profile in cygwin promt
    thank you!!!!!!!!!1

  23. Muhammad Mudassar Said,

    May 5, 2009 @ 11:36 pm

    Hi
    Tutorial is helping one. I want to know about that how to upload some images or some structured data on HDFS by using cygwin, eclipse, in windows.
    One more thing that after restart of my pc while working with hadoop it was not working well but then I restarted the CYGWIN sshd service it started again well. I want to know that after every time restarting the pc the service also has to be restarted?

    Thanks.

  24. vlad Said,

    May 8, 2009 @ 7:48 am

    First you have to ask yourself a question, what are you planning to do with your data. Depending on the answer you could use the hdfs cp command or use HBase.

    Note that if you are planning to use binary data you might have to write your own record readers.

  25. vlad Said,

    May 8, 2009 @ 7:51 am

    As for your second comment. Make sure that in the Services window your sshd service is set to start automatically.

  26. vlad Said,

    May 8, 2009 @ 8:01 am

    bin/start_dfs.sh script won’t work in the environment described in this tutorial, to start DFS services refer to section 10 of the tutorial. On the additional machines you have to start only data node and task tracker processes.

    Remember that on the worker machines you have to edit the hadoop-site file to configure the name of your namenode machine instead of localhost. Also make sure all necessary firewall ports are open.

  27. vlad Said,

    May 8, 2009 @ 8:02 am

    That’s right. But this way you will incur the penalties of running another operating system, and it is tricky to debug processes in vmware.

  28. vlad Said,

    May 8, 2009 @ 8:04 am

    Not sure, what could be causing this. Check the dates on the files.

  29. vlad Said,

    May 8, 2009 @ 8:05 am

    It’s the problem with the scripts. Try setting up your JDK in the directory that doesn’t have a space. I use C:\Java\JDK1.6 for that.

  30. Kim Said,

    May 13, 2009 @ 2:55 pm

    This tutorial is great. Hadoop is running perfectly in VM (windows xp).
    Just one question.
    Is there any way that I can use “start-all.sh”, instead of initiating “hadoop namenode”, “hadoop jobtracker”, …. in multiple cygwin windows?

    Thank you again, for your all efforts.

  31. vlad Said,

    May 13, 2009 @ 9:23 pm

    Not in Windows XP. The hadoop start scripts are written for Linux machines and for debugging purposes it is just easier to run each of the hadoop components in its own window.

  32. Mayank Said,

    May 21, 2009 @ 4:35 am

    Hi vlad, the tutorial is great.
    Currently I am facing problem in upload data step, in my eclipse i get localhost->2->error I am unable to see the user and “In” folder and so on…please suggest me what to do now..

  33. Charitha Said,

    May 28, 2009 @ 2:12 am

    error in eclipse europa while running a TestDriver.java….

    please advise me. help will be appriciated..

    09/05/28 14:40:01 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:9100/user/charitha/Out already exists
    at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:111)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:793)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142)
    at TestDriver.main(TestDriver.java:41)

    Regards,
    Charitha Reddy.

  34. vlad Said,

    May 28, 2009 @ 8:27 am

    Looks like it is a second time you are trying to run the project. Every time you run the project it creates “Out” directory to store the output. You have to delete that directory before you run your project or change the code to create a new directory every time you run. Look at the hadoop examples to see how to do the later.

  35. vlad Said,

    May 28, 2009 @ 8:36 am

    Do you see any activity in the cygwin windows when you are trying to connect. Could be the firewall blocking incoming ports.
    Use the following command from the command window and let me know what do you get, note that you have to have hadoop started.

    telnet localhost 9100

  36. Joseph Said,

    May 28, 2009 @ 8:59 pm

    Vlad

    would like to know whether you have some update on the following
    >>snip>>
    The error you getting is actually correct. The Mappers / Reducers generated by the plug-in need some tweaking. I will post another tutorial regarding sometime in May.
    vlad – April 15th, 2009 at 12:00 pm
    >>end of snip>>

  37. vlad Said,

    May 28, 2009 @ 9:22 pm

    Sorry, been really busy lately.

  38. Martinus Said,

    June 6, 2009 @ 7:48 am

    Hello Vlad,

    Thanks for the Tutorial. I still have Problem with compiling the TestDriver class. After I compile the class, I got Error message from Eclipse:

    09/06/06 16:44:03 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    09/06/06 16:44:03 INFO mapred.FileInputFormat: Total input paths to process : 4
    09/06/06 16:44:04 INFO mapred.JobClient: Running job: job_200906061639_0001
    09/06/06 16:44:05 INFO mapred.JobClient: map 0% reduce 0%
    09/06/06 16:44:14 INFO mapred.JobClient: Task Id : attempt_200906061639_0001_m_000000_0, Status : FAILED
    java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
    at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.Child.main(Child.java:158)

    09/06/06 16:44:18 INFO mapred.JobClient: Task Id : attempt_200906061639_0001_m_000000_1, Status : FAILED
    java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
    at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.Child.main(Child.java:158)

    09/06/06 16:44:22 INFO mapred.JobClient: Task Id : attempt_200906061639_0001_m_000000_2, Status : FAILED
    java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
    at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.Child.main(Child.java:158)

    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
    at TestDriver.main(TestDriver.java:40)

    I have no idea, I use all Programs, you wrote in tutorial (eclipse 3.3.2, hadoop 1.9.1, etc).

    Thanks

    Martinus

  39. Carspar Said,

    June 9, 2009 @ 12:49 am

    Hi vlad, the tutorial is great.

    I followed your tutorial and met a probelm in step:11 – Setup Hadoop Location in Eclipse.

    At the step 6, In the Project Explorer tab on the left hand side of the Eclipse window, find the DFS Locations item. Open it using the “+” icon on its left. Inside, you should see the localhost location reference with the blue elephant icon. Keep opening the items below it until you see something like the image below.

    I used the “+” icon on the left. Inside, it is a folder with empty name like your image. When I keep opening, the following folder is not a “tmp(1)”, but a “Error: null”.

    thanks,

  40. Carspar Said,

    June 9, 2009 @ 1:41 am

    I solved the problem. It is because I did not set the environment variable of cygwin rightly.

    Thanks,

  41. kerenann Said,

    July 22, 2009 @ 12:37 am

    Hello,vlad,your tutorial is very helpful.
    Only one problem in step:11-Setup Hadoop Location in Eclipse.
    At the step 6, in the project explorer tab on the left side of the eclipse window, i have found the DFS location. clink the “+” icon. There has a folder named (1). When i keep opening, the following folder is not “tmp(1)”, but a “Error:call to localhost/127.0.0.1:9000 failed on connection exception:java.net.ConnectException: Connection refused: no further information”.
    I think my environment variable of cygwin is right.
    so, I don’t know what’s wrong with it?
    thanks

  42. Wylie van den Akker Said,

    July 27, 2009 @ 11:07 am

    Just thought I would mention for hadoop-0.20.0+ under cygwin you also need to install rsynch (under the “NET” section) for filesystem replication to work. Additionally the xml configuration is split up into 3 different files. Details on that can be found here: http://hadoop.apache.org/common/docs/r0.20.0/cluster_setup.html

    Cheers,
    Wylie
    Collective Medical Technologies
    http://www.collectivemedicaltech.com

  43. vlad Said,

    August 5, 2009 @ 6:05 am

    Check if your cluster is running. [ No error messages in the command windows]. Also check if you have firewall installed that might be preventing the connections.

  44. Arun Jamwal Said,

    August 7, 2009 @ 4:55 pm

    To get rid of
    java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
    Change the following lines in TestDriver.java as

    //conf.setOutputKeyClass(Text.class);
    //conf.setOutputValueClass(IntWritable.class);
    conf.setOutputKeyClass(LongWritable.class);
    conf.setOutputValueClass(Text.class);

    HTH,
    Arun Jamwal

  45. richilee Said,

    August 22, 2009 @ 1:08 pm

    for those who have the “bin/hadoop: line 234: C:\Program: command not found” problem. This is caused by the the whitespace between “Program Files”. In other words. if your JAVA_HOME is “c:\Program Files\java”, there is a whitespace between “Program and Files”. So one way to solve the problem is put your jdk in a different folder. I put my jdk in c:\java\jdk . then everything works pretty well. hope it helps.

  46. Charanjeet Said,

    September 16, 2009 @ 3:49 am

    Hi All,

    I was using the article for installing the hadoop.

    While running the command
    $ bin/hadoop namenode -format

    I found that there are errors because the installed JDK was in ‘C:\Prpgram Files’ and the command was reffering it through environment veriable JAVA_HOME since there is space in ‘Program’ and ‘Files’ it was dying.

    I resolved it by creating a cymbolic link as

    $ln -s /cygdrive/c/Program Files/java/jdk1.6.0_02 /java

    inside ‘/’ folder through cygwin and made an entry in <>/conf/hadoop-env.sh like

    ‘export JAVA_HOME=/java’

    Regards
    Charanjeet singh
    Senior Engineer
    Impetus infotech India Pvt. Ltd.

  47. Ken Church Said,

    September 20, 2009 @ 1:46 pm

    Extremely useful. I’m thinking of pointing a bunch of students at this. One detail: the tutorial has some stale links to hadoop-0.19.1 (as well as a number of references to that elsewhere in the text). It would be good to write the tutorial in such a way that the text doesn’t need to be updated with each new version.

  48. Deng Wanyu Said,

    September 30, 2009 @ 1:17 am

    Hi:
    it is very helpful for me!
    my problem is:
    I upload the txt file by command, but I find the uploaded file is empty. why?

  49. Azuryy Said,

    October 14, 2009 @ 6:50 am

    If I don’t open five seperate Cygwin windows, instead, I run start-all.sh, I got: Could not obtail block error.

    but I open five seperated Cygwin windows as said in the tuorial, it does work.

  50. Azuryy Said,

    October 14, 2009 @ 7:06 pm

    My Found:

    If you want to run start-all.sh, instead open five seperated Cygwin windows as this toturial said, please do
    hadoop fs -put before you run start-all.sh, if not, you will get “Could not obtail block” error when you run your job.

  51. sam Said,

    October 22, 2009 @ 4:08 pm

    i get this error when i open the mapreduce perspective in eclipse and i dont see the file after localhost->1 in dsf locations the below errors was in the namenode window

    lVersion(org.apache.hadoop.dfs.ClientProtocol, 35) from 127.0.0.1:3282: error: j
    ava.io.IOException: Unknown protocol to name node: org.apache.hadoop.dfs.ClientP
    rotocol
    java.io.IOException: Unknown protocol to name node: org.apache.hadoop.dfs.Client
    Protocol
    at org.apache.hadoop.hdfs.server.namenode.NameNode.getProtocolVersion(Na
    meNode.java:98)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
    java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
    sorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
    09/10/22 15:58:08 INFO ipc.Server: IPC Server handler 4 on 9100, call getProtoco
    lVersion(org.apache.hadoop.dfs.ClientProtocol, 35) from 127.0.0.1:3282: error: j
    ava.io.IOException: Unknown protocol to name node: org.apache.hadoop.dfs.ClientP
    rotocol
    java.io.IOException: Unknown protocol to name node: org.apache.hadoop.dfs.Client
    Protocol
    at org.apache.hadoop.hdfs.server.namenode.NameNode.getProtocolVersion(Na
    meNode.java:98)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
    java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
    sorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)

  52. Ravi Said,

    October 25, 2009 @ 10:44 am

    Hi there, your tutorial is excellent. Very good job and I dont say that often.

    So I was trying to setup hbase using your hadoop tutorial. I was able to follow up to step 12 but when I try to execute

    $bin/hbase namenode -format
    : No such file or directory
    bin/hbase: line 45: $’\r’: command not found

    Can you tell me what am I missing?
    Thanks

  53. Ravi Said,

    October 25, 2009 @ 12:10 pm

    well after a few internet searches and 1 hour later, I am able to execute it, but now I get this error:
    $ bin/hbase namenode -format
    Exception in thread “main” java.lang.NoClassDefFoundError: namenode

  54. Sharad Said,

    October 29, 2009 @ 4:14 am

    Is there an elegant way to stop dfs? Stopping using Ctrl-C seems to corrupt it and bin/hadoop/stop-dfs.sh don’t seem to work (some error message like localhost: cat: cannot open file /dev/fs/C/tmp/hadoop-sk-secondarynamenode.pid : No such file or directory)

    Thanks!

  55. vlad Said,

    October 29, 2009 @ 8:30 am

    It should be bin/hdfs not bin/hbase

  56. vlad Said,

    October 29, 2009 @ 8:31 am

    Not sure. Never had the problem with corruption.

  57. steve Said,

    November 2, 2009 @ 12:01 pm

    Great tutorial!
    I’ve almost got this working, but I’m having trouble connecting to localhost with ssh.
    If I do:
    ssh localhost -v
    the last two lines are:
    Offering public key: /home/user.name/.ssh/id_rsa
    Connection closed by xxx.x.x.x

    Any ideas what is going on?
    I also had to manually add ssh_server to administrators and change the password in order to get the sshd service to run.

    -Steve

  58. RezaMor Said,

    November 9, 2009 @ 8:05 pm

    Thanks for your excellent tutorial! However, in the last
    step I got the following error, and I mentioned that two others wrote the same Error as comment for you.
    Would you please answer Me.

    09/11/10 12:53:01 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    09/11/10 12:53:01 INFO mapred.FileInputFormat: Total input paths to process : 4
    09/11/10 12:53:02 INFO mapred.JobClient: Running job: job_200911101209_0003
    09/11/10 12:53:03 INFO mapred.JobClient: map 0% reduce 0%
    09/11/10 12:53:13 INFO mapred.JobClient: Task Id : attempt_200911101209_0003_m_000000_0, Status : FAILED
    java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
    at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.Child.main(Child.java:158)

    09/11/10 12:53:17 INFO mapred.JobClient: Task Id : attempt_200911101209_0003_m_000000_1, Status : FAILED
    java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
    at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.Child.main(Child.java:158)

    09/11/10 12:53:22 INFO mapred.JobClient: Task Id : attempt_200911101209_0003_m_000000_2, Status : FAILED
    java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
    at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.Child.main(Child.java:158)

    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
    at TestDriver.main(TestDriver.java:41)

  59. vlad Said,

    November 18, 2009 @ 12:38 pm

    The reason you are getting this error is that API has been changed since hadoop version .17. And the code generated by eclipse needs some tweaking.

  60. Rill Said,

    November 24, 2009 @ 7:58 pm

    I got a problem in eclipse plugin.

    —————————————————-
    Cannot connect to the Map/Reduce location:localhost.
    Failed to get the current user’s information.
    —————————————————-

    user of my windows need password to login.

    Please help me~, thank you!

  61. Jason Venner Said,

    January 1, 2010 @ 10:18 am

    The prohadoop website has a lot of information on Hadoop and Hadoop setup as well as a good community of people to ask and answer questions with.

    This particular error java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable

    is because the input format for your job is TextInputFormat, rather than KeyValueTextInputFormat
    TextInputFormat provides a LongWritable as a key, which is the input line number, and a Text as the value, which is the input line data.

    KeyValueTextInputFormat provides a Text key, that portion of the input line up to the first TAB character, and a Text value that portion of the input line after the first TAB character.

    Alternatively you can modify the definition of your Map class to accept a LongWritable as the input key type.

  62. Swetha Said,

    January 4, 2010 @ 1:17 am

    hello!
    When I run the code I get the below error. I understand there is some change in the path where the job cache files are created; but I don’t know how to change it. Any clue??
    Thanks in advance.

    INFO mapred.JobClient: Task Id : attempt_201001041128_0006_m_000006_1, Status : FAILED
    java.io.FileNotFoundException: File C:/tmp/hadoop-MBS/mapred/local/taskTracker/jobcache/job_201001041128_0006/attempt_201001041128_0006_m_000006_1/work/tmp does not exist.
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
    at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
    at org.apache.hadoop.mapred.Child.main(Child.java:155)

  63. Abhishek Said,

    February 27, 2010 @ 2:15 pm

    Hi,
    In step 5 when I hit the command explorer as shown in the tutorial I get an error

    abc@ZULFI ~
    $explorer
    -bash: explorer: command not found

    Anybody any ideas ??

  64. vlad Said,

    February 27, 2010 @ 5:25 pm

    Hi,

    What is your system? Is it Windows XP?
    Also, type this command:

    echo $PATH

    and post the results here

  65. vlad Said,

    February 27, 2010 @ 9:06 pm

    Abishek,

    Either your system is something not standard or your PATH variable is not set-up right. Type this command in the cygwin window and post the output here:

    echo $PATH

    Vlad

  66. Keith Said,

    March 2, 2010 @ 2:32 pm

    Everything works great, except…
    The Run As menu offers “On Hadoop”, but the Debug As menu does not. Obviously, the Run As options don’t trigger break points or otherwise offer debugging capability.

    So, how do I debug?

    Thanks.

  67. Iris Said,

    March 3, 2010 @ 10:39 am

    vlad,
    Thank you for the excellent tutouial.
    I have a problem in the last step, after running the code, it showed the error below:

    10/03/04 01:06:00 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    10/03/04 01:06:01 INFO mapred.FileInputFormat: Total input paths to process : 4
    10/03/04 01:06:01 INFO mapred.JobClient: Running job: job_201003040054_0001
    10/03/04 01:06:02 INFO mapred.JobClient: map 0% reduce 0%
    10/03/04 01:06:11 INFO mapred.JobClient: Task Id : attempt_201003040054_0001_m_000006_0, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    10/03/04 01:06:15 INFO mapred.JobClient: Task Id : attempt_201003040054_0001_m_000006_1, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    10/03/04 01:06:20 INFO mapred.JobClient: Task Id : attempt_201003040054_0001_m_000006_2, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    10/03/04 01:06:29 INFO mapred.JobClient: Task Id : attempt_201003040054_0001_m_000005_0, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    10/03/04 01:06:33 INFO mapred.JobClient: Task Id : attempt_201003040054_0001_m_000005_1, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    10/03/04 01:06:37 INFO mapred.JobClient: Task Id : attempt_201003040054_0001_m_000005_2, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
    at TestDriver.main(TestDriver.java:43)

    I have no idea about what’s wrong with it.
    Please help me!
    Thank you in advance!

  68. iris Said,

    March 4, 2010 @ 4:06 am

    vlad,
    Thank you for you excellent tutorial, however I have the error when running the last step, the output error as below,

    10/03/04 18:59:57 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    10/03/04 18:59:58 INFO mapred.FileInputFormat: Total input paths to process : 4
    10/03/04 18:59:59 INFO mapred.JobClient: Running job: job_201003041848_0001
    10/03/04 19:00:00 INFO mapred.JobClient: map 0% reduce 0%
    10/03/04 19:00:14 INFO mapred.JobClient: Task Id : attempt_201003041848_0001_m_000006_0, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    10/03/04 19:00:18 INFO mapred.JobClient: Task Id : attempt_201003041848_0001_m_000006_1, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    10/03/04 19:00:24 INFO mapred.JobClient: Task Id : attempt_201003041848_0001_m_000006_2, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    10/03/04 19:00:33 INFO mapred.JobClient: Task Id : attempt_201003041848_0001_m_000005_0, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    10/03/04 19:00:37 INFO mapred.JobClient: Task Id : attempt_201003041848_0001_m_000005_1, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    10/03/04 19:00:43 INFO mapred.JobClient: Task Id : attempt_201003041848_0001_m_000005_2, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
    at TestDriver.main(TestDriver.java:44)

    I have no idea about it, please help me!
    Thank you in advance.

  69. song Said,

    March 7, 2010 @ 5:29 am

    Thanks for your excellent tutorial! However, in step 9 setup Hadoop plugin ,I followed it but when I try to execture ,I didn’t find map/reduce in “open perspective”,why?

    Thanks!

  70. euqinoxia Said,

    March 19, 2010 @ 1:43 am

    hi,Vlad,

    thanks for excellent tutorial.

    Towards the last step i got following error:

    10/03/19 15:30:23 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    10/03/19 15:30:23 INFO mapred.FileInputFormat: Total input paths to process : 4
    10/03/19 15:30:24 INFO mapred.JobClient: Running job: job_201003191529_0002
    10/03/19 15:30:25 INFO mapred.JobClient: map 0% reduce 0%
    10/03/19 15:30:31 INFO mapred.JobClient: Task Id : attempt_201003191529_0002_m_000006_0, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    10/03/19 15:30:35 INFO mapred.JobClient: Task Id : attempt_201003191529_0002_m_000006_1, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    10/03/19 15:30:39 INFO mapred.JobClient: Task Id : attempt_201003191529_0002_m_000006_2, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    10/03/19 15:30:48 INFO mapred.JobClient: Task Id : attempt_201003191529_0002_m_000005_0, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    10/03/19 15:30:52 INFO mapred.JobClient: Task Id : attempt_201003191529_0002_m_000005_1, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    10/03/19 15:30:56 INFO mapred.JobClient: Task Id : attempt_201003191529_0002_m_000005_2, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
    at TestDriver.main(TestDriver.java:41)

    thanks and regards
    equinoxia

  71. vlad Said,

    March 19, 2010 @ 5:43 am

    Iris,

    You get this error because your Map task is failing. Could you post your mapper code here.

    Vlad

  72. chefc17 Said,

    March 20, 2010 @ 4:33 am

    hi,Vlad,

    thanks for excellent tutorial.
    i used eclipse galileo
    i have a problem at “setup hadoop location”
    in Project Explorer /DFS Locations / localhost
    it’s empty “(0)

  73. rananjay Said,

    March 26, 2010 @ 4:43 am

    Hi
    thanks for this nice tutorial.
    it is really a good work and i have no words to describe your effort.
    I follow every steps of this tutorial.
    But later while running Criver Class file I am getting these error :-

    10/03/26 17:07:36 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    10/03/26 17:07:36 INFO mapred.FileInputFormat: Total input paths to process : 4
    10/03/26 17:07:36 INFO mapred.JobClient: Running job: job_201003261700_0002
    10/03/26 17:07:37 INFO mapred.JobClient: map 0% reduce 0%
    10/03/26 17:07:45 INFO mapred.JobClient: Task Id : attempt_201003261700_0002_m_000006_0, Status : FAILED
    java.io.FileNotFoundException: File C:/tmp/hadoop-RananjayS/mapred/local/taskTracker/jobcache/job_201003261700_0002/attempt_201003261700_0002_m_000006_0/work/tmp does not exist.
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:244)
    at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:520)
    at org.apache.hadoop.mapred.Child.main(Child.java:143)

    10/03/26 17:07:50 INFO mapred.JobClient: Task Id : attempt_201003261700_0002_m_000006_1, Status : FAILED
    java.io.FileNotFoundException: File C:/tmp/hadoop-RananjayS/mapred/local/taskTracker/jobcache/job_201003261700_0002/attempt_201003261700_0002_m_000006_1/work/tmp does not exist.
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:244)
    at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:520)
    at org.apache.hadoop.mapred.Child.main(Child.java:143)

    10/03/26 17:07:55 INFO mapred.JobClient: Task Id : attempt_201003261700_0002_m_000006_2, Status : FAILED
    java.io.FileNotFoundException: File C:/tmp/hadoop-RananjayS/mapred/local/taskTracker/jobcache/job_201003261700_0002/attempt_201003261700_0002_m_000006_2/work/tmp does not exist.
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:244)
    at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:520)
    at org.apache.hadoop.mapred.Child.main(Child.java:143)

    10/03/26 17:08:06 INFO mapred.JobClient: Task Id : attempt_201003261700_0002_m_000005_0, Status : FAILED
    java.io.FileNotFoundException: File C:/tmp/hadoop-RananjayS/mapred/local/taskTracker/jobcache/job_201003261700_0002/attempt_201003261700_0002_m_000005_0/work/tmp does not exist.
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:244)
    at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:520)
    at org.apache.hadoop.mapred.Child.main(Child.java:143)

    10/03/26 17:08:12 INFO mapred.JobClient: Task Id : attempt_201003261700_0002_m_000005_1, Status : FAILED
    java.io.FileNotFoundException: File C:/tmp/hadoop-RananjayS/mapred/local/taskTracker/jobcache/job_201003261700_0002/attempt_201003261700_0002_m_000005_1/work/tmp does not exist.
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:244)
    at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:520)
    at org.apache.hadoop.mapred.Child.main(Child.java:143)

    10/03/26 17:08:19 INFO mapred.JobClient: Task Id : attempt_201003261700_0002_m_000005_2, Status : FAILED
    java.io.FileNotFoundException: File C:/tmp/hadoop-RananjayS/mapred/local/taskTracker/jobcache/job_201003261700_0002/attempt_201003261700_0002_m_000005_2/work/tmp does not exist.
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:244)
    at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:520)
    at org.apache.hadoop.mapred.Child.main(Child.java:143)

    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
    at FirstDriver.main(FirstDriver.java:42)

  74. Manish Said,

    April 1, 2010 @ 2:37 am

    Hi,Vlad,

    Thanks for such a excellent tutorial on Hadoop configuration on Window.

    I have followed each steps in tutorial. Every steps went fine, but execution of the program is giving me troble. Following is the problem message on console,

    10/04/01 15:05:24 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    10/04/01 15:05:25 INFO mapred.FileInputFormat: Total input paths to process : 4
    10/04/01 15:05:27 INFO mapred.JobClient: Running job: job_201004011443_0001
    10/04/01 15:05:30 INFO mapred.JobClient: map 0% reduce 0%
    10/04/01 15:05:41 INFO mapred.JobClient: Task Id : attempt_201004011443_0001_m_000006_0, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    10/04/01 15:05:45 INFO mapred.JobClient: Task Id : attempt_201004011443_0001_m_000006_1, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    10/04/01 15:05:50 INFO mapred.JobClient: Task Id : attempt_201004011443_0001_m_000006_2, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    10/04/01 15:05:58 INFO mapred.JobClient: Task Id : attempt_201004011443_0001_m_000005_0, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    10/04/01 15:06:04 INFO mapred.JobClient: Task Id : attempt_201004011443_0001_m_000005_1, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    10/04/01 15:06:08 INFO mapred.JobClient: Task Id : attempt_201004011443_0001_m_000005_2, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
    at TestDriver.main(TestDriver.java:37)

    Please let me know what could have gone wrong.

    Thanks & R

  75. jamo Said,

    April 8, 2010 @ 8:18 am

    I get “almost” all the way through the tutorial using the 19.2 version, but when running TestDriver, it throws several FileNotFound exceptions as in Swetha’s Jan 4 2010 post above. I tried changing the mapred.job.tracker setting to c:/cygwin/tmp, and restarting the jobtracker, but this didn’t change the error. Any idea what parameter needs to be changed?
    thx,
    jamo

  76. Vaibhav Said,

    April 13, 2010 @ 4:35 am

    Hi Vlad,

    Thanks for the tutorial. I setup my environment exactly as you had specified in the tutorial. However when I run my project from eclipse (by selecting run on hadoop option), nothing happens and it fails silently. It doesn’t give any error. What could be the issue ?

    Regards,
    Vaibhav

  77. princessayu Said,

    April 13, 2010 @ 4:24 pm

    Hi there

    Nice tutorial…Help me lot for my assignment. Please can you tell me where is the link to your new tutorial with hadoop-0.20.0

  78. Rim Moussa Said,

    April 19, 2010 @ 3:13 am

    excellent tutorial
    please add the following imports to the last
    import org.apache.hadoop.mapred.TextInputFormat;
    import org.apache.hadoop.mapred.TextOutputFormat;
    import org.apache.hadoop.mapred.FileInputFormat;
    import org.apache.hadoop.mapred.FileOutputFormat;

  79. kiwibird Said,

    April 22, 2010 @ 11:35 am

    Hello
    Thank you for your excellent tutorial!However, at the last step, I cannot run hadoop project. I right click the TestDriver class and choose “Run on Hadoop”, but nothing happens–no window comes out, no info is shown in Console. And I just update my elipse to the latest version.

    Please help me.

    thanks and regards

  80. xuesf Said,

    May 5, 2010 @ 8:20 pm

    Thanks for your hadoop on windows tutorial
    I have the same problems as some people said
    I just copy the code of WordCount.hava in hadoop-0.19.1,my eclipse is 3.3.2
    So I hope you can help me
    Thanks a lot

    10/05/06 11:03:23 INFO mapred.FileInputFormat: Total input paths to process : 4
    10/05/06 11:03:23 INFO mapred.JobClient: Running job: job_201005061033_0003
    10/05/06 11:03:24 INFO mapred.JobClient: map 0% reduce 0%
    10/05/06 11:03:29 INFO mapred.JobClient: Task Id : attempt_201005061033_0003_m_000006_0, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    10/05/06 11:03:33 INFO mapred.JobClient: Task Id : attempt_201005061033_0003_m_000006_1, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    10/05/06 11:03:37 INFO mapred.JobClient: Task Id : attempt_201005061033_0003_m_000006_2, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    10/05/06 11:03:46 INFO mapred.JobClient: Task Id : attempt_201005061033_0003_m_000005_0, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    10/05/06 11:03:50 INFO mapred.JobClient: Task Id : attempt_201005061033_0003_m_000005_1, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    10/05/06 11:03:55 INFO mapred.JobClient: Task Id : attempt_201005061033_0003_m_000005_2, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    Exception in thread “main” java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
    at WordCount.run(WordCount.java:134)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at WordCount.main(WordCount.java:140)

  81. Jim Said,

    May 9, 2010 @ 9:22 am

    I followed all steps to Running Hadoop Project. I can not run a Hadoop Project. Once I clicked “Run as” -> “Run on Hadoop”, nothing happens, there is no output on Eclipse Console, and I am pretty sure one thread is running in background.

    I am using Windows Vista, Java 6 (latest version for 32 bit). I started Eclipse from window. Everything is running under Cygwin.

    How do I debug hadoop applicaiton in eclipse?

    Jim

  82. Senthil Said,

    May 19, 2010 @ 8:04 am

    Thanks for this tutorial.
    I’ve small issue. In TestDriver.java, JobConf is deprecated. I am using Hadoop0.20.2,

    JobClient client = new JobClient();
    JobConf conf = new JobConf(TestDriver.class);

    // TODO: specify output types
    conf.setOutputKeyClass(Text.class);
    conf.setOutputValueClass(IntWritable.class);

    // TODO: specify input and output DIRECTORIES (not files)
    //conf.setInputPath(new Path(“src”));
    //conf.setOutputPath(new Path(“out”));
    conf.setInputFormat(TextInputFormat.class);
    conf.setOutputFormat(TextOutputFormat.class);

    FileInputFormat.setInputPaths(conf, new Path(“In”));
    FileOutputFormat.setOutputPath(conf, new Path(“Out”));

    Which one I need to import to resolve Jobconf. I am getting error like, ” The setInputFormat in the type JobConf is not applicable for the arguments.Samething for setOutputFormat also. Kindly do the needful

  83. Jo Said,

    May 25, 2010 @ 8:43 pm

    Hi,

    i followed yout tutorial under the eclipse part and managed to setup the plugin and able to browse/access the dfs directory.
    but i am unable to use the plugin to run jobs on hadoop. clicking “run on hadoop” does not seem to be doing anything… (i.e. there is no window to show me which hadoop server to choose).

    plugin version: 0.20.2
    eclipse version: 3.5.2 galileo
    os: ubuntu 10.04 desktop 64bit

    any thoughts?

  84. Shivam Sharma Said,

    June 1, 2010 @ 1:31 am

    I configured Hadoop on window + cygwin according to your document. All my nodes and trackers are running fine. When I run the map reducer program from eclipse, its give me the following exception

    10/06/01 13:58:59 INFO mapred.JobClient: Running job: job_201006011357_0002
    10/06/01 13:59:00 INFO mapred.JobClient: map 0% reduce 0%
    10/06/01 13:59:09 INFO mapred.JobClient: Task Id : attempt_201006011357_0002_m_000004_0, Status : FAILED
    java.io.FileNotFoundException: File C:/cygwin/tmp/hadoop-ssharma1/mapred/local/taskTracker/jobcache/job_201006011357_0002/attempt_201006011357_0002_m_000004_0/work/tmp does not exist.
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
    at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
    at org.apache.hadoop.mapred.Child.main(Child.java:155)

    In all the configuration files i have put the correct entries.

    It would be your great effort if you would help me out for solving this problem.

  85. Arun Said,

    June 3, 2010 @ 12:00 am

    Hi,
    That is nice tutorial, Is there any update for latest version of hadoop-0.20.2. because the structure is bit different compare to the older version. what are things we need to change in code for eclipse/etc… ?

    Thanks in advance!
    Arun.

  86. vlad Said,

    June 3, 2010 @ 12:03 am

    I am planning to post an updated tutorial soon.

  87. Siddharth prasad Said,

    June 16, 2010 @ 8:54 pm

    Hi

    it seems everything is set up cleanly on windows vista.. but when i run a job , a small word count problem ..
    i get theis in my console

    10/06/17 09:15:36 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    10/06/17 09:15:37 INFO mapred.FileInputFormat: Total input paths to process : 1
    10/06/17 09:15:37 INFO mapred.JobClient: Running job: job_201006170907_0002
    10/06/17 09:15:38 INFO mapred.JobClient: map 0% reduce 0%
    but from here .. it just stucks and when i see the job state in eclipse it saying running, but when i type localhost:50030 in the haddop saying there is no running job.
    i can’t under stand what is going wrong, will be glad if you can help me on this.

    Thankyou
    Siddharth prasad.

  88. Jony Blues Said,

    June 22, 2010 @ 8:44 pm

    I am working on a standalone server through Putty and I got the namenode and secondarynamenode working without errors. Yet when running the command “hadoop jobtracker”, I have the following errors:

    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
    at
    org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)

    For that reason, I can not perform “fs -put” to insert the files into hdfs. I dont think diskspace is a problem because the file is really small (1MB), and I dont think it is DNS as I use the direct IP address and I remove the localhost to use the specific Namemode IP address. How can I overcome this problem? I have performed namenode -format many times with a delete of hadoop_tmp_dir, but I still see the problem.

    Thanks for your help

  89. vlad Said,

    June 22, 2010 @ 8:50 pm

    Jony,

    Check the space on the datanodes, and make sure they are reachable from the namenode.

  90. Bejoy Said,

    June 24, 2010 @ 3:31 am

    Hi,
    I’m new into Hadoop. I found your guide so useful and interactive in helping me out for initial set up. But unfortunately i’m facing a challenge while configuring cygwin for Hadoop development.
    I had genered the rsa key and when i give
    ssh localhost
    it is prompting me for
    @localhost’s password
    But i havent set any password before. Itied almost all options but none did work. Could you please help me out with the same.

  91. Hussain Said,

    July 9, 2010 @ 4:43 am

    Hi Vlad,

    Thank you for the tutorial. I was facing a problem in step 5/6. When I enter the explorer command my documents windows open up (Maybe its the home). I pasted the hadoop archive there and then as mentioned in step 6 I tried to unpack the archive, it said no such file or directory. I tried ls command and it came up empty as well. I ran the command
    echo $path and the output was
    /usr/local/bin:/usr/bin:/bin:/cygdrive/g/WINDOWS/system32:/cygdrive/g/WINDOWS:/c
    ygdrive/g/WINDOWS/System32/Wbem:/cygdrive/c/MATLAB7/bin/win32:/cygdrive/c/cygwin
    /bin:/cygdrive/c/cygwin/usr/bin
    What can be the problem?

  92. Sven Said,

    July 19, 2010 @ 2:16 am

    Thx for the nice tutorial, vlad!

    I have the same problems like others with “java.io.FileNotFoundException: File C:/cygwin/tmp/hadoop- …” exception being thrown.

    Has anyone solved this problem already?

  93. Sven Said,

    July 19, 2010 @ 5:25 am

    I found out, what works for me:

    1) In ecelipse: open “localhost” location in “map/reduce” locations. Open advanced tab. Set “mapred.child.tmp” to /tmp/hadoop-/mapred/mapred.child.tmp

    2) Use follwoing text as TestDriver:

    import java.io.IOException;
    import java.util.StringTokenizer;

    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.util.GenericOptionsParser;

    public class TestDriver {

    public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();

    Job job = new Job(conf, “hadoop test”);
    job.setJarByClass(TestDriver.class);
    job.setMapperClass(Mapper.class);
    job.setCombinerClass(Reducer.class);
    job.setReducerClass(Reducer.class);
    job.setMapOutputKeyClass(LongWritable.class);
    job.setMapOutputValueClass(Text.class);
    job.setOutputKeyClass(LongWritable.class);
    job.setOutputValueClass(Text.class);
    FileInputFormat.addInputPath(job, new Path(“In”));
    FileOutputFormat.setOutputPath(job, new Path(“Out”));
    System.exit(job.waitForCompletion(true) ? 0 : 1);

    }

    }

  94. Sven Said,

    July 19, 2010 @ 5:26 am

    P.S.: I use 0.20.2 on eclipse europa (newer didn’t work)!

  95. vlad Said,

    July 19, 2010 @ 5:29 am

    Hmm, did you add your rsa key to authorized_keys file as described in the tutorial?

  96. Guohua Liu Said,

    August 18, 2010 @ 8:15 pm

    It is a good tutorial, but In the last step,why can’t bring up the windows “Run on Hapdoop” and select a Hadoop location to run on when I click “run as”->”run on hadoop”, so can’n see console output similar to your tutorial.Thank you for you asking!

  97. vlad Said,

    August 19, 2010 @ 2:00 pm

    What is the version of eclipse you are running? The eclipse plugin only works with the version specified in the tutorial, it is not compatible with newer versions of eclipse. I am working on the upgrade to the plugin, but it is not available now.

  98. suka_hati Said,

    August 22, 2010 @ 8:48 pm

    Hi,
    I’ve follow your tutorial on configuring hadoop and eclipe. however, Im having problem in Setup hadoop location, at step 6. im not able to get the folder. I got error ‘Unknown protocol to job tracker’. Do anybody know how to resolve this issue?

  99. Lady Di Said,

    September 16, 2010 @ 6:11 am

    Well done Mr Vlad,thank you very much.Spasibo bolshoe.

  100. Priya Said,

    October 3, 2010 @ 8:02 am

    Hi Vlad!

    Thank you for the nice tutorial. I am having some issues in the last step of the tutorial wherein we have to right-click on the Map/Reduce driver and “Choose existing Hadoop location”. The problem is that when I do this and select “run on Hadoop” nothing happens! The window that should come-up asking whether I wish to use an existing hadoop server of a new server does not come up.
    The console remains blank too. The “Problems” tab just shows the warning:
    “Description Resource Path Location Type
    The import org.apache.hadoop.mapred.Mapper is never used testdriver.java /Hadoop Test/src line 8 Java Problem

    I don’t know what the issue is. All the previous steps worked fine. I am using hadoop 0.19.2 and eclipse halios!
    Thanks!

  101. junfeng_feng Said,

    October 17, 2010 @ 11:51 pm

    Could you teach how config Eclipse in linux to run hadoop,please?

  102. Erfan Said,

    November 13, 2010 @ 5:40 am

    Hi Vlad,

    Great step-by-step tutorial. These days you can’t see these kind of tutorials very often.
    I have some issues regarding the step 11:”Start the local hadoop cluster”. On my jobtracker and tasktracker window, I keep getting following error:
    Error mapred.TaskTracker: can not start task tracker because java.lang.RuntimeException: Not a host:port pair: local
    at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:136)
    and a bunch of other classes.

  103. Sasha NorCal Said,

    December 1, 2010 @ 8:15 pm

    Great tutorial! Works.
    Spasibo.

  104. Otilia Said,

    December 7, 2010 @ 6:20 am

    Hi Vlad,

    like Priya, I also have the same problem: on TestDriver class, when I choose “Run As”-> “run on Hadoop” nothing happens! The window that should come up asking whether I wish to use an existing hadoop server, does not show up.

    I use eclipse 3.5 Helios and Hadoop-0.18.3
    Do you have any idea what could be the problem? (maybe, if the plug-in is not good for this version of eclipse, do you know any other plug-in ?)

    Thanks very much,
    Otilia

  105. hth Said,

    January 10, 2011 @ 11:39 am

    Excellent tutorial

    I got it working using Eclipse Helios + Hadoop 20.2 using the eclipse plugin here

    https://issues.apache.org/jira/browse/MAPREDUCE-1280

    Also ./start-all.ksh works fine if you set the jdk correctly in conf/hadoop-env.sh

    export JAVA_HOME=C:\\Progra~1\\Java\\jdk1.6.0_19

    Also configure the site using the cluster setup here (modify the 3 files)

    http://hadoop.apache.org/common/docs/r0.20.0/cluster_setup.html

    and make the testdriver code change as per Arun’s suggestion earlier to get around the LongWritable issue

    Happy hadooping!!

  106. Arun K Said,

    January 20, 2011 @ 11:32 pm

    Hello,

    Excellent Tutorial. I followed all the steps, created the TestDriver Program, but when I give Run as -> Run on Hadoop,
    Nothing happens.

    Can you please help me ?

    I’m using Windows 7, Hadoop 0.19.2, and Java6

  107. arunk786 Said,

    January 24, 2011 @ 8:49 am

    Hi V-lad !
    This is the best tutorial to set up Hadoop cluster on a single node in windows.
    1->
    It would be the best one for times to come if the setup was discussed for HADOOP 0.2XXX version(where xml file is split into three) and for the latest Eclipse versions.
    2->
    Also a setup tutorial for a MULTINODE cluster would be of immense great help for students and research guys like us.

    Happy hadooping!

  108. Niluk Said,

    March 8, 2011 @ 8:16 am

    Thanks for the great tutorial. It’s immensely helpful.

    One issue though is that in the section that sets up SSH authorization key, once everything is done and I execute

    ssh localhost

    I’m prompted for the localhost password of the user. I don’t remember ever setting such as password up. Have I missed a step somewhere? Can someone tell me how I can get around it?

    Thanks in advance for your time!
    -Niluk

  109. Vikas Gupta Said,

    March 18, 2011 @ 11:23 am

    Hello ,
    I m vikas gupta. this is really a gud tutorial. it works.

    i wanna learn hadoop perfactly can u suggets me the way..pls m waiting for ur reply…………

  110. Paulo Ramos Said,

    March 29, 2011 @ 7:30 am

    Good morning,
    During the installation of Cygwin, at one point appears several sites, which one to choose? The first option whenever an error occurs.

    I thank the attention.

    Sincerely,
    Paulo Ramos.

  111. Mandar Said,

    April 8, 2011 @ 1:35 am

    hello sir,
    i am following ur tutorial on hadoop on windows eclips but while formating namenode by using command bin/hadoop namenode -format i am not getting expected result as u shown in ur tutorial.I am getting following warning

    $ bin/hadoop namenode -format
    cygwin warning:
    MS-DOS style path detected: C:\cygwin\home\Mandar\HADOOP~1.2\/build/native
    Preferred POSIX equivalent is: /home/Mandar/HADOOP~1.2/build/native
    CYGWIN environment variable option “nodosfilewarning” turns off this warning.
    Consult the user’s guide for more details about POSIX paths:
    http://cygwin.com/cygwin-ug-net/using.html#using-pathnames
    bin/hadoop: line 243: C:\Program: command not found
    bin/hadoop: line 273: C:\Program Files\Java\jre6\bin/bin/java: No such file or d
    irectory
    bin/hadoop: line 273: exec: C:\Program Files\Java\jre6\bin/bin/java: cannot exec
    ute: No such file or directory

    plz help me out soon..–

    Regards-

    Mandar Bedse
    Bioinformatics Centre,
    University of Pune.

  112. Rakesh Jadhav Said,

    April 8, 2011 @ 4:15 am

    Hi,
    I am getting error when I do following step… (creating hdfs file system)

    $ bin/hadoop namenode -format
    bash: bin/hadoop: /usr/bin/env: bad interpreter: Permission denied

    Any clue? Appreciate help!

  113. venkat Said,

    April 19, 2011 @ 4:47 am

    HI ur tutorial is excelent.I have done all the steps wt u have given.But when i run as Hadoop I did nt get any widow on eclipse and no response on the console.pls help me to solve this problem.

    [WORDPRESS HASHCASH] The poster sent us ’0 which is not a hashcash value.

  114. Adarsh Said,

    April 21, 2011 @ 1:06 am

    Sir, I am following this tutorial & everything works fine but when I Run the testdriver program as Run as > Run on Hadoop Nothing happens.

    The Next Popup window doesn’t appears.

    Don’t know what to do.

    Thanks

  115. tanvi Said,

    April 27, 2011 @ 1:00 pm

    I have one query.I followed all steps to open ssh but
    after executing command ssh localhost it is showing connection closed by ::1
    how to open ssh connection

    [WORDPRESS HASHCASH] The poster sent us ’0 which is not a hashcash value.

  116. Misty Said,

    April 28, 2011 @ 1:48 am

    Hi…
    I am trying to configure Cygwin on windows 7 for eclipse, but after this step

    “You’ll see the script give you some information on your system and then it will ask you to create a privileged account with the default username “cyg_server”. The default works well, so type “no” when it asks you if you want to use a different account name, although you can change this if you really like”

    , when it asks for password, and i try to enter a password,

    “Of course, you’ll have to enter a password for this account as well.”
    “Cygwin will show you your password in plain text for verification, so be sure you’re in a secure place. You’ll see some extra info come up and if all’s well, you’ll get a message that says it successfully.. ”

    It does not display the password, but asks me to reenter, even thou I re enter it shows the error
    “Creating the user ‘cyg_server’ failed! Reason: System error 5 has occured. ”

    Can you tell me where I am going wrong?

  117. Ninh Said,

    May 8, 2011 @ 7:58 pm

    I’m very new with Hadoop
    So, i googled and get here
    i follow the instructions.
    All things are sucessful, but following:

    I copy hadoop-0.20.2-eclipse-plugin.jar into eclipse/plugins
    Then, I start eclipse, i set the MapReduce Perspective
    And I successful create a new hadoop location
    But, when i new a MapReduce project, i right click then choose run as->run on hadoop. I wait for a long time and nothing happended.
    What should i do now? What problems I have?
    I run Hadoop in Centos 5.6, the i try again with win 7 home premium. But nothing’s different
    Plese help me, i wait for your reply

    Thanks in advance

  118. myat kyaw Said,

    May 15, 2011 @ 9:48 pm

    hi Vlad,
    Thanks for your tutorial.
    When i run a testDriver class, i get this error.
    I cannot solve this error.
    Please give me some suggessions and help me.
    The error message is as below…
    11/05/16 10:41:08 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    11/05/16 10:41:09 INFO mapred.FileInputFormat: Total input paths to process : 4
    11/05/16 10:41:10 INFO mapred.JobClient: Running job: job_201105161029_0001
    11/05/16 10:41:11 INFO mapred.JobClient: map 0% reduce 0%
    11/05/16 10:41:26 INFO mapred.JobClient: Task Id : attempt_201105161029_0001_m_000000_0, Status : FAILED
    java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
    at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.Child.main(Child.java:158)

    11/05/16 10:41:31 INFO mapred.JobClient: Task Id : attempt_201105161029_0001_m_000001_0, Status : FAILED
    java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
    at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.Child.main(Child.java:158)

    11/05/16 10:41:33 INFO mapred.JobClient: Task Id : attempt_201105161029_0001_m_000000_1, Status : FAILED
    java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
    at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.Child.main(Child.java:158)

    11/05/16 10:41:38 INFO mapred.JobClient: Task Id : attempt_201105161029_0001_m_000001_1, Status : FAILED
    java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
    at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.Child.main(Child.java:158)

    11/05/16 10:41:40 INFO mapred.JobClient: Task Id : attempt_201105161029_0001_m_000000_2, Status : FAILED
    java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
    at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.Child.main(Child.java:158)

    11/05/16 10:41:43 INFO mapred.JobClient: Task Id : attempt_201105161029_0001_m_000001_2, Status : FAILED
    java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
    at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.Child.main(Child.java:158)

    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
    at TestDriver.main(TestDriver.java:40)

  119. vlad Said,

    May 15, 2011 @ 10:00 pm

    Congratulations, you managed to set up the cluster correctly. The problem with your job is that you have type mismatch, between what Mapper expects and what comes to it’s input. The mappers and reducers generated by Eclipse plug-in do not work well with never versions of hadoop. Just google the web for some hadoop examples and try to run those.

  120. vlad Said,

    May 15, 2011 @ 10:01 pm

    Check your eclipse version.. the standard plug-in does not work well with recent versions of eclipse.

  121. Gio Said,

    May 27, 2011 @ 5:24 pm

    Hi – I was very happy to find this tutorial but I’m stuck at the ssh step :-( with this error:

    $ ssh localhost
    Connection closed by ::1

    If I remove the key files ‘ssh localhost’ works (i.e., I’m prompted for my password and successfully connect).

    I’ve tried various tweaks of the /etc/ssh_config and /etc/sshd_config file with no luck yet… thank you in advance for your help.

    A little more verbose output:
    $ ssh -v localhost
    OpenSSH_5.8p1, OpenSSL 0.9.8r 8 Feb 2011
    debug1: Reading configuration data /etc/ssh_config
    debug1: Applying options for *
    debug1: Connecting to localhost [::1] port 22.
    debug1: Connection established.
    debug1: identity file /home/gio/.ssh/id_rsa type 1
    debug1: identity file /home/gio/.ssh/id_rsa-cert type -1
    debug1: identity file /home/gio/.ssh/id_dsa type -1
    debug1: identity file /home/gio/.ssh/id_dsa-cert type -1
    debug1: identity file /home/gio/.ssh/id_ecdsa type -1
    debug1: identity file /home/gio/.ssh/id_ecdsa-cert type -1
    debug1: Remote protocol version 2.0, remote software version OpenSSH_5.8
    debug1: match: OpenSSH_5.8 pat OpenSSH*
    debug1: Enabling compatibility mode for protocol 2.0
    debug1: Local version string SSH-2.0-OpenSSH_5.8
    debug1: SSH2_MSG_KEXINIT sent
    debug1: SSH2_MSG_KEXINIT received
    debug1: kex: server->client aes128-ctr hmac-md5 none
    debug1: kex: client->server aes128-ctr hmac-md5 none
    debug1: sending SSH2_MSG_KEX_ECDH_INIT
    debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
    debug1: Server host key: ECDSA 1b:57:34:0e:9a:a7:da:09:ae:62:7a:81:cf:0c:a9:2f
    The authenticity of host ‘localhost (::1)’ can’t be established.
    ECDSA key fingerprint is 1b:57:34:0e:9a:a7:da:09:ae:62:7a:81:cf:0c:a9:2f.
    Are you sure you want to continue connecting (yes/no)? yes
    Warning: Permanently added ‘localhost’ (ECDSA) to the list of known hosts.
    debug1: ssh_ecdsa_verify: signature correct
    debug1: SSH2_MSG_NEWKEYS sent
    debug1: expecting SSH2_MSG_NEWKEYS
    debug1: SSH2_MSG_NEWKEYS received
    debug1: Roaming not allowed by server
    debug1: SSH2_MSG_SERVICE_REQUEST sent
    debug1: SSH2_MSG_SERVICE_ACCEPT received
    debug1: Authentications that can continue: publickey,password
    debug1: Next authentication method: publickey
    debug1: Offering RSA public key: /home/gio/.ssh/id_rsa
    Connection closed by ::1

  122. Gio Said,

    June 2, 2011 @ 11:10 pm

    Browse HDFS Troubleshooting:

    Hi Vlad – using your tutorial and several comments here, I too was able to get almost everything working
    using Eclipse Helios 3.6 + Hadoop 20.2… thanks. I can launch jobs from within eclipse, but I can’t browse the local HDFS.

    Under “Project Explorer” > “DFS Locations” > “localhost” > “(1)” I’m getting:

    Error: Call to localhost/127.0.0.1:9100 failed on local exception: java.io.EOFException

    From the command line I can confirm HDFS is working properly — i.e., I can issue commands such as “hadoop fs –ls /”, so it’s probably a setting in the “location” “Advanced” tab. Any suggestions for how o diagnose/fix? Thanks.

  123. vlad Said,

    June 2, 2011 @ 11:18 pm

    The hadoop plugin described in this tutorial is written for Eclipse Europa 3.3 it does not work on Eclipse Helios. I had plans to update it and post it here, but never got around to actually do it.

  124. myat kyaw Said,

    June 6, 2011 @ 10:24 pm

    when i run the testDriver on hadoop, i haven’t seen a hadoop
    location box to choose a hadoop server. Why?
    Give me some suggesstion.Please.

  125. praveen Said,

    June 10, 2011 @ 3:03 am

    I got the following error while running namenode command could you please help for resolving the issue.

    INFO metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=9100
    INFO namenode.NameNode: Namenode up at: localhost/127.0.0.1:9100
    INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null
    metrics.NameNodeMetrics: Initializing NameNodeMeterics us
    ing context object:org.apache.hadoop.metrics.spi.NullContext
    INFO namenode.FSNamesystem: fsOwner=EE205782,Domain,Users,root,Administrators,Users,Debugger,Users
    INFO namenode.FSNamesystem: supergroup=supergroup
    namenode.FSNamesystem: isPermissionEnabled=true
    INFO metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
    INFO namenode.FSNamesystem: Registered FSNamesystemStatusMBean
    ERROR namenode.FSNamesystem: FSNamesystem initialization failed.
    java.io.IOException: NameNode is not formatted.
    at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:305)
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:309)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:288)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:163)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:208)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:194)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:859)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:868)
    INFO ipc.Server: Stopping server on 9100
    ERROR namenode.NameNode: java.io.IOException: NameNode is notformatted.
    at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:305)
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:309)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:288)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:163)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:859)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:868)

  126. praveen Said,

    June 14, 2011 @ 2:47 am

    Hi vlad,
    Getting the below while configuring inside jobtracker

    Error:java.io.IOException:unknown protocol to job tracker: org.apache.hadoop.hdfs.protocol.ClientProtocol at org.apache.hadoop.mapred.Jobtracker.getProtocolVersion(JobTracker.java222)

  127. sudhir Said,

    June 16, 2011 @ 12:12 am

    Excellent, I just started with hadoop, and was looking for exactly the same information.

    BTW, how stable is the hadoop eclipse plugin !

  128. Troy Said,

    June 16, 2011 @ 7:31 am

    I’m stuck in step 3 – setup ssh daemon.

    This is the message i get:

    *** Query: Should privilege separation be used? (yes/no) no
    *** Info: Updating /etc/sshd_config file

    *** Warning: The following functions require administrator privileges!

    *** Query: Do you want to install sshd as a service?
    *** Query: (Say “no” if it is already installed as a service) (yes/no) yes
    *** Query: Enter the value of CYGWIN for the daemon: [] ntsec
    *** Info: On Windows Server 2003, Windows Vista, and above, the
    *** Info: SYSTEM account cannot setuid to other users — a capability
    *** Info: sshd requires. You need to have or to create a privileged
    *** Info: account. This script will help you do so.

    *** Info: You appear to be running Windows XP 64bit, Windows 2003 Server,
    *** Info: or later. On these systems, it’s not possible to use the LocalSyste
    *** Info: account for services that can change the user id without an
    *** Info: explicit password (such as passwordless logins [e.g. public key
    *** Info: authentication] via sshd).

    *** Info: If you want to enable that functionality, it’s required to create
    *** Info: a new account with special privileges (unless a similar account
    *** Info: already exists). This account is then used to run these special
    *** Info: servers.

    *** Info: Note that creating a new user requires that the current account
    *** Info: have Administrator privileges itself.

    *** Info: No privileged account could be found.

    *** Info: This script plans to use ‘cyg_server’.
    *** Info: ‘cyg_server’ will only be used by registered services.
    *** Query: Do you want to use a different name? (yes/no) no
    *** Query: Create new privileged user account ‘cyg_server’? (yes/no) yes
    *** Info: Please enter a password for new user cyg_server. Please be sure
    *** Info: that this password matches the password rules given on your system.
    *** Info: Entering no password will exit the configuration.
    *** Query: Please enter the password:

  129. hansuksoo. Said,

    June 18, 2011 @ 6:47 pm

    hi Vlad,
    Thanks for your tutorial.
    When i run a testDriver class, i get this error.
    I cannot solve this error.
    Please give me some suggessions and help me.
    The error message is as below…
    eclipse : 3.3
    hadoop : hadoop-0.19.1
    jdk : 1.6.0_26
    JVM problem?

    11/06/19 10:37:03 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    11/06/19 10:37:03 INFO mapred.FileInputFormat: Total input paths to process : 4
    11/06/19 10:37:04 INFO mapred.JobClient: Running job: job_201106191031_0001
    11/06/19 10:37:05 INFO mapred.JobClient: map 0% reduce 0%
    11/06/19 10:37:09 INFO mapred.JobClient: Task Id : attempt_201106191031_0001_m_000006_0, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    11/06/19 10:37:13 INFO mapred.JobClient: Task Id : attempt_201106191031_0001_m_000006_1, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    11/06/19 10:37:17 INFO mapred.JobClient: Task Id : attempt_201106191031_0001_m_000006_2, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    11/06/19 10:37:24 INFO mapred.JobClient: Task Id : attempt_201106191031_0001_m_000005_0, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    11/06/19 10:37:27 INFO mapred.JobClient: Task Id : attempt_201106191031_0001_m_000005_1, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    11/06/19 10:37:31 INFO mapred.JobClient: Task Id : attempt_201106191031_0001_m_000005_2, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
    at TestDriver.main(TestDriver.java:40)

  130. vlad Said,

    June 18, 2011 @ 7:53 pm

    Everything is okay. It’s just the newer code generated by the plug-in is not compatible with the newer versions of hadoop. Try to run the sample code that came with your hadoop distribution.

  131. vlad Said,

    June 22, 2011 @ 7:56 pm

    You need to enter a new password for the ‘sshd’ service account. Pick any password you want. Also make sure you are running this step as an Administrator.

  132. vlad Said,

    June 22, 2011 @ 8:00 pm

    I found eclipse plugin to be pretty stable. But it only works with older versions of Eclipse. Also, hadoop itself is not considered stable on Windows. It works for me doing initial development, but for production usage consider running the real jobs on the linux system. If you don’t have a cluster at your disposal you could use Amazon Elastic MapReduce. I found it pretty good, and quite cheap for what it does. I was able to run humongous jobs processing 400G of data within a few hours.

  133. Rahul Said,

    June 24, 2011 @ 6:52 pm

    Very useful tutorial. Without this it would not have been possible to install on Windows machine for me. Thank you Vlad.

  134. Cindy Said,

    June 27, 2011 @ 4:25 am

    Hi Vlad,
    thank you very much for your tutorial. I run all the process sucessfully except “Tasktracker” as I receive an error message ” which shuts down the process for tasktracker “failed to set permissions for tmp/hadoop-$user/mapred/local/ttprivat to 0700″. I’m using Eclipse 3.6 Helios and latest stable version of Hadoop 0.20.0203. According to Hadoop they are compatible. Do you know how I can resolve this problem?
    Thank you in advance

  135. Vincent Said,

    June 28, 2011 @ 8:16 am

    Hi Vlad,
    Thanks for your tutorial.
    My hadoop cluster has 3 machines(both on linux). And I am using my laptop (eclipse + osx)to develop the hadoop app.
    But when I run my app, I get the message:

    11/06/28 11:03:48 WARN conf.Configuration: Could not make localRunner/job_local_0001.xml in local directories from mapred.local.dir
    11/06/28 11:03:48 WARN conf.Configuration: mapred.local.dir[0]=/home/hikaru/hadoop/hdfs/hadoop-hikaru/mapred/local
    Exception in thread “main” java.io.IOException: No valid local directories in property: mapred.local.dir

  136. vlad Said,

    June 28, 2011 @ 8:54 am

    Did you format your namenode? Looks like you either missed that step or something happen during formatting. Try do redo the steps starting with namenode format.

  137. Lakshminarasu Said,

    June 29, 2011 @ 9:38 am

    HI Vlad,

    Thanks a lot for this tutorial. But after entering the details for confisuring the Map/Reduce Server locations, I am getting the following error:

    “Cannot connect to the Map/Reduce location: localhost
    Protocol org.apache.hadoop.mapred.JobSubmissionProtocol version mismatch. (client = 10, server = 16)”

    Any idea how to resolve this ?

  138. alex Said,

    July 3, 2011 @ 8:42 am

    Hi vlad..
    Thnks 4 awesome tutorial on hadoop…It helped me a lot..
    I did exactly d same u mentioned in your tutorial..but in the end when i ran the test project…its giving some exception…
    11/07/03 21:06:19 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    11/07/03 21:06:19 INFO mapred.FileInputFormat: Total input paths to process : 4
    11/07/03 21:06:20 INFO mapred.JobClient: Running job: job_201107032031_0009
    11/07/03 21:06:21 INFO mapred.JobClient: map 0% reduce 0%
    11/07/03 21:06:27 INFO mapred.JobClient: Task Id : attempt_201107032031_0009_m_000000_0, Status : FAILED
    java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
    at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.Child.main(Child.java:158)

    11/07/03 21:06:32 INFO mapred.JobClient: Task Id : attempt_201107032031_0009_m_000000_1, Status : FAILED
    java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
    at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.Child.main(Child.java:158)

    11/07/03 21:06:37 INFO mapred.JobClient: Task Id : attempt_201107032031_0009_m_000000_2, Status : FAILED
    java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:558)
    at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.Child.main(Child.java:158)

    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
    at TestDriver.main(TestDriver.java:42)
    …plz help me out…:)
    Thnks..

  139. vlad Said,

    July 6, 2011 @ 11:12 am

    It is not a JVM problem. However I can’t tell because you did not give exact exception information. Take a look at the TaskTracker logs to see what is the actual exception given by the task.

  140. kamlesh Said,

    July 11, 2011 @ 5:40 am

    hi Vlad,
    Thanks for your tutorial.
    When i run a testDriver class, i get this error.
    I cannot solve this error.
    Please give me some suggessions and help me.
    The error message is as below…

    11/06/19 10:37:03 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    11/06/19 10:37:03 INFO mapred.FileInputFormat: Total input paths to process : 4
    11/06/19 10:37:04 INFO mapred.JobClient: Running job: job_201106191031_0001
    11/06/19 10:37:05 INFO mapred.JobClient: map 0% reduce 0%
    11/06/19 10:37:09 INFO mapred.JobClient: Task Id : attempt_201106191031_0001_m_000006_0, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    11/06/19 10:37:13 INFO mapred.JobClient: Task Id : attempt_201106191031_0001_m_000006_1, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    11/06/19 10:37:17 INFO mapred.JobClient: Task Id : attempt_201106191031_0001_m_000006_2, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    11/06/19 10:37:24 INFO mapred.JobClient: Task Id : attempt_201106191031_0001_m_000005_0, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    11/06/19 10:37:27 INFO mapred.JobClient: Task Id : attempt_201106191031_0001_m_000005_1, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    11/06/19 10:37:31 INFO mapred.JobClient: Task Id : attempt_201106191031_0001_m_000005_2, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
    at TestDriver.main(TestDriver.java:40)

    I think this error occurs becouse when run Tasktracker, it started but it shows the exception as :

    mapred.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on linux.

    mapred.Tasktracker: ProcessTree implementationis missing on this system. TaskMemoryManager is desabled.

    (If anyone knows the answer please help!
    Your help will be appreciated!)

  141. sunny Said,

    July 12, 2011 @ 3:20 am

    When i am installing the Cygwin it displaying the cygncurses-9.dll was not found.After re-installing
    the application it showing same problem.
    Can any one can help me to solve this problem

  142. sunny Said,

    July 27, 2011 @ 1:45 am

    Hi vlad,
    Your tutorial helped me a lot.I did exactly the same you mentioned in your tutorial,but in the end when i click the new hadoop location it’s didn’t display the dialog box.
    Can you help me………..

  143. jan Said,

    August 10, 2011 @ 10:27 am

    Hello vlad,
    thanks for this very detailed tutorial.

    Please note that for windows 7 it is neccessary to start cgywin as administrator. It is best to start cmd as administrator (right click, start as administrator) and then call cgywin.bat from this cmd.

    However, I still have a problem with the tasktracker :-( .
    I can not start it. When I do, there is an error message saying:
    Can not start task tracker because java.io.IOException: Failed to set permissions of path: tmp/hadoop-jan/mapred/local/ttprivate to 0700.

    Can anyone help me with this error?

    Thanks in advance.

  144. rohit Said,

    August 12, 2011 @ 12:01 pm

    Hi Vlad,

    When i running Namenode command i got following error

    bin/hadoop: line 243: C:\Program: command not found
    11/08/13 00:02:09 INFO namenode.NameNode: STARTUP_MSG:
    /************************************************************
    STARTUP_MSG: Starting NameNode
    STARTUP_MSG: host = HLBLSWSSS030612/192.168.2.4
    STARTUP_MSG: args = []
    STARTUP_MSG: version = 0.19.1
    STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.19 -r 745977; compiled by ‘ndaley’ on Fri Feb 20 00:16:34 UTC 2009
    ************************************************************/
    11/08/13 00:02:09 INFO metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=9100
    11/08/13 00:02:09 INFO namenode.NameNode: Namenode up at: 127.0.0.1/127.0.0.1:9100
    11/08/13 00:02:09 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null
    11/08/13 00:02:09 INFO metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
    11/08/13 00:02:09 INFO namenode.FSNamesystem: fsOwner=hcltech\deven.panse,mkgroup,Users
    11/08/13 00:02:09 INFO namenode.FSNamesystem: supergroup=supergroup
    11/08/13 00:02:09 INFO namenode.FSNamesystem: isPermissionEnabled=true
    11/08/13 00:02:09 INFO metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
    11/08/13 00:02:09 INFO namenode.FSNamesystem: Registered FSNamesystemStatusMBean
    11/08/13 00:02:09 INFO common.Storage: Storage directory C:\tmp\hadoop-Deven.Panse\dfs\name does not exist.
    11/08/13 00:02:09 ERROR namenode.FSNamesystem: FSNamesystem initialization failed.
    org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory C:\tmp\hadoop-Deven.Panse\dfs\name is in an inconsistent state: storage directory does not exist or is not accessible.
    at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:278)
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:309)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:288)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:163)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:208)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:194)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:859)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:868)
    11/08/13 00:02:09 INFO ipc.Server: Stopping server on 9100
    11/08/13 00:02:09 ERROR namenode.NameNode: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory C:\tmp\hadoop-Deven.Panse\dfs\name is in an inconsistent state: storage directory does not exist or is not accessible.
    at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:278)
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:309)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:288)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:163)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:208)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:194)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:859)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:868)

    11/08/13 00:02:09 INFO namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at HLBLSWSSS030612/192.168.2.4
    ************************************************************/

  145. sunny Said,

    August 16, 2011 @ 2:51 am

    Hi vlad,
    Above problem solved,When i run a TestDriver class, i get this error

    11/08/16 12:36:09 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    11/08/16 12:36:10 INFO mapred.FileInputFormat: Total input paths to process : 4
    11/08/16 12:36:10 INFO mapred.JobClient: Running job: job_201108161203_0001
    11/08/16 12:36:11 INFO mapred.JobClient: map 0% reduce 0%
    11/08/16 12:36:17 INFO mapred.JobClient: Task Id : attempt_201108161203_0001_m_000006_0, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    11/08/16 12:36:21 INFO mapred.JobClient: Task Id : attempt_201108161203_0001_m_000006_1, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    11/08/16 12:36:25 INFO mapred.JobClient: Task Id : attempt_201108161203_0001_m_000006_2, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    11/08/16 12:36:34 INFO mapred.JobClient: Task Id : attempt_201108161203_0001_m_000005_0, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    11/08/16 12:36:37 INFO mapred.JobClient: Task Id : attempt_201108161203_0001_m_000005_1, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    11/08/16 12:36:42 INFO mapred.JobClient: Task Id : attempt_201108161203_0001_m_000005_2, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
    at TestDriver.main(TestDriver.java:40)

    can you help me

  146. Vlad Said,

    August 16, 2011 @ 7:28 am

    Sunny,

    Glad you got everything working. Regarding your last message, there is not enough information for me to help you. What I see is that your tasks are failing, you need to see the log’s for individual tasks ( best way to do it is through web interface to the job tracker, when you click on the job you will see the list of all tasks ( failed and successful ) and should be able to see the links to the logs for individual tasks.

    You could probably discern from these logs what is wrong with your job. It could be anything, from not enough space on the drive to runtime error in the tasks. If you get stuck post the logs here I will try to help you.

  147. rohit Said,

    August 17, 2011 @ 11:00 am

    Hi Vlad,
    When i running the TestDriver class It showing like this :
    11/08/17 23:24:37 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    11/08/17 23:24:37 INFO mapred.FileInputFormat: Total input paths to process : 4
    11/08/17 23:24:37 INFO mapred.JobClient: Running job: job_201108172316_0002
    11/08/17 23:24:38 INFO mapred.JobClient: map 0% reduce 0%
    11/08/17 23:24:47 INFO mapred.JobClient: Task Id : attempt_201108172316_0002_m_000000_0, Status : FAILED
    java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.io.IntWritable, recieved org.apache.hadoop.io.Text
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:563)
    at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.Child.main(Child.java:158)

    11/08/17 23:24:51 INFO mapred.JobClient: Task Id : attempt_201108172316_0002_m_000000_1, Status : FAILED
    java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.io.IntWritable, recieved org.apache.hadoop.io.Text
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:563)
    at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.Child.main(Child.java:158)

    11/08/17 23:24:55 INFO mapred.JobClient: Task Id : attempt_201108172316_0002_m_000000_2, Status : FAILED
    java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.io.IntWritable, recieved org.apache.hadoop.io.Text
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:563)
    at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.Child.main(Child.java:158)

    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
    at TestDriver.main(TestDriver.java:46)

    Can you help me…..

  148. rohit Said,

    August 17, 2011 @ 11:27 am

    Vlad, just a note … the earlier output was from another machine (Which you think may have mem problems), so I set up Hadoop @ home as well to continue with my project. And at home unfortunately Im running into another problem, which is the second log I posted just above this comment.

  149. vlad Said,

    August 18, 2011 @ 12:03 am

    Your cluster is functioning correctly. However you have an error in your mapreduce code. Your mapper is expecting an Integer, but hadoop is sending a String. Did you use generated code by the plugin? It is not compatible with the newer versions of hadoop. Take a look at WordCount example that came with your hadoop distro and use that as your guide. The code changes are very minimal.

  150. mayur Said,

    August 19, 2011 @ 8:39 am

    hey vlad..!
    thanx for this excellent tutorial..

    I’m new to hadoop and m getting an error on step 11..”Setup hadoop Location”
    while I select the DFS locations\ in the project explorertab ..in localhost\…(1)…

    I get the following error

    “An internal error occurred during: “Connecting to DFS localhost”.”

    Please help me out..!

  151. mayur Said,

    August 20, 2011 @ 11:48 am

    hey vlad..!
    thanx for this excellent tutorial..

    I’m new to hadoop and m getting an error on step 11..”Setup hadoop Location”
    while I select the DFS locations\ in the project explorertab ..in localhost\…(1)…

    I get the following error

    “An internal error occurred during: “Connecting to DFS localhost”.”

    Please help me out..!

  152. richard Said,

    August 23, 2011 @ 8:33 pm

    hi,vlad:
    when i tried to run my map/reduce project on the hadoop,this error happened.
    error message said that:
    Plug-in org.apache.hadoop.eclipse was unable to load class org.apache.hadoop.eclipse.launch.HadoopApplicationLaunchShortcut.
    and the eclipse UI logs exception this(i can connect with the hadoop server):
    java.lang.NoClassDefFoundError: org/eclipse/jdt/internal/debug/ui/launcher/JavaApplicationLaunchShortcut
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClassCond(ClassLoader.java:632)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:616)
    at org.eclipse.osgi.internal.baseadaptor.DefaultClassLoader.defineClass(DefaultClassLoader.java:188)
    at org.eclipse.osgi.baseadaptor.loader.ClasspathManager.defineClass(ClasspathManager.java:580)
    at org.eclipse.osgi.baseadaptor.loader.ClasspathManager.findClassImpl(ClasspathManager.java:550)
    at org.eclipse.osgi.baseadaptor.loader.ClasspathManager.findLocalClassImpl(ClasspathManager.java:481)
    at org.eclipse.osgi.baseadaptor.loader.ClasspathManager.findLocalClass_LockClassLoader(ClasspathManager.java:469)
    at org.eclipse.osgi.baseadaptor.loader.ClasspathManager.findLocalClass(ClasspathManager.java:449)
    at org.eclipse.osgi.internal.baseadaptor.DefaultClassLoader.findLocalClass(DefaultClassLoader.java:216)
    at org.eclipse.osgi.internal.loader.BundleLoader.findLocalClass(BundleLoader.java:393)
    at org.eclipse.osgi.internal.loader.BundleLoader.findClassInternal(BundleLoader.java:469)
    at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:422)
    at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:410)
    at org.eclipse.osgi.internal.baseadaptor.DefaultClassLoader.loadClass(DefaultClassLoader.java:107)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
    at org.eclipse.osgi.internal.loader.BundleLoader.loadClass(BundleLoader.java:338)

  153. vamshi Said,

    August 25, 2011 @ 1:26 am

    Hi vlad, your tutorial is great..
    i am using eclipse helios service release 2. when i am followoing your steps, i am not finding tmp(1) under project explorere of eclipse IDE , instead an Error:localhost/127.0.0.1:9100 failed on connection exception:java.net.connectException Connection refused.
    what is the reason for that? How can i get out of this error? please help me..

  154. Rudra Said,

    September 2, 2011 @ 6:58 am

    A very good End to End with complete tutorials. Thanks for writing such a good one. I’m using eclipse-galileo 1.7.2 till opening Map/Reduce perspective its good. I’m trying to add “New Hadoop location” is not been opening. Pls help me.

  155. John Said,

    September 5, 2011 @ 1:55 am

    Hi Vlad,

    Please let me know when do we need to configure hadoop-env.sh file with JAVA_HOME path.

    In this tutorial, it seems that you’ve not configured the same. Is it necessary to configure hadoop-env.sh file or is it optional?

  156. Monk Said,

    September 9, 2011 @ 12:33 pm

    Hey great tutorial!

    Maybe i can help out. I had a few problems myself. I did net set JAVA HOME in windows(caused problems) but directly in the hadoop-env file. Also i’ve my JAVA map in the native folder. It worked for me. For people that have permission problems etc maybe it’s wise to create a new user account with admin rights or run cygwin as administrator.
    I used hadoop 0.20.2

    thanks for this tutorial vlad!

  157. naveen Said,

    September 12, 2011 @ 1:34 am

    Thank you for you excellent tutorial, however I have the error when running the last step, the output error as below,

    11/09/07 12:57:51 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    11/09/07 12:57:52 INFO mapred.FileInputFormat: Total input paths to process : 4
    11/09/07 12:57:56 INFO mapred.JobClient: Running job: job_201109071219_0001
    11/09/07 12:57:57 INFO mapred.JobClient: map 0% reduce 0%
    11/09/07 12:58:06 INFO mapred.JobClient: Task Id : attempt_201109071219_0001_m_000006_0, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    11/09/07 12:58:10 INFO mapred.JobClient: Task Id : attempt_201109071219_0001_m_000006_1, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    11/09/07 12:58:14 INFO mapred.JobClient: Task Id : attempt_201109071219_0001_m_000006_2, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    11/09/07 12:58:24 INFO mapred.JobClient: Task Id : attempt_201109071219_0001_m_000005_0, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    11/09/07 12:58:28 INFO mapred.JobClient: Task Id : attempt_201109071219_0001_m_000005_1, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    11/09/07 12:58:32 INFO mapred.JobClient: Task Id : attempt_201109071219_0001_m_000005_2, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
    at TestDriver.main(TestDriver.java:41)

  158. Michael Said,

    September 23, 2011 @ 5:41 am

    Hi Vlad,
    I followed your tutorial, however using more current versions of Hadoop (0.20.203.0) and Eclipse (3.3.0 and 3.7.0). The first problem that I encountered is not related to Eclipse. In the section “Start the local hadoop cluster” the command

    bin/hadoop tasktracker

    gives me the following errors:

    11/09/21 08:28:25 ERROR mapred.TaskTracker: Can not start task tracker because java.io.IOException: Failed to set permissions of path: /tmp/hadoop-Michael/mapred/local/ttprivate to 0700
    at org.apache.hadoop.fs.RawLocalFileSystem.checkReturnValue(RawLocalFileSystem.java:525)
    at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:499)
    at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:318)
    at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:183)
    at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:635)
    at org.apache.hadoop.mapred.TaskTracker.(TaskTracker.java:1328)
    at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3430)

    I tried to Google it but couldn’t find the way to get around it. Do you happen to know how to fix it. I’m getting the same error when running the test project in Eclipse as Java Application. Running on Hadoop doesn’t work for me neither with the same results in v. 3.3 and 3.7 by bringing up a list of classes to run as Java App.
    Thank you in advance.

  159. Aditya Said,

    September 28, 2011 @ 3:43 am

    Hi Vlad,

    I followed your tutorial. It was really helpful.
    I m stuck with the following.

    I m able to start sshd service and loggen on to local host but When I execute start-df.sh or start-mapred.sh it shows me error that
    “daemon wants to run as MY_USERNAME but not running as that user or root”

    and in the log files it says that the storage directory “\hadoopBase\aditya\namenode\dfs\name is in an inconsistent state: storage directory does not exist or is not accessible.”

    I looked in that directory. That directory was not even present there. No matter how many times I run the command
    “Hadoop namenode –format”
    it always says that namenode formatted successfully but the directory or the files in it are never actually created.
    I think that is the main problem. I would really appreciate if you could help me with it.

  160. Parita Said,

    October 1, 2011 @ 10:36 pm

    Hey my hadoop is not getting unpacked properly,it is showing ‘reached the end of file’.And as a result the namenode is not getting formatted.Pls help!

  161. Amit Said,

    October 5, 2011 @ 1:53 am

    Hi,
    Can you please step by step guide, how to convert Mysql table to hadoop, how to create table in hadoop.

  162. Prashant Said,

    October 17, 2011 @ 11:41 pm

    Hi Vlad,
    I am getting error here

    $ bin/hadoop fs -help
    bin/hadoop: line 243: C:\Program: command not found
    bin/hadoop: line 273: C:\Program File\Java\jdk1.6.0_27/bin/java: No such file or
    directory
    bin/hadoop: line 273: exec: C:\Program File\Java\jdk1.6.0_27/bin/java: cannot ex
    ecute: No such file or directory

    kindly help..

  163. jeris Said,

    October 26, 2011 @ 2:32 pm

    I tried to navigate to other directory using the cd command in the cygwin command prompt but the directory doesn’t change as a result of which i am stuck in the middle of configuration. Could you please say what can the possible reason behind this be???

  164. Harish Said,

    November 7, 2011 @ 8:40 pm

    Hi ,

    Will running bin/hadoop namenode format

    Actually formats my physical drive ? Will I lose the contents already in that ?

  165. vlad Said,

    November 11, 2011 @ 6:59 am

    Namenode format formats your hdfs system not the physical drive. So you will not lose your drive contents.

    HOWEVER, if you previously had HDFS filesystem on that drive it will erase all your hadoop data. So be careful with that. If it is your first Hadoop installation you obviously won’t have this problem. Otherwise make some arrangements to backup your data.

  166. vlad Said,

    November 11, 2011 @ 7:01 am

    Sorry, this is way out of scope for this tutorial. Hadoop doesn’t even have tables in the SQL sense of way. It is designed to process data from the streams of records.

  167. vlad Said,

    November 11, 2011 @ 7:02 am

    Something is wrong with your archive. Try to download it again. Also check how much free space you have on your drive and if you have any disk quotas turned on.

  168. vlad Said,

    November 11, 2011 @ 7:04 am

    This tutorial will not run with newer versions of hadoop. Configration files in Hadoop 0.20 have changed dramatically. And the eclipse plugin used in this version of the tutorial will not work in newer versions of eclipse.

  169. Jim Said,

    November 15, 2011 @ 2:57 pm

    Which Eclipse- for C/C+, Java Devs, RCP/Plug-in or Java EE?

  170. Bhavesh Said,

    November 22, 2011 @ 2:45 am

    Hi vlad,
    I want to ask just one thing that –
    Is Hadoop will be useful in Data Retrieval and analyzing
    from very large database?

  171. Bhavesh Said,

    November 22, 2011 @ 3:00 am

    Do we need to install hadoop seperately through cygwin or will these steps do that?

  172. omprakash Said,

    November 22, 2011 @ 11:58 am

    thanks for the tutorial
    i followed all the properly but when i was trying to start the jobtracker and tasktracker…they are not getting started

    plz help me to solve this issue
    thank u..

  173. Paul Said,

    December 1, 2011 @ 3:49 am

    Thanks for the tutorial!!
    When uploading data in HDFS(12th step) I type
    bin/hadoop fs -mkdir In
    its showing
    bin/hadoop: line 243: C:\Program: command not found
    mkdir: cannot create directory In: File exists
    What do I do now??
    Thanks once again.

  174. vlad Said,

    December 1, 2011 @ 4:32 am

    Looks like your Java is installed into default location in C:\Program Files\, this breaks cygwin scripts. I suggest moving your Java install to something like C:\Java and adjusting your code accordingly.

    You can just install another copy of JDK, it is not very big by modern standards, and it will solve your problem without breaking the rest of the system.

  175. vlad Said,

    December 1, 2011 @ 4:33 am

    Java Devs is sufficient, Java EE might come useful.

  176. Paul Said,

    December 1, 2011 @ 1:43 pm

    Hi,

    I installed java jdk in c:\JAVA again. But the error persists!!

    Any other ways to get rid of this error??

    Thanks.

  177. Bhavesh Said,

    December 1, 2011 @ 9:49 pm

    When I run enter the query on Hive CLI, I get the errors as below:

    hive> SELECT a.foo FROM invites a WHERE a.ds=’2008-08-15′;

    Total MapReduce jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks is set to 0 since there’s no reduce operator
    Starting Job = job_201111291547_0013, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201111291547_0013
    Kill Command = C:\cygwin\home\Bhavesh.Shah\hadoop-0.20.2\/bin/hadoop job -Dmapred.job.tracker=localhost:9101 -kill job_201111291547_0013
    2011-12-01 14:00:52,380 Stage-1 map = 0%, reduce = 0%
    2011-12-01 14:01:19,518 Stage-1 map = 100%, reduce = 100%
    Ended Job = job_201111291547_0013 with errors
    FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

    So my question is that how to stop a job? In this case the job is : job_201111291547_0013
    Pls help me out so that I can remove these errors and try for next.
    Thanks.

  178. Jacky Hou Said,

    December 7, 2011 @ 3:54 am

    Hi, Vlad!
    Thank you for your tutorial.
    I had followed every step and everything comes good, but when I create the file of testDriver, threr emerged many errors, many of them are deprecated. and when I execute the final step, running the project, there is nothing happened.
    my email is
    Pls help…

  179. vlad Said,

    December 7, 2011 @ 4:19 pm

    The code generated by the plugin is not compatible with newer versions of the hadoop. Hence the deprecated errors. You need a few changes to the code to make it work with current versions.

  180. Annamalai Said,

    December 21, 2011 @ 8:10 pm

    Thank you Vlad for such a nice tutorial,
    Can you help me resolving the following error,

    I understand the Out directory exists already. But where can i check for this directory and delete it before rerunning the job.

    11/12/21 20:25:14 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:9100/user/anna-pc/anna/Out already exists
    at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:111)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:793)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142)
    at TestDriver.main(TestDriver.java:43)

  181. Yuriy Said,

    December 23, 2011 @ 1:58 am

    As I see the most common error after configuring Hadoop and Eclipse on Windows is as follows:
    java.io.FileNotFoundException: File C:/tmp/hadoop/mapred/local/taskTracker/jobcache/job_201112221128_0001/attempt_201112221128_0001_m_000001_1/work/tmp does not exist.
    As it was already mentioned, it’s likely to be caused by incompatibility of hadoop-eclipse plugin with versions of Hadoop and Eclipse newer than mentioned in the tutorial.

    However I’ve got the following configuration working on Windows with hadoop-0.20.2, Eclipse Indigo 3.7.1 and hadoop-eclipse-plugin-0.20.3-SNAPSHOT

  182. Yuriy Said,

    December 23, 2011 @ 1:59 am

    1) jdk1.7.0_01
    2) I configured paths by commands
    echo ‘export JAVA_HOME=/usr/local/java/jdk1.10_01 > /etc/profile.d/jdk.sh
    echo ‘export PATH=$JAVA_HOME/bin:$PATH’ >>
    /etc/profile.d/jdk.sh
    source /etc/profile.d/jdk.sh
    (I copied the jdk1.7.0_01 directory from C:\Program Files\Java to /usr/java to avoid spaces in paths)
    3) hadoop-env.sh
    export JAVA_HOME=/usr/local/java/jdk1.7.0_01

  183. Yuriy Said,

    December 23, 2011 @ 2:06 am

    4)add this to mapred-site.xml:

    mapred.tasktracker.map.tasks.maximum
    1

    mapred.child.tmp
    /tmp/hadoop/mapred/mapred.child.tmp

  184. Yuriy Said,

    December 23, 2011 @ 2:06 am

    5) eclipse.ini: (modify teh lines after -vmargs)
    -Xms512m
    -Xmx1024m
    -XX:MaxPermSize=256m
    -Dosgi.classloader.lock=classname
    -Dosgi.requiredJavaVersion=1.7
    6) replace hadoop-0.20.1-eclipse-plugin in eclipse/plugins directory by hadoop-eclipse-plugin-0.20.3-SNAPSHOT

    I’m not strongly convinced that all of these configurations are obligitory for normal processing. I’ve only shared mine.
    I’ve also haven’t tested versions of hadoop higher than 0.20.2

  185. Yuriy Said,

    December 23, 2011 @ 2:10 am

    As I see the most common error after configuring Hadoop and Eclipse on Windows is as follows:
    java.io.FileNotFoundException: File C:/tmp/hadoop/mapred/local/taskTracker/jobcache/job_201112221128_0001/attempt_201112221128_0001_m_000001_1/work/tmp does not exist.
    As it was already mentioned, it’s likely to be caused by incompatibility of hadoop-eclipse plugin with versions of Hadoop and Eclipse newer than mentioned in the tutorial.

    However I’ve got the following configuration working on Windows with hadoop-0.20.2, Eclipse Indigo 3.7.1 and hadoop-eclipse-plugin-0.20.3-SNAPSHOT

    1) jdk1.7.0_01
    2) I configured paths by commands
    echo ‘export JAVA_HOME=/usr/local/java/jdk1.10_01 > /etc/profile.d/jdk.sh
    echo ‘export PATH=$JAVA_HOME/bin:$PATH’ >>
    /etc/profile.d/jdk.sh
    source /etc/profile.d/jdk.sh
    (I copied the jdk1.7.0_01 directory from C:\Program Files\Java to /usr/java to avoid spaces in paths)
    3) hadoop-env.sh
    export JAVA_HOME=/usr/local/java/jdk1.7.0_01
    4)add this propeties to mapred-site.xml:

    mapred.tasktracker.map.tasks.maximum
    1

    mapred.child.tmp
    /tmp/hadoop/mapred/mapred.child.tmp
    5) eclipse.ini: (modify teh lines after -vmargs)
    -Xms512m
    -Xmx1024m
    -XX:MaxPermSize=256m
    -Dosgi.classloader.lock=classname
    -Dosgi.requiredJavaVersion=1.7
    6) replace hadoop-0.20.1-eclipse-plugin in eclipse/plugins directory by hadoop-eclipse-plugin-0.20.3-SNAPSHOT

    I’m not strongly convinced that all of these configurations are obligitory for normal processing. I’ve only shared mine.
    I’ve also haven’t tested versions of hadoop higher than 0.20.2

  186. Yuriy Said,

    December 24, 2011 @ 1:41 am

    As I see the most common error after configuring Hadoop and Eclipse on Windows is as follows:
    java.io.FileNotFoundException: File C:/tmp/hadoop/mapred/local/taskTracker/jobcache/job_201112221128_0001/attempt_201112221128_0001_m_000001_1/work/tmp does not exist.
    As it was already mentioned, it’s likely to be caused by incompatibility of hadoop-eclipse plugin with versions of Hadoop and Eclipse newer than mentioned in the tutorial.

    However I’ve got the following configuration working on Windows with hadoop-0.20.2, Eclipse Indigo 3.7.1 and hadoop-eclipse-plugin-0.20.3-SNAPSHOT

    1) jdk1.7.0_01
    2) I configured paths by commands
    echo ‘export JAVA_HOME=/usr/local/java/jdk1.10_01 > /etc/profile.d/jdk.sh
    echo ‘export PATH=$JAVA_HOME/bin:$PATH’ >>
    /etc/profile.d/jdk.sh
    source /etc/profile.d/jdk.sh
    (I copied the jdk1.7.0_01 directory from C:\Program Files\Java to /usr/java to avoid spaces in paths)
    3) hadoop-env.sh
    export JAVA_HOME=/usr/local/java/jdk1.7.0_01
    4) add these propeties to mapred-site.xml:
    mapred.tasktracker.map.tasks.maximum
    1

    mapred.child.tmp
    /tmp/hadoop/mapred/mapred.child.tmp

    5) eclipse.ini: (modify the lines after -vmargs)
    -Xms512m
    -Xmx1024m
    -XX:MaxPermSize=256m
    -Dosgi.classloader.lock=classname
    -Dosgi.requiredJavaVersion=1.7
    6) replace hadoop-0.20.1-eclipse-plugin in eclipse/plugins directory by hadoop-eclipse-plugin-0.20.3-SNAPSHOT

    I’m not strongly convinced that all of these configurations are obligitory for normal processing. I’ve only shared mine.
    I’ve also haven’t tested versions of hadoop higher than 0.20.2

  187. Valon Said,

    December 24, 2011 @ 9:29 am

    Hi, thanks again for this tutorial. Could this plugin be used for the Standalone Mode? I can run the Hadoop-0.20.2 examples without starting the 5 separate Cygwin windows.

    By the way I am using Hadoop-0.20.2 + Cygwin + Standalone Mode and would like to use Eclipse Indigo. I found plugins for hadoop-0.20.1, .2 and .3 but not sure if they were patched for newer version of Eclipse.

  188. MK Said,

    December 27, 2011 @ 5:50 pm

    Hi,

    Thanks for the awesome tutorial!! I have My eclipse 6.0 at my workplace and I have done all the steps that you have shown but when I open Eclipse and try to open the Map/Reduce perspective Eclipse flags an error saying “Problems opening perspective ‘org.apache.hadoop.eclipse.Perspective” Please Help!!

    Thanks!!

  189. Faten Said,

    December 30, 2011 @ 1:07 pm

    Hi, Vlad!
    Thank you for your tutorial.
    I had followed every step and everything comes good,thanks again :) ,I hope that you can help me,I am new in programming with Hadoop and after installing is done I don’t know what to do after that,please help me and thanks again,all what I want to know is how to make the hadoop server receive and send data to the client

  190. Vikram Said,

    January 9, 2012 @ 3:03 am

    Hi,
    I am getting following error while configuring ssh.
    cygrunsrv: Error installing a service: OpenSCManager: Win32 error 5:
    Access is denied.
    I have admin rights on my machine.I have Windows 7 OS.

    Please suggest what could be the problem
    TIA :)

  191. Priyanka Said,

    January 11, 2012 @ 3:23 am

    Hi,
    Thank you for this tutorial
    I have hadoop0.20.203.0 version.and i am not finding hadoop-site file inside conf folder?What to do now??

  192. vlad Said,

    January 11, 2012 @ 3:37 am

    This tutorial is not compatible with hadoop version 18 and above. The newer versions of Hadoop are not compatible with the eclipse plugin.

  193. Priyanka Said,

    January 11, 2012 @ 3:45 am

    As i read about hadoop that in hadoop0.20 hadoop-site.xml has been divided into three files namely core-site.xml,hdfs-site.xml,mapred.xml,Is i can use the settings that u provided to configure eclipse with all three files

  194. vlad Said,

    January 11, 2012 @ 3:50 am

    Still, it is not going to work with Eclipse. You can use this tutorial to get ideas how to setup hadoop on Windows. But, it will not be compatible with eclipse plugin. There is an updated version of the plugin somewhere on IBM developerworks site, not sure exactly where it is. Also, keep in mind that newer version of hadoop use a lot of native Linux FS features, so your performance and stability is not going to be great on Windows.

  195. Priyanka Said,

    January 11, 2012 @ 3:57 am

    Thanks a lot for the information..Now i will try to work with hadoop older version.

  196. Priyanka Said,

    January 11, 2012 @ 4:26 am

    $ bin/hadoop namenode -format
    After typing this command,I am getting-

    bin/hadoop: line 243: C:\Program: command not found
    bin/hadoop: line 273: C:\Program Files\Java\jdk1.6.0\bin;/bin/java: No such file or directory
    bin/hadoop: line 273: exec: C:\Program Files\Java\jdk1.6.0\bin;/bin/java: cannot execute: No such file or directory
    My java path is C:\Program Files\Java\jdk1.6.0\bin.What is to do??
    Thanks in advance..

  197. vlad Said,

    January 11, 2012 @ 12:57 pm

    You have your JDK installed into C:\Program Files. That drives cygwin bash scripts crazy. Best way to fix that is to install JDK into C:\Java, also make sure that the path to C:\Java\Bin comes first in your PATH variable settings. You can see how to configure you path variable in the tutorial itself.

  198. vlad Said,

    January 13, 2012 @ 3:39 am

    The directory is in HDFS. Use ‘hadoop fs’ command to manipulate it.

  199. Burcu Said,

    January 18, 2012 @ 1:24 am

    Hi,

    I run the code and it works. I use this command to see the output file.

    “bin/hadoop fs -cat Out2/*”

    The input files are shown and after end of files; it states “cat: Source must be a file.”
    What is wrong here?

    Thanks,
    ———-
    (…)
    294366 49. HADOOP-96. Logging improvements. Log files are now separate from
    294437 standard output and standard error files. Logs are now rolled.
    294505 Logging of all DFS state changes can be enabled, to facilitate
    294572 debugging. (Hairong Kuang via cutting)
    294616
    294617
    294618 Release 0.1.1 – 2006-04-08
    294645
    294646 1. Added CHANGES.txt, logging all significant changes to Hadoop. (cutt ing)
    294723
    294724 2. Fix MapReduceBase.close() to throw IOException, as declared in the
    294795 Closeable interface. This permits subclasses which override this
    294865 method to throw that exception. (cutting)
    294911
    294912 3. Fix HADOOP-117. Pathnames were mistakenly transposed in
    294973 JobConf.getLocalFile() causing many mapred temporary files to not
    295043 be removed. (Raghavendra Prabhu via cutting)
    295093
    295095 4. Fix HADOOP-116. Clean up job submission files when jobs complete.
    295165 (cutting)
    295179
    295180 5. Fix HADOOP-125. Fix handling of absolute paths on Windows (cutting)
    295252
    295253 Release 0.1.0 – 2006-04-01
    295280
    295281 1. The first release of Hadoop.
    295314
    cat: Source must be a file.

  200. vlad Said,

    January 18, 2012 @ 6:23 am

    Probably you have some non-file entity besides the text file you are cating..

  201. Krishna Kishore Vangavolu Said,

    January 25, 2012 @ 1:07 am

    Hi Here is the Issue fix for
    java.io.FileNotFoundException: File C:/tmp/hadoop-kvangavolu/mapred/local/taskTracker/jobcache/job_201201251217_0003/attempt_201201251217_0003_m_000006_3/work/tmp does not exist.
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420)

    Fix :
    1. Put the following chunk in the hadoop-site.xml:

    mapred.child.tmp
    C:\tmp

    2. Also change the property value of ‘mapred.child.tmp’ to C:\tmp in Edit Hadoop Location -> Advanced parameters of eclipse.

    This should fix the FileNotFound issues

  202. Sandeep Said,

    January 27, 2012 @ 10:14 pm

    Vlad,

    With your tutorial I was able to setup hadoop. But failed to run ‘TestDriver’ example. I have been searching web over a week for the solution, but hard luck. Can you please help me for the below error I get.

    >>Err>>>>>>>>>>>>>>>

    Meta VERSION=”1″ .
    Job JOBID=”job_201201280957_0001″ JOBNAME=”Hadoop Test_TestDriver\.java-2996741353767810850\.jar” USER=”Administrator” SUBMIT_TIME=”1327725089199″ JOBCONF=”hdfs://localhost:9100/tmp/hadoop-Administrator/mapred/system/job_201201280957_0001/job\.xml” .
    Job JOBID=”job_201201280957_0001″ JOB_PRIORITY=”NORMAL” .
    Job JOBID=”job_201201280957_0001″ LAUNCH_TIME=”1327725089918″ TOTAL_MAPS=”5″ TOTAL_REDUCES=”1″ JOB_STATUS=”PREP” .
    Task TASKID=”task_201201280957_0001_m_000006″ TASK_TYPE=”SETUP” START_TIME=”1327725092793″ SPLITS=”" .
    MapAttempt TASK_TYPE=”SETUP” TASKID=”task_201201280957_0001_m_000006″ TASK_ATTEMPT_ID=”attempt_201201280957_0001_m_000006_0″ START_TIME=”1327725093339″ TRACKER_NAME=”tracker_192\.168\.65\.129:localhost/127\.0\.0\.1:1276″ HTTP_PORT=”50060″ .
    MapAttempt TASK_TYPE=”SETUP” TASKID=”task_201201280957_0001_m_000006″ TASK_ATTEMPT_ID=”attempt_201201280957_0001_m_000006_0″ TASK_STATUS=”FAILED” FINISH_TIME=”1327725097996″ HOSTNAME=”tracker_192\.168\.65\.129″ ERROR=”java\.io\.IOException: Task process exit with nonzero status of 1\.
    at org\.apache\.hadoop\.mapred\.TaskRunner\.run(TaskRunner\.java:425)
    ” .
    MapAttempt TASK_TYPE=”SETUP” TASKID=”task_201201280957_0001_m_000006″ TASK_ATTEMPT_ID=”attempt_201201280957_0001_m_000006_1″ START_TIME=”1327725098089″ TRACKER_NAME=”tracker_192\.168\.65\.129:localhost/127\.0\.0\.1:1276″ HTTP_PORT=”50060″ .
    MapAttempt TASK_TYPE=”SETUP” TASKID=”task_201201280957_0001_m_000006″ TASK_ATTEMPT_ID=”attempt_201201280957_0001_m_000006_1″ TASK_STATUS=”FAILED” FINISH_TIME=”1327725101949″ HOSTNAME=”tracker_192\.168\.65\.129″ ERROR=”java\.io\.IOException: Task process exit with nonzero status of 1\.
    at org\.apache\.hadoop\.mapred\.TaskRunner\.run(TaskRunner\.java:425)
    ” .
    MapAttempt TASK_TYPE=”SETUP” TASKID=”task_201201280957_0001_m_000006″ TASK_ATTEMPT_ID=”attempt_201201280957_0001_m_000006_2″ START_TIME=”1327725102011″ TRACKER_NAME=”tracker_192\.168\.65\.129:localhost/127\.0\.0\.1:1276″ HTTP_PORT=”50060″ .
    MapAttempt TASK_TYPE=”SETUP” TASKID=”task_201201280957_0001_m_000006″ TASK_ATTEMPT_ID=”attempt_201201280957_0001_m_000006_2″ TASK_STATUS=”FAILED” FINISH_TIME=”1327725105855″ HOSTNAME=”tracker_192\.168\.65\.129″ ERROR=”java\.io\.IOException: Task process exit with nonzero status of 1\.
    at org\.apache\.hadoop\.mapred\.TaskRunner\.run(TaskRunner\.java:425)
    ” .
    MapAttempt TASK_TYPE=”SETUP” TASKID=”task_201201280957_0001_m_000006″ TASK_ATTEMPT_ID=”attempt_201201280957_0001_m_000006_3″ START_TIME=”1327725105918″ TRACKER_NAME=”tracker_192\.168\.65\.129:localhost/127\.0\.0\.1:1276″ HTTP_PORT=”50060″ .
    MapAttempt TASK_TYPE=”SETUP” TASKID=”task_201201280957_0001_m_000006″ TASK_ATTEMPT_ID=”attempt_201201280957_0001_m_000006_3″ TASK_STATUS=”FAILED” FINISH_TIME=”1327725109746″ HOSTNAME=”tracker_192\.168\.65\.129″ ERROR=”java\.io\.IOException: Task process exit with nonzero status of 1\.
    at org\.apache\.hadoop\.mapred\.TaskRunner\.run(TaskRunner\.java:425)
    ” .
    Task TASKID=”task_201201280957_0001_m_000006″ TASK_TYPE=”SETUP” TASK_STATUS=”FAILED” FINISH_TIME=”1327725109746″ ERROR=”java\.io\.IOException: Task process exit with nonzero status of 1\.
    at org\.apache\.hadoop\.mapred\.TaskRunner\.run(TaskRunner\.java:425)
    ” TASK_ATTEMPT_ID=”" .
    Task TASKID=”task_201201280957_0001_m_000005″ TASK_TYPE=”CLEANUP” START_TIME=”1327725109746″ SPLITS=”" .
    MapAttempt TASK_TYPE=”CLEANUP” TASKID=”task_201201280957_0001_m_000005″ TASK_ATTEMPT_ID=”attempt_201201280957_0001_m_000005_0″ START_TIME=”1327725109793″ TRACKER_NAME=”tracker_192\.168\.65\.129:localhost/127\.0\.0\.1:1276″ HTTP_PORT=”50060″ .
    MapAttempt TASK_TYPE=”CLEANUP” TASKID=”task_201201280957_0001_m_000005″ TASK_ATTEMPT_ID=”attempt_201201280957_0001_m_000005_0″ TASK_STATUS=”FAILED” FINISH_TIME=”1327725113668″ HOSTNAME=”tracker_192\.168\.65\.129″ ERROR=”java\.io\.IOException: Task process exit with nonzero status of 1\.
    at org\.apache\.hadoop\.mapred\.TaskRunner\.run(TaskRunner\.java:425)
    ” .
    MapAttempt TASK_TYPE=”CLEANUP” TASKID=”task_201201280957_0001_m_000005″ TASK_ATTEMPT_ID=”attempt_201201280957_0001_m_000005_1″ START_TIME=”1327725113730″ TRACKER_NAME=”tracker_192\.168\.65\.129:localhost/127\.0\.0\.1:1276″ HTTP_PORT=”50060″ .
    MapAttempt TASK_TYPE=”CLEANUP” TASKID=”task_201201280957_0001_m_000005″ TASK_ATTEMPT_ID=”attempt_201201280957_0001_m_000005_1″ TASK_STATUS=”FAILED” FINISH_TIME=”1327725117543″ HOSTNAME=”tracker_192\.168\.65\.129″ ERROR=”java\.io\.IOException: Task process exit with nonzero status of 1\.
    at org\.apache\.hadoop\.mapred\.TaskRunner\.run(TaskRunner\.java:425)
    ” .
    MapAttempt TASK_TYPE=”CLEANUP” TASKID=”task_201201280957_0001_m_000005″ TASK_ATTEMPT_ID=”attempt_201201280957_0001_m_000005_2″ START_TIME=”1327725117605″ TRACKER_NAME=”tracker_192\.168\.65\.129:localhost/127\.0\.0\.1:1276″ HTTP_PORT=”50060″ .
    MapAttempt TASK_TYPE=”CLEANUP” TASKID=”task_201201280957_0001_m_000005″ TASK_ATTEMPT_ID=”attempt_201201280957_0001_m_000005_2″ TASK_STATUS=”FAILED” FINISH_TIME=”1327725122027″ HOSTNAME=”tracker_192\.168\.65\.129″ ERROR=”java\.io\.IOException: Task process exit with nonzero status of 1\.
    at org\.apache\.hadoop\.mapred\.TaskRunner\.run(TaskRunner\.java:425)
    ” .
    MapAttempt TASK_TYPE=”CLEANUP” TASKID=”task_201201280957_0001_m_000005″ TASK_ATTEMPT_ID=”attempt_201201280957_0001_m_000005_3″ START_TIME=”1327725122058″ TRACKER_NAME=”tracker_192\.168\.65\.129:localhost/127\.0\.0\.1:1276″ HTTP_PORT=”50060″ .
    MapAttempt TASK_TYPE=”CLEANUP” TASKID=”task_201201280957_0001_m_000005″ TASK_ATTEMPT_ID=”attempt_201201280957_0001_m_000005_3″ TASK_STATUS=”FAILED” FINISH_TIME=”1327725125949″ HOSTNAME=”tracker_192\.168\.65\.129″ ERROR=”java\.io\.IOException: Task process exit with nonzero status of 1\.
    at org\.apache\.hadoop\.mapred\.TaskRunner\.run(TaskRunner\.java:425)
    ” .
    Task TASKID=”task_201201280957_0001_m_000005″ TASK_TYPE=”CLEANUP” TASK_STATUS=”FAILED” FINISH_TIME=”1327725125949″ ERROR=”java\.io\.IOException: Task process exit with nonzero status of 1\.
    at org\.apache\.hadoop\.mapred\.TaskRunner\.run(TaskRunner\.java:425)
    ” TASK_ATTEMPT_ID=”" .
    Job JOBID=”job_201201280957_0001″ FINISH_TIME=”1327725125996″ JOB_STATUS=”FAILED” FINISHED_MAPS=”0″ FINISHED_REDUCES=”0″ .

    >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

    at http://localhost:50060/logs/userlogs/attempt_201201261350_0001_m_000005_0/syslog, I see following log

    2012-01-26 14:10:33,259 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
    java.io.FileNotFoundException: File C:/tmp/hadoop-Administrator/mapred/local/taskTracker/jobcache/job_201201261350_0001/attempt_201201261350_0001_m_000005_0/work/tmp does not exist.
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:244)
    at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:527)
    at org.apache.hadoop.mapred.Child.main(Child.java:143)
    2012-01-26 14:10:33,275 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the task

    Thanks!
    Sandeep.

  203. Antony Said,

    January 30, 2012 @ 9:19 pm

    Hello Vlad,

    Thanks for your tutorial!!
    I am running into issues when I try to set up the Hadoop plugin on eclipse.
    There error is: problem opening perspective ‘org.apache.hadoop.eclipse.Perspective’.
    My JDK is jdk1.6.0_25

  204. Antony Said,

    January 31, 2012 @ 12:44 pm

    Nevermind, my path was set to JDK 1.5

  205. Alex Said,

    February 4, 2012 @ 8:23 pm

    Thanx man, your tutorial is more than perfect.
    It is not easy to find tutorials like this.
    Everything worked for me, I tried to run WordCount example.

    Keep up the fantastic work!

  206. sina Said,

    February 15, 2012 @ 5:26 am

    hi,Can i execute hadoop on windowsXP with NetBean??
    How Can i do it?(execute on winxp with netbean)
    i installed hadoop plugin in my netbean.

  207. Alfan Said,

    February 16, 2012 @ 7:29 am

    First, thank you for giving such nice and great tutorial :)

    I’ve followed your instructions step-by-step. and it was success until 10th page. Problem happened when i was trying to follow your instruction in 11st page.

    after the step number 6 (page 11), Project explorer on my eclipse doesn’t want to show the HDFS structure…

    it was..
    DFS Locations
    —> localhost
    ——-> (1)
    ————> (node: null)

    Please kindly reply my message :)

    thank you very much for your attention,

  208. usesay Said,

    February 16, 2012 @ 8:49 pm

    hi Vlad,

    Many thanks for the tutorial.
    I have Win 7 64-bit, Hadoop0.203.00 and Eclipse 3.7 and Plug-in for hadoop-0.203.jar
    I had issues with my eclipse like this:

    An internal error occurred during: “Connecting to DFS VMware server”.org/apache/commons/configuration/Configuration

    i appreciate your help.

    regards,

    Unisa

  209. vlad Said,

    February 17, 2012 @ 1:47 pm

    Hi,

    The tutorial is written for Hadoop 18 and Eclipse 3.4. The eclipse plugin does not work in newer versions of the eclipse.

  210. vlad Said,

    February 17, 2012 @ 1:48 pm

    Check your eclipse version. The plugin supplied with the hadoop is for the Eclipse Galileo it does not work for newer version. And the symptos are exactly as you described above.

  211. usesay Said,

    February 18, 2012 @ 1:26 am

    Hi Vlad,

    Many thanks, Infact i crased my laptop so i had to rrun everything afresh. In so doing, i got into trouble again with Cygwin…i got these Commanp prompt like this:
    unisa@unisa-PC ~/hadoop-0.20.203.0
    $ bin/hadoop namenode -format
    bin/hadoop: line 297: c:/Program: No such file or directory
    12/02/18 07:37:56 INFO namenode.NameNode: STARTUP_MSG:
    /************************************************************
    STARTUP_MSG: Starting NameNode
    STARTUP_MSG: host = unisa-PC/192.168.163.1
    STARTUP_MSG: args = [-format]
    STARTUP_MSG: version = 0.20.203.0
    STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203 -r 1099333; compiled by ‘oom’ on Wed May 4 07:57:50 PDT 2011
    ************************************************************/
    Re-format filesystem in \tmp\hadoop-unisa\dfs\name ? (Y or N) y
    Format aborted in \tmp\hadoop-unisa\dfs\name
    12/02/18 07:37:58 INFO namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at unisa-PC/192.168.163.1

    Why line 279

  212. Vijay jadhav Said,

    February 19, 2012 @ 10:11 pm

    First I would like to say Thank you for this all step………
    but I have problem when I entered following command in 8th step … I got this error, i am not able to get answer which expected..
    what is worng plz help me.
    Administrator@nile ~
    $ cd hadoop-0.19.1

    Administrator@nile ~/hadoop-0.19.1
    $ bin/hadoop namenode -format
    cygwin warning:
    MS-DOS style path detected: C:\cygwin\home\ADMINI~1\HADOOP~1.1\/build/native
    Preferred POSIX equivalent is: /home/ADMINI~1/HADOOP~1.1/build/native
    CYGWIN environment variable option “nodosfilewarning” turns off this warning.
    Consult the user’s guide for more details about POSIX paths:
    http://cygwin.com/cygwin-ug-net/using.html#using-pathnames
    bin/hadoop: line 243: C:\Program: command not found
    bin/hadoop: line 273: C:\Program Files\Java\jdk1.7.0_02;C:\cygwin\bin;C:\cygwin\usr\bin/bin/java: No such file or directory
    bin/hadoop: line 273: exec: C:\Program Files\Java\jdk1.7.0_02;C:\cygwin\bin;C:\cygwin\usr\bin/bin/java: cannot execute: No such file or directory

    Administrator@nile ~/hadoop-0.19.1
    $

  213. Vijay jadhav Said,

    February 20, 2012 @ 4:21 am

    Now I am facing this error when All setup are done correctly but at the last when I execute on Hadoop then I give this error ………plz help me…….

    12/02/20 16:33:24 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    12/02/20 16:33:26 INFO mapred.FileInputFormat: Total input paths to process : 4
    12/02/20 16:33:31 INFO mapred.JobClient: Running job: job_201202201507_0009
    12/02/20 16:33:32 INFO mapred.JobClient: map 0% reduce 0%
    12/02/20 16:33:48 INFO mapred.JobClient: Task Id : attempt_201202201507_0009_m_000006_0, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    12/02/20 16:33:56 INFO mapred.JobClient: Task Id : attempt_201202201507_0009_m_000006_1, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    12/02/20 16:34:04 INFO mapred.JobClient: Task Id : attempt_201202201507_0009_m_000006_2, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    12/02/20 16:34:20 INFO mapred.JobClient: Task Id : attempt_201202201507_0009_m_000005_0, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    12/02/20 16:34:27 INFO mapred.JobClient: Task Id : attempt_201202201507_0009_m_000005_1, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    12/02/20 16:34:36 INFO mapred.JobClient: Task Id : attempt_201202201507_0009_m_000005_2, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
    at TestDriver.main(TestDriver.java:40)

  214. kavya Said,

    February 20, 2012 @ 6:36 am

    This tutorial is really helpful. But i encountered an error i.e after entering the map/reduce locations i am not getting the folders as shown within the DFS locations. Why is that so??

  215. Salma Said,

    February 22, 2012 @ 4:50 pm

    I have been following your tutorial and I believe it’s excellent, but I’m having trouble strating the taskTracker and the jobTracker, they end up shutting down and I don’t know why; plus when I configure the MapReduce locations and try to expand the locahost it gives me 0, where did it go wrong ?
    Thank you again

  216. ZB Said,

    February 23, 2012 @ 5:49 am

    Hi Vlad,

    Thanks for excellent tutorial.

    My hadoop is running on the server, say 192.168.132.128. The mapred.job.tracker value is 192.168.132.128:8021. I am submitting job from via JobClient, with setting property,
    conf.set(“mapred.job.tracker”, “192.168.132.128:8021″);

    I am getting following logs and the the system does not give any response after the logs. It seems its waiting for some socket operation.

    Please give me the solution if you are aware. Many Thanks in Advance

    12/02/23 18:14:43 DEBUG ipc.Client: Connecting to /192.168.132.128:8021
    12/02/23 18:14:43 DEBUG ipc.Client: IPC Client (47) connection to /192.168.132.128:8021 from zb sending #1
    12/02/23 18:14:43 DEBUG ipc.Client: IPC Client (47) connection to /192.168.132.128:8021 from zb: starting, having connections 2
    12/02/23 18:14:48 DEBUG ipc.Client: IPC Client (47) connection to /192.168.132.128:8022 from zb: closed
    12/02/23 18:14:48 DEBUG ipc.Client: IPC Client (47) connection to /192.168.132.128:8022 from zb: stopped, remaining connections 1

  217. yuvraj Said,

    February 29, 2012 @ 2:25 am

    Hi I am getting an error when typing
    >>$ cat id_rsa.pub >> authorized_keys
    >>$ ssh localhost
    commands on cygwin prompt.do you have any solution?
    error is
    Connection closed by ::1

  218. yuvraj Said,

    February 29, 2012 @ 11:28 pm

    i am getting an error even after i disabled the firewall setting
    error:call to localhost/127.0.0.1:failed on connection exception java.net.Connection Exception:connection refused

    please help me out

  219. vlad Said,

    March 1, 2012 @ 11:29 am

    Hi,

    This error usually happens when nothing is listening on the port or a firewall is rejecting these connections. To check for the first run ‘netstat -na’ at the command prompt.

    If you do see active ports there, try to telnet or point your browser to localhost and see what happens. DOn’t forget to specify the port numbr.

  220. Shruthi Said,

    March 4, 2012 @ 10:35 pm

    Hi,
    I am trying to run an example program on hadoop in ubuntu and i am getting this following exception. kindly help me.

    12/03/05 10:56:14 INFO input.FileInputFormat: Total input paths to process : 1
    12/03/05 10:56:15 INFO mapred.JobClient: Running job: job_201203051040_0002
    12/03/05 10:56:16 INFO mapred.JobClient: map 0% reduce 0%
    12/03/05 10:56:37 INFO mapred.JobClient: Task Id : attempt_201203051040_0002_m_000002_0, Status : FAILED
    java.net.ConnectException: Call to shru-virtual-machine/192.168.56.101:9000 failed on connection exception: java.net.ConnectException: Connection refused
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)
    at org.apache.hadoop.ipc.Client.call(Client.java:743)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at $Proxy1.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
    at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:207)
    at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:170)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
    at org.apache.hadoop.mapred.Child.main(Child.java:168)
    Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:304)
    at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:860)
    at org.apache.hadoop.ipc.Client.call(Client.java:720)
    … 13 more

    12/03/05 10:56:54 INFO mapred.JobClient: Task Id : attempt_201203051040_0002_m_000002_1, Status : FAILED
    java.net.ConnectException: Call to shru-virtual-machine/192.168.56.101:9000 failed on connection exception: java.net.ConnectException: Connection refused
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)
    at org.apache.hadoop.ipc.Client.call(Client.java:743)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at $Proxy1.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
    at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:207)
    at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:170)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
    at org.apache.hadoop.mapred.Child.main(Child.java:168)
    Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:304)
    at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:860)
    at org.apache.hadoop.ipc.Client.call(Client.java:720)
    … 13 more

    12/03/05 10:57:13 INFO mapred.JobClient: Task Id : attempt_201203051040_0002_m_000002_2, Status : FAILED
    java.net.ConnectException: Call to shru-virtual-machine/192.168.56.101:9000 failed on connection exception: java.net.ConnectException: Connection refused
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)
    at org.apache.hadoop.ipc.Client.call(Client.java:743)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at $Proxy1.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
    at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:207)
    at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:170)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
    at org.apache.hadoop.mapred.Child.main(Child.java:168)
    Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:304)
    at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:860)
    at org.apache.hadoop.ipc.Client.call(Client.java:720)
    … 13 more

    12/03/05 10:57:52 INFO mapred.JobClient: Task Id : attempt_201203051040_0002_m_000001_0, Status : FAILED
    java.net.ConnectException: Call to shru-virtual-machine/192.168.56.101:9000 failed on connection exception: java.net.ConnectException: Connection refused
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)
    at org.apache.hadoop.ipc.Client.call(Client.java:743)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at $Proxy1.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
    at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:207)
    at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:170)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
    at org.apache.hadoop.mapred.Child.main(Child.java:168)
    Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:304)
    at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:860)
    at org.apache.hadoop.ipc.Client.call(Client.java:720)
    … 13 more

    12/03/05 10:58:10 INFO mapred.JobClient: Task Id : attempt_201203051040_0002_m_000001_1, Status : FAILED
    java.net.ConnectException: Call to shru-virtual-machine/192.168.56.101:9000 failed on connection exception: java.net.ConnectException: Connection refused
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)
    at org.apache.hadoop.ipc.Client.call(Client.java:743)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at $Proxy1.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
    at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:207)
    at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:170)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
    at org.apache.hadoop.mapred.Child.main(Child.java:168)
    Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:304)
    at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:860)
    at org.apache.hadoop.ipc.Client.call(Client.java:720)
    … 13 more

    12/03/05 10:58:29 INFO mapred.JobClient: Task Id : attempt_201203051040_0002_m_000001_2, Status : FAILED
    java.net.ConnectException: Call to shru-virtual-machine/192.168.56.101:9000 failed on connection exception: java.net.ConnectException: Connection refused
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)
    at org.apache.hadoop.ipc.Client.call(Client.java:743)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at $Proxy1.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
    at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:207)
    at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:170)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
    at org.apache.hadoop.mapred.Child.main(Child.java:168)
    Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:304)
    at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:860)
    at org.apache.hadoop.ipc.Client.call(Client.java:720)
    … 13 more

    12/03/05 10:58:47 INFO mapred.JobClient: Job complete: job_201203051040_0002
    12/03/05 10:58:47 INFO mapred.JobClient: Counters: 0

  221. Salma Said,

    March 6, 2012 @ 4:45 am

    I am trying to execute a project on hadoop-0.20.2 but the plugin provided from ecplise was nto working properly (when run on hadoop, the Select Hadoop Location is not displayed) so I replaced with the one for hadoop-0.20.3, the window is displayed now, but the project does not give the right output, this is what it gives me (below) do you have any idea on what is the nature of this problem ?

    12/03/06 11:45:04 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
    12/03/06 11:45:04 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    12/03/06 11:45:04 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
    org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/C:/Users/Salma/workspace/HadoopTest/In
    at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190)
    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201)
    at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
    at TestDriver.main(TestDriver.java:43)

  222. Kirankumar Shendage Said,

    March 6, 2012 @ 4:55 am

    How can i Integrate Hadoop hadoop-0.20.203.0 with Eclipse Helios.
    While Integrate them i got the following error while adding the dfs location in Eclipse.

    “An internal error occurred during: “Map/Reduce location status updater”.
    org/codehaus/jackson/map/JsonMappingException”

    and the Project explorer on my eclipse doesn’t want to show the HDFS structure…

    it was..
    DFS Locations
    —> localhost
    ——-> (1)
    ————> (failure to login )

    can anybody tell me the solution for that.

  223. Devansh Baghel Said,

    March 9, 2012 @ 4:39 am

    Hi Vlad,

    Thanks for writing such an engaging tutorial.

    I am having problems with running the task tracker and job tracker. I get the below error on the cygwin window. It would be great if you could point me in the right direction to resolve the error.

    12/03/09 16:37:37 INFO mapred.JobTracker: STARTUP_MSG:
    /************************************************************
    STARTUP_MSG: Starting JobTracker
    STARTUP_MSG: host = 01hw407915/172.17.69.81
    STARTUP_MSG: args = []
    STARTUP_MSG: version = 0.20.2
    STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/b ranch-0.20 -r 911707; compiled by ‘chrisdo’ on Fri Feb 19 08:07:34 UTC 2010
    ************************************************************/
    12/03/09 16:37:37 INFO mapred.JobTracker: Scheduler configured with (memSizeForM apSlotOnJT, memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemForRedu ceTasks) (-1, -1, -1, -1)
    12/03/09 16:37:38 FATAL mapred.JobTracker: java.lang.RuntimeException: Not a hos t:port pair: local
    at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:136)
    at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:123)
    at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:1807)
    at org.apache.hadoop.mapred.JobTracker.(JobTracker.java:1579)
    at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:183)
    at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:175)
    at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:3702)

    12/03/09 16:37:38 INFO mapred.JobTracker: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down JobTracker at 01hw407915/172.17.69.81

  224. Devansh Baghel Said,

    March 9, 2012 @ 4:42 am

    Hi Erfan,

    I noticed that you also got the same error ( no 102) as me. Could you let me know how to resolve it?

    Thanks

  225. yerriswamy Said,

    March 19, 2012 @ 2:51 am

    hi vlad,very thanks to you for prividing good installation steps.
    here i followed every step correctly.after executing wordcount example program i got this error.pls help me how to resolve this.
    12/03/19 14:59:32 INFO mapred.FileInputFormat: Total input paths to process : 1
    12/03/19 14:59:32 INFO mapred.JobClient: Running job: job_201203191430_0002
    12/03/19 14:59:33 INFO mapred.JobClient: map 0% reduce 0%
    12/03/19 14:59:38 INFO mapred.JobClient: Task Id : attempt_201203191430_0002_m_000003_0, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    12/03/19 14:59:43 INFO mapred.JobClient: Task Id : attempt_201203191430_0002_m_000003_1, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    12/03/19 14:59:47 INFO mapred.JobClient: Task Id : attempt_201203191430_0002_m_000003_2, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    12/03/19 14:59:55 INFO mapred.JobClient: Task Id : attempt_201203191430_0002_m_000002_0, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    12/03/19 14:59:59 INFO mapred.JobClient: Task Id : attempt_201203191430_0002_m_000002_1, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    12/03/19 15:00:03 INFO mapred.JobClient: Task Id : attempt_201203191430_0002_m_000002_2, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    Exception in thread “main” java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
    at hadoopproject.WordCount.run(WordCount.java:149)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at hadoopproject.WordCount.main(WordCount.java:155)

  226. yuvraj Said,

    March 19, 2012 @ 5:30 am

    Hi Vlad
    really a excellent tutorials,lot of new things i learn but the window is not populating after when i execute run on hadoop in eclipse.and the programe is not executing.Is there any other way from that i can run my test program ?
    please provide any link to develop mini/small projects on hadoop.and help for enhancing my knowledge.

    Regards
    Yuvraj

  227. venkat Said,

    March 20, 2012 @ 7:51 am

    Iam getting the following error when executing the below command ..please help me

    $ bin/hadoop jobtracker
    bin/hadoop: line 258: C:\Program: command not found
    12/03/20 20:19:06 INFO mapred.JobTracker: STARTUP_MSG:
    /************************************************************
    STARTUP_MSG: Starting JobTracker
    STARTUP_MSG: host = dell-PC/106.79.170.69
    STARTUP_MSG: args = []
    STARTUP_MSG: version = 0.20.2
    STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by ‘chrisdo’ on Fri Feb 19 08:07:34 UTC 2010
    ************************************************************/
    12/03/20 20:19:06 INFO mapred.JobTracker: Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1)
    12/03/20 20:19:06 INFO metrics.RpcMetrics: Initializing RPC Metrics with hostName=JobTracker, port=9101
    12/03/20 20:19:07 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
    12/03/20 20:19:07 INFO http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50030
    12/03/20 20:19:07 INFO http.HttpServer: listener.getLocalPort() returned 50030 webServer.getConnectors()[0].getLocalPort() returned 50030
    12/03/20 20:19:07 INFO http.HttpServer: Jetty bound to port 50030
    12/03/20 20:19:07 INFO mortbay.log: jetty-6.1.14
    12/03/20 20:19:07 INFO mortbay.log: Started SelectChannelConnector@0.0.0.0:50030
    12/03/20 20:19:07 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
    12/03/20 20:19:07 INFO mapred.JobTracker: JobTracker up at: 9101
    12/03/20 20:19:07 INFO mapred.JobTracker: JobTracker webserver: 50030
    12/03/20 20:19:08 INFO mapred.JobTracker: Cleaning up the system directory
    12/03/20 20:19:08 INFO mapred.CompletedJobStatusStore: Completed job store is inactive
    12/03/20 20:19:08 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-dell/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)

    at org.apache.hadoop.ipc.Client.call(Client.java:740)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at $Proxy4.addBlock(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy4.addBlock(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2937)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2819)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)

    12/03/20 20:19:08 WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null
    12/03/20 20:19:08 WARN hdfs.DFSClient: Could not get block locations. Source file “/tmp/hadoop-dell/mapred/system/jobtracker.info” – Aborting…
    12/03/20 20:19:08 WARN mapred.JobTracker: Writing to file hdfs://localhost:9100/tmp/hadoop-dell/mapred/system/jobtracker.info failed!
    12/03/20 20:19:08 WARN mapred.JobTracker: FileSystem is not ready yet!
    12/03/20 20:19:08 WARN mapred.JobTracker: Failed to initialize recovery manager.
    org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-dell/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)

  228. Mei Said,

    March 20, 2012 @ 12:55 pm

    Hi Vlad,

    Thanks so much for this detailed tutorial! This is way beyond what I could hope for.

    I followed everything as in the tutorial, including the versions of jdk, eclipse, and hadoop. Now I’m at the very last step, but I got the following error:
    12/03/20 12:47:32 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    12/03/20 12:47:33 INFO mapred.FileInputFormat: Total input paths to process : 4
    12/03/20 12:47:33 INFO mapred.JobClient: Running job: job_201203201247_0001
    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
    at TestDriver.main(TestDriver.java:37)

    The error is so short, so I’m guessing something is totally not right. Can you please help?

    Many many thanks!

  229. Mei Said,

    March 20, 2012 @ 4:06 pm

    Oh, just found what could be the cause, but don’t know how to fix it… Below is the error message from the job tracker window. Apparently the folders in windows uses “\” which is not recognized, but I don’t know how and where to fix it. Can Vlad or someone here help? Thanks thanks!

    12/03/20 12:47:33 ERROR mapred.EagerTaskInitializationListener: Job initialization failed:
    java.util.regex.PatternSyntaxException: Illegal hexadecimal escape sequence near index 45
    localhost_[0-9]+_job_201203201247_0001_corp\xiaomezhu_\QHadoopTest_TestDriver.java-4615229745122367962.jar\E+
    ^

  230. Sushma Said,

    March 30, 2012 @ 5:25 am

    Hi Vlad,
    I am able to connect to the local hdfs. I need to connect through my eclipse plugin to a remote machine(hdfs). I have given the ipaddress and the port correctly but unable to connect.
    Error:Call to /ipaddress:port failed on local exception: java.io.EOFxception

  231. yuvraj Said,

    April 5, 2012 @ 6:22 am

    How i can load csv in Hdfs
    when i load a simple.xlsx using $ bin/hadoop fs -put simple.xlsx In command its gives me unreadable format file (In is a directory in HDFS Eclipse listed under dfs)

  232. Ravisankar Said,

    April 6, 2012 @ 6:02 pm

    Your tutorial is amazing. I deployed Hadoop successfully in my local system. Thanks for your work.

  233. Veerababu K Said,

    April 9, 2012 @ 12:44 am

    I am getting the following exception when I am running
    command:
    veerababuk@4csid046 ~/hadoop-1.0.2
    $ bin/hadoop jobtracker

    java.lang.IllegalArgumentException: Does not contain a valid host:port authority: local

    Can anyone guide me on this issue.

    Thanks,
    Veerababu K

  234. Veerababu K Said,

    April 9, 2012 @ 2:34 am

    Hi,
    I was able to resolve the above issue, but I have got into another issue.

    When I have run command
    cd hadoop-0.19.1
    bin/hadoop tasktracker

    I am getting the following Exception:
    Can not start task tracker because java.io.IOException: Failed to set permissions of path: \tmp\hadoop-veerababuk\mapred\local\ttprivate to 0700

    Can any once guide me on this issue.

    Thanks,
    Veerababu K

  235. bobvar Said,

    April 14, 2012 @ 8:45 pm

    I got the following error messages when I run TestDriver on hadoop:

    12/04/14 23:37:03 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:50040. Already tried 0 time(s).
    12/04/14 23:37:05 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:50040. Already tried 1 time(s).
    12/04/14 23:37:07 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:50040. Already tried 2 time(s).
    12/04/14 23:37:09 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:50040. Already tried 3 time(s).
    12/04/14 23:37:11 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:50040. Already tried 4 time(s).
    12/04/14 23:37:13 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:50040. Already tried 5 time(s).
    12/04/14 23:37:15 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:50040. Already tried 6 time(s).
    12/04/14 23:37:17 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:50040. Already tried 7 time(s).
    12/04/14 23:37:19 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:50040. Already tried 8 time(s).
    12/04/14 23:37:21 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:50040. Already tried 9 time(s).
    Exception in thread “main” java.lang.RuntimeException: java.net.ConnectException: Call to localhost/127.0.0.1:50040 failed on connection exception: java.net.ConnectException: Connection refused: no further information
    at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:323)
    at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:295)
    at TestDriver.main(TestDriver.java:32)
    Caused by: java.net.ConnectException: Call to localhost/127.0.0.1:50040 failed on connection exception: java.net.ConnectException: Connection refused: no further information
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:724)
    at org.apache.hadoop.ipc.Client.call(Client.java:700)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
    at $Proxy0.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:348)
    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:104)
    at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:176)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:75)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1367)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:56)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1379)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:120)
    at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:319)
    … 2 more
    Caused by: java.net.ConnectException: Connection refused: no further information
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
    at sun.nio.ch.SocketAdaptor.connect(Unknown Source)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
    at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:801)
    at org.apache.hadoop.ipc.Client.call(Client.java:686)
    … 14 more

  236. Mahendran Said,

    April 18, 2012 @ 7:52 am

    HI Yuvraj / Vlad,

    I’m getting the following error when I run the job in Cygwin and from the eclipse

    I changed the permissions to on the directory C:/cygwin/tmp/ to 1777, still getting the error. Any clue how to get it fix?

    $ bin/hadoop jar hadoop-0.20.2-examples.jar wordcount /tmp/hadoop-MXP8369/input /tmp/hadoop-MXP8369/output
    12/04/18 10:49:51 INFO input.FileInputFormat: Total input paths to process : 1
    12/04/18 10:49:52 INFO mapred.JobClient: Running job: job_201204181037_0001
    12/04/18 10:49:53 INFO mapred.JobClient: map 0% reduce 0%
    12/04/18 10:50:08 INFO mapred.JobClient: Task Id : attempt_201204181037_0001_m_000002_0, Status : FAILED
    java.io.FileNotFoundException: File C:/cygwin/tmp/hadoop-MXP8369/mapred/local/taskTracker/jobcache/job_201204181037_0001/attempt_201204181037_0001_m_000002_0/work/tmp does not exist.
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
    at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
    at org.apache.hadoop.mapred.Child.main(Child.java:155)

  237. Mahendran Said,

    April 18, 2012 @ 8:57 am

    Hi guys, I’m able to run the hadoop successfully in my local windows environment without any issues.

  238. Yixi Said,

    April 24, 2012 @ 8:10 am

    Hi, I am getting “ERROR namenode.NameNode: java.io.IOException: javax.security.auth.login.LoginException: Login failed: Expect one token as the result of whoami” when I try to run #8 “format namenode”
    Anything I can do about it?

  239. Amit Said,

    April 24, 2012 @ 10:50 pm

    Hi All,

    I am also getting an exception saying
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
    12/04/24 22:50:20 ERROR security.UserGroupInformation: PriviledgedActionException as:NEHA cause:java.io.IOException: File /tmp/hadoop-NEHA/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
    12/04/24 22:50:20 INFO ipc.Server: IPC Server handler 1 on 9100, call addBlock(/tmp/hadoop-NEHA/mapred/system/jobtracker.info, DFSClient_1779551535, null) from 127.0.0.1:52300: error: java.io.IOException: File /tmp/hadoop-NEHA/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
    java.io.IOException: File /tmp/hadoop-NEHA/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1556)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
    at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)

  240. krishna reddy Said,

    April 27, 2012 @ 12:26 am

    Hi guys,
    this was nice tutorial

    when i am starting job tracker in third window(bin/hadoop jobtracker
    ) i am getting
    the following error
    FATAL mapred.JobTracker: java.lang.IllegalArgumentException: D oes not contain a valid host:port authority: local
    at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:162)
    at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128)
    at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:2560)
    at org.apache.hadoop.mapred.JobTracker.(JobTracker.java:2200)
    at org.apache.hadoop.mapred.JobTracker.(JobTracker.java:2192)
    at org.apache.hadoop.mapred.JobTracker.(JobTracker.java:2186)
    at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:300)
    at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:291)
    at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4978)

    i hope u guys suggest me any solution for this excreption as soon as possobile for u
    thanks,
    krishna reddy

  241. Andrei Said,

    May 1, 2012 @ 4:15 pm

    Hi

    The version of Cygwin available from http://www.cygwin.com/ now is different from the one you described the following command does not run as per notes

    ssh-host-config

    consequently from Services and Applications I cannot find “CYGWIN sshd”

    Could you help with the new version plesae

    many thanks

    Andrei

  242. FlanCay Said,

    May 1, 2012 @ 5:47 pm

    ssh: connect to host localhost port 22: Connection refused

  243. Div Said,

    May 8, 2012 @ 2:50 am

    Nice tutorial!!!!
    I am getting this error :

    Problems opening perspective “org.apache.hadoop.eclipse.Perspective”

    The versiona used are :
    jdk 1.6
    Eclipse europa 3.3
    hadoop 0.19.1

  244. Emma Said,

    May 14, 2012 @ 12:35 pm

    I have the same problem as in comment 128: “I’m stuck in step 3 – setup ssh daemon” (By Troy)
    Can someone suggest what should i do in order to pass this step?

    Thanks!!!

  245. Ram Said,

    May 23, 2012 @ 5:54 am

    I got a problem in eclipse plugin.
    Once I configure the MAP-Reduce to Eclipse location

    Location Name — localhost

    Map/Reduce Master
    Host — localhost
    Port — 9101

    DFS Master
    Check “Use M/R Master Host”
    Port — 9100
    User name — RAM

    After clicking the Localhost below is the error.
    —————————————————-
    Cannot connect to the Map/Reduce location:localhost.
    Failed to get the current user’s information.
    —————————————————-

    Please help me to solve the isssue, thank you!

  246. eddy Said,

    May 23, 2012 @ 8:14 am

    hi… while selecting packages in cygwin installation i’m unable to select open ssh.. there’s no box to check it… it says n/a instead of a box to select it…. can u help me as soon as possible

  247. Ram Said,

    May 23, 2012 @ 12:24 pm

    Hi Vlad,
    I foolowed as per your suggestions for the last before step, set up new Map-Reduce Location in Eclips for the following details. I got the Error.

    Location Name — localhost
    Map/Reduce Master
    Host — localhost
    Port — 9101
    DFS Master
    Check “Use M/R Master Host”
    Port — 9100
    User name — User

    Below the Error I am geeting Please respond me ASAP.

    Cannot connect to the Map/Reduce location: localhost
    Failed to get the current user’s information.

    Thanks,
    Ram

  248. R.Padmapriya Said,

    May 28, 2012 @ 1:00 am

    Sir,

    Thanks for helping me in using Eclipse to execute the map reduce job in hadoop.

    ur work is really speakable.

  249. VahidSanei Said,

    June 4, 2012 @ 12:42 am

    Hi, I added WordCount code on eclipse
    But I`m getting error:
    Exception in thread “main” Exception in thread “main” java.lang.ExceptionInInitializerError
    at WordCount.main(WordCount.java:16)
    Caused by: org.apache.commons.logging.LogConfigurationException: User-specified log class ‘org.apache.commons.logging.impl.Log4JLogger’ cannot be found or is not useable.
    at org.apache.commons.logging.impl.LogFactoryImpl.discoverLogImplementation(LogFactoryImpl.java:874)
    at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:604)
    at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:336)
    at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:704)
    at org.apache.hadoop.mapred.JobClient.(JobClient.java:153)
    … 1 more
    what should i do ?
    Thanks in advance

  250. Indra vikas Said,

    June 7, 2012 @ 1:07 am

    There is no folder of name eclipse-plugin inside hadoop-1.0.3/contrib. Please help me on this.

  251. hannan Said,

    June 7, 2012 @ 7:01 am

    I tried some computers,when I input “ssh localhost”there is no actions,I don’t konw why.
    this problem disturbs me two days…..

  252. Raj Said,

    June 19, 2012 @ 1:16 pm

    Cant seem to find the hadoop-site.xml …plus I have a qn – do we need hadoop installed as well as hbase?
    Thanks!

  253. Raj Said,

    June 19, 2012 @ 1:41 pm

    I get this error when executing “$ bin/hadoop namenode -format”

    srajabha@NY180W15T3500C ~/hadoop-0.20.2
    $ bin/hadoop namenode -format
    bin/hadoop: line 2: $’\r’: command not found
    bin/hadoop: line 17: $’\r’: command not found
    bin/hadoop: line 18: $’\r’: command not found
    bin/hadoop: line 43: $’\r’: command not found
    : No such file or directoryn
    bin/hadoop: line 46: $’\r’: command not found
    : No such file or directorysrajabha/hadoop-0.20.2
    bin/hadoop: line 48: $’\r’: command not found
    bin/hadoop: line 50: syntax error near unexpected token `$’in\r”
    ‘in/hadoop: line 50: `case “`uname`” in

  254. alex Said,

    June 27, 2012 @ 12:50 am

    I downloaded the version hadoop-0.20.204.0
    everthing works fine till I get too “start local hadoop cluster” section.

    I then try and run the command
    bin/haoop jobtracker no such file or directory. I tried this for datanode and tasktracker and got the same result. I am wondering If I downloaded and installed the right package. Please could you explain where I went wrong. I am using windows 7 if that helps.

  255. Sébastien Bouffard Said,

    July 15, 2012 @ 9:19 am

    Hello all,

    Does this tutorial written 4 years ago still work with more recent versions of hadoop? I run into problems when trying to start the JobTracker with version 1.0.3 which is the stable release as of now.

    FATAL mapred.JobTracker: java.lang.IllegalArgumentException: Does not contain a valid host:port authority: local
    at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:162)
    at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128)
    at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:2560)
    at org.apache.hadoop.mapred.JobTracker.(JobTracker.java:2200)
    at org.apache.hadoop.mapred.JobTracker.(JobTracker.java:2192)
    at org.apache.hadoop.mapred.JobTracker.(JobTracker.java:2186)
    at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:300)
    at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:291)
    at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4978)

  256. Sébastien Bouffard Said,

    July 15, 2012 @ 9:37 am

    Actually, it’s a great tutorial but it definitely needs some updating for newer versions of hadoop, especially at step 7 where the config is now broken down into several files.

    Thanks anyway, I’ll do another attempt at getting this to run on Windows when I get a later more free time on my side.

  257. MHF Said,

    July 17, 2012 @ 1:35 am

    Thanks for your tutorial! I’m installing Hadoop step by step. I want to know for installing Cygwin openssh is in which category. I used the link of Cygwin in tutorial.
    thank you

  258. rashmi Said,

    July 18, 2012 @ 10:31 pm

    Hi,

    I had installed hadoop stable version successfully. but confused while installing hadoop -2.0.0 version.

    I want to install hadoop-2.0.0-alpha on two nodes, using federation on both machines. “rsi-1″, ‘rsi-2″ are hostnames.

    what should be values of below properties for implementation of federation. Both machines are also used for datanodes too.

    fs.defaulFS
    dfs.federation.nameservices
    dfs.namenode.name.dir
    dfs.datanode.data.dir
    yarn.nodemanager.localizer.address
    yarn.resourcemanager.resource-tracker.address
    yarn.resourcemanager.scheduler.address
    yarn.resourcemanager.address

    One more point, in stable version of hadoop i have configuration files under conf folder in installation directory.

    But in 2.0.0-aplha version, there is etc/hadoop directory and it doesnt have mapred-site.xml, hadoop-env.sh. do i need to copy conf folder under share folder into hadoop-home directory? or do i need to copy these files from share folder into etc/hadoop directory?

    Regards,
    Rashmi

  259. MHF Said,

    July 22, 2012 @ 2:47 am

    hi vlad,
    I followed every step and after executing TestDriver I got this error.pls help me how to resolve this.thank you

    12/07/22 14:08:06 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    12/07/22 14:08:07 INFO mapred.FileInputFormat: Total input paths to process : 4
    12/07/22 14:08:07 INFO mapred.JobClient: Running job: job_201207221120_0014
    12/07/22 14:08:08 INFO mapred.JobClient: map 0% reduce 0%
    12/07/22 14:08:16 INFO mapred.JobClient: Task Id : attempt_201207221120_0014_m_000007_0, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    12/07/22 14:08:20 INFO mapred.JobClient: Task Id : attempt_201207221120_0014_m_000007_1, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    12/07/22 14:08:23 INFO mapred.JobClient: Task Id : attempt_201207221120_0014_m_000007_2, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    12/07/22 14:08:31 INFO mapred.JobClient: Task Id : attempt_201207221120_0014_m_000006_0, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    12/07/22 14:08:35 INFO mapred.JobClient: Task Id : attempt_201207221120_0014_m_000006_1, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    12/07/22 14:08:38 INFO mapred.JobClient: Task Id : attempt_201207221120_0014_m_000006_2, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
    at TestDriver.main(TestDriver.java:41)

  260. Suresh S Said,

    August 1, 2012 @ 10:12 pm

    I followed the steps properly.

    But the following command not working…
    “explorer .”

    It give the following error.
    “bash: explorer: command not found”

  261. awei Said,

    August 3, 2012 @ 2:06 am

    I have a problem…win7+hadoop1.0.3+eclipse-jee-helios-SR2-win32
    When i run wordcount,I get the following error
    12/08/03 16:51:04 INFO input.FileInputFormat: Total input paths to process : 4
    12/08/03 16:51:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
    12/08/03 16:51:04 WARN snappy.LoadSnappy: Snappy native library not loaded
    12/08/03 16:51:05 INFO mapred.JobClient: Running job: job_201208031641_0004
    12/08/03 16:51:06 INFO mapred.JobClient: map 0% reduce 0%
    12/08/03 16:51:13 INFO mapred.JobClient: Task Id : attempt_201208031641_0004_m_000005_0, Status : FAILED
    java.lang.Throwable: Child Error
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
    Caused by: java.io.IOException: Task process exit with nonzero status of -1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)

    12/08/03 16:51:13 WARN mapred.JobClient: Error reading task outputhttp://awei:50060/tasklog?plaintext=true&attemptid=attempt_201208031641_0004_m_000005_0&filter=stdout
    12/08/03 16:51:13 WARN mapred.JobClient: Error reading task outputhttp://awei:50060/tasklog?plaintext=true&attemptid=attempt_201208031641_0004_m_000005_0&filter=stderr
    12/08/03 16:51:19 INFO mapred.JobClient: Task Id : attempt_201208031641_0004_m_000005_1, Status : FAILED
    java.lang.Throwable: Child Error
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
    Caused by: java.io.IOException: Task process exit with nonzero status of -1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)

  262. EE Said,

    August 10, 2012 @ 5:10 pm

    Hi,
    Thanks for the tutorial!
    I can’t browse the DFS in eclipse, I get the error: Error: Call to localhost/127.0.0.1:9100 failed on local exception: java.io.EOFException
    HDFS is running and the text files are there.
    I am using eclipse Europa and Hadoop 0.20.2 cloudera cdh3u1.
    Why do you set the username to ‘User’?

  263. Kannibala Said,

    August 16, 2012 @ 3:46 am

    Hi Vlad,

    Thanks for the tutorial. I setup my environment exactly as you had specified in the tutorial. However when I run my project from eclipse (by selecting run on hadoop option), nothing happens and it fails silently. It doesn’t give any error. What could be the issue ? reply me

    Regards,
    kannibala

  264. Pavan Said,

    September 9, 2012 @ 9:34 am

    Hi all,

    I’m stuck with following permissions related error while running in windows7.

    java.io.IOException: Failed to set permissions of path: \tmp\hadoop-pavan\mapred\local\ttprivate to 0700

    Any workaround or patch to resolve this issue ? I tried running from cygwin & eclipse, still same issue.

    Thanks,
    Pavan

  265. Tathagat Said,

    September 12, 2012 @ 1:28 am

    Hi Vlad,

    I came across this tutorial a bit late and have already set up hadoop following the procedure mentioned in the apache site. I am using a windows 7 system with cygwin. However i am facing an issue. If possible can you please suggest me some way out for this. The issue is as that i am able to start hadoop. the link of 50070 also opens for me but the number of live nodes is showing as zero. Also when i do hadoop fs -ls i get the below:

    -rw-r–r– 2 localhost supergroup 107 2012-09-07 18:29 /user/User

    I cannot use mkdir because it tries to create a directory inside user/User which is itself a file. Can you please suggest something.

  266. Vin Said,

    September 21, 2012 @ 10:32 am

    Hello, How do I get the cygwin setup.eve version 2.573.2.2
    The latest version that I downloaded does not have openssh package that I can select for install

    Thanks,
    Vin

  267. mc Said,

    October 3, 2012 @ 9:51 am

    Thank you for the tutorial!
    I have fulfilled to all the steps, at less of the last step; when I try to “run as hadoop” the execution do not start. Can some one give me an hand to resolve this problem?

    Thank you so much,
    Best regards

  268. Paresh Shah Said,

    October 3, 2012 @ 11:50 am

    Can anyone please clarify why the following is explicitly stated


    Win32 is supported as a development platform. Distributed operation has not been well tested on Win32, so it is not supported as a production platform.

    Does the above mean that there are problems running a Hadoop cluster on Windows. I was interested in knowing if there folks out there who have actually deployed Hadoop cluster on Windows and what kinds of issues have they faced.

    thanks

  269. Luis Said,

    October 7, 2012 @ 12:35 am

    Hello,
    Thanks for the tutorial!! I was trying understand how hadoop works and with this tutorial I achieved it…I had some problems but whit the passing of the time I will be able to understand more of hadoop!!!

  270. parmesh Said,

    October 7, 2012 @ 10:20 am

    Thank you for the tutorial!
    i got the following problem

    $ ssh localhost yes
    ssh: connect to host localhost port 22: Connection refused

  271. vlad Said,

    October 7, 2012 @ 10:22 am

    Most likely your ssh service is not running.
    Your process list to see if there is an ‘sshd’ process.
    If it is running check your firewall settings ( try to disable it first completely ).

  272. Balaswamy Said,

    October 10, 2012 @ 2:47 am

    really good one

  273. Suresh V Said,

    October 10, 2012 @ 4:56 am

    Can you make another tutorial for the latest version of hadoop please ?

  274. vaibhav Said,

    October 12, 2012 @ 12:18 am

    Hello,vlad,your tutorial is very helpful.
    Only one problem in step:11-Setup Hadoop Location in Eclipse.
    At the step 6, in the project explorer tab on the left side of the eclipse window, i have found the DFS location. clink the “+” icon. There has a folder named (1). When i keep opening, the following folder is not “tmp(1)”, but a “Error:call to localhost/127.0.0.1:9000 failed on connection exception:java.net.ConnectException: Connection refused: no further information”.
    I think my environment variable of cygwin is right.
    so, I don’t know what’s wrong with it?
    thanks

  275. vaibhav Said,

    October 12, 2012 @ 12:19 am

    Hello,vlad,your tutorial is very helpful.
    Only one problem in step:11-Setup Hadoop Location in Eclipse.
    At the step 6, in the project explorer tab on the left side of the eclipse window, i have found the DFS location. clink the “+” icon. There has a folder named (1). When i keep opening, the following folder is not “tmp(1)”, but a “Error:call to localhost/127.0.0.1:9000 failed on connection exception:java.net.ConnectException: Connection refused: no further information”.
    so, I don’t know what’s wrong with it?
    thanks

  276. Nelson Said,

    October 14, 2012 @ 11:07 pm

    Thank you for the tutorial!

  277. Sagar Deep Singh Said,

    October 19, 2012 @ 1:23 pm

    Hey very nice tutorial.
    I am trying on my system, but when I run ssh-host-config, I am getting error msg csih-0.9.6 requires WinNT or above. I m using windowXP. Plz let know is this works for winXP.

  278. Mouthgalya Ganapathy Said,

    October 27, 2012 @ 7:44 pm

    Hi Vlad..
    Big Thanks for the tutorials!!! It was of get help…
    I could setup everything without any errors. In the end when i run the TestDriver class my output console, I get the following error in the cygwin windows

    12/10/27 22:07:14 ERROR mapred.EagerTaskInitializationListener: Job initializati on failed:
    java.util.regex.PatternSyntaxException: Illegal/unsupported escape sequence near index 45
    localhost_[0-9]+_job_201210272120_0001_infor\mganapathy_\QHadoop Test_TestDriver .java-631490389041719967.jar\E+
    ^
    at java.util.regex.Pattern.error(Pattern.java:1924)
    at java.util.regex.Pattern.escape(Pattern.java:2416)
    at java.util.regex.Pattern.atom(Pattern.java:2164)
    at java.util.regex.Pattern.sequence(Pattern.java:2097)
    at java.util.regex.Pattern.expr(Pattern.java:1964)
    at java.util.regex.Pattern.compile(Pattern.java:1665)
    at java.util.regex.Pattern.(Pattern.java:1337)
    at java.util.regex.Pattern.compile(Pattern.java:1022)
    at org.apache.hadoop.mapred.JobHistory$JobInfo.getJobHistoryFileName(Job History.java:638)
    at org.apache.hadoop.mapred.JobHistory$JobInfo.logSubmitted(JobHistory.j ava:803)
    at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:3 60)
    at org.apache.hadoop.mapred.EagerTaskInitializationListener$JobInitThrea d.run(EagerTaskInitializationListener.java:55)
    at java.lang.Thread.run(Thread.java:722)

    Exception in thread “initJobs” java.util.regex.PatternSyntaxException: Illegal/u nsupported escape sequence near index 45
    localhost_[0-9]+_job_201210272120_0001_infor\mganapathy_\QHadoop Test_TestDriver .java-631490389041719967.jar\E+
    ^
    at java.util.regex.Pattern.error(Pattern.java:1924)
    at java.util.regex.Pattern.escape(Pattern.java:2416)
    at java.util.regex.Pattern.atom(Pattern.java:2164)
    at java.util.regex.Pattern.sequence(Pattern.java:2097)
    at java.util.regex.Pattern.expr(Pattern.java:1964)
    at java.util.regex.Pattern.compile(Pattern.java:1665)
    at java.util.regex.Pattern.(Pattern.java:1337)
    at java.util.regex.Pattern.compile(Pattern.java:1022)
    at org.apache.hadoop.mapred.JobHistory$JobInfo.getJobHistoryFileName(Job History.java:638)
    at org.apache.hadoop.mapred.JobHistory$JobInfo.finalizeRecovery(JobHisto ry.java:746)
    at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:1549)
    at org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.j ava:2320)
    at org.apache.hadoop.mapred.JobInProgress.terminateJob(JobInProgress.jav a:2004)
    at org.apache.hadoop.mapred.JobInProgress.terminate(JobInProgress.java:2 019)
    at org.apache.hadoop.mapred.JobInProgress.fail(JobInProgress.java:2095)
    at org.apache.hadoop.mapred.EagerTaskInitializationListener$JobInitThrea d.run(EagerTaskInitializationListener.java:62)
    at java.lang.Thread.run(Thread.java:722)

    I think because of this the job in the output console doesnt go to completion…It kind of hangs with the following output..ANY IDEAS TO RESOLVE THIS???? CAN I GIVE THE USER NAME mganapathy in Map/Reduce locations or shld it be only “User”??

    12/10/27 22:25:02 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    12/10/27 22:25:02 INFO mapred.FileInputFormat: Total input paths to process : 4
    12/10/27 22:25:03 INFO mapred.JobClient: Running job: job_201210272120_0004
    12/10/27 22:25:04 INFO mapred.JobClient: map 0% reduce 0%

  279. Kiran Said,

    November 5, 2012 @ 12:20 pm

    i am using the hadoop-1.0.4 version but eclipse plug-in is missing in this hadoop version. can you please tell us how to create a eclipse plug-in JAR from source code from hadoop-1.0.4\src\contrib\eclipse-plugin

  280. Kiran Panga Said,

    November 8, 2012 @ 11:45 am

    eclipse-plugin is not available with hadoop1.0.4 version. can you please tell us how to create the eclipse plugin JAR from source which is available in hadoop1.04 version?

  281. shretha Said,

    November 11, 2012 @ 6:19 am

    thnx for the tutorial
    but i am having trouble running cygwin on win 7
    its installed as told in the tutorial but when i write the first command
    ” ssh-host-config ”
    its saying command not found

  282. PKS Said,

    November 21, 2012 @ 4:19 pm

    When I run the command on window 3,4 and 5, it says that job tracker, datanode and task tracker.

  283. Mahi Said,

    November 22, 2012 @ 11:38 am

    Thnx for the tutorial. It’s very clear. I’m able to set up the environment in just few hours :)

  284. tushar sarde Said,

    December 1, 2012 @ 4:03 am

    ssh: connect to host localhost port 22: Connection refused

    help me please .. i got this error after key authorized_keys step

  285. Rahul Said,

    December 7, 2012 @ 3:58 am

    You have mentioned 3 ‘openssh’ but i found only 2 ie.

    1) the OpenSSh server and client programs(under debug default)
    2)debug info for openssh (under net default)

  286. Aniket Said,

    December 9, 2012 @ 1:00 pm

    I could not find the C:/cygwin directory..help please..

  287. Rahul Said,

    December 12, 2012 @ 7:20 am

    @Aniket you might not have installed in this directory. Try to install again

  288. Rahul Said,

    December 12, 2012 @ 7:22 am

    The error I get is JAVA_HOME not set after executing step no8. What to do?

  289. David Harris Said,

    January 3, 2013 @ 12:08 pm

    Thank you for going though the effort of putting this tutorial up, it helped me out alot.

  290. Karthik Said,

    January 9, 2013 @ 9:39 am

    Hi vlad
    I followed your tutorial and everything seemed to be working fine. But while running I got the following error. Any help would be greatly appreciated.

    13/01/09 11:32:52 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    13/01/09 11:32:53 INFO mapred.FileInputFormat: Total input paths to process : 4
    13/01/09 11:32:54 INFO mapred.JobClient: Running job: job_201301091108_0005
    13/01/09 11:32:55 INFO mapred.JobClient: map 0% reduce 0%
    13/01/09 11:33:08 INFO mapred.JobClient: Task Id : attempt_201301091108_0005_m_000006_0, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    13/01/09 11:33:15 INFO mapred.JobClient: Task Id : attempt_201301091108_0005_m_000006_1, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    13/01/09 11:33:21 INFO mapred.JobClient: Task Id : attempt_201301091108_0005_m_000006_2, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    13/01/09 11:33:35 INFO mapred.JobClient: Task Id : attempt_201301091108_0005_m_000005_0, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    13/01/09 11:33:41 INFO mapred.JobClient: Task Id : attempt_201301091108_0005_m_000005_1, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    13/01/09 11:33:48 INFO mapred.JobClient: Task Id : attempt_201301091108_0005_m_000005_2, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
    at Test.main(Test.java:41)

    Thanks
    Karthik

  291. Sachin Said,

    January 13, 2013 @ 12:13 am

    Thank you for such a nice tutorial!!I m new to hadoop can you please suggest some tutorial for learning hadoop!

  292. kalyani Said,

    January 22, 2013 @ 2:50 am

    hi,
    i have been trying 2 setup hadoop single node cluster on windows. but m getting the error as : “WARN fs.FileSystem: “localhost:9100″ is a deprecated filesystem name. Use “hdfs://localhost:9100/” instead.”, when tried for sample checking with command :
    “bin/hadoop dfs -mkdir input “, i.e., simple creation of input directory.
    Can any1 suggest the solution?

  293. Somayeh Said,

    January 28, 2013 @ 2:46 am

    thank for your perfect tutorial. i have installed cygwin, java and Eclipse. but in the tutorial you’ve said to run the command explorer . ,when i write it and press the enter key this message is shown:
    hamyari 01@hamyari-01 ~
    $ explorer .
    -bash: explorer: command not found

    hamyari 01@hamyari-01 ~
    $
    please help me to solve this problem.
    thank you so much

  294. vlad Said,

    January 28, 2013 @ 8:08 am

    Hi,

    Check your PATH variable using the command below

    echo $PATH

    Make sure that WINDOWS/System32 is in your path…

  295. samba Said,

    January 29, 2013 @ 8:16 am

    hi,
    Am unable to configure hadoop location in eclipse.
    could u pls help me in configuring the same.
    Since I installed Windows HDInsight my port nos are different. Where exactly i could see the port number.
    is it in core-site.xml or hdfs-site.xml or mapred-site.xml file in conf folder?

  296. Somayeh Said,

    February 2, 2013 @ 5:07 am

    hi
    thanks for your kindness. as you said i used the command below:
    echo $PATH
    but the WINDOWS/System32 root was not on the path.now, how can i manage this problem?
    best regards

  297. Somayeh Said,

    February 2, 2013 @ 5:23 am

    hi
    another question?
    to install cygwin is it necessary to be connected to the internet?

  298. JaanVivek Said,

    February 13, 2013 @ 11:32 pm

    Thanks for your blog.

    while setting up ssh daemon I am facing problem.

    Here is snippet of problem lines.

    admin@admin-PC ~/.ssh
    $ ssh localhost
    ssh:connect to host localhost port 22:Connection refused.

    I have followed all step prior to this as mentioned in blog.

    Please suggest in this case.

    Thanks.
    JaanVivek

  299. Valentina Said,

    February 17, 2013 @ 11:15 am

    Hi!
    I cannot start CYGWIN sshd service.. “CYGWIN sshd” was started and then stopped automatically..
    Help please :)

  300. jaanvivek Said,

    February 18, 2013 @ 4:54 am

    I am having issue with CYGWIN installation on windows 7 ultimate 32 bit.

    $ ssh localhost
    Coonection closed by ::1

    Could anyone help in this.

  301. Marwa Said,

    February 19, 2013 @ 1:36 pm

    Thanks, for this tutorial.I follow you step by step But now I’ve the following error in the step configure ssh when I wrote ssh localhost >>cygwin wrote Warning:Permanently added ‘localhost’to the list of known hosts. Connection closed by 127.0.0.1 how I can make the connection successful
    please help me

  302. Avinash Said,

    February 20, 2013 @ 7:37 am

    Thanks for this nice blog i executed those step and hadoop run on windows without any problem can u enlighten us how to install hive on this hadoop cluster of windows

  303. JaanVivek Said,

    February 25, 2013 @ 3:00 am

    I am having issue while adding Map\Reduce localtion in eclipse.

    I followed the steps but got error at the step
    Step-6.In the Project Explorer tab on the lefthand side of the eclipse window, find the DFS Locations item. Open it up using the “+” icon on the left side of it, inside of it you should see the localhost location reference with the blue elephant icon. Keep opening up the items below it until you see something like the image below

    Error-”Call to localhost 127.0.0.1:9100 cfailed connection exception”

    Anyone ahs any idea baout it.

    I installed it on Windows 7 32 bit, CYGWIN

    Thanks.

  304. sachin Said,

    February 26, 2013 @ 4:39 am

    sir, after entering ssh localhost i am getting error as:
    SACHIN@SACHIN-PC ~/.ssh
    $ ssh localhost
    ssh: connect to host localhost port 22: Connection refused

    please tell me alternative.

  305. vlad Said,

    February 26, 2013 @ 9:20 am

    The most common reason for such error is that your SSH service is not running. Check your services tab in the contorol panel to verify that is indeed running. Also use ‘netstat -na’ to verify that your SSH server is indeed listening on the port 22..

  306. vlad Said,

    February 26, 2013 @ 9:22 am

    Most likelely your hadoop version or Eclipse version is too new.. this is pretty old tutorial… things have changed a lot since that. According to my knowledge I don’t believe there is an up-to-date plugin for the current versions of Eclispe…

  307. JaanVivek Said,

    February 26, 2013 @ 1:11 pm

    Thanks for your reply. I am getting error on CYGWIN

    $ ssh localhost
    Coonection closed by ::1

    Could anyone help in this.

    **CYGWIN sshd is running.

  308. JaanVivek Said,

    March 4, 2013 @ 1:53 pm

    I am facing issues while “upload data” as mentioned in blog
    here is script snippet-

    vivek@lenovo-963b3048 ~/hadoop-0.19.1
    $ bin/hadoop fs -mkdir In
    bin/hadoop: line 243: C:\Program: command not found
    mkdir: cannot create directory In: File exists

    I have tried on google but I am not getting how to remove space while setting JAVA_HOME=c:\Program Files\javjre…

    Please help.

    Thanks
    JaanVivek

  309. Vlad Said,

    March 5, 2013 @ 9:42 am

    Vivek,

    The reason you seeing this is because your JDK is intalled into C:\Program Files. A lot of Java software doesn’t like it. The easiest way to fix this is to install JDK into C:\Java. You can pretty much copy it, to that location and set your JAVA_HOME to be C:\Java

  310. Micheal Said,

    March 6, 2013 @ 11:07 pm

    Hi,Vlad

    Hadoop have many released packages since hadoop-0.19.1.Also,Eclipse also have released many new versions since 3.3.2.So, Is there any new combinations that works OK on windows by now. For example, Is Hadoop-0.20.2 and Eclipse3.5 works well now? Would you please so kind to list all the worked combinations for us to choose and reference?

    Micheal

  311. Anuradha Dharapuram Said,

    March 11, 2013 @ 10:54 pm

    Hi,

    When I try to install cygwin, I do not see the option to select for Openssh as mentioned in your instruction.

    Please let me know what needs to be done.

    Thanks,
    Anuradha

  312. Prashant Said,

    March 12, 2013 @ 6:31 am

    Thanks for this tutorial.it is very nice tutorial i have follow this tutorial steps and i have done my hadoop setup on my system for the last step and i have created test project and that output is also same as per this tutorial.

  313. Mohammad Asad Ansari Said,

    March 17, 2013 @ 10:56 pm

    I have implemented 13 steps in eclipse , successful and run the application successfully, but I am unable to run any example , can you help me how can I implement in this an example.

  314. divya Said,

    March 17, 2013 @ 11:40 pm

    how to establish localhost connection???

  315. Shibaji Said,

    March 19, 2013 @ 11:29 pm

    Hi ,
    I am getting one error when trying to start the server.
    some thing like this “13/03/20 11:42:31 ERROR namenode.NameNode: java.io.IOException: Unexpected version of the file system log file: -32. Current version = -18.”

    Total Stack Trace
    ——————-
    13/03/20 11:42:31 INFO metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=9100
    13/03/20 11:42:31 INFO namenode.NameNode: Namenode up at: 127.0.0.1/127.0.0.1:9100
    13/03/20 11:42:31 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null
    13/03/20 11:42:31 INFO metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
    13/03/20 11:42:31 INFO namenode.FSNamesystem: fsOwner=shisanya\shibaji,None,root,Administrators,Users,SDB,Operators
    13/03/20 11:42:31 INFO namenode.FSNamesystem: supergroup=supergroup
    13/03/20 11:42:31 INFO namenode.FSNamesystem: isPermissionEnabled=true
    13/03/20 11:42:31 INFO metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
    13/03/20 11:42:31 INFO namenode.FSNamesystem: Registered FSNamesystemStatusMBean
    13/03/20 11:42:31 INFO common.Storage: Number of files = 1
    13/03/20 11:42:31 INFO common.Storage: Number of files under construction = 0
    13/03/20 11:42:31 INFO common.Storage: Image file of size 113 loaded in 0 seconds.
    13/03/20 11:42:31 ERROR namenode.FSNamesystem: FSNamesystem initialization failed.
    java.io.IOException: Unexpected version of the file system log file: -32. Current version = -18.
    at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:508)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:973)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:793)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:352)
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:309)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:288)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:163)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:208)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:194)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:859)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:868)
    13/03/20 11:42:31 INFO ipc.Server: Stopping server on 9100
    13/03/20 11:42:31 ERROR namenode.NameNode: java.io.IOException: Unexpected version of the file system log file: -32. Current version = -18.
    at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:508)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:973)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:793)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:352)
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:309)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:288)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:163)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:208)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:194)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:859)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:868)

    13/03/20 11:42:31 INFO namenode.NameNode: SHUTDOWN_MSG:

    Can any one help me out…
    Thanks a lot in advance…. :-)

  316. keerthi Said,

    March 25, 2013 @ 3:55 am

    i m facing problems in ssh localhost..sir. i realy dont know how to establish it??..please help me sir.

  317. Hardik Shah Said,

    March 30, 2013 @ 5:44 pm

    I am getting error while starting a cygwin sshd service that
    this service has been started and stopped automatically.
    What to do?

  318. Hardik Shah Said,

    March 30, 2013 @ 5:51 pm

    sometimes it is getting started
    but then also getting error on ssh-localhost that
    connect to host localhost port 22: Connection refused

  319. robin Said,

    April 8, 2013 @ 4:42 am

    i’m using hadoop-1.0.4

    when i give the command
    “$ bin/hadoop namenode -format”
    i’m getting the error below… please help me

    “/usr/local/hadoop-1.0.4/libexec/../conf/hadoop-env.sh: line 9: export: `FilesJavajdk1.7.0_13′: not a valid identifier
    bin/hadoop: line 320: C:Program/bin/java: No such file or directory
    bin/hadoop: line 390: C:Program/bin/java: No such file or directory
    bin/hadoop: line 390: exec: C:Program/bin/java: cannot execute: No such file or directory”

  320. Nidhi Said,

    April 14, 2013 @ 1:35 am

    hello Vlad Korolev;

    thanks for excellent tutorial..
    What is the case if i am using proxy? is there any problem for running that or it can be still work..

    i can’nt connect with the localhost…why?

  321. Piyush Srivastava Said,

    April 16, 2013 @ 5:32 am

    i’m using hadoop-1.0.4
    when i give the command
    scp ~/.ssh/id_dsa.pub hadoop@hadoop-pc:~/.ssh/master-key.pub

    i’m getting the error below… please help me

    ssh: connect to host ruchi-pc port 22: Connection timed out
    lost connection

  322. koyison Said,

    April 17, 2013 @ 6:54 pm

    Thanks for a Great tutorial.
    am using hadoop-0.20.2 with Eclipse classic 4.4.2

    DSN Loation error.. I cant load the files am getting th following error[Call to localhost127.0.0.1 failed on connection exception:java.net.connectException:Connection refused:no further information]

    Also when am trying to ssh localhost:
    $ ssh localhost
    Connection closed by ::1
    Please Help…deny_tonny_83@yahoo.com

  323. Siddharth Said,

    April 18, 2013 @ 5:52 pm

    Hi Vlad,

    Great Tutorial for beginners like me.

    But I get stuck when i try to format namenode for hdfs.

    When I enter the following command
    “bin/hadoop namenode -format”

    I get the following error:

    cygwin warning:
    MS-DOS style path detected: C:\cygwin\hadoop-0.20.2\/build/native
    Preferred POSIX equivalent is: /hadoop-0.20.2/build/native
    CYGWIN environment variable option “nodosfilewarning” turns off this warning.
    Consult the user’s guide for more details about POSIX paths:
    http://cygwin.com/cygwin-ug-net/using.html#using-pathnames
    bin/hadoop: line 258: C:\Java\jdk1.7.0_13/bin/java: No such file or directory
    bin/hadoop: line 289: C:\Java\jdk1.7.0_13/bin/java: No such file or directory
    bin/hadoop: line 289: exec: C:\Java\jdk1.7.0_13/bin/java: cannot execute: No such file or directory

    Is there a problem with my environment variables or something?

    Please help.

    Thanks

  324. Aravinth Said,

    April 23, 2013 @ 6:43 am

    Hi,
    I tried to install Cygwin and tried to run the ssh-host-config command in terminal. But I am getting below error,

    Error Log:
    /usr/bin/ssh-host-config: line 49: /usr/share/csih/cygwin-service-installation-helper.sh: No such file or directory

    /usr/bin/ssh-host-config: line 667: csih_make_dir: command not found
    /usr/bin/ssh-host-config: line 680: csih_make_dir: command not found
    /usr/bin/ssh-host-config: line 710: csih_make_dir: command not found
    /usr/bin/ssh-host-config: line 713: csih_warning: command not found
    /usr/bin/ssh-host-config: line 718: csih_warning: command not found
    /usr/bin/ssh-host-config: line 726: csih_install_config: command not found
    /usr/bin/ssh-host-config: line 738: csih_install_config: command not found
    /usr/bin/ssh-host-config: line 282: csih_install_config: command not found

    Do you want to install sshd as a service?
    /usr/bin/ssh-host-config: line 432: csih_request: command not found

    /usr/bin/ssh-host-config: line 754: csih_warning: command not found
    /usr/bin/ssh-host-config: line 755: csih_warning: command not found
    /usr/bin/ssh-host-config: line 756: csih_warning: command not found

    I looked into few forums and tried to install csih and difffutils, but still i am getting below error?
    I would really really appreciate help on this ? Thanks in advance!

  325. Shrikant Said,

    April 28, 2013 @ 8:32 am

    Hello,

    You have explained so nicely on how to install Hadoop on windows with eclipse. The tutorial is arranged in stepwise manner with good explanation & screenshots.

    As I am beginner to Hadoop & currently working on it this tutorial helped me a lot. I implemented all the steps in your tutorial & its working.

    But after the last step when we create & run a test project, what is the final output that I should get? I got the console window in eclipse similar to yours. But your tutorial does not give complete console window. So how can I confirm that I get the correct output or not ?

    It will be very helpful if you provide complete screen-shot of the final output.

    Awaiting for reply.

    Thanks & Regards.

  326. sushant Said,

    April 30, 2013 @ 6:06 am

    I tried to connect to hadoop-1.0.4(Ubuntu12.10) from eclipse jono.During configuartion of localhost in mapreduce locations I get the folloing error:
    org/codehaus/jackson/map/JsonMappingException

    I have configured core-site.xml port as 9000 and mapred-site.xml as 9001. Can I mention 9010 as you have mentioned while configuring DFS-location in eclipse.

    How to add jar files to eclipse Classpath. Do we configure classpath in eclipse. Is HADOOP_HOME & HADOOP_CLASSPATH SAME.
    Pl help.

  327. Anuradha G Said,

    May 15, 2013 @ 10:28 am

    i followed ur instructions as per tutorial for installing hadoop with eclipse. but when i typed ssh-host-config i got an error command not found.
    how should i sort it out?

  328. Revathy Ranganathan Said,

    May 22, 2013 @ 8:11 am

    Hi vlad,
    When i convert word count hadoop program into a jar and run in a hadoop single node cluster it works fine. But when i try to run the same jar on a hadoop multi node cluster it gives the following error
    13/05/22 20:26:25 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    13/05/22 20:26:26 INFO mapred.FileInputFormat: Total input paths to process : 1
    13/05/22 20:26:34 INFO mapred.JobClient: Running job: job_201305222023_0002
    13/05/22 20:26:35 INFO mapred.JobClient: map 0% reduce 0%
    13/05/22 20:26:50 INFO mapred.JobClient: Task Id : attempt_201301091108_0005_m_000006_0, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

  329. Sumit Said,

    May 31, 2013 @ 8:08 am

    I followed the instruction given in the tutorial but the sshd service is not getting started. Pls help

  330. Dorian G Said,

    June 3, 2013 @ 7:22 pm

    If you’re having trouble getting sshd to start up on Cygwin (especially on Windows 8), try this:

    http://www.davemalpass.com/words/cygwin-install-on-windows-8-ssh-host-config-as-well/

    Pay attention to the CHANGE PERMISSIONS bit at the bottom!

  331. Rahul Said,

    June 7, 2013 @ 4:48 am

    Hi,
    Thanks for this post.
    I am could successfully configure the hadoop in my system. But I am unable to configure it after pasting the eclipse-plugin.jar file in the plugin folder of eclipse. I am not getting the Map Reduce option which you have shown on page:
    Eclipse Map/Reduce Perspective on the link :
    http://v-lad.org/Tutorials/Hadoop/13.5%20-%20copy%20hadoop%20plugin.html.

    Please, give me solution on this.
    Thank You.

  332. Rushi Soni Said,

    June 10, 2013 @ 10:33 pm

    Hi,

    Thanks for these tutorials.
    I am having following error while doing these step:

    bin/hadoop namenode -format

    cygpath: can’t convert empty path
    cygwin warning:
    MS-DOS style path detected: C:\cygwin\home\OIS\hadoop-0.23.8/build/native
    Preferred POSIX equivalent is: /home/OIS/hadoop-0.23.8/build/native
    CYGWIN environment variable option “nodosfilewarning” turns off this warning.
    Consult the user’s guide for more details about POSIX paths:
    http://cygwin.com/cygwin-ug-net/using.html#using-pathnames
    DEPRECATED: Use of this script to execute hdfs command is deprecated.
    Instead use the hdfs command for it.

    which: no hdfs in (./C:\cygwin\home\OIS\hadoop-0.23.8/bin)
    dirname: missing operand
    Try `dirname –help’ for more information.
    C:\cygwin\home\OIS\hadoop-0.23.8/bin/hdfs: line 24: /home/OIS/hadoop-0.23.8/../libexec/hdfs-config.sh: No such file or directory
    cygpath: can’t convert empty path
    C:\cygwin\home\OIS\hadoop-0.23.8/bin/hdfs: line 142: exec: : not found

    Please help me to resolve this.

  333. Bijendra Singh Said,

    July 23, 2013 @ 10:17 am

    Hi, Thanks for the tutorial. Every thing is working fine for me. But when i am executing TestDriver class on Run On Hadoop it is not executing. No Errors in console no logging is going on as mentioned in tutorial while performing run it will ask to choose the localhost from dialog box. Even dialog box is not appearing. Anybody can help me on this. Using Eclipse Juno, JDK1.6

  334. Prasanna B Said,

    July 26, 2013 @ 2:24 am

    Hi
    Thanks for the good Tutorials

    i facing the problem in executing the hadoop project

    Kindly help me in over come

    13/07/26 14:39:57 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively
    Exception in thread “main” org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/asiapac/pb/In
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
    at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
    at TestDriver.main(TestDriver.java:66)

    Thanks in Advance

  335. Rajul Anand Said,

    July 26, 2013 @ 1:39 pm

    Hi,
    Thank you very much for the comprehensive tutorial. I have experience with installing hadoop (0.20.2) on ubuntu, wanted to try on windows (Windows 7 SP1 64 bit) and found your article as an excellent resource.

    I installed hadoop 0.20.2 but faced few problems while getting ssh to work. Primarily, the problem was that ssh on new windows systems require password based systems. I was able to circumvent the problem using this resource:

    http://docs.oracle.com/cd/E24628_01/install.121/e22624/preinstall_req_cygwin_ssh.htm

    For changes regarding files in conf folder, i used my earlier files; the source for which is another excellent tutorial by Michael Noll:
    http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/

    I hope it will help someone trying to install the same or different distribution on new windows systems.

  336. Chetan Said,

    August 1, 2013 @ 1:24 am

    while opening cygwin trminal i’m getting an fatal error like this
    1 [main] -bash 4788 C:\cygwin\bin\bash.exe: *** fatal error – prefork: couldn’t create pipe process tracker, Win32 error 161
    1 [main] -bash 2008 C:\cygwin\bin\bash.exe: *** fatal error – prefork: couldn’t create pipe process tracker, Win32 error 161
    1 [main] -bash 5884 C:\cygwin\bin\bash.exe: *** fatal error – prefork: couldn’t create pipe process tracker, Win32
    error 161
    how to resolve this error please suggest me

  337. Chetan Said,

    August 14, 2013 @ 12:47 am

    I am Getting this warning message while Difining Hadoop Location

    The job tracker information (mapred.job.tracker) is invalid. This usually looks like “host:port”

  338. Hay Said,

    August 28, 2013 @ 11:39 pm

    In the Project Explorer tab on the left hand side of the Eclipse window, find the DFS Locations item. Open it using the “+” icon on its left.

    I used the “+” icon on the left. Inside, localhost->tmp->hadoop-xxx->mapred->system->Error:org.apche.hadoop.security.AccessControlException:Permission denied: user=xxxx,access=READ_EXECUTE,inode=”system:xxxx-pc\xxx:supergroup:rwx-wx-wx…….
    this error is show. so, how should i do?please help me to solve this. Thank you very much.

  339. Sourabh Said,

    September 16, 2013 @ 8:01 pm

    I am getting this error while running TestDriver , Please help !!

    13/09/16 19:30:17 INFO mapred.JobClient: Task Id : attempt_201309161918_0001_m_000006_0, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    13/09/16 19:30:22 INFO mapred.JobClient: Task Id : attempt_201309161918_0001_m_000006_1, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    13/09/16 19:30:25 INFO mapred.JobClient: Task Id : attempt_201309161918_0001_m_000006_2, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    13/09/16 19:30:33 INFO mapred.JobClient: Task Id : attempt_201309161918_0001_m_000005_0, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    13/09/16 19:30:37 INFO mapred.JobClient: Task Id : attempt_201309161918_0001_m_000005_1, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    13/09/16 19:30:40 INFO mapred.JobClient: Task Id : attempt_201309161918_0001_m_000005_2, Status : FAILED
    java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:425)

    java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
    at TestDriver.main(TestDriver.java:40)

  340. sridhar Said,

    September 17, 2013 @ 11:39 am

    $ bin/hadoop namenode
    bin/hadoop: line 2: $’\r’: command not found
    bin/hadoop: line 17: $’\r’: command not found
    bin/hadoop: line 18: $’\r’: command not found
    bin/hadoop: line 43: $’\r’: command not found
    : No such file or directoryn
    bin/hadoop: line 46: $’\r’: command not found
    : No such file or directoryocal/hadoop0
    bin/hadoop: line 48: $’\r’: command not found
    bin/hadoop: line 50: syntax error near unexpected token `$’in\r”
    ‘in/hadoop: line 50: `case “`uname`” in

  341. sridhar Said,

    September 17, 2013 @ 7:42 pm

    bin/hadoop: line 2: $’\r’: command not found
    bin/hadoop: line 17: $’\r’: command not found
    bin/hadoop: line 18: $’\r’: command not found
    bin/hadoop: line 43: $’\r’: command not found
    : No such file or directoryn
    bin/hadoop: line 46: $’\r’: command not found
    : No such file or directoryocal/hadoop0
    bin/hadoop: line 48: $’\r’: command not found
    bin/hadoop: line 50: syntax error near unexpected token `$’in\r”
    ‘in/hadoop: line 50: `case “`uname`” in
    Pls help me

Leave a Comment