Linux系统下运行基于本地的Hadoop

感觉,又不同于在Windows下使用Cygwin模拟Linux环境下运行Hadoop。在Linux下,如果权限不够,根本就不可能让你运行的。

当然,使用root用户没有问题了,看看我的运行过程。我使用的是hadoop-0.18.0版本的。

首先,修改Hadoop配置文件hadoop-env.sh,设置JAVA_HOME:

# The java implementation to use. Required.
export JAVA_HOME="/usr/java/jdk1.6.0_07"
 

其次,切换到root用户,并通过ssh登录到127.0.0.1:

[@www.linuxidc.com hadoop-0.18.0]$ su root
口令:
[root@ hadoop-0.18.0]# ssh localhost
root@localhost's password:
Last login: Wed Sep 24 19:25:21 2008 from localhost.localdomain
[root@ ~]#
 

接着,准备输入数据文件,在hadoop-0.18.0目录下面新建一个目录my-input,里面新建了7个TXT文件,文件内容就是使用空格分隔的英文单词。

然后,切换到hadoop-0.18.0目录下面,并运行WordCount统计词频的工具:

[root@ hadoop-0.18.0]# bin/hadoop jar hadoop-0.18.0-examples.jar wordcount my-input my-output  

运行过程如下所示:

08/09/25 16:32:39 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
08/09/25 16:32:40 INFO mapred.FileInputFormat: Total input paths to process : 7
08/09/25 16:32:40 INFO mapred.FileInputFormat: Total input paths to process : 7
08/09/25 16:32:41 INFO mapred.JobClient: Running job: job_local_0001
08/09/25 16:32:41 INFO mapred.FileInputFormat: Total input paths to process : 7
08/09/25 16:32:41 INFO mapred.FileInputFormat: Total input paths to process : 7
08/09/25 16:32:41 INFO mapred.MapTask: numReduceTasks: 1
08/09/25 16:32:41 INFO mapred.MapTask: io.sort.mb = 100
08/09/25 16:32:42 INFO mapred.JobClient: map 0% reduce 0%
08/09/25 16:32:44 INFO mapred.MapTask: data buffer = 79691776/99614720
08/09/25 16:32:44 INFO mapred.MapTask: record buffer = 262144/327680
08/09/25 16:32:45 INFO mapred.MapTask: Starting flush of map output
08/09/25 16:32:45 INFO mapred.MapTask: bufstart = 0; bufend = 3262; bufvoid = 99614720
08/09/25 16:32:45 INFO mapred.MapTask: kvstart = 0; kvend = 326; length = 327680
08/09/25 16:32:45 INFO mapred.MapTask: Index: (0, 26, 26)
08/09/25 16:32:45 INFO mapred.MapTask: Finished spill 0
08/09/25 16:32:45 INFO mapred.LocalJobRunner: file:/home/www.linuxidc.com/hadoop-0.18.0/my-input/e.txt:0+1957
08/09/25 16:32:45 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
08/09/25 16:32:45 INFO mapred.TaskRunner: Saved output of task 'attempt_local_0001_m_000000_0' to file:/home/www.linuxidc.com/hadoop-0.18.0/my-output
08/09/25 16:32:46 INFO mapred.MapTask: numReduceTasks: 1
08/09/25 16:32:46 INFO mapred.MapTask: io.sort.mb = 100
08/09/25 16:32:46 INFO mapred.JobClient: map 100% reduce 0%
08/09/25 16:32:46 INFO mapred.MapTask: data buffer = 79691776/99614720
08/09/25 16:32:46 INFO mapred.MapTask: record buffer = 262144/327680
08/09/25 16:32:46 INFO mapred.MapTask: Starting flush of map output
08/09/25 16:32:46 INFO mapred.MapTask: bufstart = 0; bufend = 3262; bufvoid = 99614720
08/09/25 16:32:46 INFO mapred.MapTask: kvstart = 0; kvend = 326; length = 327680
08/09/25 16:32:46 INFO mapred.MapTask: Index: (0, 26, 26)
08/09/25 16:32:46 INFO mapred.MapTask: Finished spill 0
08/09/25 16:32:46 INFO mapred.LocalJobRunner: file:/home/www.linuxidc.com/hadoop-0.18.0/my-input/a.txt:0+1957
08/09/25 16:32:46 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000001_0' done.
08/09/25 16:32:46 INFO mapred.TaskRunner: Saved output of task 'attempt_local_0001_m_000001_0' to file:/home/www.linuxidc.com/hadoop-0.18.0/my-output
08/09/25 16:32:46 INFO mapred.MapTask: numReduceTasks: 1
08/09/25 16:32:46 INFO mapred.MapTask: io.sort.mb = 100
08/09/25 16:32:47 INFO mapred.MapTask: data buffer = 79691776/99614720
08/09/25 16:32:47 INFO mapred.MapTask: record buffer = 262144/327680
08/09/25 16:32:47 INFO mapred.MapTask: Starting flush of map output
08/09/25 16:32:47 INFO mapred.MapTask: bufstart = 0; bufend = 16845; bufvoid = 99614720
08/09/25 16:32:47 INFO mapred.MapTask: kvstart = 0; kvend = 1684; length = 327680
08/09/25 16:32:47 INFO mapred.MapTask: Index: (0, 42, 42)
08/09/25 16:32:47 INFO mapred.MapTask: Finished spill 0
08/09/25 16:32:47 INFO mapred.LocalJobRunner: file:/home/www.linuxidc.com/hadoop-0.18.0/my-input/b.txt:0+10109
08/09/25 16:32:47 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000002_0' done.
08/09/25 16:32:47 INFO mapred.TaskRunner: Saved output of task 'attempt_local_0001_m_000002_0' to file:/home/www.linuxidc.com/hadoop-0.18.0/my-output
08/09/25 16:32:47 INFO mapred.MapTask: numReduceTasks: 1
08/09/25 16:32:47 INFO mapred.MapTask: io.sort.mb = 100
08/09/25 16:32:48 INFO mapred.MapTask: data buffer = 79691776/99614720
08/09/25 16:32:48 INFO mapred.MapTask: record buffer = 262144/327680
08/09/25 16:32:48 INFO mapred.MapTask: Starting flush of map output
08/09/25 16:32:48 INFO mapred.MapTask: bufstart = 0; bufend = 3312; bufvoid = 99614720
08/09/25 16:32:48 INFO mapred.MapTask: kvstart = 0; kvend = 331; length = 327680
08/09/25 16:32:48 INFO mapred.MapTask: Index: (0, 72, 72)
08/09/25 16:32:48 INFO mapred.MapTask: Finished spill 0
08/09/25 16:32:48 INFO mapred.LocalJobRunner: file:/home/www.linuxidc.com/hadoop-0.18.0/my-input/d.txt:0+1987
08/09/25 16:32:48 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000003_0' done.
08/09/25 16:32:48 INFO mapred.TaskRunner: Saved output of task 'attempt_local_0001_m_000003_0' to file:/home/www.linuxidc.com/hadoop-0.18.0/my-output
08/09/25 16:32:48 INFO mapred.MapTask: numReduceTasks: 1
08/09/25 16:32:48 INFO mapred.MapTask: io.sort.mb = 100
08/09/25 16:32:49 INFO mapred.MapTask: data buffer = 79691776/99614720
08/09/25 16:32:49 INFO mapred.MapTask: record buffer = 262144/327680
08/09/25 16:32:49 INFO mapred.MapTask: Starting flush of map output
08/09/25 16:32:49 INFO mapred.MapTask: bufstart = 0; bufend = 3262; bufvoid = 99614720
08/09/25 16:32:49 INFO mapred.MapTask: kvstart = 0; kvend = 326; length = 327680
08/09/25 16:32:49 INFO mapred.MapTask: Index: (0, 26, 26)
08/09/25 16:32:49 INFO mapred.MapTask: Finished spill 0
08/09/25 16:32:49 INFO mapred.LocalJobRunner: file:/home/www.linuxidc.com/hadoop-0.18.0/my-input/g.txt:0+1957
08/09/25 16:32:49 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000004_0' done.
08/09/25 16:32:49 INFO mapred.TaskRunner: Saved output of task 'attempt_local_0001_m_000004_0' to file:/home/www.linuxidc.com/hadoop-0.18.0/my-output
08/09/25 16:32:49 INFO mapred.MapTask: numReduceTasks: 1
08/09/25 16:32:49 INFO mapred.MapTask: io.sort.mb = 100
08/09/25 16:32:49 INFO mapred.MapTask: data buffer = 79691776/99614720
08/09/25 16:32:49 INFO mapred.MapTask: record buffer = 262144/327680
08/09/25 16:32:49 INFO mapred.MapTask: Starting flush of map output
08/09/25 16:32:49 INFO mapred.MapTask: bufstart = 0; bufend = 3262; bufvoid = 99614720
08/09/25 16:32:49 INFO mapred.MapTask: kvstart = 0; kvend = 326; length = 327680
08/09/25 16:32:50 INFO mapred.MapTask: Index: (0, 26, 26)
08/09/25 16:32:50 INFO mapred.MapTask: Finished spill 0
08/09/25 16:32:50 INFO mapred.LocalJobRunner: file:/home/www.linuxidc.com/hadoop-0.18.0/my-input/c.txt:0+1957
08/09/25 16:32:50 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000005_0' done.
08/09/25 16:32:50 INFO mapred.TaskRunner: Saved output of task 'attempt_local_0001_m_000005_0' to file:/home/www.linuxidc.com/hadoop-0.18.0/my-output
08/09/25 16:32:50 INFO mapred.MapTask: numReduceTasks: 1
08/09/25 16:32:50 INFO mapred.MapTask: io.sort.mb = 100
08/09/25 16:32:50 INFO mapred.MapTask: data buffer = 79691776/99614720
08/09/25 16:32:50 INFO mapred.MapTask: record buffer = 262144/327680
08/09/25 16:32:50 INFO mapred.MapTask: Starting flush of map output
08/09/25 16:32:50 INFO mapred.MapTask: bufstart = 0; bufend = 3306; bufvoid = 99614720
08/09/25 16:32:50 INFO mapred.MapTask: kvstart = 0; kvend = 330; length = 327680
08/09/25 16:32:50 INFO mapred.MapTask: Index: (0, 50, 50)
08/09/25 16:32:50 INFO mapred.MapTask: Finished spill 0
08/09/25 16:32:50 INFO mapred.LocalJobRunner: file:/home/www.linuxidc.com/hadoop-0.18.0/my-input/f.txt:0+1985
08/09/25 16:32:50 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000006_0' done.
08/09/25 16:32:50 INFO mapred.TaskRunner: Saved output of task 'attempt_local_0001_m_000006_0' to file:/home/www.linuxidc.com/hadoop-0.18.0/my-output
08/09/25 16:32:51 INFO mapred.ReduceTask: Initiating final on-disk merge with 7 files
08/09/25 16:32:51 INFO mapred.Merger: Merging 7 sorted segments
08/09/25 16:32:51 INFO mapred.Merger: Down to the last merge-pass, with 7 segments left of total size: 268 bytes
08/09/25 16:32:51 INFO mapred.LocalJobRunner: reduce > reduce
08/09/25 16:32:51 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' done.
08/09/25 16:32:51 INFO mapred.TaskRunner: Saved output of task 'attempt_local_0001_r_000000_0' to file:/home/www.linuxidc.com/hadoop-0.18.0/my-output
08/09/25 16:32:51 INFO mapred.JobClient: Job complete: job_local_0001
08/09/25 16:32:51 INFO mapred.JobClient: Counters: 11
08/09/25 16:32:51 INFO mapred.JobClient:   File Systems
08/09/25 16:32:51 INFO mapred.JobClient:     Local bytes read=953869
08/09/25 16:32:51 INFO mapred.JobClient:     Local bytes written=961900
08/09/25 16:32:51 INFO mapred.JobClient:   Map-Reduce Framework
08/09/25 16:32:51 INFO mapred.JobClient:     Reduce input groups=7
08/09/25 16:32:51 INFO mapred.JobClient:     Combine output records=21
08/09/25 16:32:51 INFO mapred.JobClient:     Map input records=7
08/09/25 16:32:51 INFO mapred.JobClient:     Reduce output records=7
08/09/25 16:32:51 INFO mapred.JobClient:     Map output bytes=36511
08/09/25 16:32:51 INFO mapred.JobClient:     Map input bytes=21909
08/09/25 16:32:51 INFO mapred.JobClient:     Combine input records=3649
08/09/25 16:32:51 INFO mapred.JobClient:     Map output records=3649
08/09/25 16:32:51 INFO mapred.JobClient:     Reduce input records=21
 

最后,查看处理数据的结果:

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:http://www.heiqu.com/pxpgg.html