Hadoop 3.1.1 运行自带的 wordcount 示例 -- cnDBA.cn

Hadoop 的计算需要通过MapReduced来实现，可以通过编写Java程序，将功能打成jar包来执行。所以如果有良好的Java基础，编写MR程序自然会容易很多。

我们这里用自带的示例程序来运行wordcount，从而来演示Hadoop的功能。
/home/cndba/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1.jar

导入测试文件：

[http://www.cndba.cn@hadoopmaster hadoop]$ ls
bin  dfs  etc  include  lib  libexec  LICENSE.txt  logs  NOTICE.txt  README.txt  sbin  share  tmp  var
[http://www.cndba.cn@hadoopmaster hadoop]$ hdfs dfs -put LICENSE.txt /dave
[http://www.cndba.cn@hadoopmaster hadoop]$ 
[http://www.cndba.cn@hadoopmaster hadoop]$ hdfs dfs -ls /
Found 3 items
drwxr-xr-x   - cndba supergroup          0 2019-01-23 23:16 /dave
drwxr-xr-x   - cndba supergroup          0 2019-01-23 21:33 /oracle
drwxr-xr-x   - cndba supergroup          0 2019-01-23 22:36 /system
[http://www.cndba.cn@hadoopmaster hadoop]$ hdfs dfs -ls -R /
drwxr-xr-x   - cndba supergroup          0 2019-01-23 23:16 /dave
-rw-r--r--   2 cndba supergroup     147144 2019-01-23 23:16 /dave/LICENSE.txt
-rw-r--r--   2 cndba supergroup          0 2019-01-23 21:51 /dave/www.cndba.cn.txt
drwxr-xr-x   - cndba supergroup          0 2019-01-23 21:33 /oracle
drwxr-xr-x   - cndba supergroup          0 2019-01-23 21:33 /oracle/mysql
drwxr-xr-x   - cndba supergroup          0 2019-01-23 22:36 /system

示例的jar包在如下目录：

[http://www.cndba.cn@hadoopmaster mapreduce]$ pwd
/home/cndba/hadoop/share/hadoop/mapreduce

执行Hadoop MR程序：

[http://www.cndba.cn@hadoopmaster mapreduce]$ hadoop jar hadoop-mapreduce-examples-3.1.1.jar wordcount /dave/LICENSE.txt output
2019-01-23 23:55:14,527 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.20.80:8032
2019-01-23 23:55:14,944 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/cndba/.staging/job_1548242934753_0003
2019-01-23 23:55:15,344 INFO input.FileInputFormat: Total input files to process : 1
2019-01-23 23:55:15,461 INFO mapreduce.JobSubmitter: number of splits:1
2019-01-23 23:55:15,538 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2019-01-23 23:55:15,749 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1548242934753_0003
2019-01-23 23:55:15,751 INFO mapreduce.JobSubmitter: Executing with tokens: []
2019-01-23 23:55:15,967 INFO conf.Configuration: resource-types.xml not found
2019-01-23 23:55:15,967 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2019-01-23 23:55:16,050 INFO impl.YarnClientImpl: Submitted application application_1548242934753_0003
2019-01-23 23:55:16,106 INFO mapreduce.Job: The url to track the job: http://hadoopmaster:8088/proxy/application_1548242934753_0003/
2019-01-23 23:55:16,107 INFO mapreduce.Job: Running job: job_1548242934753_0003
2019-01-23 23:55:23,242 INFO mapreduce.Job: Job job_1548242934753_0003 running in uber mode : false
2019-01-23 23:55:23,244 INFO mapreduce.Job:  map 0% reduce 0%
2019-01-23 23:55:28,328 INFO mapreduce.Job:  map 100% reduce 0%
2019-01-23 23:55:34,369 INFO mapreduce.Job:  map 100% reduce 100%
2019-01-23 23:55:34,380 INFO mapreduce.Job: Job job_1548242934753_0003 completed successfully
2019-01-23 23:55:34,524 INFO mapreduce.Job: Counters: 53
    File System Counters
        FILE: Number of bytes read=46271
        FILE: Number of bytes written=521743
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=147250
        HDFS: Number of bytes written=34795
        HDFS: Number of read operations=8
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Launched reduce tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=3346
        Total time spent by all reduces in occupied slots (ms)=3103
        Total time spent by all map tasks (ms)=3346
        Total time spent by all reduce tasks (ms)=3103
        Total vcore-milliseconds taken by all map tasks=3346
        Total vcore-milliseconds taken by all reduce tasks=3103
        Total megabyte-milliseconds taken by all map tasks=3426304
        Total megabyte-milliseconds taken by all reduce tasks=3177472
    Map-Reduce Framework
        Map input records=2746
        Map output records=21463
        Map output bytes=228869
        Map output materialized bytes=46271
        Input split bytes=106
        Combine input records=21463
        Combine output records=2965
        Reduce input groups=2965
        Reduce shuffle bytes=46271
        Reduce input records=2965
        Reduce output records=2965
        Spilled Records=5930
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=102
        CPU time spent (ms)=2300
        Physical memory (bytes) snapshot=518160384
        Virtual memory (bytes) snapshot=5637390336
        Total committed heap usage (bytes)=431489024
        Peak Map Physical memory (bytes)=314851328
        Peak Map Virtual memory (bytes)=2815950848
        Peak Reduce Physical memory (bytes)=203309056
        Peak Reduce Virtual memory (bytes)=2821439488
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=147144
    File Output Format Counters 
        Bytes Written=34795
[http://www.cndba.cn@hadoopmaster mapreduce]$

注意这里可能会遇到如下错误：

Hadoop 3.1.1 Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster 解决方法
https://www.cndba.cn/dave/article/3259
http://www.cndba.cn/cndba/dave/article/3260

输出的结果在HDFS的output目录下：

[http://www.cndba.cn@hadoopmaster mapreduce]$ hdfs dfs -ls -R /
drwxr-xr-x   - cndba supergroup          0 2019-01-23 23:16 /dave
-rw-r--r--   2 cndba supergroup     147144 2019-01-23 23:16 /dave/LICENSE.txt
-rw-r--r--   2 cndba supergroup          0 2019-01-23 21:51 /dave/www.cndba.cn.txt
drwxr-xr-x   - cndba supergroup          0 2019-01-23 21:33 /oracle
drwxr-xr-x   - cndba supergroup          0 2019-01-23 21:33 /oracle/mysql
drwxr-xr-x   - cndba supergroup          0 2019-01-23 22:36 /system
drwx------   - cndba supergroup          0 2019-01-23 23:25 /tmp
drwx------   - cndba supergroup          0 2019-01-23 23:25 /tmp/hadoop-yarn
drwx------   - cndba supergroup          0 2019-01-23 23:39 /tmp/hadoop-yarn/staging
drwx------   - cndba supergroup          0 2019-01-23 23:25 /tmp/hadoop-yarn/staging/cndba
drwx------   - cndba supergroup          0 2019-01-23 23:55 /tmp/hadoop-yarn/staging/cndba/.staging
drwx------   - cndba supergroup          0 2019-01-23 23:25 /tmp/hadoop-yarn/staging/cndba/.staging/job_1548242934753_0001
-rw-r--r--  10 cndba supergroup     316297 2019-01-23 23:25 /tmp/hadoop-yarn/staging/cndba/.staging/job_1548242934753_0001/job.jar
-rw-r--r--  10 cndba supergroup        113 2019-01-23 23:25 /tmp/hadoop-yarn/staging/cndba/.staging/job_1548242934753_0001/job.split
-rw-r--r--   2 cndba supergroup         42 2019-01-23 23:25 /tmp/hadoop-yarn/staging/cndba/.staging/job_1548242934753_0001/job.splitmetainfo
-rw-r--r--   2 cndba supergroup     182479 2019-01-23 23:25 /tmp/hadoop-yarn/staging/cndba/.staging/job_1548242934753_0001/job.xml
drwxr-xr-x   - cndba supergroup          0 2019-01-23 23:39 /tmp/hadoop-yarn/staging/history
drwxrwxrwt   - cndba supergroup          0 2019-01-23 23:39 /tmp/hadoop-yarn/staging/history/done_intermediate
drwxrwx---   - cndba supergroup          0 2019-01-23 23:55 /tmp/hadoop-yarn/staging/history/done_intermediate/cndba
-rwxrwx---   2 cndba supergroup      22444 2019-01-23 23:40 /tmp/hadoop-yarn/staging/history/done_intermediate/cndba/job_1548242934753_0002-1548257983080-cndba-word+count-1548258000513-1-1-SUCCEEDED-default-1548257988877.jhist
-rwxrwx---   2 cndba supergroup        440 2019-01-23 23:40 /tmp/hadoop-yarn/staging/history/done_intermediate/cndba/job_1548242934753_0002.summary
-rwxrwx---   2 cndba supergroup     211968 2019-01-23 23:40 /tmp/hadoop-yarn/staging/history/done_intermediate/cndba/job_1548242934753_0002_conf.xml
-rwxrwx---   2 cndba supergroup      22442 2019-01-23 23:55 /tmp/hadoop-yarn/staging/history/done_intermediate/cndba/job_1548242934753_0003-1548258916005-cndba-word+count-1548258932719-1-1-SUCCEEDED-default-1548258921387.jhist
-rwxrwx---   2 cndba supergroup        440 2019-01-23 23:55 /tmp/hadoop-yarn/staging/history/done_intermediate/cndba/job_1548242934753_0003.summary
-rwxrwx---   2 cndba supergroup     211968 2019-01-23 23:55 /tmp/hadoop-yarn/staging/history/done_intermediate/cndba/job_1548242934753_0003_conf.xml
drwxr-xr-x   - cndba supergroup          0 2019-01-23 23:39 /user
drwxr-xr-x   - cndba supergroup          0 2019-01-23 23:55 /user/cndba
drwxr-xr-x   - cndba supergroup          0 2019-01-23 23:55 /user/cndba/output
-rw-r--r--   2 cndba supergroup          0 2019-01-23 23:55 /user/cndba/output/_SUCCESS
-rw-r--r--   2 cndba supergroup      34795 2019-01-23 23:55 /user/cndba/output/part-r-00000
[http://www.cndba.cn@hadoopmaster mapreduce]$


[http://www.cndba.cn@hadoopmaster mapreduce]$ hdfs dfs -cat /user/cndba/output/part-r-00000|more
""AS    2
"AS    22
"AS-IS"    1
"Adaptation"    1
"COPYRIGHTS    1
"Collection"    1
"Collective    1
"Contribution"    2
"Contributor"    2
"Creative    1
"Derivative    2
"Distribute"    1
"French    2
"JDOM"    2
"JDOM",    1
"Java    1
"LICENSE").    2
"Legal    1
"License"    1
"License");    2
"Licensed    1
"Licensor"    3
"Losses")    1
"NOTICE"    1
"Not    1
"Object"    1
"Original    2
"Program"    1
"Publicly    1
……

注意这里的输出目录不能存在，如果存在会报如下错误：

[http://www.cndba.cn@hadoopmaster mapreduce]$ hadoop jar hadoop-mapreduce-examples-3.1.1.jar wordcount /dave/LICENSE.txt output
2019-01-23 23:42:16,728 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.20.80:8032
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://hadoopmaster:9000/user/cndba/output already exists
    at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:164)
    at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:280)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:146)
    at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
    at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)

删除对应目录就可以执行了：

[http://www.cndba.cn@hadoopmaster mapreduce]$ hdfs dfs -rm -r output
Deleted output

签到成功

CNDBA社区

Hadoop 3.1.1 运行自带的 wordcount 示例

dave

QQ交流群

注册联系QQ