Hadoop 环境搭建

Hadoop

1.Configuration

Use the following:

  • edit host
    edit the machine file /etc/hosts add hostname.
1
2
3
eg.  
192.168.137.128 skadoosh
192.168.137.128 localhost
  • Prepare to Start the Hadoop Cluster:
    edit the file etc/hadoop/hadoop-env.sh to define some parameters as follows:
1
2
# set to the root of your Java installation
export JAVA_HOME=/usr/java/latest
  • etc/hadoop/core-site.xml:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/test/tmp/</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>127.0.0.1:2181</value>
</property>
</configuration>
  • etc/hadoop/hdfs-site.xml:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/${hadoop.tmp.dir}/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/${hadoop.tmp.dir}/dfs/data</value>
</property>
</configuration>

Note:这个value要注意,后边应该是一个"/"(如:file:/home/test/tmp).

2.Execution

  • Format the filesystem:
    $ bin/hdfs namenode -format
  • Start NameNode daemon and DataNode daemon:
    $ sbin/start-dfs.sh

The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).

3.FileSystemShell

mkdir

Usage: hadoop fs -mkdir [-p]

Takes path uri’s as argument and creates directories.

Options:

The -p option behavior is much like Unix mkdir -p, creating parent directories along the path.

Example:

$ hadoop fs -mkdir /user/hadoop/dir1 /user/hadoop/dir2
$ hadoop fs -mkdir hdfs://nn1.example.com/user/hadoop/dir hdfs://nn2.example.com/user/hadoop/dir

Exit Code:

Returns 0 on success and -1 on error.


mv

Usage: hadoop fs -mv URI [URI …]

Moves files from source to destination. This command allows multiple sources as well in which case the destination needs to be a directory. Moving files across file systems is not permitted.

Example:

$ hadoop fs -mv /user/hadoop/file1 /user/hadoop/file2
$ hadoop fs -mv hdfs://nn.example.com/file1 hdfs://nn.example.com/file2 hdfs://nn.example.com/file3 hdfs://nn.example.com/dir1

Exit Code:

Returns 0 on success and -1 on error.


put

Usage: hadoop fs -put

Copy single src, or multiple srcs from local file system to the destination file system. Also reads input from stdin and writes to destination file system.

$ hadoop fs -put localfile /user/hadoop/hadoopfile
$ hadoop fs -put localfile1 localfile2 /user/hadoop/hadoopdir
$ hadoop fs -put localfile hdfs://nn.example.com/hadoop/hadoopfile
$ hadoop fs -put - hdfs://nn.example.com/hadoop/hadoopfile Reads the input from stdin.

Exit Code:

Returns 0 on success and -1 on error.


rm

Usage: hadoop fs -rm [-f] [-r |-R] [-skipTrash] URI [URI …]

Delete files specified as args.

If trash is enabled, file system instead moves the deleted file to a trash directory (given by FileSystem#getTrashRoot).

Currently, the trash feature is disabled by default. User can enable trash by setting a value greater than zero for parameter fs.trash.interval (in core-site.xml).

See expunge about deletion of files in trash.

Options:

The -f option will not display a diagnostic message or modify the exit status to reflect an error if the file does not exist.
The -R option deletes the directory and any content under it recursively.
The -r option is equivalent to -R.
The -skipTrash option will bypass trash, if enabled, and delete the specified file(s) immediately. This can be useful when it is necessary to delete files from an over-quota directory.

Example:

$ hadoop fs -rm hdfs://nn.example.com/file /user/hadoop/emptydir

Exit Code:

Returns 0 on success and -1 on error.


Hadoop安装完后,启动时报
Error: JAVA_HOME is not set and could not be found.

【解决办法】:修改/etc/hadoop/hadoop-env.sh中设JAVA_HOME。应当使用绝对路径。

export JAVA_HOME=$JAVA_HOME //错误,不能这么改

export JAVA_HOME=/usr/java/jdk1.6.0_45 //正确,应该这么改