HDFS Trash 整理

CM配置步骤:

Configuring HDFS Trash

The Hadoop trash feature helps prevent accidental deletion of files and directories. If trash is enabled and a file or directory is deleted using the Hadoop shell, the file is moved to the .Trash directory in the user's home directory instead of being deleted. Deleted files are initially moved to the Current sub-directory of the .Trash directory, and their original path is preserved. If trash checkpointing is enabled, the Current directory is periodically renamed using a timestamp. Files in .Trash are permanently removed after a user-configurable time delay. Files and directories in the trash can be restored simply by moving them to a location outside the .Trash directory.

  Important:
  • The trash feature is disabled by default. Cloudera recommends that you enable it on all production clusters.
  • The trash feature works by default only for files and directories deleted using the Hadoop shell. Files or directories deleted programmatically using other interfaces (WebHDFS or the Java APIs, for example) are not moved to trash, even if trash is enabled, unless the program has implemented a call to the trash functionality. (Hue, for example, implements trash as of CDH 4.4.)

    Users can bypass trash when deleting files using the shell by specifying the -skipTrash option to the hadoop fs -rm -r command. This can be useful when it is necessary to delete files that are too large for the user's quota.

Configuring HDFS Trash Using Cloudera Manager

Required Role:   

Enabling and Disabling Trash

  1. Go to the HDFS service.
  2. Click the Configuration tab.
  3. Select Scope > Gateway.
  4. Select or deselect the Use Trash checkbox.

    If more than one role group applies to this configuration, edit the value for the appropriate role group. See Modifying Configuration Properties.

  5. Click Save Changes to commit the changes.
  6. Restart the cluster and deploy the cluster client configuration.

Setting the Trash Interval

  1. Go to the HDFS service.
  2. Click the Configuration tab.
  3. Select Scope > NameNode.
  4. Specify the Filesystem Trash Interval property, which controls the number of minutes after which a trash checkpoint directory is deleted and the number of minutes between trash checkpoints. For example, to enable trash so that deleted files are deleted after 24 hours, set the value of the Filesystem Trash Interval property to 1440.
      Note: The  trash interval is measured from the point at which the files are moved to  trash, not from the last time the files were modified.

    If more than one role group applies to this configuration, edit the value for the appropriate role group. See Modifying Configuration Properties.

  5. Click Save Changes to commit the changes.
  6. Restart all NameNodes.
    来源: >

command line:

Enabling Trash

The Hadoop trash feature helps prevent accidental deletion of files and directories. If trash is enabled and a file or directory is deleted using the Hadoop shell, the file is moved to the .Trash directory in the user's home directory instead of being deleted. Deleted files are initially moved to the Current sub-directory of the .Trash directory, and their original path is preserved. If trash checkpointing is enabled, the Current directory is periodically renamed using a timestamp. Files in .Trash are permanently removed after a user-configurable time delay. Files and directories in the trash can be restored simply by moving them to a location outside the .Trash directory.

  Important:
  • The trash feature is disabled by default. Cloudera recommends that you enable it on all production clusters.
  • The trash feature works by default only for files and directories deleted using the Hadoop shell. Files or directories deleted programmatically using other interfaces (WebHDFS or the Java APIs, for example) are not moved to trash, even if trash is enabled, unless the program has implemented a call to the trash functionality. (Hue, for example, implements trash as of CDH 4.4.)

    Users can bypass trash when deleting files using the shell by specifying the -skipTrash option to the hadoop fs -rm -r command. This can be useful when it is necessary to delete files that are too large for the user's quota.

Trash is configured with the following properties in the core-site.xml file:

CDH Parameter

Value

Description

fs.trash.interval

minutes or 0

The number of minutes after which a trash checkpoint directory is deleted. This option can be configured both on the server and the client.

  • If trash is enabled on the server configuration, then the value configured on the server is used and the client configuration is ignored.
  • If trash is disabled in the server configuration, then the client side configuration is checked.
  • If the value of this property is zero (the default), then the trash feature is disabled.

fs.trash.checkpoint.interval

minutes or 0

The number of minutes between trash checkpoints. Every time the checkpointer runs on the NameNode, it creates a new checkpoint of the "Current" directory and removes checkpoints older thanfs.trash.interval minutes. This value should be smaller than or equal to fs.trash.interval. This option is configured on the server. If configured to zero (the default), then the value is set to the value offs.trash.interval.

For example, to enable trash so that files deleted using the Hadoop shell are not deleted for 24 hours, set the value of the  fs.trash.interval property in the server's  core-site.xml file to a value of  1440.
  Note:

The period during which a file remains in the trash starts when the file is moved to the trash, not when the file is last modified.

http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_hdfs_cluster_deploy.html#topic_11_2_8_unique_1


这个命令看起来是清空回收站,但实际并没有清空,只是做了一次trash checkpoint?这里我不太确定

expunge

Usage: hadoop fs -expunge

Empty the Trash. Refer to the HDFS Architecture Guide for more information on the Trash feature.

 
   
[root@gc2 hadoop]# hadoop fs -ls /user/root/.Trash/Current/user/root
Found 2 items
-rw-r--r--   1 root supergroup         10 2015-07-24 10:39 /user/root/.Trash/Current/user/root/slaves
-rw-r--r--   1 root supergroup         72 2015-07-24 10:40 /user/root/.Trash/Current/user/root/sum.sh
[root@gc2 hadoop]# hadoop fs -expunge
15/07/28 00:07:11 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 1440 minutes, Emptier interval = 0 minutes.
15/07/28 00:07:11 INFO fs.TrashPolicyDefault: Created trash checkpoint: /user/root/.Trash/150728000711
[root@gc2 hadoop]# hadoop fs -ls -R  /user/root/.Trash/Current
ls: `/user/root/.Trash/Current': No such file or directory
[root@gc2 hadoop]# hadoop fs -ls -R  /user/root/.Trash
drwx------   - root supergroup          0 2015-07-26 12:16 /user/root/.Trash/150728000711
drwx------   - root supergroup          0 2015-07-26 13:43 /user/root/.Trash/150728000711/snap
-rw-r--r--   1 root supergroup         72 2015-07-26 11:55 /user/root/.Trash/150728000711/snap/sum.sh
drwx------   - root supergroup          0 2015-07-24 10:41 /user/root/.Trash/150728000711/user
drwx------   - root supergroup          0 2015-07-28 00:05 /user/root/.Trash/150728000711/user/root
-rw-r--r--   1 root supergroup         10 2015-07-24 10:39 /user/root/.Trash/150728000711/user/root/slaves
-rw-r--r--   1 root supergroup         72 2015-07-24 10:40 /user/root/.Trash/150728000711/user/root/sum.sh

你可能感兴趣的:(Hadoop)