seaweedfs部署及使用(兼容Hadoop)

软件版本:

软件 版本 压缩包包名
seaweedfs seaweedfs-1.11 linux_amd64.tar.gz

GitHub:

https://github.com/chrislusf/seaweedfs

相关定义说明:

定义名称 说明
master 提供volume=>location 位置映射服务和文件id的序列号
Node 系统抽象的节点,抽象为DataCenter、Rack
Datanode 存储节点,用于管理、存储逻辑卷
DataCenter 数据中心,对应现实中的不同机架
Rack 机架,对应现实中的机柜,一个机架属于特定的数据中心,一个数据中心可以包含多个机架。
Volume 逻辑卷,存储的逻辑结构,逻辑卷下存储Needle,A VolumeServer contains one Store
Needle 逻辑卷中的Object,对应存储的文件, Needle file size is limited to 4GB for now.
Collection 文件集,可以分布在多个逻辑卷上,如果在存储文件的时候没有指定collection,那么使用默认的""
Filer 文件管理器,Filer将数据上传到Weed Volume Servers,并将大文件分成块,将元数据和块信息写入Filer存储区。
Mount 用户空间,当filer与mount一起使用时,filer仅提供文件元数据检索,实际文件内容直接在mount和volume服务器之间读写,所以不需要多个filer

使用 $ ./weed -h 查看命令及说明
使用 $ ./weed [command] -h 查看命令参数及说明

部署规划:

节点 master volume filer
cdh1
cdh2
cdh3

解压:

$ tar -zxvf ./linux_amd64.tar.gz
得到 weed 文件

启动命令:

创建文件夹:

$ mkdir seaweedfd_master
$ mkdir seaweedfd_data

启动master命令:

$ ./weed master -ip cdh1 -maxCpu 1 -mdir ./seaweedfd_master -peers cdh1:9333,cdh2:9333,cdh3:9333 -port 9333 -pulseSeconds 5 -defaultReplication 001
$ ./weed master -ip cdh2 -maxCpu 1 -mdir ./seaweedfd_master -peers cdh1:9333,cdh2:9333,cdh3:9333 -port 9333 -pulseSeconds 5 -defaultReplication 001
$ ./weed master -ip cdh3 -maxCpu 1 -mdir ./seaweedfd_master -peers cdh1:9333,cdh2:9333,cdh3:9333 -port 9333 -pulseSeconds 5 -defaultReplication 001

避免脑裂:Only odd number of masters are supported!

后台运行:$ nohup ./weed master -ip cdh3 -maxCpu 1 -mdir ./seaweedfd_master -peers cdh1:9333,cdh2:9333,cdh3:9333 -port 9333 -pulseSeconds 5 -defaultReplication 001 > weed_master.out &

想对外提供服务必须存活两台master

启动volume:

$ ./weed volume -dataCenter dc1 -dir ./seaweedfd_data -ip cdh1 -ip.bind cdh1 -maxCpu 1 -mserver cdh1:9333,cdh2:9333,cdh3:9333 -port 9222 -port.public 9222 -publicUrl cdh1 -rack rack1
$ ./weed volume -dataCenter dc1 -dir ./seaweedfd_data -ip cdh2 -ip.bind cdh2 -maxCpu 1 -mserver cdh1:9333,cdh2:9333,cdh3:9333 -port 9222 -port.public 9222 -publicUrl cdh2 -rack rack1
$ ./weed volume -dataCenter dc1 -dir ./seaweedfd_data -ip cdh3 -ip.bind cdh3 -maxCpu 1 -mserver cdh1:9333,cdh2:9333,cdh3:9333 -port 9222 -port.public 9222 -publicUrl cdh3 -rack rack1

dataCenter: 数据中心名称
rack: 机架名称
后台启动:$ nohup ./weed volume -dataCenter dc1 -dir ./seaweedfd_data -ip cdh1 -ip.bind cdh1 -maxCpu 1 -max 200 -mserver cdh1:9333,cdh2:9333,cdh3:9333 -port 9222 -port.public 9222 -publicUrl cdh1 -rack rack1 > weed_volume.out &

访问master webUI:

http://cdh3:9333/

命令上传文件目录:

$ ./weed upload -dataCenter dc1 -master=cdh3:9333 -dir="./dir/"

分配文件key:

# 基本使用:
$ curl http://cdh1:9333/dir/assign
# 指定复制类型:
$ curl "http://cdh1:9333/dir/assign?replication=001"
# 指定保存时间
$ curl "http://cdh1:9333/dir/assign?count=5"
# 指定数据中心
$ curl "http://cdh1:9333/dir/assign?dataCenter=dc1"

上传文件例子:

# 获取file key
$ curl "http://cdh1:9333/dir/assign?dataCenter=dc1"
# 返回JSON
{"fid":"2,016beb339d","url":"cdh2:9222","publicUrl":"cdh2","count":1}
# 上传一个文件指定fid
$ curl -F file=@./file http://cdh2:9222/2,016beb339d
# 返回JSON
{"name":"file","size":41629428}

获取文件:

$ curl http://cdh2:9222/2,016beb339d

配置启动filer:

# 查看配置文件 filer.toml 
$ ./weed scaffold filer 

默认使用leveldb管理文件

# 生成配置文件
$ ./weed scaffold -config filer -output="."
# 示例使用postgres作为元数据存储
# 创建表
=========================================
CREATE TABLE IF NOT EXISTS filemeta (
dirhash     BIGINT,
name        VARCHAR(1000),
directory   VARCHAR(4096),
meta        bytea,
PRIMARY KEY (dirhash, name)
);
=========================================
# 配置 filer.toml 中的[postgres]
$ vi filer.toml

启动:

$ ./weed filer -master cdh1:9333,cdh2:9333,cdh3:9333 -port 8888 -port.public 8889

后台启动 $ nohup ./weed filer -master cdh1:9333,cdh2:9333,cdh3:9333 -port 8888 > weed_filer.out &

建议启动多台,多台共享一个数据库

上传文件:

$ curl -F "[email protected]" "http://cdh1:8888/path/to/sources/"

访问webUI页面:

http://cdh1:8888/

兼容Hadoop:

# MavenCentral下载最新版本
https://mvnrepository.com/artifact/com.github.chrislusf/seaweedfs-hadoop-client

# 确认有 mapred-site.xml 文件

# 测试 ls
==================================================================================
../../bin/hdfs dfs -Dfs.defaultFS=seaweedfs://cdh1:8888 \
           -Dfs.seaweedfs.impl=seaweed.hdfs.SeaweedFileSystem \
           -libjars ./seaweedfs-hadoop-client-1.0.2.jar \
           -ls /
# 返回
Found 2 items
drwxrwx---   -          0 2018-12-13 10:29 /path
drwxrwx---   -          0 2018-12-13 14:17 /weed

# 测试上传文件
==================================================================================
../../bin/hdfs dfs -Dfs.defaultFS=seaweedfs://cdh1:8888 \
           -Dfs.seaweedfs.impl=seaweed.hdfs.SeaweedFileSystem \
           -libjars ./seaweedfs-hadoop-client-1.0.2.jar \
           -put ./slaves /

# 测试下载文件夹
==================================================================================
../../bin/hdfs dfs -Dfs.defaultFS=seaweedfs://cdh1:8888 \
           -Dfs.seaweedfs.impl=seaweed.hdfs.SeaweedFileSystem \
           -libjars ./seaweedfs-hadoop-client-1.0.2.jar \
           -get /path

配置Hadoop:

$ vi core-site.xml

    fs.seaweedfs.impl
    seaweed.hdfs.SeaweedFileSystem



    fs.defaultFS
    seaweedfs://cdh1:8888

# 配置SeaweedFS HDFS客户端jar
$ bin/hadoop classpath
$ cp ./seaweedfs-hadoop-client-1.0.2.jar /hadoop/share/hadoop/common/lib/
$ scp ./seaweedfs-hadoop-client-1.0.2.jar cdh2:/hadoop/share/hadoop/common/lib
$ scp ./seaweedfs-hadoop-client-1.0.2.jar cdh3:/hadoop/share/hadoop/common/lib
$ scp ./core-site.xml cdh2:/hadoop/etc/hadoop/
$ scp ./core-site.xml cdh3:/hadoop/etc/hadoop/
# 查看
$ ../../bin/hdfs dfs -ls seaweedfs://cdh3:8888/
# 返回
Found 3 items
drwxrwx---   -                        0 2018-12-13 10:29 seaweedfs://cdh3:8888/path
-rw-r--r--   1 dpnice dpnice         15 2018-12-13 14:41 seaweedfs://cdh3:8888/slaves
drwxrwx---   -                        0 2018-12-13 14:17 seaweedfs://cdh3:8888/weed

API:

Master Server API:

分配文件密钥:

# Basic Usage:
curl http://localhost:9333/dir/assign
# To assign with a specific replication type:
curl "http://localhost:9333/dir/assign?replication=001"
# To specify how many file ids to reserve
curl "http://localhost:9333/dir/assign?count=5"
# To assign a specific data center
curl "http://localhost:9333/dir/assign?dataCenter=dc1"

查找volume的地址:

curl "http://localhost:9333/dir/lookup?volumeId=3&pretty=y"
{
  "locations": [
    {
      "publicUrl": "localhost:8080",
      "url": "localhost:8080"
    }
  ]
}
# Other usages:
# You can actually use the file id to lookup, if you are lazy to parse the file id.
curl "http://localhost:9333/dir/lookup?volumeId=3,01637037d6"
# If you know the collection, specify it since it will be a little faster
curl "http://localhost:9333/dir/lookup?volumeId=3&collection=turbo"

垃圾回收:

curl "http://localhost:9333/vol/vacuum"
curl "http://localhost:9333/vol/vacuum?garbageThreshold=0.4"

垃圾回收将创建.dat和.idx文件的副本,跳过已删除的文件,保留副本删除原文件。
garbageThreshold 是可选的。

预分配卷:

# specify a specific replication
curl "http://localhost:9333/vol/grow?replication=000&count=4"
{"count":4}
# specify a collection
curl "http://localhost:9333/vol/grow?collection=turbo&count=4"
# specify data center
curl "http://localhost:9333/vol/grow?dataCenter=dc1&count=4"
# specify ttl
curl "http://localhost:9333/vol/grow?ttl=5d&count=4"

count代表生成几个空volume

删除集合:

# delete a collection
curl "http://localhost:9333/col/delete?collection=benchmark&pretty=y"

检查系统状态:

# 集群状态
curl "http://10.0.2.15:9333/cluster/status?pretty=y"
{
"IsLeader": true,
"Leader": "10.0.2.15:9333",
"Peers": [
"10.0.2.15:9334",
"10.0.2.15:9335"
    ]
}
# 拓扑状态
curl "http://localhost:9333/dir/status?pretty=y"
{
"Topology": {
"DataCenters": [
    {
    "Free": 567,
    "Id": "dc1",
    "Max": 600,
    "Racks": [
      {
        "DataNodes": [
          {
            "Free": 190,
            "Max": 200,
            "PublicUrl": "cdh2",
            "Url": "cdh2:9222",
            "Volumes": 10
          },
          {
            "Free": 190,
            "Max": 200,
            "PublicUrl": "cdh1",
            "Url": "cdh1:9222",
            "Volumes": 10
          },
          {
            "Free": 187,
            "Max": 200,
            "PublicUrl": "cdh3",
            "Url": "cdh3:9222",
            "Volumes": 13
          }
        ],
        "Free": 567,
        "Id": "rack1",
        "Max": 600
      }
    ]
  }
],
"Free": 567,
"Max": 600,
"layouts": [
  {
    "collection": "",
    "replication": "001",
    "ttl": "5d",
    "writables": [
      15,
      16,
      17,
      18
    ]
  },
  {
    "collection": "",
    "replication": "000",
    "ttl": "",
    "writables": [
      13,
      14,
      10,
      11,
      12,
      19,
      20,
      21,
      22
    ]
  },
  {
    "collection": "",
    "replication": "001",
    "ttl": "",
    "writables": [
      6,
      3,
      7,
      2,
      4,
      5
    ]
  },
  {
    "collection": "turbo",
    "replication": "001",
    "ttl": "",
    "writables": [
      8,
      9
    ]
  }
]
},
"Version": "1.11"
}

Volume Server API:

# 上传文件
curl -F file=@/home/chris/myphoto.jpg http://127.0.0.1:8080/3,01637037d6

前置需要向master分配文件的key

# 直接上传文件自动分配key( master的端口)
curl -F file=@/home/chris/myphoto.jpg http://localhost:9333/submit
{"fid":"3,01fbe0dc6f1f38","fileName":"myphoto.jpg","fileUrl":"localhost:8080/3,01fbe0dc6f1f38","size":68231}
# 删除文件
curl -X DELETE http://127.0.0.1:8080/3,01637037d6
# 查看分块大文件的列表文件内容
curl http://127.0.0.1:8080/3,01637037d6?cm=false
# 检查 Volume Server 的状态
curl "http://localhost:8080/status?pretty=y"
{
  "Version": "0.34",
  "Volumes": [
    {
      "Id": 1,
      "Size": 1319688,
      "RepType": "000",
      "Version": 2,
      "FileCount": 276,
      "DeleteCount": 0,
      "DeletedByteCount": 0,
      "ReadOnly": false
    },
    {
      "Id": 2,
      "Size": 1040962,
      "RepType": "000",
      "Version": 2,
      "FileCount": 291,
      "DeleteCount": 0,
      "DeletedByteCount": 0,
      "ReadOnly": false
    },
    {
      "Id": 3,
      "Size": 1486334,
      "RepType": "000",
      "Version": 2,
      "FileCount": 301,
      "DeleteCount": 2,
      "DeletedByteCount": 0,
      "ReadOnly": false
    },
    {
      "Id": 4,
      "Size": 8953592,
      "RepType": "000",
      "Version": 2,
      "FileCount": 320,
      "DeleteCount": 2,
      "DeletedByteCount": 0,
      "ReadOnly": false
    },
    {
      "Id": 5,
      "Size": 70815851,
      "RepType": "000",
      "Version": 2,
      "FileCount": 309,
      "DeleteCount": 1,
      "DeletedByteCount": 0,
      "ReadOnly": false
    },
    {
      "Id": 6,
      "Size": 1483131,
      "RepType": "000",
      "Version": 2,
      "FileCount": 301,
      "DeleteCount": 1,
      "DeletedByteCount": 0,
      "ReadOnly": false
    },
    {
      "Id": 7,
      "Size": 46797832,
      "RepType": "000",
      "Version": 2,
      "FileCount": 292,
      "DeleteCount": 0,
      "DeletedByteCount": 0,
      "ReadOnly": false
    }
  ]
}

Filer Server API:

# Basic Usage:
# create or overwrite the file, the directories /path/to will be automatically created
curl -F [email protected] "http://localhost:8888/path/to"
{"name":"report.js","size":866,"fid":"7,0254f1f3fd","url":"http://localhost:8081/7,0254f1f3fd"}
# get the file content
curl  "http://localhost:8888/javascript/report.js"   
# upload the file with a different name
curl -F [email protected] "http://localhost:8888/javascript/new_name.js"    
{"name":"report.js","size":866,"fid":"3,034389657e","url":"http://localhost:8081/3,034389657e"}
# list all files under /javascript/
curl  -H "Accept: application/json" "http://localhost:8888/javascript/?pretty=y"           
{
  "Directory": "/javascript/",
  "Files": [
    {
      "name": "new_name.js",
      "fid": "3,034389657e"
    },
    {
      "name": "report.js",
      "fid": "7,0254f1f3fd"
    }
  ],
  "Subdirectories": null
}
# 分页查看文件列表
curl  "http://localhost:8888/javascript/?pretty=y&lastFileName=new_name.js&limit=2"
{
  "Directory": "/javascript/",
  "Files": [
    {
      "name": "report.js",
      "fid": "7,0254f1f3fd"
    }
  ]
}
# 删除文件
curl -X DELETE "http://localhost:8888/javascript/report.js"

你可能感兴趣的:(SeaweedFS)