seaweedfs

{
FastDFS
它只能通过专有API对文件进行存取访问,不支持POSIX接口方式,不能mount使用。

Seaweedfs (https://github.com/chrislusf/seaweedfs)
Seaweedfs 的设计原理是基于 Facebook 的一篇图片存储系统的论文 Facebook-Haystack 说到这个,毛剑也在依这个论文写bfs, 正在开发中,可以跟看从小到大一步步完善的过程。FastDFS与Seaweedfs这类文件系统,就是为了存海量小文件而专门设计的。非常适用做APP的后台文件存储系统。用哪个看个人选择,要我,我会偏向Seaweedfs一些,
七牛
它的文件hash值算法是公开的,可以一看. https://github.com/qiniu/qetag 选方案时,看个人考量了。

目前业界最流行的PB级的开源的解决方案,主要有两种: Swift与Ceph。

Swift
(Python)作为OpenStack组件之一,非常多的应用,记得有人说百度网盘底层也是用的这个.
Ceph
(主要用C/C++)则这一二年突然火了,很多公司(像雅虎)用Ceph替换了之前的方案。

}

{
使用seaweedfs的原因
https://blog.csdn.net/github_37459410/article/details/81141365

./weed -h # to check available options

The commands are:

benchmark   benchmark on writing millions of files and read out
backup      incrementally backup a volume to local folder   备份
compact     run weed tool compact on volume file   压缩
filer.copy  copy one or a list of files to a filer folder
fix         run weed tool fix on index file if corrupted
server      start a server, including volume server, and automatically elect a master server
master      start a master server
filer       start a file server that points to a master server
upload      upload one or a list of files
download    download files by file id
shell       run interactive commands, now just echo
version     print SeaweedFS version
volume      start a volume server
export      list or export files from one volume data file
mount       mount weed filer to a directory as file system in userspace(FUSE)

Use “weed help [command]” for more information about a command.

For Logging, use “weed [logging_options] [command]”. The logging options are:
-alsologtostderr
log to standard error as well as files (default true)
-log_backtrace_at value
when logging hits line file:N, emit a stack trace
-log_dir string
If non-empty, write log files in this directory
-logtostderr
log to standard error instead of files
-stderrthreshold value
logs at or above this threshold go to stderr
-v value
log level for V logs
-vmodule value
comma-separated list of pattern=N settings for file-filtered logging
}

master
{
weed master -h 提供卷=>位置映射服务和文件id的序号
Example: weed master -port=9333
Default Usage:
-cpuprofile string
cpu profile output file
-defaultReplication string
Default replication type if not specified. (default “000”)
-garbageThreshold string
threshold to vacuum and reclaim spaces (default “0.3”)
-ip string
master | address (default “localhost”)
-ip.bind string
ip address to bind to (default “0.0.0.0”)
-maxCpu int
maximum number of CPUs. 0 means all available CPUs
-mdir string 选项用于配置保存生成的序列文件id的文件夹
data directory to store meta data (default “C:\Users\wangz\AppData\Local\Temp”)
-peers string
other master nodes in comma separated ip:port list, example: 127.0.0.1:9093,127.0.0.1:9094
-port int
http listen port (default 9333)
-pulseSeconds int
number of seconds between heartbeats (default 5)
-secure.secret string
secret to encrypt Json Web Token(JWT)
-volumePreallocate
Preallocate disk space for volumes.
-volumeSizeLimitMB uint
Master stops directing writes to oversized volumes. (default 30000)
-whiteList string
comma separated Ip addresses having write permission. No limit if empty.
Description:
start a master server to provide volume=>location mapping service
and sequence number of file ids

}


volume
{
weed volume -h 提供存储空间

Example: weed volume -port=8080 -dir=/tmp -max=5 -ip=server_name -mserver=localhost:9333
Default Usage:
-cache.enable
direct cache instead of OS cache, cost more memory.
-dataCenter string
current volume server’s data center name
-dir string
directories to store data files. dir[,dir]… (default “C:\Users\wangz\AppData\Local\Temp”)
-idleTimeout int
connection idle seconds (default 30)
-images.fix.orientation
Adjust jpg orientation when uploading. (default true)
-index string
Choose [memory|leveldb|boltdb|btree] mode for memory~performance balance. (default “memory”)
-ip string
ip or server name
-ip.bind string
ip address to bind to (default “0.0.0.0”)
-max string
maximum numbers of volumes, count[,count]… (default “7”)
-maxCpu int
maximum number of CPUs. 0 means all available CPUs
-mserver string
master server location (default “localhost:9333”)
-port int
http listen port (default 8080)
-port.public int
port opened to public
-publicUrl string
Publicly accessible address
-pulseSeconds int
number of seconds between heartbeats, must be smaller than or equal to the master’s setting (default 5)
-rack string
current volume server’s rack name
-read.redirect
Redirect moved or non-local volumes. (default true)
-whiteList string
comma separated Ip addresses having write permission. No limit if empty.
Description:
start a volume server to provide storage spaces

}


server
{
weed server -h 这是作为一种方便的方式来启动卷服务器和主服务器。这些服务器与分别启动它们完全相同。
Example: weed server -port=8080 -dir=/tmp -volume.max=5 -ip=server_name
Default Usage:
-cpuprofile string
cpu profile output file
-dataCenter string
current volume server’s data center name
-dir string
directories to store data files. dir[,dir]… (default “C:\Users\wangz\AppData\Local\Temp”)
-filer
whether to start filer
-filer.cassandra.keyspace string
keyspace of the cassandra server (default “seaweed”)
-filer.cassandra.server string
host[:port] of the cassandra server
-filer.collection string
all data will be stored in this collection
-filer.confFile string
json encoded filer conf file
-filer.defaultReplicaPlacement string
Default replication type if not specified during runtime.
-filer.dir string
directory to store meta data, default to a ‘filer’ sub directory of what -dir is specified
-filer.disableDirListing
turn off directory listing
-filer.master string
default to current master server
-filer.maxMB int
split files larger than the limit
-filer.port int
filer server http listen port (default 8888)
-filer.port.public int
filer server public http listen port
-filer.redirectOnRead
whether proxy or redirect to volume server during file GET request
-filer.redis.database int
the database on the redis server
-filer.redis.password string
redis password in clear text
-filer.redis.server string
host:port of the redis server, e.g., 127.0.0.1:6379
-garbageThreshold string
threshold to vacuum and reclaim spaces (default “0.3”)
-idleTimeout int
connection idle seconds (default 30)
-ip string
ip or server name (default “localhost”)
-ip.bind string
ip address to bind to (default “0.0.0.0”)
-master.defaultReplicaPlacement string
Default replication type if not specified. (default “000”)
-master.dir string
data directory to store meta data, default to same as -dir specified
-master.peers string
other master nodes in comma separated ip:masterPort list
-master.port int
master server http listen port (default 9333)
-master.volumePreallocate
Preallocate disk space for volumes.
-master.volumeSizeLimitMB uint
Master stops directing writes to oversized volumes. (default 30000)
-maxCpu int
maximum number of CPUs. 0 means all available CPUs
-pulseSeconds int
number of seconds between heartbeats (default 5)
-rack string
current volume server’s rack name
-secure.secret string
secret to encrypt Json Web Token(JWT)
-volume.cache.enable
direct cache instead of OS cache, cost more memory.
-volume.images.fix.orientation
Adjust jpg orientation when uploading. (default true)
-volume.index string
Choose [memory|leveldb|boltdb|btree] mode for memory~performance balance. (default “memory”)
-volume.max string
maximum numbers of volumes, count[,count]… (default “7”)
-volume.port int
volume server http listen port (default 8080)
-volume.port.public int
volume server public port
-volume.publicUrl string
publicly accessible address
-volume.read.redirect
Redirect moved or non-local volumes. (default true)
-whiteList string
comma separated Ip addresses having write permission. No limit if empty.
Description:
start both a volume server to provide storage spaces
and a master server to provide volume=>location mapping service and sequence number of file ids

This is provided as a convenient way to start both volume server and master server.
The servers are exactly the same as starting them separately.

So other volume servers can use this embedded master server also.

Optionally, one filer server can be started. Logically, filer servers should not be in a cluster.
They run with meta data on disk, not shared. So each filer server is different.
}


master
{
Master Server API

您可以将&pretty=y附加到任何HTTP API,以查看格式化的json输出。
1.分配一个file key
curl http://localhost:9333/dir/assign
指定复制类型:?replication=001
指定要保留多少文件id?count=5
指定的数据中心 ?dataCenter=dc1

2.检测volume
curl “http://localhost:9333/dir/lookup?volumeId=3&pretty=y”
指定集合会让速度更快 curl “http://localhost:9333/dir/lookup?volumeId=3&collection=turbo”

3.强制垃圾回收
删除操作不会马上清空磁盘,要想实时清空未使用的空间,请执行
curl “http://localhost:9333/vol/vacuum”
curl “http://localhost:9333/vol/vacuum?garbageThreshold=0.4” garbageThreshold=0.4是可选的,不会更改默认阈值。您可以使用不同的默认garbageThreshold启动卷主。

4.预先分配卷
一个卷服务器一次只写一个卷。如果需要增加并发性,可以预先分配大量卷。下面是例子。您还可以组合所有不同的选项。
这会生成4个空卷:

specify a specific replication

curl “http://localhost:9333/vol/grow?replication=000&count=4”
{“count”:4}

specify a collection

curl “http://localhost:9333/vol/grow?collection=turbo&count=4”

specify data center

curl “http://localhost:9333/vol/grow?dataCenter=dc1&count=4”

specify ttl

curl “http://localhost:9333/vol/grow?ttl=5d&count=4”

5.Delete Collection
curl “http://localhost:9333/col/delete?collection=benchmark&pretty=y”

6.Check System Status
curl “http://10.0.2.15:9333/cluster/status?pretty=y”
curl “http://localhost:9333/dir/status?pretty=y”

}


Volume
{
Volume Server API

1.上传文件
curl -F file=@/home/chris/myphoto.jpg http://127.0.0.1:8080/3,01637037d6
{“size”: 43234}

2.直接上传文件
这个API只是为了方便。主服务器将获得一个文件id,并将文件存储到正确的卷服务器。它是一个方便的API,在分配文件id时不支持不同的参数(或者您可以添加支持并发送推送请求)。
curl -F file=@/home/chris/myphoto.jpg http://localhost:9333/submit
{“fid”:“3,01fbe0dc6f1f38”,“fileName”:“myphoto.jpg”,“fileUrl”:“localhost:8080/3,01fbe0dc6f1f38”,“size”:68231}

3.删除文件
curl -X DELETE http://127.0.0.1:8080/3,01637037d6

4.查看大块大文件的清单文件内容
curl http://127.0.0.1:8080/3,01637037d6?cm=false

5.Check Volume Server Status
curl “http://localhost:8080/status?pretty=y”

}


Filer
{
Filer Server API

Basic Usage:

curl -F [email protected] “http://localhost:8888/javascript/”
{“name”:“report.js”,“size”:866,“fid”:“7,0254f1f3fd”,“url”:“http://localhost:8081/7,0254f1f3fd”}
curl “http://localhost:8888/javascript/report.js” # get the file content

curl -F [email protected] “http://localhost:8888/javascript/new_name.js” # upload the file with a different name
{“name”:“report.js”,“size”:866,“fid”:“3,034389657e”,“url”:“http://localhost:8081/3,034389657e”}
curl -H “Accept: application/json” “http://localhost:8888/javascript/?pretty=y” # list all files under /javascript/
{
“Directory”: “/javascript/”,
“Files”: [
{
“name”: “new_name.js”,
“fid”: “3,034389657e”
},
{
“name”: “report.js”,
“fid”: “7,0254f1f3fd”
}
],
“Subdirectories”: null
}

1.列出目录下所有文件
curl “http://localhost:8888/javascript/?pretty=y&lastFileName=new_name.js&limit=2”
{
“Directory”: “/javascript/”,
“Files”: [
{
“name”: “report.js”,
“fid”: “7,0254f1f3fd”
}
]
}

curl -X DELETE “http://localhost:8888/assets/report.js”

}


客户端 https://github.com/linxGnu/goseaweedfs


filer
C:\Users\wangz>weed filer -h
Example: weed filer -port=8888 -dir=/tmp -master=ip:port
Default Usage:
-cassandra.keyspace string
keyspace of the cassandra server (default “seaweed”)
-cassandra.server string
host[:port] of the cassandra server
-collection string
all data will be stored in this collection
-confFile string
json encoded filer conf file
-defaultReplicaPlacement string
default replication type if not specified (default “000”)
-dir string
directory to store meta data (default “C:\Users\wangz\AppData\Local\Temp”)
-disableDirListing
turn off directory listing
-ip string
filer server http listen ip address
-master string
master server location (default “localhost:9333”)
-maxMB int
split files larger than the limit
-port int
filer server http listen port (default 8888)
-port.public int
port opened to public
-redirectOnRead
whether proxy or redirect to volume server during file GET request
-redis.database int
the database on the redis server
-redis.password string
password in clear text
-redis.server string
host:port of the redis server, e.g., 127.0.0.1:6379
-secure.secret string
secret to encrypt Json Web Token(JWT)
Description:
start a file server which accepts REST operation for any files.

    //create or overwrite the file, the directories /path/to will be automatically created
    POST /path/to/file
    //get the file content
    GET /path/to/file
    //create or overwrite the file, the filename in the multipart request will be used
    POST /path/to/
    //return a json format subdirectory and files listing
    GET /path/to/

Current mapping metadata store is local embedded leveldb.
It should be highly scalable to hundreds of millions of files on a modest machine.

Future we will ensure it can avoid of being SPOF.

Filer Setup
weed scaffold -config filer -output="."
查看
weed scaffold filer
[leveldb]
enabled = true
dir = “.” # directory to store level db files

weed filer

POST a file and read it back

curl -F "[email protected]" “http://localhost:8888/path/to/sources/”
curl “http://localhost:8888/path/to/sources/README.md”

POST a file with a new name and read it back

curl -F “filename=@Makefile” “http://localhost:8888/path/to/sources/new_name”
curl “http://localhost:8888/path/to/sources/new_name”

list sub folders and files

visit “http://localhost:8888/path/to/sources/”

if lots of files under this folder, here is a way to efficiently paginate through all of them

visit “http://localhost:8888/path/to/sources/?lastFileName=abc.txt&limit=50”

你可能感兴趣的:(seaweedfs)