AWS request data with CLI and SDK/通过命令行界面和SDK从亚马逊云服务下载数据

摘要:AWS CLI提供了基本而且灵活的S3(AmazonSimple Storage Service)数据获取方式,但是高级的数据获取方式比如续传需要用户自己实现。基本的数据获取可以使用CLI命令,但是高级的实现需要依赖不同语言的API,比如Java,C#等等。

1 AWS CLI request data with s3api get-object

cmd: aws s3api get-object

https://docs.aws.amazon.com/cli/latest/reference/s3api/get-object.html

 

The example below demonstrates the use of--range to download a specific byte range from an object. Note the byte ranges needs to be prefixed with "bytes=":

awss3api get-object --bucket text-content --key dir/my_data --rangebytes=8888-9999 my_data_range

 

Synopsis

get-object

--bucket

[--if-match ]

[--if-modified-since ]

[--if-none-match ]

[--if-unmodified-since ]

--key

[--range ]

[--response-cache-control ]

[--response-content-disposition]

[--response-content-encoding ]

[--response-content-language ]

[--response-content-type ]

[--response-expires ]

[--version-id ]

[--sse-customer-algorithm ]

[--sse-customer-key ]

[--sse-customer-key-md5 ]

[--request-payer ]

[--part-number ]

outfile

 

Description (dou):

Basiccmd:

awss3api get-object --bucket text-content --key dir/my_data my_data_range

--bucket (string): data bucket, i.e. ownerdefined data pool

–key (string): full dir of requested datain bucket

Outfile: output file name to be saved, userdefined

 

Partially download:

Method 1:

aws s3apiget-object --bucket text-content --key dir/my_data --range bytes=8888-9999my_data_range

--range (string): Downloads the specifiedrange bytes of an object.

Method 2:

aws s3apiget-object --bucket text-content --key dir/my_data -- part-number 1 my_data_range

--part-number (integer) Part number of the object being read. This is a positive integer between 1 and 10,000. Effectively performs a 'ranged' GET request for the part specified. Useful for downloading just a part of an object.

 

For more AWS CLI command reference:

https://docs.aws.amazon.com/cli/latest/reference/

2 Request data with SDK (e.g. C#)

2.1 Getting Started with the AWS SDK for .NET

https://docs.aws.amazon.com/sdk-for-net/v3/developer-guide/net-dg-setup.html

  • Create an AWS Account and Credentials
    • Create a profile and save it to the .NET credentials file

var options = new CredentialProfileOptions

{

    AccessKey = "access_key",

    SecretKey = "secret_key"

};

var profile = newAmazon.Runtime.CredentialManagement.CredentialProfile("basic_profile",options);

profile.Region =RegionEndpoint.USWest1;

var netSDKFile = new NetSDKCredentialsFile();

netSDKFile.RegisterProfile(profile);

TheRegisterProfile method is used to register a new profile. Your applicationtypically calls this method only once for each profile.

  • Install the .NET Development Environment
    • Microsoft .NET Framework 3.5 or later
    • Microsoft Visual Studio 2010 or later
  • Install AWSSDK Assemblies
    • Go to AWS SDK for .NET (this is for VS2013+, for VS2010-2012 using this).
    • In the Downloads section, choose Download MSI Installer to download the installer.
    • To start installation, run the downloaded installer and follow the on-screen instructions.
  • Start a New Project
    • Create a new project from Template

(1号坑:这里要在VS新建项目选择AWS的模板,而不是新建普通项目添加相应的dll)

(2号坑:新建的项目编译错误找不到命名空间Amazon,要查看项目.NET版本,手动选择AWS SDK安装目录添加对应版本的dll,目录一般是Program File (X86))

  • Platforms Supported by the AWS SDK for .NET

2.2 Continued request code

         这里主要考虑下载的文件比较大时,网络不稳定,下载一会就断掉就比较坑。考虑利用分块的方法持续下载。

(1)VS 2013 (my). New a project with template AWS S3 sample.

(2)Configure profile.

Press Ctrl+K, and then press A.

Choose the New (or Edit) Account Profile icon to the right of the Profile list.

https://docs.aws.amazon.com/toolkit-for-visual-studio/latest/user-guide/credentials.html

 

(3)Set region

The regionrefers the location of bucket. The region must right or cause an error “Thebucket you are attempting to access must be addressed using the specifiedendpoint. Please send all future requests to this endpoint.”. (3号坑:必须指定正确的region也即endpoint

Endpointscurrently do not support cross-region requests—ensure that you create yourendpoint in the same region as your bucket. You can find the location of yourbucket by using the Amazon S3 console, or by using the get-bucket-location command.

Detail: https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-endpoints-s3.html

Synopsis

  get-bucket-location

--bucket

[--cli-input-json]

[--generate-cli-skeleton]

https://docs.aws.amazon.com/cli/latest/reference/s3api/get-bucket-location.html

Example:

The followingcommand retrieves the location constraint for a bucket named my-bucket, if aconstraint exists:

aws s3api get-bucket-location --bucket my-bucket

Output:

{

    "LocationConstraint":"us-west-2"

}

For my case, Iget null.(4号坑:us-east-1, i.e. US East (N. Virginia), 获取的region是null

aws s3api get-bucket-location --bucket spacenet-dataset

Output:

{

    "LocationConstraint":"null"

}

According to the servicedocumentation, S3 returns a null location if the bucket is in the US East(N. Virginia) region. So this is expected behavior. If you are trying to use such a bucket, you need to construct the client with the RegionEndpoint.USEast1 region.

         (5号坑:VS里通过选择设置Region无效,通过修改App.config来修改Region

App.config

Other way to selectAWS region (endpoint):

https://docs.aws.amazon.com/sdk-for-net/v3/developer-guide/net-dg-region-selection.html

other: China(Beijing) Region Endpoints: cn-north-1

(4) Modify code

My cmd: awss3api get-object --bucket spacenet-dataset --keySpaceNet_Roads_Competition/AOI_2_Vegas_Roads_Train.tar.gz --request-payerrequester --part-number 1 AOI_2_Vegas_Roads_Train.tar.gz.1

Code reference:

https://docs.aws.amazon.com/AmazonS3/latest/dev/AuthUsingAcctOrUserCredDotNet.html

 

// In Main()

bucketName ="spacenet-dataset";

keyName ="SpaceNet_Roads_Competition/AOI_3_Paris_Roads_Test_Public.tar.gz";

outPath ="E:\\data\\";

RP =RequestPayer.Requester;

// loop

for (PartNum =17; PartNum<10001; PartNum++)

{

bool flag =false;

do

{

    flag = ReadingAnObject();

}

while (flag ==false);

}

// update ReadingAnObject

static boolReadingAnObject()

{

    bool flag = false;

    try

    {

GetObjectRequestrequest = new GetObjectRequest()

{

    BucketName = bucketName,

    Key = keyName,

    RequestPayer = RP,

    PartNumber = PartNum

};

 

using(GetObjectResponse response = client.GetObject(request))

{

    string title =response.Metadata["x-amz-meta-title"];

    Console.WriteLine("The object's titleis {0}", title);

    // string dest =Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop),keyName);

    string dest = Path.Combine(outPath,keyName) + "." + PartNum.ToString();

  

    // if (!File.Exists(dest))

    {

response.WriteResponseStreamToFile(dest);

    }

}

flag = true;

    }

    catch (AmazonS3Exception amazonS3Exception)

    {

if(amazonS3Exception.ErrorCode != null &&

   (amazonS3Exception.ErrorCode.Equals("InvalidAccessKeyId") ||

   amazonS3Exception.ErrorCode.Equals("InvalidSecurity")))

{

    Console.WriteLine("Please check theprovided AWS Credentials.");

    Console.WriteLine("If you haven'tsigned up for Amazon S3, please visit http://aws.amazon.com/s3");

}

else

{

    Console.WriteLine("An error occurredwith the message '{0}' when reading an object",amazonS3Exception.Message);

}

    }

    return flag;

}

 

 

你可能感兴趣的:(AI/ML/DL)