引言:
Xarray是一个性能出众的张量操作库,通常用于多通道的时间序列信号处理(比如传感器信号)。通常,在处理此类数据时,我认为您经常使用numpy的np.ndarray。但是,由于np.ndarray是一个简单的矩阵(或张量),因此需要保留其他的关键字信息,来避免在复杂高维数据处理时可能出现的复杂的索引操作导致的错误。
”However, real-world datasets are usually more than just raw numbers; they have labels which encode information about how the array values map to locations in space, time, etc.“
这个官网介绍解决的numpy用户的痛点问题也很贴切…
import xarray as xr
data = xr.DataArray(np.random.randn(2, 3))
print(data)
输出打印信息如下:
<xarray.DataArray (dim_0: 2, dim_1: 3)>
array([[-0.06620569, -0.01929077, 1.44195805],
[-0.14480076, 0.97707183, -0.22340199]])
Dimensions without coordinates: dim_0, dim_1
import xarray as xr
import numpy as np
# Create a 2D numpy array with temperature data
temp_data = np.array([[25.0, 26.2, 24.8], [28.5, 27.6, 26.4], [23.7, 25.1, 26.8]])
# Create coordinate arrays for time and value dimensions
time_coords = np.array(['2023-08-01', '2023-08-02', '2023-08-03'])
value_coords = np.array(['V1', 'V2', 'V3'])
# Create an xarray DataArray object with the temperature data and coordinates
temp_data_array = xr.DataArray(temp_data, dims=('time', 'value'), coords={'time': time_coords, 'value': value_coords})
# Print the DataArray object
print(temp_data_array)
结果如下:
array([[25. , 26.2, 24.8],
[28.5, 27.6, 26.4],
[23.7, 25.1, 26.8]])
Coordinates:
* time (time) <U10 '2023-08-01' '2023-08-02' '2023-08-03'
* value (value) <U2 'V1' 'V2' 'V3'
一个对象,其中包含多个xr.DataArrays。它可以有多个轴,并保留有关每个数据对应于哪个轴的信息。
还是以天气数据为例子,以不同经度纬度地方的天气数据为 待存储数据
import xarray as xr
import numpy as np
# Create a 3D numpy array with temperature data
# Generate a tensor with shape [3, 4, 5] and elements from [20, 30]
tensor = np.random.uniform(low=20.0, high=30.0, size=(3, 4, 5))
# Round the tensor to one decimal point
temp_data = np.round(tensor, decimals=1)
# Create coordinate arrays for time, latitude, and longitude dimensions
time_coords = np.array(['2023-08-01', '2023-08-02', '2023-08-03'])
lat_coords = np.array([40.0, 41.0, 42.0, 43.0])
lon_coords = np.array([-110.0, -109.0, -108.0, -107.0, -106.0])
# Create an xarray DataArray object with the temperature data and associated coordinates
temp_data_array = xr.DataArray(temp_data, dims=('time', 'lat', 'lon'), coords={'time': time_coords, 'lat': lat_coords, 'lon': lon_coords})
# Create a new xarray Dataset object with the temperature data array and associated metadata
temp_dataset = xr.Dataset({'temperature': temp_data_array})
# Print the Dataset object
print(temp_dataset)
结果如下:
Dimensions: (time: 3, lat: 4, lon: 5)
Coordinates:
* time (time)
在上面这个例子中,我们就成功存储了一个张量数据,每个通道是某一天不同经纬度地方的天气数据:
print(temp_data[0])
[[22.1 27.8 23.8 24.9 26.8]
[25.7 24.7 26.3 22.3 24.3]
[21.9 27. 23.3 29.7 24.4]
[21.6 27.4 28.1 23.2 27.7]]
【1】Xarray官网