tensorflow imagenet数据集转化

脚本为 https://github.com/tensorflow/models/tree/master/research/slim/datasets/download_and_convert_imagenet.sh

set -e

if [ -z "$1" ]; then
  echo "usage download_and_convert_imagenet.sh [data dir]"
  exit
fi

# Create the output and temporary directories. 
# 创建输出和临时目录。
DATA_DIR="${1%/}"
SCRATCH_DIR="${DATA_DIR}/raw-data/"
mkdir -p "${DATA_DIR}"
mkdir -p "${SCRATCH_DIR}"
WORK_DIR="$0.runfiles/__main__"

# Download the ImageNet data. 
# 下载ImageNet数据。
LABELS_FILE="${WORK_DIR}/datasets/imagenet_lsvrc_2015_synsets.txt"
DOWNLOAD_SCRIPT="${WORK_DIR}/datasets/download_imagenet.sh"
"${DOWNLOAD_SCRIPT}" "${SCRATCH_DIR}" "${LABELS_FILE}"

# Note the locations of the train and validation data.
# 请注意训练数据集和验证数据集的位置。
TRAIN_DIRECTORY="${SCRATCH_DIR}train/"
VALIDATION_DIRECTORY="${SCRATCH_DIR}validation/"

# Preprocess the validation data by moving the images into the appropriate
# sub-directory based on the label (synset) of the image
# 通过将图像移动到基于图像的标签(synset)的适当的子目录来预处理验证数据。
echo "Organizing the validation data into sub-directories."
PREPROCESS_VAL_SCRIPT="${WORK_DIR}/datasets/preprocess_imagenet_validation_data.py"
VAL_LABELS_FILE="${WORK_DIR}/datasets/imagenet_2012_validation_synset_labels.txt"

"${PREPROCESS_VAL_SCRIPT}" "${VALIDATION_DIRECTORY}" "${VAL_LABELS_FILE}"

# Convert the XML files for bounding box annotations into a single CSV.
# 将边界框注释的XML文件转换为单个CSV。
echo "Extracting bounding box information from XML."
BOUNDING_BOX_SCRIPT="${WORK_DIR}/datasets/process_bounding_boxes.py"
BOUNDING_BOX_FILE="${SCRATCH_DIR}/imagenet_2012_bounding_boxes.csv"
BOUNDING_BOX_DIR="${SCRATCH_DIR}bounding_boxes/"

"${BOUNDING_BOX_SCRIPT}" "${BOUNDING_BOX_DIR}" "${LABELS_FILE}" \
 | sort >"${BOUNDING_BOX_FILE}"
echo "Finished downloading and preprocessing the ImageNet data."

# Build the TFRecords version of the ImageNet data.
# 构建ImageNet数据的TFRecords版本。
BUILD_SCRIPT="${WORK_DIR}/build_imagenet_data"
OUTPUT_DIRECTORY="${DATA_DIR}"
IMAGENET_METADATA_FILE="${WORK_DIR}/datasets/imagenet_metadata.txt"

"${BUILD_SCRIPT}" \
  --train_directory="${TRAIN_DIRECTORY}" \
  --validation_directory="${VALIDATION_DIRECTORY}" \
  --output_directory="${OUTPUT_DIRECTORY}" \
  --imagenet_metadata_file="${IMAGENET_METADATA_FILE}" \
  --labels_file="${LABELS_FILE}" \
  --bounding_box_file="${BOUNDING_BOX_FILE}"



其中涉及到下载数据的脚本 download_imagenet.sh

set -e

if [ "x$IMAGENET_ACCESS_KEY" == x -o "x$IMAGENET_USERNAME" == x ]; then
  cat <


最后在网上看到一位博主写了将图像转化为TFRecords的过程(已经下载了数据集)

地址是:http://blog.csdn.net/hustlx/article/details/76585843

首先工作目录为:models/inception/inception/data   将imagenet的原始数据ILSVRC2012_img_train.tar和ILSVRC2012_img_val.tar放在这个目录下,

然后可运行如下脚本:

mkdir -p ILSVRC2012  
mkdir -p ILSVRC2012/raw-data  
mkdir -p ILSVRC2012/raw-data/imagenet-data  
mkdir -p ILSVRC2012/raw-data/imagenet-data/bounding_boxes  

mv ILSVRC2012_bbox_train_v2.tar.gz ILSVRC2012/raw-data/imagenet-data/bounding_boxes/  
tar xvf ILSVRC2012/raw-data/imagenet-data/bounding_boxes/ILSVRC2012_bbox_train_v2.tar.gz -C ILSVRC2012/raw-data/imagenet-data/bounding_boxes/  

NUM_XML=$(ls -1 ILSVRC2012/raw-data/imagenet-data/bounding_boxes/* | wc -l)  
echo "Identified ${NUM_XML} bounding box annotations."  
mkdir -p ILSVRC2012/raw-data/imagenet-data/validation/  
tar xf ILSVRC2012_img_val.tar -C ILSVRC2012/raw-data/imagenet-data/validation/  

mkdir -p ILSVRC2012/raw-data/imagenet-data/train/  
mv ILSVRC2012_img_train.tar ILSVRC2012/raw-data/imagenet-data/train/ && cd ILSVRC2012/raw-data/imagenet-data/train/  
tar -xvf ILSVRC2012_img_train.tar && rm -f ILSVRC2012_img_train.tar   

find . -name "*.tar" | while read NAE ; do mkdir -p "${NAE%.tar}"; tar -xvf "${NAE}" -C "${NAE%.tar}"; rm -f "${NAE}"; done   
cd .. && cd .. && cd .. && cd ..  
  
python preprocess_imagenet_validation_data.py ILSVRC2012/raw-data/imagenet-data/validation/ imagenet_2012_validation_synset_labels.txt  

#这一步应将 process_bounding_boxes 里的 JPEG 改为 jpeg  
python process_bounding_boxes.py ILSVRC2012/raw-data/imagenet-data/bounding_boxes/ imagenet_lsvrc_2015_synsets.txt | sort > ILSVRC2012/raw-data/imagenet_2012_bounding_boxes.csv  

#这一步需要将 build_imagenet_data.py 里面的 JPEG 换成 jpeg  
python build_imagenet_data.py --train_directory=ILSVRC2012/raw-data/imagenet-data/train/ --validation_directory=ILSVRC2012/raw-data/imagenet-data/validation/ --output_directory=ILSVRC2012/ --imagenet_metadata_file=imagenet_metadata.txt --labels_file=imagenet_lsvrc_2015_synsets.txt --bounding_box_file=ILSVRC2012/raw-data/imagenet_2012_bounding_boxes.csv  





你可能感兴趣的:(TensorFlow)