menghaocheng

TensorFlow-一种改进的inception-v3迁移学习(图文）

本文是关于如何用谷歌提供的训练好的Inception-v3进行水果图片分类，涉及以下几个内容：

下载inception-v3（谷歌训练好的模型）
图片数据的下载
图片数据的清洗
将模型用于图片分类

-------------------------------------------------------------------

详解：

【创建文件】

|--baidu_search.py #通过百度爬取图片

|--ulibs.py #用于存放数据清洗等功能函数

|--inception-v3.py # 模型训练函数

|--data/ #存放数据

|--model/ #存放已训练好的模型

|--fruit_photos/ #存放爬取的图片

|--tmp/ #存放临时文件

【下载inception-v3】

https://storage.googleapis.com/download.tensorflow.org/models/inception_dec_2015.zip

解压后放在./data/model/目录下

【下载水果图片】：通过关键字从百度爬取

baidu_search.py:

# -*- coding: utf-8 -*-
"""
Created on Tue Feb 27 11:10:45 2018

@author: mc.meng
"""
import re, os
import requests
from urllib.request import urlretrieve


def download1(url, filename, filepath):
    full_name = os.path.join(filepath, filename)
    if os.path.exists(full_name):
        print("【消息】文件已经存在：", full_name)
    try:
        pic = requests.get(url, timeout=5)
    except:
        print('【错误】当前图片无法下载')
        return
    try:
        with open(filepath + "/" + filename, 'wb') as wf:
            wf.write(pic.content)
    except :
        print("【错误】写入失败")


def download2(url, filename, filepath):
    full_name = os.path.join(filepath, filename)
    if os.path.exists(full_name):
        print("【消息】文件已经存在：", full_name)
    try:
        urlretrieve(url, full_name)
    except:
        print('【错误】当前图片无法下载')


def search(word="美女", local_path="./data/down/", page=None, keep_original_name=True):
    local_path += word
    os.makedirs(local_path, exist_ok=True)
    url = 'http://image.baidu.com/search/flip?tn=baiduimage&ie=utf-8&word={word}&pn={pn}&gsm={gsm:x}&ct=&ic=0&lm=-1&width=0&height=0'.format(word=word, pn=20 * page, gsm=40 + 20 * page)

    print("HHHC:0====>page=%d,url=\"%s\"" % (page,url))
    html = requests.get(url).text
    pic_url = re.findall('"objURL":"(.*?)",', html, re.S)

    i = 0
    for url in pic_url:
        print(url)
        i = i + 1
        filename = os.path.split(url)[1].split('?')[0]
        filename_split = filename.split('.')
        if len(filename_split) != 2:
            print("【错误】文件名异常:" + filename)
            continue
        #print("HHHA:0====>", filename_split[1])
        if filename_split[1] != 'jpg' and filename_split[1] != 'JPG' \
                and filename_split[1] != 'png' and filename_split[1] != 'PNG':
            print("【错误】类型异常:" + filename)
            continue

        if not keep_original_name:
            filename = filename.split('.')[0].strip() + "-" + str(page) + "-" + str(i) + "." + filename.split('.')[1].strip()

        download1(url, filename, local_path)
    return


def search_50_page(word, local_path="./data/down/"):
    for i in range(1, 50):
        search(word, local_path, i)


def search_50_page_test():
    search_50_page("美女")


def search_list_test():
    obj_list = ["苹果", "香蕉", "桔子", "桃子", "樱桃", "龙眼", "荔枝"]
    #obj_list = ["苹果", "香蕉", "桔子", "橙子", "桃子", "樱桃", "龙眼", "荔枝", "雪梨", "草莓", "葡萄", "猕猴桃", "菠萝", "番石榴", "青梅"]
    #obj_list = ["菊花", "蒲公英", "玫瑰", "向日葵", "郁金香"]
    for obj in obj_list:
        search_50_page(obj, "./data/fruit_photos/")


if __name__ == '__main__':
    search_list_test()

（PS：源码暗藏福利，但是我不说^V^）

等效于按下图步骤把百度图片切换到”传统翻页版“，然后手动把前面50页都下载下来了

如果你尝试过手动下载，你就会发现图片中有很多是相同的——文件名和URL都一样。此爬虫在文件保存的时候用原始文件名保存，并在在保存新文件前先判断文件是否存在，这就避免了重复的文件。

如果把“苹果”换成“apple"你将看到：

这显然不是我们想要的效果——我们今天需要的是水果图片，因此我们先用中文关键字爬取，完了之后再手动把文件夹名改成英文的：

【图片统一转成jpg】

从百度爬取的图片文件有png、jpg、gpeg等格式，为了方便处理，先把它们统一成jpg

(创建ulibs.py用于存放我们的清洗函数):

def png_to_jpg(path):
    """convert images into jpg format under the path"""
    print("【消息】将图片转换成jpg", path)
    for root, sub_dir, files in os.walk(path):
        print("【消息】进入目录：%s" % root)
        if root == path or not files:
            continue

        for file in files:
            if file.split('.')[1] != 'jpg':
                print("【消息】不是jpg:", file)
                old_file = os.path.join(root, file)
                img = cv2.imread(old_file)
                new_file = os.path.join(root, file.split('.')[0] + ".jpg")
                print("转换成:", new_file)
                cv2.imwrite(new_file, img)
                os.remove(old_file)
    print("【消息】转换完毕")

def png_to_jpg_test():
    png_to_jpg("./data/fruit_photos/")

【手动删除无法预览及明显错误的图片】：

【统一命名】：

从百度爬取的图片的文件名不统一，很多“%”，长度也参差不齐，为了美观起见我们也把文件名处理一下：

类型+编号：

def rename_files(path):
    """rename files under path"""
    for root, sub_dir, files in os.walk(path):
        if root == path or not files:
            continue
        print("will rename files under[%s]" % root)
        count = 1
        for file in files:
            os.rename(os.path.join(root, file), os.path.join(root, os.path.basename(root) + "-" + str(count) + ".jpg"))
            count += 1


def rename_files_test():
    rename_files("./data/fruit_photos/")

效果：

【将inception-v3用于水果分类】

重头戏终于开始了，先上完整代码，然后看效果，然后再详解代码：

'''
data: http://download.tensorflow.org/example_images/flower_photos.tgz
model: https://storage.googleapis.com/download.tensorflow.org/models/inception_dec_2015.zip

inception-v4: http://download.tensorflow.org/models/inception_v4_2016_09_09.tar.gz

'''
import glob
import os.path
import random
import cv2
import numpy as np
import tensorflow as tf
from tensorflow.python.platform import gfile
from tensorflow.python.framework import graph_util

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

BOTTLENECK_TENSOR_SIZE = 2048
BOTTLENECK_TENSOR_NAME = 'pool_3/_reshape:0'
JPEG_DATA_TENSOR_NAME = 'DecodeJpeg/contents:0'
MODEL_DIR = './data/model/inception_dec_2015'
MODEL_FILE = 'tensorflow_inception_graph.pb'
THIS_MODEL_DIR = "./data/model/inception/"
THIS_MODEL_FILE = "inception.pb"
CACHE_DIR = './data/tmp/bottleneck/inception'
#INPUT_DATA = './data/flower_photos'
INPUT_DATA = './data/fruit_photos'
INPUT_DATA = './data/animal_photos'
VALIDATION_PERCENTAGE = 10
TEST_PERCENTAGE = 10
LEARNING_RATE = 0.01
STEPS = 1000
BATCH = 100


def create_image_lists(file_dir):
    training = {}
    validation = {}
    testing = {}
    if not os.path.exists(file_dir):
        print("Not such path:", file_dir)
        return None, None, None

    for this_dir, sub_dirs, files in os.walk(file_dir):
        if this_dir == file_dir or not files:
            continue
        np.random.shuffle(files)
        percent10 = int(len(files) * 0.1)
        this_dir = os.path.basename(this_dir.lower())
        training[this_dir] = files[:percent10 * 8]
        validation[this_dir] = files[percent10 * 8:percent10 * 9]
        testing[this_dir] = files[percent10 * 9:]
    return training, validation, testing


def get_or_create_bottleneck(sess_mod, image_path):
    path_seg = image_path.split('\\')
    label_name = path_seg[-2]
    os.makedirs(os.path.join(CACHE_DIR, label_name), exist_ok=True)
    bottleneck_path = os.path.join(CACHE_DIR, path_seg[-2], path_seg[-1]) + ".txt"

    if not os.path.exists(bottleneck_path):
        image_data = gfile.FastGFile(image_path, 'rb').read()
        bottleneck_values = sess_mod['sess'].run(sess_mod['premod_bottleneck'], feed_dict={sess_mod['premod_input']: image_data})
        bottleneck_values = np.squeeze(bottleneck_values)
        print("HHHA:0====>", image_path)
        print(bottleneck_values)
        bottleneck_string = ','.join(str(x
                                         ) for x in bottleneck_values)
        with open(bottleneck_path, 'w') as bottleneck_file:
            bottleneck_file.write(bottleneck_string)
    else:
        with open(bottleneck_path, 'r') as bottleneck_file:
            bottleneck_string = bottleneck_file.read()
        bottleneck_values = [float(x) for x in bottleneck_string.split(',')]
    return bottleneck_values


def get_cached_bottleneck(sess_mod, images, label=None, index=None):
    label_list = list(images.keys())
    label_list.sort()
    if label is None:
        label = label_list[random.randrange(len(label_list))]
    if index is None:
        index = random.randrange(len(images[label]))

    image_path = os.path.join(INPUT_DATA, label, images[label][index])
    bottleneck = get_or_create_bottleneck(sess_mod, image_path)
    ground_truth = np.zeros(len(label_list), dtype=np.float32)
    ground_truth[label_list.index(label)] = 1.0
    return bottleneck, ground_truth, image_path


def fill_feed_dict(sess_mod, image_lists, amount=None):
    bottlenecks = []
    ground_truths = []
    this_paths = []
    if amount is None:
        for label in list(image_lists.keys()):
            for index, file in enumerate(image_lists[label]):
                bottleneck, ground_truth, path = get_cached_bottleneck(sess_mod, image_lists, label, index)
                bottlenecks.append(bottleneck)
                ground_truths.append(ground_truth)
                this_paths.append(path)
    else:
        for _ in range(amount):
            bottleneck, ground_truth, path = get_cached_bottleneck(sess_mod, image_lists)
            bottlenecks.append(bottleneck)
            ground_truths.append(ground_truth)
            this_paths.append(path)

    feed_dict = {
        sess_mod['placeholder_input']: bottlenecks,
        sess_mod['placeholder_labels']: ground_truths,
    }
    return feed_dict, this_paths


def inference(inputs, n_classes):
    this_input = tf.reshape(inputs, [-1, BOTTLENECK_TENSOR_SIZE], name='input_images')
    weights = tf.get_variable("weights", [BOTTLENECK_TENSOR_SIZE, n_classes], initializer=tf.truncated_normal_initializer(stddev=0.001))
    biases = tf.get_variable("biases", [n_classes], initializer=tf.constant_initializer(0.0))
    logits = tf.add(tf.matmul(this_input, weights), biases, "logits")
    return logits


def loss(logits, labels):
    labels = tf.to_int64(labels)
    cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels)
    return tf.reduce_mean(cross_entropy)


def training(loss, learning_rate):
    tf.summary.scalar('loss', loss)
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    global_step = tf.Variable(0, name='global_step', trainable=False)
    train_op = optimizer.minimize(loss, global_step=global_step)
    return train_op


def evaluation(logits, labels):
    with tf.name_scope('evaluation'):
        correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
        evaluation_step = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
        return evaluation_step


def model_save(sess, model_path, input_tensor_name, bottleneck_tensor_name):
    graph_def = tf.get_default_graph().as_graph_def()
    outpput_graph_def = graph_util.convert_variables_to_constants(sess, graph_def, [input_tensor_name, bottleneck_tensor_name])

    with tf.gfile.GFile(model_path, "wb") as wf:
        wf.write(outpput_graph_def.SerializeToString())


def model_restore(model_path, input_tensor_name, bottleneck_tensor_name):
    with gfile.FastGFile(model_path, 'rb') as rf:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(rf.read())

    in_tensor, out_tensor,  = tf.import_graph_def(graph_def, return_elements=[input_tensor_name, bottleneck_tensor_name])
    return in_tensor, out_tensor


def run_training(epoch=STEPS):
    imgs_training, imgs_validation, imgs_testing = create_image_lists(INPUT_DATA)
    n_classes = len(imgs_training.keys())

    m1_input, m1_bottleneck = model_restore(os.path.join(MODEL_DIR, MODEL_FILE), JPEG_DATA_TENSOR_NAME, BOTTLENECK_TENSOR_NAME)

    placeholder_input = tf.placeholder(tf.float32, [None, BOTTLENECK_TENSOR_SIZE], name='in_images')
    placeholder_labels = tf.placeholder(tf.float32, [None, n_classes])

    logits = inference(placeholder_input, n_classes)

    this_loss = loss(logits, placeholder_labels)

    train_step = training(this_loss, LEARNING_RATE)

    evaluation_step = evaluation(logits, placeholder_labels)

    init = tf.global_variables_initializer()

    with tf.Session() as sess:
        sess.run(init)
        sess_mod = {
            'sess': sess,
            'premod_input': m1_input,
            'premod_bottleneck': m1_bottleneck,
            'placeholder_input': placeholder_input,
            'placeholder_labels': placeholder_labels
        }

        for step in range(epoch):
            feed_dict, image_path = fill_feed_dict(sess_mod, imgs_training, BATCH)
            sess.run(train_step, feed_dict=feed_dict)

            if step % 100 == 0 or step + 1 == STEPS:
                feed_dict, image_path = fill_feed_dict(sess_mod, imgs_validation, BATCH)
                accuracy = sess.run(evaluation_step, feed_dict=feed_dict)
                print("Step %d: Validation accuracy on random sampled %d examples = %.2f%%" % (step, BATCH, accuracy * 100))

        accuracy = sess.run(evaluation_step, feed_dict=fill_feed_dict(sess_mod, imgs_testing)[0])
        print("Final test accuracy = %.1f%%" % (accuracy * 100))

        model_save(sess, os.path.join(THIS_MODEL_DIR, THIS_MODEL_FILE), "in_images", 'logits')


def predict_test():
    imgs_training, imgs_validation, imgs_testing = create_image_lists(INPUT_DATA)

    m1_input, m1_bottleneck = model_restore(os.path.join(MODEL_DIR, MODEL_FILE), JPEG_DATA_TENSOR_NAME, BOTTLENECK_TENSOR_NAME)

    m2_input, m2_bottleneck = model_restore(os.path.join(THIS_MODEL_DIR, THIS_MODEL_FILE), "in_images:0", "logits:0")

    placeholder_labels = tf.placeholder(tf.float32, [None, len(imgs_training.keys())])

    evaluation_step = evaluation(m2_bottleneck, placeholder_labels)

    placeholder_logits = tf.placeholder(tf.float32, [None, len(imgs_training.keys())])
    final_tensor = tf.nn.softmax(placeholder_logits)
    final_index = tf.argmax(final_tensor, 1)

    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        sess_mod = {
            'sess': sess,
            'premod_input': m1_input,
            'premod_bottleneck': m1_bottleneck,
            'placeholder_input': m2_input,
            'placeholder_labels': placeholder_labels
        }
        feed_dict, image_path = fill_feed_dict(sess_mod, imgs_testing)
        accuracy = sess.run(evaluation_step, feed_dict=feed_dict)
        print("Final test accuracy = %.1f%%" % (accuracy * 100))

        while True:
            feed_dict, image_path = fill_feed_dict(sess_mod, imgs_testing, 1)
            this_logits = sess.run(m2_bottleneck, feed_dict=feed_dict)
            f_tensor, f_index = sess.run([final_tensor, final_index], feed_dict={placeholder_logits: this_logits})
            image_path = image_path[0]
            f_tensor = f_tensor[0]
            f_index = f_index[0]
            print("image_path:", image_path)
            print("f_tensor:", f_tensor)
            print("f_index", f_index)

            label_list = list(imgs_testing.keys())
            label_list.sort()
            f_predict = label_list[f_index]

            print("f_predict:", f_predict)

            img = cv2.imread(image_path)
            if img is None:
                print("File not found:", image_path)
                continue
            img = cv2.resize(img, (500, 500))
            cv2.putText(img, os.path.basename(image_path), (50, 50), cv2.FONT_HERSHEY_COMPLEX, 1, (255, 0, 0), 1)
            cv2.putText(img, f_predict, (50, 150), cv2.FONT_HERSHEY_COMPLEX, 3, (255, 0, 255), 5)
            cv2.imshow("predict", img)
            key = cv2.waitKey()
            if key & 0xFF == ord('q'):
                break
            elif key & 0xFF == ord('d'):
                print("removing:", image_path)
                os.remove(image_path)


def main(argv=None):
    #run_training(STEPS)
    predict_test()


if __name__ == "__main__":
    tf.app.run()

运行过程中如果出现错误，一般是图片文件无法打开（文件损坏、原图是gif文件等），直接将其删除就好了。

输出：

94.2%的准确率，还算不错。

【可视化预测结果】

主函数修改如下再运行：

def main(argv=None):
    #run_training(500)
    predict_test()

按q键退出，按d键删除当前文件，按其它何意键切换到下一张：

【代码详解】：

主函数开始：

def main(argv=None):
    run_training(STEPS)
    #predict_test()

可以看出，我们的模型分训练和预测两个阶段：

run_training()是将inception-3迁移到我们的水果分类，训练并将保存新模型；

predict_test()是使用新模型进行预测，并可视化展示预测结果；

【模型保存及恢复】：

model_save()、model_restore()分别是保存和恢复模型

def model_save(sess, model_path, input_tensor_name, bottelneck_tensor_name):
    graph_def = tf.get_default_graph().as_graph_def()
    outpput_graph_def = graph_util.convert_variables_to_constants(sess, graph_def, [input_tensor_name, bottelneck_tensor_name])

    with tf.gfile.GFile(model_path, "wb") as wf:
        wf.write(outpput_graph_def.SerializeToString())


def model_restore(model_path, input_tensor_name, bottelneck_tensor_name):
    with gfile.FastGFile(model_path, 'rb') as rf:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(rf.read())

    in_tensor, out_tensor,  = tf.import_graph_def(graph_def, return_elements=[input_tensor_name, bottelneck_tensor_name])
    return in_tensor, out_tensor

参数：

model_path：指定了模型文件所在的路径；

input_tensor_name: 模型的输入张量名称；

bottelneck_tensor_name: 模型的瓶颈张量；

sess: 保存模型时需要传入当前的会话；

model_restore()在run_training()和predict_test()中都有使用：在run_training()中是恢复inception-v3模型；而在predict_test()中不仅要恢复inception-v3模型，还要恢复我们刚刚训练好的新模型，因此调用了两次。

【四大金刚】：模型、损失、训练、评估

def inference(inputs, n_classes):
    this_input = tf.reshape(inputs, [-1, BOTTLENECK_TENSOR_SIZE], name='input_images')
    weights = tf.get_variable("weights", [BOTTLENECK_TENSOR_SIZE, n_classes], initializer=tf.truncated_normal_initializer(stddev=0.001))
    biases = tf.get_variable("biases", [n_classes], initializer=tf.constant_initializer(0.0))
    logits = tf.add(tf.matmul(this_input, weights), biases, "logits")
    return logits


def loss(logits, labels):
    labels = tf.to_int64(labels)
    cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels)
    return tf.reduce_mean(cross_entropy)


def training(loss, learning_rate):
    tf.summary.scalar('loss', loss)
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    global_step = tf.Variable(0, name='global_step', trainable=False)
    train_op = optimizer.minimize(loss, global_step=global_step)
    return train_op


def evaluation(logits, labels):
    with tf.name_scope('evaluation'):
        correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
        evaluation_step = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
        return evaluation_step

这四个函数都是针对我们的新模型而言：

inference: 向前传播模型；

loss: 损失的计算；

training: 通过最小化损失训练模型参数；

evaluation: 计算预测的精确度；

【瓶颈张量的计算】

def create_image_lists(file_dir):
    training = {}
    validation = {}
    testing = {}
    if not os.path.exists(file_dir):
        print("Not such path:", file_dir)
        return None, None, None

    for this_dir, sub_dirs, files in os.walk(file_dir):
        if this_dir == file_dir or not files:
            continue
        np.random.shuffle(files)
        percent10 = int(len(files) * 0.1)
        this_dir = os.path.basename(this_dir.lower())
        training[this_dir] = files[:percent10 * 8]
        validation[this_dir] = files[percent10 * 8:percent10 * 9]
        testing[this_dir] = files[percent10 * 9:]
    return training, validation, testing


def get_or_create_bottleneck(sess_mod, image_path):
    path_seg = image_path.split('\\')
    label_name = path_seg[-2]
    os.makedirs(os.path.join(CACHE_DIR, label_name), exist_ok=True)
    bottleneck_path = os.path.join(CACHE_DIR, path_seg[-2], path_seg[-1]) + ".txt"

    if not os.path.exists(bottleneck_path):
        image_data = gfile.FastGFile(image_path, 'rb').read()
        bottleneck_values = sess_mod['sess'].run(sess_mod['premod_bottleneck'], feed_dict={sess_mod['premod_input']: image_data})
        bottleneck_values = np.squeeze(bottleneck_values)
        print("HHHA:0====>", image_path)
        print(bottleneck_values)
        bottleneck_string = ','.join(str(x
                                         ) for x in bottleneck_values)
        with open(bottleneck_path, 'w') as bottleneck_file:
            bottleneck_file.write(bottleneck_string)
    else:
        with open(bottleneck_path, 'r') as bottleneck_file:
            bottleneck_string = bottleneck_file.read()
        bottleneck_values = [float(x) for x in bottleneck_string.split(',')]
    return bottleneck_values


def get_cached_bottleneck(sess_mod, images, label=None, index=None):
    label_list = list(images.keys())
    label_list.sort()
    if label is None:
        label = label_list[random.randrange(len(label_list))]
    if index is None:
        index = random.randrange(len(images[label]))

    image_path = os.path.join(INPUT_DATA, label, images[label][index])
    bottleneck = get_or_create_bottleneck(sess_mod, image_path)
    ground_truth = np.zeros(len(label_list), dtype=np.float32)
    ground_truth[label_list.index(label)] = 1.0
    return bottleneck, ground_truth, image_path

create_image_lists()：

理解这个函数需要结合我们的目录结构：fruit_photos下面每种水果的图片放在一个以该水果命名的小目录中：

参数file_dir传入的将是fruit_photos所在路径。用os.walk遍历这个目录，并按1：1：8的比例把所有图片分割成训练、验证、测试三个数据集，每个数据集都是一个字典：以水果名称为键，以图片名称列表为值。

get_or_create_bottleneck():

获取或创建瓶颈向量：

用指定的模型计算指定图片的瓶颈向量。什么意思呢？具体就是获取图片A经过inception-v3这个模型之后的输出。参数sess_mod是封装了inception-v3的输入、输出、和用于计算的sess：

bottleneck_values = sess_mod['sess'].run(sess_mod['premod_bottleneck'], feed_dict={sess_mod['premod_input']: image_data})

可对比tensorflow的经典方式进行理解：sess.run(z, feed_dict={x:a, y:b})

计算瓶颈向量比较耗时，为了避免重复计算，把计算结果存放在CACHE_DIR/水果名/中，以图片名.txt命名。每次获取时先尝试从该目录中获取，如果文件不存在，则用模型进行计算并保存。

参数image_path指明了给获取哪张图片的瓶颈向量。

get_cached_bottleneck():

基于get_or_create_bottleneck()的封装，参数：

images: 图片列表，也就是create_image_list中分割出来的training, validation, testing三个数据集中的一个；

label: 水果名称，如果没有指定，则随机选择一种水果

index: 文件下标，如果没有指定，则随机选择一个下标

如：get_cached_bottleneck(sess_mod, training, "apple", 0)的意思是获取训练集中的苹果的下标为0的图片的瓶颈向量；

又如：get_cached_bottleneck(sess_mod, training)的意思是从训练集中随机获取一张图片的瓶颈向量。

【训练字典的生成】

def fill_feed_dict(sess_mod, image_lists, amount=None):
    bottlenecks = []
    ground_truths = []
    this_paths = []
    if amount is None:
        for label in list(image_lists.keys()):
            for index, file in enumerate(image_lists[label]):
                bottleneck, ground_truth, path = get_cached_bottleneck(sess_mod, image_lists, label, index)
                bottlenecks.append(bottleneck)
                ground_truths.append(ground_truth)
                this_paths.append(path)
    else:
        for _ in range(amount):
            bottleneck, ground_truth, path = get_cached_bottleneck(sess_mod, image_lists)
            bottlenecks.append(bottleneck)
            ground_truths.append(ground_truth)
            this_paths.append(path)

    feed_dict = {
        sess_mod['placeholder_input']: bottlenecks,
        sess_mod['placeholder_labels']: ground_truths,
    }
    return feed_dict, this_paths

这个函数最终输出一个字典，用于新模型的计算。

feed_dict = {

sess_mod['placeholder_input']: bottlenecks,

sess_mod['placeholder_labels']: ground_truths,

}

bottlenecks是图片经过inception-v3的输出，它将作为新模型的输入。sess_mod['placeholder_input']是新模型的输出占位张量；sess_mod['placeholder_labels']是图片的正确标签——计算瓶颈向量的时候“顺便”生成的。

再看amount这个参数：训练的时候用BATCH，评估的时候未指定——等效于None，predict_test()的时候用1，这是为什么呢？

原来amount是指明要随机填充的图片数量，当为空时候将填充传入的整个图片列表。predict_test()阶段由于要向用户展示图片，因此每次只填充一张。

【运行训练】

def run_training(epoch=STEPS):
    imgs_training, imgs_validation, imgs_testing = create_image_lists(INPUT_DATA)
    n_classes = len(imgs_training.keys())

    m1_input, m1_bottleneck = model_restore(os.path.join(MODEL_DIR, MODEL_FILE), JPEG_DATA_TENSOR_NAME, BOTTLENECK_TENSOR_NAME)

    placeholder_input = tf.placeholder(tf.float32, [None, BOTTLENECK_TENSOR_SIZE], name='in_images')
    placeholder_labels = tf.placeholder(tf.float32, [None, n_classes])

    logits = inference(placeholder_input, n_classes)

    this_loss = loss(logits, placeholder_labels)

    train_step = training(this_loss, LEARNING_RATE)

    evaluation_step = evaluation(logits, placeholder_labels)

    init = tf.global_variables_initializer()

    with tf.Session() as sess:
        sess.run(init)
        sess_mod = {
            'sess': sess,
            'premod_input': m1_input,
            'premod_bottleneck': m1_bottleneck,
            'placeholder_input': placeholder_input,
            'placeholder_labels': placeholder_labels
        }

        for step in range(epoch):
            feed_dict, image_path = fill_feed_dict(sess_mod, imgs_training, BATCH)
            sess.run(train_step, feed_dict=feed_dict)

            if step % 100 == 0 or step + 1 == STEPS:
                feed_dict, image_path = fill_feed_dict(sess_mod, imgs_validation, BATCH)
                accuracy = sess.run(evaluation_step, feed_dict=feed_dict)
                print("Step %d: Validation accuracy on random sampled %d examples = %.2f%%" % (step, BATCH, accuracy * 100))

        accuracy = sess.run(evaluation_step, feed_dict=fill_feed_dict(sess_mod, imgs_testing)[0])
        print("Final test accuracy = %.1f%%" % (accuracy * 100))

        model_save(sess, os.path.join(THIS_MODEL_DIR, THIS_MODEL_FILE), "in_images", 'logits')

这是训练的主干过程，解释完前面的小函数之后，这个函数似乎没有太多需要解释的了，它就是把前介绍的函数调用了一遍！

sess_mod的这样封装的原因是sess、m1_input，m1_bottelneck这几个参数经过多层传递最终执行，把它们入在字典中可减少中间函数的参数数量，增加代码的可读性。

【图片展示函数片段】

while True:
    this_logits = sess.run(m2_bottleneck, feed_dict=feed_dict)
    f_tensor, f_index = sess.run([final_tensor, final_index], feed_dict={placeholder_logits: this_logits})
    image_path = image_path[0]
    f_tensor = f_tensor[0]
    f_index = f_index[0]
    print("image_path:", image_path)
    print("f_tensor:", f_tensor)
    print("f_index", f_index)

    label_list = list(imgs_testing.keys())
    label_list.sort()
    f_predict = label_list[f_index]

    print("f_predict:", f_predict)

    img = cv2.imread(image_path)
    if img is None:
        print("File not found:", image_path)
        continue
    img = cv2.resize(img, (500, 500))
    cv2.putText(img, os.path.basename(image_path), (50, 50), cv2.FONT_HERSHEY_COMPLEX, 1, (255, 0, 0), 1)
    cv2.putText(img, f_predict, (50, 150), cv2.FONT_HERSHEY_COMPLEX, 3, (255, 0, 255), 5)
    cv2.imshow("predict", img)
    key = cv2.waitKey()
    if key & 0xFF == ord('q'):
        break
    elif key & 0xFF == ord('d'):
        print("removing:", image_path)
        os.remove(image_path)

用opencv，

cv2.imread(): 读取图片；

cv2.resize(): 将图片大小调整为500*500，这是因为原图的大小并非统一的，建议读者试试去掉的效果；

cv2.putText(): 在图上显示文字；

cv2.imshow(): 显示图片；

cv2.waitKey(): 等待用户输入:

如果用户输入q: 退出循环；

如果用户输入d: 删除当前图片，这在剔除错误图片时相当方便

【扩展】

将model_save()/model_restore()收入ulibs.py中，然后通过以下方式调用:

import ulibs

ulibs.model_save() 

ulibs.model_restore()

参考：《TensorFlow实战Google尝试学习框架》--郑泽宇顾思宇

你可能感兴趣的:(TensorFlow,机器学习,Python)

英伟达终为 CUDA 添加原生 Python 支持，他有什么目的？朱卫军 AI python 开发语言
CUDA原来只支持C/C++/Fortran，在2025的CES上宣布支持原生Python其实是不得已而为之，一方面现在Python的AI开发者数量过于庞大，达到数千万级别，而CUDA仅几百万，CUDA想扩大自己的用户圈子，只能拉Python入伙。另一方面，Python生态的计算库实在太强大，比如numpy，几乎垄断了数组计算，还有像scipy、keras等，已经成为机器学习的主流工具，CUDA必
Python 领域 vllm 安装与环境配置全攻略 Python编程之道 Python编程之道 python 开发语言 ai
Python领域vllm安装与环境配置全攻略关键词：Python、vllm、安装、环境配置、深度学习摘要：本文围绕Python领域中vllm的安装与环境配置展开，全面且深入地介绍了vllm的相关知识。首先阐述了背景信息，包括目的范围、预期读者、文档结构和术语表。接着详细讲解了vllm的核心概念与联系，分析其核心算法原理并给出具体操作步骤，还引入了相关数学模型和公式进行说明。通过项目实战，提供代码实
Docker跨架构部署实操油泼辣子多加算法实战 docker 架构 java
需求场景python项目，开发环境以及可供测试的环境为X86架构下的LINUX服务器，但正式环境需要部署在ARM架构下的麒麟服务器，且正式环境后续可能会长时间处于断网状态，需要一份跨架构的部署方案。解决思路在X86上打包、在ARM（麒麟Linux）上运行，最大的难点就在于二进制兼容性——X86编译出的可执行文件（无论是用PyInstaller还是其它方式）都无法直接在ARM上跑。下面分别说一下两种
Python 爬虫实战：爬取网易公开课（课程列表解析 + 视频资源批量下载） Python核芯 Python爬虫实战项目 python 爬虫音视频网易
一、引言在数字化学习蓬勃发展的当下，网易公开课作为优质在线教育平台，汇聚了海量精品课程，涵盖科技、文化、艺术等多元领域，为求知者提供了便捷的学习渠道。然而，面对丰富的内容，手动逐一浏览、下载课程视频既耗时又低效，尤其对于想要系统学习特定领域知识的用户而言，亟需更高效的解决方案。Python爬虫技术凭借其强大的自动化数据获取能力，可轻松应对这一挑战，实现网易公开课课程列表的精准解析与视频资源的批量下
Bongo-Cat-Crew:用Python打造动态音乐猫元楼
本文还有配套的精品资源，点击获取简介：在这个项目中，我们创建了一个将音乐、游戏和编程结合的创新体验，允许玩家通过动态猫声分类与节奏游戏OSU!互动。Python的使用使得音乐节奏识别、猫声分类逻辑和游戏接口交互成为可能。项目的核心包含了音乐节奏分析、游戏模式识别和猫声动画实现等技术要点，旨在为玩家提供独特的交互乐趣。1.Python在项目中的应用和角色1.1Python在IT行业中的普及Pytho
基于python的api扫描器系统的设计与实现
博主介绍：✌在职Java研发工程师、专注于程序设计、源码分享、技术交流、专注于Java技术领域和毕业设计✌温馨提示：文末有CSDN平台官方提供的老师Wechat/QQ名片:)Java精品实战案例《700套》2025最新毕业设计选题推荐：最热的500个选题o(￣▽￣)ｄ介绍在当今数字化社会，网络安全问题日益突出，为了有效识别和防范网络威胁，开发一款全面的Web应用渗透测试系统至关重要。本研究基于Py
基于小样本的高光谱图像分类任务：CMFSL方法及Python实现 pk_xz123456 仿真模型算法深度学习分类 python 人工智能深度学习机器学习
基于小样本的高光谱图像分类任务：CMFSL方法及Python实现1.引言高光谱图像分类是遥感图像处理领域的重要研究方向，它在农业监测、环境评估、军事侦察等领域有着广泛的应用。与传统RGB图像不同，高光谱图像包含数百个连续的光谱波段，能够提供丰富的光谱信息。然而，高光谱图像分类面临着维度灾难、样本获取困难等挑战，特别是在小样本条件下，传统分类方法往往表现不佳。针对这一问题，本文介绍一种基于小样本的高
ubuntu创建、删除虚拟环境 screenCui ubuntu linux
your_name是自己起的环境名字创建虚拟环境首先通过xshell等工具与服务器建立链接。然后进行以下两步：激活condasource~/.bashrc2.创建虚拟环境condacreate-nyour_namepython=3.7退出以及删除虚拟环境退出虚拟环境condadeactivate删除虚拟环境condaremove-nyour_name--all
python画图修改字体为新罗马字体
#设置字体为新罗马字体font={'family':'serif','serif':['TimesNewRoman'],'size':20,'style':'normal'}plt.rc('font',**font)plt.rc('axes',labelsize=20)如果跑出来不是新罗马字体，那是服务器没装新罗马字体的问题，切换环境到本地就可以了。（本地一般都有新罗马字体）
python序列化任意结构到dict YoungHong1992 python 开发语言
defserialize(obj:Any)->Any:"""因为Param没有序列化的接口，无法直接转为dict或json，因此编写该函数,把Param转为dict"""ifisinstance(obj,np.ndarray):returnobj.tolist()#将numpy.ndarray转换为列表elifisinstance(obj,(int,float,str,bool)):#基本数据类型
Python包版本分析工具开发：从PyPI私有源快速提取元数据 YoungHong1992 python windows 开发语言
importsubprocessimportreimportosimportsysimporttempfileimportzipfilefromemail.parserimportParserfromtypingimportList,Dict,Optional,Anyfromjinja2importEnvironmentfrompackaging.versionimportparseasparse
Python爬虫实战：使用Scrapy和Selenium高效爬取USPTO美国专利数据 Python爬虫项目 2025年爬虫实战项目 python 爬虫 scrapy 开发语言 selenium 测试工具
引言在当今的知识经济时代，专利数据蕴含着巨大的商业和技术价值。美国专利商标局(USPTO)作为全球最大的专利数据库之一，收录了数百万项专利信息，这些数据对于企业竞争分析、技术趋势预测和学术研究都具有重要意义。本文将详细介绍如何使用Python构建一个高效、稳定的USPTO专利数据爬虫系统。一、USPTO专利数据库概述1.1USPTO数据库结构USPTO提供了多种访问专利数据的途径：专利全文和图像数
Python爬虫实战：爬取百度学术摘要信息全流程详解与代码示例 Python爬虫项目 2025年爬虫实战项目 python 爬虫开发语言 scrapy 学习 dubbo 百度
1.前言随着学术资源数字化的普及，百度学术成为学者们常用的论文搜索平台。获取大量论文摘要信息对于文献综述、知识图谱构建等研究极为重要。本文将系统讲解如何利用Python编写爬虫，批量抓取百度学术上的论文摘要。我们将结合最新Python爬虫技术，涵盖基础同步爬虫、异步爬虫、多线程，全面实战演示。2.项目背景与目标百度学术支持通过关键词搜索论文，展示论文标题、作者、期刊、摘要等信息。目标是：根据关键词
Python爬虫实战：爬取网易云音乐热评的完整教程 Python爬虫项目 python 爬虫开发语言能源 selenium
1.背景介绍：为什么爬网易云音乐热评？网易云音乐是中国最受欢迎的音乐平台之一，其用户活跃度极高。评论区往往蕴含丰富的情感表达和用户反馈，是音乐数据分析、情感分析、推荐算法等领域的宝贵数据源。爬取热评可以用于：歌曲口碑分析用户情绪挖掘热门歌曲趋势追踪机器学习训练数据准备但网易云音乐对评论接口进行了加密，直接请求很难成功。本文将帮你攻克这一难点。2.网易云音乐热评接口分析我们首先用浏览器开发者工具（C
macOS运行python程序遇libiomp5.dylib库冲突错误解决方案 screenCui macos python 开发语言
用途说明在macOS系统运行某些涉及OpenMP或多线程的Python程序（如PyTorch、NumPy等科学计算库）时，可能会出现libiomp5.dylib库冲突的错误。设置os.environ['KMP_DUPLICATE_LIB_OK']='True'允许系统加载重复的动态链接库，临时解决冲突问题。典型错误场景错误信息通常包含以下内容：OMP:Error#15:Initializingli
Python项目如何读取nacos配置 Tizzy JJ 服务器 python pycharm
目录一、nacos配置示例二、python读取nacos配置一、nacos配置示例在Nacos中创建yaml格式配置（DataID:your-data-id）#Nacos配置文件(your-data-id.yaml)app:env:productionversion:1.2.3apis:deepseek:api_key:"sk-your-deepseek-key-here"timeout:30da
com本质论 pdf_如何使用PDF Arranger来对PDF文件进行编排和修改 weixin_39797780 com本质论 pdf creatprocess 操作文件 delphi fedora如何隐藏顶部状态栏 linux .bash_profile文件 linux c++编程 pdf
PDFArranger是一个十分简单的GUI应用程序，能够帮助您拆分或合并PDF文档，以及旋转，裁剪和重新编排页面。所有前面提到的任务都可以通过交互式和直观的图形界面轻松完成。Pdfarranger是pdfshuffler的fork以及pikepdf的前端。PDFArranger在许多流行的GNU/Linux操作系统和MicrosoftWindows上都能良好地运行。它是使用GTK+和Python
基于Matplotlib，在个人电脑上实现无代码、易于使用的绘图体验 wh3933 matplotlib 信息可视化
在科学研究、商业分析和学术出版等领域，数据可视化是沟通洞见、展示成果的关键环节。强大的Python绘图库Matplotlib为此提供了无限可能，但其陡峭的学习曲线和对编程能力的硬性要求，将大量非程序员的领域专家拒之门外。这些专家——包括科学家、分析师、学者和学生——虽然在各自领域具备深厚的知识，却常常因不熟悉编程而难以高效地创建高质量、可定制的图表。他们目前或受限于Excel等功能有限的软件，或需
阿里也出手了！十分钟接入Spring Cloud Alibaba AI 体验JAVA微服务AI人工智能，可接通义千问等模型， Java斌十分钟学会Java AI 人工智能 java 微服务
什么是SpringAISpringAI是从著名的Python项目LangChain和LlamaIndex中汲取灵感，它不是这些项目的直接移植，它的成立信念是，「下一波生成式人工智能应用程序将不仅适用于Python开发人员，而且将在许多编程语言中无处不在」。我们可以从SpringAI的官网描述中，总结出SpringAI的几个核心的关键词：提供抽象能力简化AI应用的开发模型与向量支持AI集成与自动配置
python----下载安装，配置环境 m0_73882020 python
1.下载老版本2.7.18参考链接：Python版本Python2.7.18|Python.org2.配置环境手动添加Python到PATH右键点击此电脑→属性→高级系统设置→环境变量；在系统变量中找到Path，点击编辑→新建，添加以下两条路径：D:\download\xz\python\D:\download\xz\python\Scripts\路径就是在你的安装Python保存后重启命令提示符
PDFArranger 1.12.0版本发布：专业PDF文档管理工具的新特性解析
PDFArranger1.12.0版本发布：专业PDF文档管理工具的新特性解析pdfarrangerSmallpython-gtkapplication,whichhelpstheusertomergeorsplitPDFdocumentsandrotate,cropandrearrangetheirpagesusinganinteractiveandintuitivegraphicalinter
Flask 框架：深入浅出理解其工作原理与机制 chilavert318 熬之滴水穿石 flask python 后端
今天写不发相关连载了，而是将我近段时间接触到的内容做次分享。这几天，使用了开源的DashGO框架，了解到了这个开源的底层是Flask框架。所以花了点时间了解一下，现在Web开发领域，各种框架层出不穷，看了一下Flask的源码，作为一款轻量级的PythonWeb框架，还是凸显了简洁、灵活的特点。今天就深入浅出地将我理解的Flask讲解出来。一、Flask是什么简单来说，Flask是一个使用Pytho
Gemma Chatbot 架构深度剖析：从 C++ 核心到多语言推理的工程实践雷羿 LexChien LLM 人工智能 python c++LLM RAG
GemmaChatbot架构深度剖析：从C++核心到多语言推理的工程实践随着大语言模型（LLM）本地化需求日益提升，如何设计一套高效、可扩展、易于维护的本地聊天系统。GemmaChatbot以C++为推理核心，结合Python前端与多语言支持，实现了高性能与灵活性的完美结合。本文将深入剖析其程序架构、模块划分、数据流设计与工程实践细节。一、总体架构设计GemmaChatbot采用“前后端分离”与“
【后端开发】Flask学习教程大雨淅淅后端开发 flask 学习 python 后端
目录一、Flask是什么？二、环境搭建，准备启航2.1安装Python2.2安装Flask库三、第一个Flask程序，初窥门径3.1导入Flask类3.2创建应用实例3.3定义路由和视图函数3.4运行应用四、深入理解Flask核心概念4.1路由系统详解4.2请求与响应处理4.3模板引擎Jinja2五、Flask扩展，增强战斗力5.1Flask-SQLAlchemy：数据库操作的得力助手5.2Fla
【后端开发】Django 大雨淅淅后端开发 sqlite 数据库后端 django
目录一、Django是什么，为何选择它？二、学习前的准备工作三、Django项目初体验四、深入Django核心概念（一）模型（Model）（二）视图（View）（三）模板（Template）（四）URL配置五、实战演练：打造一个简单博客（一）搭建博客基础框架（二）实现文章发布功能（三）展示文章列表和详情六、总结与展望一、Django是什么，为何选择它？在PythonWeb开发的广袤天地里，Djan
【零基础学AI】第33讲：强化学习基础 - 游戏AI智能体 1989 0基础学AI 人工智能游戏 transformer 分类深度学习神经网络
本节课你将学到理解强化学习的基本概念和框架掌握Q-learning算法原理使用Python实现贪吃蛇游戏AI训练能够自主玩游戏的智能体开始之前环境要求Python3.8+PyTorch2.0+Gymnasium(原OpenAIGym)NumPyMatplotlib推荐使用JupyterNotebook进行实验前置知识Python基础编程（第1-8讲）基本数学概念（函数、导数）神经网络基础（第23讲
用python写一个hello world、把代码写下来_程序员如何利用Python写出hello world weixin_39699070 用python写一个hello world 把代码写下来
学习编程语言的第一步，让你的程序对这个世界说一声helloworld!这个程序是编程界经典中的经典，让无数编程恐惧症患者得以顺利写出第一个程序，从而走上大神的不归路！1.新建文本文档先让我们在桌面上新建一个文本文档(helloWorld.txt)2.键入代码现在我们在桌面上已经有了一个空白的文本文档helloWorld.txt，接下来我们打开helloWorld.txt键入下面这这行代码print
python基础训练day27
python基础训练day27小白打卡第27天！题目来源这里python基础训练day27第一题（循环）第二题（进制转换）第三题（又是循环）第四题（字符串连接）第一题（循环）#809*??=800*??+9*??其中??代表的两位数,809*??为四位数，8*??的结果为两位数，9*??的结果为3位数。求??代表的两位数，及809*??后的结果。经过条件判断，i在（1,13）之间变化，应用循环把它
python基础day08 树上的 python python 开发语言
1.闭包:闭包的使用场景:当函数调用完，函数内定义的变量都销毁了，但是我们有时候需要保存函数内的这个变量，每次在这个变量的基础上完成一系列的操作，比如:每次在这个变量的基础上和其它数字进行求和计算。闭包的定义:在函数嵌套的前提下，内部函数使用了外部函数的变量，并且外部函数返回了内部函数，我们把这个使用外部函数变量的内部函数称为闭包。闭包的作用:闭包可以保存函数内的变量，不会随着函数调用完而销毁。闭
Python深度学习实践：建立端到端的自动驾驶系统 AI天才研究院 Agentic AI 实战计算 AI人工智能与大数据计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
Python深度学习实践：建立端到端的自动驾驶系统1.背景介绍自动驾驶系统是当今科技领域最具挑战性和前景的应用之一。它融合了计算机视觉、深度学习、规划与控制等多个领域的先进技术,旨在实现车辆的自主感知、决策和操控。随着人工智能技术的不断发展,越来越多的公司和研究机构投入了大量资源来开发自动驾驶系统。Python作为一种高效、易学且开源的编程语言,在这一领域扮演着重要角色。本文将探讨如何利用Pyth
解读Servlet原理篇二---GenericServlet与HttpServlet 周凡杨 java HttpServlet 源理 GenericService 源码
在上一篇《解读Servlet原理篇一》中提到，要实现javax.servlet.Servlet接口（即写自己的Servlet应用），你可以写一个继承自javax.servlet.GenericServletr的generic Servlet ，也可以写一个继承自java.servlet.http.HttpServlet的HTTP Servlet（这就是为什么我们自定义的Servlet通常是exte
MySQL性能优化 bijian1013 数据库 mysql
性能优化是通过某些有效的方法来提高MySQL的运行速度，减少占用的磁盘空间。性能优化包含很多方面，例如优化查询速度，优化更新速度和优化MySQL服务器等。本文介绍方法的主要有： a.优化查询 b.优化数据库结构
ThreadPool定时重试 dai_lm java ThreadPool thread timer timertask
项目需要当某事件触发时，执行http请求任务，失败时需要有重试机制，并根据失败次数的增加，重试间隔也相应增加，任务可能并发。由于是耗时任务，首先考虑的就是用线程来实现，并且为了节约资源，因而选择线程池。为了解决不定间隔的重试，选择Timer和TimerTask来完成 package threadpool; public class ThreadPoolTest {
Oracle 查看数据库的连接情况周凡杨 sql oracle 连接
首先要说的是，不同版本数据库提供的系统表会有不同，你可以根据数据字典查看该版本数据库所提供的表。 select * from dict where table_name like '%SESSION%'; 就可以查出一些表，然后根据这些表就可以获得会话信息 select sid,serial#,status,username,schemaname,osuser,terminal,ma
类的继承朱辉辉33 java
类的继承可以提高代码的重用行，减少冗余代码；还能提高代码的扩展性。Java继承的关键字是extends 格式:public class 类名（子类）extends 类名（父类）{ } 子类可以继承到父类所有的属性和普通方法，但不能继承构造方法。且子类可以直接使用父类的public和 protected属性，但要使用private属性仍需通过调用。子类的方法可以重写，但必须和父类的返回值类
android 悬浮窗特效肆无忌惮_ android
最近在开发项目的时候需要做一个悬浮层的动画，类似于支付宝掉钱动画。但是区别在于，需求是浮出一个窗口，之后边缩放边位移至屏幕右下角标签处。效果图如下：一开始考虑用自定义View来做。后来发现开线程让其移动很卡，ListView+动画也没法精确定位到目标点。后来想利用Dialog的dismiss动画来完成。自定义一个Dialog后，在styl
hadoop伪分布式搭建林鹤霄 hadoop
要修改4个文件 1: vim hadoop-env.sh 第九行 2: vim core-site.xml <configuration> &n
gdb调试命令 aigo gdb
原文：http://blog.csdn.net/hanchaoman/article/details/5517362 一、GDB常用命令简介 r run 运行.程序还没有运行前使用 c cuntinue
Socket编程的HelloWorld实例 alleni123 socket
public class Client { public static void main(String[] args) { Client c=new Client(); c.receiveMessage(); } public void receiveMessage(){ Socket s=null; BufferedRea
线程同步和异步百合不是茶线程同步异步
多线程和同步 : 如进程、线程同步，可理解为进程或线程A和B一块配合，A执行到一定程度时要依靠B的某个结果，于是停下来，示意B运行；B依言执行，再将结果给A；A再继续操作。所谓同步，就是在发出一个功能调用时，在没有得到结果之前，该调用就不返回，同时其它线程也不能调用这个方法多线程和异步:多线程可以做不同的事情,涉及到线程通知 &
JSP中文乱码分析 bijian1013 java jsp 中文乱码
在JSP的开发过程中，经常出现中文乱码的问题。首先了解一下Java中文问题的由来： Java的内核和class文件是基于unicode的，这使Java程序具有良好的跨平台性，但也带来了一些中文乱码问题的麻烦。原因主要有两方面，
js实现页面跳转重定向的几种方式 bijian1013 JavaScript 重定向
js实现页面跳转重定向有如下几种方式：一.window.location.href <script language="javascript"type="text/javascript"> window.location.href="http://www.baidu.c
【Struts2三】Struts2 Action转发类型 bit1129 struts2
在【Struts2一】 Struts Hello World http://bit1129.iteye.com/blog/2109365中配置了一个简单的Action，配置如下 <!DOCTYPE struts PUBLIC "-//Apache Software Foundation//DTD Struts Configurat
【HBase十一】Java API操作HBase bit1129 hbase
Admin类的主要方法注释： 1. 创建表 /** * Creates a new table. Synchronous operation. * * @param desc table descriptor for table * @throws IllegalArgumentException if the table name is res
nginx gzip ronin47 nginx gzip
Nginx GZip 压缩 Nginx GZip 模块文档详见：http://wiki.nginx.org/HttpGzipModule 常用配置片段如下： gzip on; gzip_comp_level 2; # 压缩比例，比例越大，压缩时间越长。默认是1 gzip_types text/css text/javascript; # 哪些文件可以被压缩 gzip_disable &q
java-7.微软亚院之编程判断俩个链表是否相交给出俩个单向链表的头指针，比如 h1 ， h2 ，判断这俩个链表是否相交 bylijinnan java
public class LinkListTest { /** * we deal with two main missions: * * A. * 1.we create two joined-List(both have no loop) * 2.whether list1 and list2 join * 3.print the join
Spring源码学习-JdbcTemplate batchUpdate批量操作 bylijinnan java spring
Spring JdbcTemplate的batch操作最后还是利用了JDBC提供的方法，Spring只是做了一下改造和封装 JDBC的batch操作： String sql = "INSERT INTO CUSTOMER " + "(CUST_ID, NAME, AGE) VALUES (?, ?, ?)";
[JWFD开源工作流]大规模拓扑矩阵存储结构最新进展 comsci 工作流
生成和创建类已经完成,构造一个100万个元素的矩阵模型,存储空间只有11M大,请大家参考我在博客园上面的文档"构造下一代工作流存储结构的尝试",更加相信的设计和代码将陆续推出......... 竞争对手的能力也很强.......,我相信..你们一定能够先于我们推出大规模拓扑扫描和分析系统的....
base64编码和url编码 cuityang base64 url
import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; import java.io.PrintWriter; import java.io.StringWriter; import java.io.UnsupportedEncodingException;
web应用集群Session保持 dalan_123 session
关于使用 memcached 或redis 存储 session ，以及使用 terracotta 服务器共享。建议使用 redis，不仅仅因为它可以将缓存的内容持久化，还因为它支持的单个对象比较大，而且数据类型丰富，不只是缓存 session，还可以做其他用途，一举几得啊。1、使用 filter 方法存储这种方法比较推荐，因为它的服务器使用范围比较多，不仅限于tomcat ，而且实现的原理比较简
Yii 框架里数据库操作详解-[增加、查询、更新、删除的方法 'AR模式'] dcj3sjt126com 数据库
public function getMinLimit () { $sql = "..."; $result = yii::app()->db->createCo
solr StatsComponent（聚合统计） eksliang solr聚合查询 solr stats
StatsComponent 转载请出自出处：http://eksliang.iteye.com/blog/2169134 http://eksliang.iteye.com/ 一、概述 Solr可以利用StatsComponent 实现数据库的聚合统计查询，也就是min、max、avg、count、sum的功能二、参数
百度一道面试题 greemranqq 位运算百度面试寻找奇数算法 bitmap 算法
那天看朋友提了一个百度面试的题目：怎么找出{1,1,2,3,3,4,4,4,5,5,5,5} 找出出现次数为奇数的数字. 我这里复制的是原话，当然顺序是不一定的，很多拿到题目第一反应就是用map,当然可以解决，但是效率不高。还有人觉得应该用算法xxx,我是没想到用啥算法好...！还有觉得应该先排序... 还有觉
Spring之在开发中使用SpringJDBC ihuning spring
在实际开发中使用SpringJDBC有两种方式： 1. 在Dao中添加属性JdbcTemplate并用Spring注入； JdbcTemplate类被设计成为线程安全的，所以可以在IOC 容器中声明它的单个实例，并将这个实例注入到所有的 DAO 实例中。JdbcTemplate也利用了Java 1.5 的特定(自动装箱，泛型，可变长度
JSON API 1.0 核心开发者自述 | 你所不知道的那些技术细节 justjavac json
2013年5月，Yehuda Katz 完成了JSON API(英文，中文) 技术规范的初稿。事情就发生在 RailsConf 之后，在那次会议上他和 Steve Klabnik 就 JSON 雏形的技术细节相聊甚欢。在沟通单一 Rails 服务器库—— ActiveModel::Serializers 和单一 JavaScript 客户端库——&
网站项目建设流程概述 macroli 工作
一.概念网站项目管理就是根据特定的规范、在预算范围内、按时完成的网站开发任务。二.需求分析项目立项　　我们接到客户的业务咨询，经过双方不断的接洽和了解，并通过基本的可行性讨论够，初步达成制作协议，这时就需要将项目立项。较好的做法是成立一个专门的项目小组，小组成员包括：项目经理，网页设计，程序员，测试员，编辑/文档等必须人员。项目实行项目经理制。客户的需求说明书　　第一步是需
AngularJs 三目运算表达式判断 qiaolevip 每天进步一点点学习永无止境众观千象 AngularJS
事件回顾：由于需要修改同一个模板，里面包含2个不同的内容，第一个里面使用的时间差和第二个里面名称不一样，其他过滤器，内容都大同小异。希望杜绝If这样比较傻的来判断if-show or not，继续追究其源码。 var b = "{{", a = "}}"; this.startSymbol = function(a) {
Spark算子：统计RDD分区中的元素及数量 superlxw1234 spark spark算子 Spark RDD分区元素
关键字：Spark算子、Spark RDD分区、Spark RDD分区元素数量 Spark RDD是被分区的，在生成RDD时候，一般可以指定分区的数量，如果不指定分区数量，当RDD从集合创建时候，则默认为该程序所分配到的资源的CPU核数，如果是从HDFS文件创建，默认为文件的Block数。可以利用RDD的mapPartitionsWithInd
Spring 3.2.x将于2016年12月31日停止支持 wiselyman Spring 3
Spring 团队公布在2016年12月31日停止对Spring Framework 3.2.x（包含tomcat 6.x）的支持。在此之前spring团队将持续发布3.2.x的维护版本。请大家及时准备及时升级到Spring
fis纯前端解决方案fis-pure zccst JavaScript
作者：zccst FIS通过插件扩展可以完美的支持模块化的前端开发方案，我们通过FIS的二次封装能力，封装了一个功能完备的纯前端模块化方案pure。 1，fis-pure的安装 $ fis install -g fis-pure $ pure -v 0.1.4 2，下载demo到本地 git clone https://github.com/hefangshi/f