boywaiter

Chapter 12 Distributing TensorFlow Across Devices and Servers

OReilly. Hands-On Machine Learning with Scikit-Learn and TensorFlow读书笔记

12.1 Multiple Devices on a Single Machine

12.1.1 Installation

Check GPU compatibility: https://developer.nvidia.com/cuda-gpus

An Amazon AWS GPU instance are available in Žiga Avsec’s helpful blog post.

Google also released a cloud service called Cloud Machine Learning to run TensorFlow graphs.

Tim Dettmers wrote a great blog post to help you choose, and he updates it fairly regularly.

You must then download and install the appropriate version of the CUDA (Compute Unified Device Architecture) and cuDNN (CUDA Deep Neural Network) libraries, and set a few environment variables so TensorFlow knows where to find CUDA and cuDNN.

You can use the nvidia-smi command to check that CUDA is properly installed. It lists the GPU cards, as well as processes running on each card:

$ nvidia-smi

Create an isolated environment using virtualenv, and activate it

$ cd $ML_PATH # Your ML working directory (e.g., $HOME/ml)
$ source env/bin/activate

Install GPU-enabled version of TensorFlow:

$ pip3 install --upgrade tensorflow-gpu

Now you can open up a Python shell and check that TensorFlow detects and uses CUDA and cuDNN properly by importing TensorFlow and creating a session:

>>> import tensorflow as tf
>>> sess=tf.Session()
2019-03-13 13:44:36.870279: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-03-13 13:44:38.591268: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:03:00.0
totalMemory: 10.92GiB freeMemory: 10.76GiB
2019-03-13 13:44:38.591331: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-03-13 13:45:08.050299: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-13 13:45:08.050344: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0
2019-03-13 13:45:08.050354: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N
2019-03-13 13:45:08.050730: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10404 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1)
>>> se>>> import tensorflow as tf
2019-03-13 13:45:08.050730: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10404 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1)

12.1.2 Managing the GPU Ram

By default TensorFlow automatically grabs all the RAM in all available GPUs the first time you run a graph, so you will not be able to start a second TensorFlow program while the first one is still running.

To run each process on different GPU cards, the simplest option is to set the CUDA_VISIBLE_DEVICES environment variable so that each process only sees the appropriate GPU cards.

$ CUDA_VISIBLE_DEVICES=0,1 python3 program_1.py
# and in another terminal:
$ CUDA_VISIBLE_DEVICES=3,2 python3 program_2.py

Another option is to tell TensorFlow to grab only a fraction of the memory. For example, to make TensorFlow grab only 40% of each GPU’s memory, you must create a ConfigProto object, set its gpu_options.per_process_gpu_memory_fraction option to 0.4, and create the session using this configuration:

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config)

12.1.3 Placing Operations on Devices

The TensorFlow whitepaper presents a friendly dynamic placer algorithm that automagically distributes operations across all available devices, taking into account things like the measured computation time in previous runs of the graph, estimations of the size of the input and output tensors to each operation, the amount of RAM available in each device, communication delay when transferring data in and out of devices, hints and constraints from the user, and more.

Simple placement

The simple placer respects the following rules:

If a node was already placed on a device in a previous run of the graph, it is left on that device.
Else, if the user pinned a node to a device (described next), the placer places it on that device.
Else, it defaults to GPU #0, or the CPU if there is no GPU.

To pin nodes onto a device, you must create a device block using the device() function. For
example, the following code pins the variable a and the constant b on the CPU, but the multiplication node c is not pinned on any device, so it will be placed on the default device:

with tf.device("/cpu:0"):
    a=tf.Variable(3.0)
    b=tf.constant(4.0)
c=a*b

Logging placements

You can set the log_device_placement option to True to tell the placer to log a message whenever it places a node. For example:

config = tf.ConfigProto()
config.log_device_placement = True
sess = tf.Session(config=config)
init=tf.global_variables_initializer()
init.run(session=sess)
sess.run(c)

The lines starting with "I" for Info are the log messages. When we create a session,
TensorFlow logs a message to tell us that it has found a GPU card. Then the first time we run the graph (in this case when initializing the variable a), the simple placer is run and places each node on the device it was assigned to. As expected, the log messages show that all nodes are placed on "/cpu:0" except the multiplication node, which ends up on the default device "/gpu:0". Notice that the second time we run the graph (to compute c), the placer is not used since all the nodes TensorFlow needs to compute c are already placed.

Dynamic placement function

When you create a device block, you can specify a function instead of a device name.
TensorFlow will call this function for each operation it needs to place in the device block, and the function must return the name of the device to pin the operation on. For example, the following code pins all the variable nodes to "/cpu:0" (in this case just the variable a) and all other nodes to "/gpu:0":

def variables_on_cpu(op):
    if op.type=="Variable":
        return "/cpu:0"
    else:
        return "/gpu:0"
with tf.device(variables_on_cpu):
    a=tf.Variable(3.0)
    b=tf.constant(4.0)
    c=a*b

Operations and kernels

For a TensorFlow operation to run on a device, it needs to have an implementation for that device; this is called a kernel. Many operations have kernels for both CPUs and GPUs, but not all of them. For example, TensorFlow does not have a GPU kernel for integer variables, so the following code will fail when TensorFlow tries to place the variable i on GPU #0:

with tf.device("/gpu:0"):
    i = tf.Variable(3)
sess.run(i.initializer)

Soft placement

By default, if you try to pin an operation on a device for which the operation has no kernel, you get the exception shown earlier when TensorFlow tries to place the operation on the device. If you prefer TensorFlow to fall back to the CPU instead, you can set the allow_soft_placement configuration option to True:

with tf.device("/gpu:0"):
    i=tf.Variable(3)
config=tf.ConfigProto()
config.allow_soft_replacement=True
sess=tf.Session(config=config)
sess.run(i.initializer)# the placer runs and falls back to /cpu:0

12.1.4 Parallel Execution

When TensorFlow runs a graph, it starts by finding out the list of nodes that need to be evaluated, and it counts how many dependencies each of them has. TensorFlow then starts evaluating the nodes with zero dependencies (i.e., source nodes). If these nodes are placed on separate devices, they obviously get evaluated in parallel. If they are placed on the same device, they get evaluated in different threads, so they may run in parallel too (in separate GPU threads or CPU cores).

TensorFlow manages a thread pool on each device to parallelize operations (see Figure 12-5). These are called the inter-op thread pools. Some operations have multi‐threaded kernels: they can use other thread pools (one per device) called the intra-op thread pools.

Operations D and E depend on C. As soon as operation C finishes, the dependency counters of operations D and E will be decremented and will both reach 0, so both operations will be sent to the inter-op thread pool to be executed.

You can control the number of threads per inter-op pool by setting the inter_op_parallelism_threads option. Note that the first session you start creates the inter-op thread pools. All other sessions will just reuse them unless you set the use_per_session_threads option to True. You can control the number of threads per intra-op pool by setting the intra_op_parallelism_threads option.

12.1.5 Control Dependencies

To postpone evaluation of some nodes, a simple solution is to add control dependencies. For example, the following code tells TensorFlow to evaluate x and y only after a and b have been evaluated:

a=tf.constant(1.0)
b=a+2.0

with tf.control_dependencies([a,b]):
    x=tf.constant(3.0)
    y=tf.constant(4.0)
z=x+y

12.2 Multiple Devices Across Multiple Servers

To run a graph across multiple servers, you first need to define a cluster. A cluster is composed of one or more TensorFlow servers, called tasks, typically spread across several machines. Each task belongs to a job. A job is just a named group of tasks that typically have a common role, such as keeping track of the model parameters (such a job is usually named "ps" for parameter server), or performing computations (such a job is usually named "worker").

cluster_spec=tf.train.ClusterSpec({
    "ps":[
        "machine-a.example.com:2221",# /job:ps/task:0
    ],
    "worker":[
        "machine-a.example.com:2222",# /job:worker/task:0
        "machine-b.example.com:2222",# /job:worker/task:1
    ]
})

To start a TensorFlow server, you must create a Server object, passing it the cluster
specification (so it can communicate with other servers) and its own job name and task number. For example, to start the first worker task, you would run the following code on machine A:

server = tf.train.Server(cluster_spec, job_name="worker", task_index=0)

It is usually simpler to just run one task per machine, but the previous example demonstrates that TensorFlow allows you to run multiple tasks on the same machine if you want. If you have several servers on one machine, you will need to ensure that they don’t all try to grab all the RAM of every GPU, as explained earlier. For example, in Figure 12-6 the “ps” task does not see the GPU devices, since presumably its process was launched with CUDA_VISIBLE_DEVICES="". Note that the CPU is shared by all tasks located on the same machine.

If you want the process to do nothing other than run the TensorFlow server, you can block the main thread by telling it to wait for the server to finish using the join() method (otherwise the server will be killed as soon as your main thread exits). Since there is currently no way to stop the server, this will actually block forever:

server.join() # blocks until the server stops (i.e., never)

12.2.1 Opening a Session

Once all the tasks are up and running (doing nothing yet), you can open a session on any of the servers, from a client located in any process on any machine (even from a process running one of the tasks), and use that session like a regular local session. For example:

a=tf.constant(1.0)
b=a+2
c=a*3
with tf.Session("grpc://machine-b.example.com:2222") as sess:
    print(c.eval()) #9.0

12.2.2 The Master and Worker Services

The client uses the gRPC protocol (Google Remote Procedure Call) to communicate with the server. Data is transmitted in the form of protocol buffers, another open source Google technology. This is a lightweight binary data interchange format.

Every TensorFlow server provides two services: the master service and the worker service. The master service allows clients to open sessions and use them to run graphs. It coordinates the computations across tasks, relying on the worker service to actually execute computations on other tasks and get their results.

This architecture gives you a lot of flexibility. One client can connect to multiple servers by opening multiple sessions in different threads. One server can handle multiple sessions simultaneously from one or more clients. You can run one client per task (typically within the same process), or just one client to control all tasks. All options are open.

12.2.3 Pinning Operations Across Tasks

You can use device blocks to pin operations on any device managed by any task, by specifying the job name, task index, device type, and device index. For example, the following code pins a to the CPU of the first task in the “ps” job (that’s the CPU on machine A), and it pins b to the second GPU managed by the first task of the “worker” job (that’s GPU #1 on machine A). Finally, c is not pinned to any device, so the master places it on its own default device (machine B’s GPU #0 device).

with tf.device("/job:ps/task:0/cpu:0"):
    a=tf.constant(1.0)

with tf.device("job:worker/task:0/gpu:1"):
    b=a+2

c=a+b

12.2.4 Sharding Variables Across Multiple Parameter Servers

TensorFlow provides the replica_device_setter() function, which distributes variables across all the "ps" tasks in a round-robin fashion. For example, the following code pins five
variables to two parameter servers:

with tf.device(tf.train.replica_device_setter(ps_tasks=2)):
    v1 = tf.Variabel(1.0) # pinned to /job:ps/task:0
    v2 = tf.Variable(2.0) # pinned to /job:ps/task:1
    v3 = tf.Variable(3.0) # pinned to /job:ps/task:0
    v4 = tf.Variable(4.0) # pinned to /job:ps/task:1
	v5 = tf.Variable(5.0) # pinned to /job:ps/task:0

Instead of passing the number of ps_tasks, you can pass the cluster spec cluster=cluster_spec and TensorFlow will simply count the number of tasks in the "ps"
job.

If you create other operations in the block, beyond just variables, TensorFlow automatically pins them to "/job:worker", which will default to the first device managed by the first task in the "worker" job. You can pin them to another device by setting the worker_device parameter, but a better approach is to use embedded device blocks. An inner device block can override the job, task, or device defined in an outer block. For example:

with tf.device(tf.train.replica_device_setter(ps_tasks=2)):
    v1=tf.Variable(1.0) # pinned to /job:ps/task:0 (+ defaults to /cpu:0)
    v2=tf.Variable(2.0) # pinned to /job:ps/task:1 (+ defaults to /cpu:0)
    v3=tf.Variable(3.0) # pinned to /job:ps/task:0 (+ defaults to /cpu:0)
    s=v1+v2             # pinned to /job:worker (+ defaults to task:0/gpu:0)   
    with tf.device("/gpu:1"):
        p1=2*s          # pinned to /job:worker/gpu:1 (+ defaults to /task:0)
        with tf.device("/task:1"):
            p2=3*s      # pinned to /job:worker/task:1/gpu:1

12.2.5 Sharing State Across Sessions Using Resource Container

When you are using distributed sessions, variable state is managed by resource containers located on the cluster itself, not by the sessions. So if you create a variable named x using one client session, it will automatically be available to any other session on the same cluster (even if both sessions are connected to a different server). For example, consider the following client code:

#simple_client.py
import tensorflow as tf
import sys

x=tf.Variable(0.0,name="x")
increment_x=tf.assign(x,x+1)

with tf.Session(sys.argv[1]) as sess:
    if sys.argv[2:]==["init"]:
        sess.run(x.initializer)
    sess.run(increment_x)
    print(x.eval(0))

$ python3 simple_client.py grpc://machine-a.example.com:2222 init
1.0

$ python2 simple_client.py grpc://machine-b.example.com:22222
2.0

If you want to run completely independent computations on the same cluster you will have to be careful not to use the same variable names by accident. One way to ensure that you won’t have name clashes is to wrap all of your construction phase inside a variable scope with a unique name for each computation, for example:

with tf.variable_scope("my_problem_1"):
    [...] # Construction phase of problem 1

A better option is to use a container block:

with tf.container("my_problem_1"):
    [...] # Construction phase of problem 1

The following command will connect to the server on machine A and ask it to reset the container named “my_problem_1”, which will free all the resources this container used (and also close all sessions open on the server). Any variable managed by this container must be initialized before you can use it again:

tf.Session.reset("grpc://machine-a.example.com:2222", ["my_problem_1"])

12.2.6 Asynchronous Communication Using TensorFlow Queues

Queues are another great way to exchange data between multiple sessions; for example, one common use case is to have a client create a graph that loads the training data and pushes it into a queue, while another client creates a graph that pulls the data from the queue and trains a model (see Figure 12-8). This can speed up training considerably because the training operations don’t have to wait for the next mini-batch at every step.

The following code creates a FIFO queue that can store up to 10 tensors containing two float values each:

q=tf.FIFOQueue(capacity=10,dtypes=[tf.float32],shapes=[[2]],
               name="q",shared_name="shared_q")

Enqueueing data

#training_data_loader.py
import tensorflow as tf
with tf.container("sharedqueue"):
    #report an error if use "shared_queue" as container name
    q=tf.FIFOQueue(capacity=10,dtypes=[tf.float32],shapes=[[2]],
                  name="q",shared_name="shared_q")
    training_instance=tf.placeholder(tf.float32,shape=(2))
    enqueue =q.enqueue([training_instance])

with tf.container("sharedqueue"):
    with tf.Session("grpc://127.0.0.1:2222") as sess:
        sess.run(enqueue, feed_dict={training_instance: [1., 2.]})
        sess.run(enqueue, feed_dict={training_instance: [3., 4.]})
        sess.run(enqueue, feed_dict={training_instance: [5., 6.]})

Instead of enqueuing instances one by one, you can enqueue several at a time using an enqueue_many operation:

[...]
training_instances = tf.placeholder(tf.float32, shape=(None, 2))
enqueue_many = q.enqueue([training_instances])
with tf.container("sharedqueue"):
    with tf.Session("grpc://127.0.0.1:2222") as sess:
        sess.run(enqueue_many,feed_dict={training_instances: [[1., 2.], [3., 4.], [5., 6.]]})

Dequeuing data

# trainer.py
import tensorflow as tf
with tf.container("sharedqueue"):
    dequeue = q.dequeue()
    with tf.Session("grpc://127.0.0.1:2222") as sess:
        print(sess.run(dequeue)) # [1., 2.]
        print(sess.run(dequeue)) # [3., 4.]
        print(sess.run(dequeue)) # [5., 6.]

[...]
with tf.container("sharedqueue"):
    batch_size = 2
    dequeue_mini_batch= q.dequeue_many(batch_size)
    with tf.Session("grpc://127.0.0.1:2222") as sess:
        print(sess.run(dequeue_mini_batch)) # [[1., 2.], [4., 5.]]
        print(sess.run(dequeue_mini_batch)) # blocked waiting for another instance

Queues of tuples

Each item in a queue can be a tuple of tensors (of various types and shapes) instead of
just a single tensor. For example, the following queue stores pairs of tensors, one of
type int32 and shape (), and the other of type float32 and shape [3,2]:

q=tf.FIFOQueue(capacity=10,dtypes=[tf.int32,tf.float32],shapes=[[],[3,2]],name="q",shared_name="shared_q")

The enqueue operation must be given pairs of tensors (note that each pair represents only one item in the queue):

a=tf.placeholder(tf.int32,shape=())
b=tf.placeholder(tf.float32,shape=(3,2))
enqueue=q.enqueue((a,b))

with tf.Session("grpc://127.0.0.1:2221") as sess:
    sess.run(enqueue,feed_dict={a:10,b:[[1.,2.],[3.,4.],[5.,6.]]})
    sess.run(enqueue, feed_dict={a: 11, b:[[2., 4.], [6., 8.], [0., 2.]]})
    sess.run(enqueue, feed_dict={a: 12, b:[[3., 6.], [9., 2.], [5., 8.]]})

dequeue_a, dequeue_b = q.dequeue()
with tf.Session("grpc://127.0.0.1:2222") as sess:
    a_val, b_val = sess.run([dequeue_a, dequeue_b])
    print(a_val) # 10
    print(b_val) # [[1., 2.], [3., 4.], [5., 6.]]

batch_size = 2
dequeue_as, dequeue_bs = q.dequeue_many(batch_size)
with tf.Session("grpc://127.0.0.1:2222") as sess:
    a, b = sess.run([dequeue_as, dequeue_bs])
    print(a) # [10, 11]
    print(b) # [[[1., 2.], [3., 4.], [5., 6.]], [[2., 4.], [6., 8.], [0., 2.]]]
    a, b = sess.run([dequeue_as, dequeue_bs]) # blocked waiting for another pair

Closing a queue

close_q = q.close()
with tf.Session("grpc://127.0.0.1:2222") as sess:
    [...]
    sess.run(close_q)

RandomShuffleQueue

import tensorflow as tf
tf.reset_default_graph()
q=tf.RandomShuffleQueue(capacity=50, min_after_dequeue=2,
                        dtypes=[tf.int32],shapes=[()],
                       name="q",shared_name="shared_q")
x=tf.placeholder(dtype=tf.int32,shape=())
enqueue_instance= q.enqueue(x)
dequeue = q.dequeue_many(5)
with tf.Session() as sess:
    for i in range(23):
        sess.run(enqueue_instance,feed_dict={x:i})
    print(sess.run(dequeue)) # [ 20. 15. 11. 12. 4.] (17 items left)
    print(sess.run(dequeue)) # [ 5. 13. 6. 0. 17.] (12 items left)
    print(sess.run(dequeue)) # 12 - 5 < 10: blocked waiting for 3 more instances

PaddingFIFOQueue

A PaddingFIFOQueue accepts tensors of variable sizes along any dimension (but with a fixed rank).

q = tf.PaddingFIFOQueue(capacity=50, dtypes=[tf.float32], shapes=[(None, None)],name="q", shared_name="shared_q")
v = tf.placeholder(tf.float32, shape=(None, None))
enqueue = q.enqueue([v])
with tf.Session("grpc://127.0.0.1:2222") as sess:
    sess.run(enqueue, feed_dict={v: [[1., 2.], [3., 4.], [5., 6.]]}) # 3x2
    sess.run(enqueue, feed_dict={v: [[1.]]}) # 1x1
    sess.run(enqueue, feed_dict={v: [[7., 8., 9., 5.], [6., 7., 8., 9.]]}) # 2x4

dequeue = q.dequeue_many(3)
with tf.Session("grpc://127.0.0.1:2222") as sess:
    print(sess.run(dequeue))

#output
[[[ 1. 2. 0. 0.]
[ 3. 4. 0. 0.]
[ 5. 6. 0. 0.]]
[[ 1. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]]
[[ 7. 8. 9. 5.]
[ 6. 7. 8. 9.]
[ 0. 0. 0. 0.]]]

12.2.7 Loading Data Directly from the Graph

Preload the data into a variable

For datasets that can fit in memory, a better option is to load the training data once and assign it to a variable, then just use that variable in your graph. This is called preloading the training set.

tf.reset_default_graph()
n_features=2
training_set_init=tf.placeholder(tf.float32,shape=(None,n_features))
training_set=tf.Variable(training_set_init,trainable=False,
                         collections=[],name="training_set",
                        validate_shape=False)
with tf.Session() as sess:
    data=[[1,2],[3,4],[5,6],[7.,8.],[9.,10.]]
    sess.run(training_set.initializer,feed_dict={training_set_init:data})

This example assumes that all of your training set (including the labels) consists only of float32 values. If that’s not the case, you will need one variable per type.

Reading the training data directly from the graph

Reader operations: operations capable of reading data directly from the filesystem. This way the training data never needs to flow through the clients at all. TensorFlow provides readers for various file formats:

CSV
Fixed-length binary records
TensorFlow’s own TFRecords format, based on protocol buffers

Let’s look at a simple example reading from a CSV file. Suppose you have file named my_test.csv that contains training instances, and you want to create operations to read it.

x1, x2, target
1. , 2. , 0
4. , 5 , 1
7. , , 0

First, let’s create a TextLineReader to read this file. A TextLineReader opens a file (once we tell it which one to open) and reads lines one by one. It is a stateful operation, like variables and queues: it preserves its state across multiple runs of the graph, keeping track of which file it is currently reading and what its current position is in this file.

reader=tf.TextLineReader(skip_header_lines=1)

Next, we create a queue that the reader will pull from to know which file to read next. We also create an enqueue operation and a placeholder to push any filename we want to the queue, and we create an operation to close the queue once we have no more files to read:

#enqueue filenames
filename_queue=tf.FIFOQueue(capacity=10,dtypes=[tf.string],shape=[()])
filename=tf.placeholder(tf.string)
enqueue_filename= filename_queue.enqueue([filename])
close_filename_queue=filename_queue.close()

Create a read operation that will read one record (i.e., a line) at a time and return a key/value pair. The key is the record’s unique identifier—a string composed of the filename, a colon (?, and the line number—and the value is simply a string containing the content of the line:

#read each line of files of which the names contained in filename_queue
key,value =reader.read(filename_queue)

Parse this string to get the features and target:

x1,x2,target= tf.decode_csv(value,record_defaults=[[-1.],[-1.],[-1]])
features =tf.stack([x1,x2])

Finally, we can push this training instance and its target to a RandomShuffleQueue that we will share with the training graph (so it can pull mini-batches from it), and we create an operation to close that queue when we are done pushing instances to it:

#enqueue each line into instance_queue 
instance_queue = tf.RandomSHuffleQueue(capacity=10, min_after_dequeue=2, dtypes=[tf.float32,tf.int32],shapes=[[2],[]],name="instance_q", shared_name="shared_instance_q")
enqueue_instance =instance_queue.enqueue([features,targer])
close_instance_queue=instance_queue_close()

Run the graph:

minibatch_instances, minibatch_targets=instance_queue.dequeue_up_to(2)
with tf.Session() as sess:
    sess.run(enqueue_filename,feed_dict={filename:"my_test.csv"})
    sess.run(close_filename_queue)
    try:
        while True:
            sess.run(enqueue_instance)
    except tf.errors.OutOfRangeError as ex:
        print("No more files to read")
    sess.run(close_instance_queue)
    try:
        while True:
            print(sess.run([minibatch_instances,minibatch_targets]))
    except tf.errors.OutOfRangeError as ex:
        print("No more training instances")

Multithreaded readers using a Coordinator and a QueueRunner

minibatch_instances, minibatch_targets=instance_queue.dequeue_up_to(2)

n_threads=5
queue_runner=tf.train.QueueRunner(instance_queue,[enqueue_instance]*5)
coord= tf.train.Coordinator()

with tf.Session() as sess:
    sess.run(enqueue_filename,feed_dict={filename:"my_test.csv"})
    sess.run(close_filename_queue)
    enqueue_threads= queue_runner.create_threads(sess,coord=coord,start=True)
    try:
        while True:
            print(sess.run([minibatch_instances,minibatch_targets]))
    except tf.errors.OutOfRangeError as ex:
        print("No more training instances")

The QueueRunner is deprecated. Use tf.data.Dataset instead.

Reading simultaneously from multiple files

def read_and_push_instance(filename_queue,instance_queue):
    reader=tf.TextLineReader(skip_header_lines=1)
    key,value= reader.read(filename_queue)
    x1,x2,target= tf.decode_csv(value,record_defaults=[[-1.],[-1.],[1]])
    n_features=tf.stack([x1,x2])
    enqueue_instance=instance_queue.enqueue([n_features,target])
    return enqueue_instance
filename_queue=tf.FIFOQueue(capacity=10,dtypes=[tf.string],shapes=[()])
filename=tf.placeholder(tf.string)
enqueue_filename=filename_queue.enqueue([filename])
close_filename_queue=filename_queue.close()

instance_queue=tf.RandomShuffleQueue(
    capacity=10, min_after_dequeue=2,dtypes=[tf.float32,tf.int32],
    shapes=[[2],[]],name="instance_q",shared_name="shared_instance_q")

read_and_enqueue_ops=[
    read_and_push_instance(filename_queue,instance_queue)
    for i in range(5)]
queue_runner=tf.train.QueueRunner(instance_queue,read_and_enqueue_ops)

minibatch_instances,minibatch_targets=instance_queue.dequeue_up_to(2)

with tf.Session() as sess:
    sess.run(enqueue_filename,feed_dict={filename:"my_test.csv"})
    sess.run(close_filename_queue)
    coord=tf.train.Coordinator()
    enqueue_threads=queue_runner.create_threads(sess,coord=coord,start=True)
    try:
        while True:
            print(sess.run([minibatch_instances,minibatch_targets]))
    except tf.errors.OutOfRangeError as ex:
        print("No more training instances")

**Other convenient **

The string_input_producer()takes a 1D tensor containing a list of filenames, creates a thread that pushes one filename at a time to the filename queue, and then closes the queue. If you specify a number of epochs, it will cycle through the filenames once per epoch before closing the queue. By default, it shuffles the filenames at each epoch. It creates a QueueRunner to manage its thread, and adds it to the GraphKeys.QUEUE_RUNNERS collection. To start every QueueRunner in that collection, you can call the tf.train.start_queue_runners() function. Note that if you forget to start the QueueRunner, the filename queue will be open and empty, and your readers will be blocked forever.

There are a few other producer functions that similarly create a queue and a corresponding QueueRunner for running an enqueue operation (e.g., input_producer(), range_input_producer(), and slice_input_producer()).

The shuffle_batch() function takes a list of tensors (e.g., [features, target]) and
creates:

A RandomShuffleQueue
A QueueRunner to enqueue the tensors to the queue (added to the GraphKeys.QUEUE_RUNNERS collection)
A dequeue_many operation to extract a mini-batch from the queue

This makes it easy to manage in a single process a multithreaded input pipeline feeding a queue and a training pipeline reading mini-batches from that queue. Also check out the batch(), batch_join(), and shuffle_batch_join() functions that provide similar functionality.

12.3 Parallelizing Neural Networks on a TensorFlow Cluster

12.3.1 One Neural Network per Device

The most trivial way to train and run neural networks on a TensorFlow cluster is to take the exact same code you would use for a single device on a single machine, and specify the master server’s address when creating the session. You can change the device that will run your graph simply by putting your code’s construction phase within a device block.

Another option is to serve your neural networks using [TensorFlow Serving](https://tensor
ﬂow.github.io/serving/).

12.3.2 In-Graph Versus Between-Graph Replication

You can also parallelize the training of a large ensemble of neural networks by simply placing every neural network on a different device.

There are two major approaches to handling a neural network ensemble:

You can create one big graph, containing every neural network, each pinned to a
different device, plus the computations needed to aggregate the individual predictions from all the neural networks (see Figure 12-12). Then you just create one session to any server in the cluster and let it take care of everything (including waiting for all individual predictions to be available before aggregating them). This approach is called in-graph replication.
Alternatively, you can create one separate graph for each neural network and handle synchronization between these graphs yourself. This approach is called between-graph replication. One typical implementation is to coordinate the execution of these graphs using queues (see Figure 12-13). A set of clients handles one neural network each, reading from its dedicated input queue, and writing to its dedicated prediction queue. Another client is in charge of reading the inputs and pushing them to all the input queues (copying all inputs to every queue). Finally, one last client is in charge of reading one prediction from each prediction queue and aggregating them to produce the ensemble’s prediction.

tf.reset_default_graph()

q=tf.FIFOQueue(capacity=10,dtypes=[tf.float32],shapes=[()])
v=tf.placeholder(tf.float32)
enqueue=q.enqueue([v])
dequeue=q.dequeue()
output=dequeue+1

config=tf.ConfigProto()
config.operation_timeout_in_ms=1000

with tf.Session(config=config) as sess:
    sess.run(enqueue,feed_dict={v:1.0})
    sess.run(enqueue,feed_dict={v:2.0})
    sess.run(enqueue,feed_dict={v:3.0})
    print(sess.run(output))
    print(sess.run(output,feed_dict={dequeue:5}))
    print(sess.run(output))
    print(sess.run(output))
    try:
        print(sess.run(output))
    except tf.errors.DeadlineExceededError as ex:
        print("Timed out while dequeuing")

12.3.3 Model Parallelism

Model Parallelism: chopping your model into separate chunks and running each chunk on a different device.

12.3.4 Data Parallelism

Another way to parallelize the training of a neural network is to replicate it on each device, run a training step simultaneously on all replicas using a different mini-batch for each, and then aggregate the gradients to update the model parameters. This is called data parallelism.

There are two variants of this approach: synchronous updates and asynchronous updates.

Synchronous updates

With synchronous updates, the aggregator waits for all gradients to be available before
computing the average and applying the result.

Asynchronous updates

With asynchronous updates, whenever a replica has finished computing the gradients, it immediately uses them to update the model parameters. There is no aggregation (remove the “mean” step in Figure 12-17), and no synchronization. Replicas just work independently of the other replicas. Since there is no waiting for the other replicas, this approach runs more training steps per minute. Moreover, although the parameters still need to be copied to every device at every step, this happens at different times for each replica so the risk of bandwidth saturation is reduced.

By the time a replica has finished computing the gradients based on some parameter values, these parameters will have been updated several times by other replicas (on average N – 1 times if there are N replicas) and there is no guarantee that the computed gradients will still be pointing in the right direction (see Figure 12-18). When gradients are severely out-of-date, they are called stale gradients: they can slow down convergence, introducing noise and wobble effects (the learning curve may contain temporary oscillations), or they can even make the training algorithm diverge.

There are a few ways to reduce the effect of stale gradients:

Reduce the learning rate.
Drop stale gradients or scale them down.
Adjust the mini-batch size.
Start the first few epochs using just one replica (this is called the warmup phase). Stale gradients tend to be more damaging at the beginning of training, when gradients are typically large and the parameters have not settled into a valley of the cost function yet, so different replicas may push the parameters in quite different directions.

A paper published by the Google Brain team in April 2016 benchmarked various approaches and found that data parallelism with synchronous updates using a few spare replicas was the most efficient, not only converging faster but also producing a better model. However, this is still an active area of research, so you should not rule out asynchronous updates quite yet.

Bandwidth saturation

For some models, typically relatively small and trained on a very large training set, you are often better off training the model on a single machine with a single GPU.

Here are a few simple steps you can take to reduce the saturation problem:

Group your GPUs on a few servers rather than scattering them across many servers. This will avoid unnecessary network hops.
Shard the parameters across multiple parameter servers (as discussed earlier).
Drop the model parameters’ float precision from 32 bits (tf.float32) to 16 bits (tf.bfloat16). This will cut in half the amount of data to transfer, without much impact on the convergence rate or the model’s performance.

You can actually drop down to 8-bit precision after training to reduce the size of the model and speed up computations. This is called quantizing the neural network. It is particularly useful for deploying and running pretrained models on mobile phones. See Pete Warden’s great post on the subject.

TensorFlow implementation

To implement data parallelism using TensorFlow, you first need to choose whether you want in-graph replication or between-graph replication, and whether you want synchronous updates or asynchronous updates.

With in-graph replication + synchronous updates, you build one big graph containing all the model replicas (placed on different devices), and a few nodes to aggregate all their gradients and feed them to an optimizer. Your code opens a session to the cluster and simply runs the training operation repeatedly.

With in-graph replication + asynchronous updates, you also create one big graph, but with one optimizer per replica, and you run one thread per replica, repeatedly running the replica’s optimizer.

With between-graph replication + asynchronous updates, you run multiple independent clients (typically in separate processes), each training the model replica as if it were alone in the world, but the parameters are actually shared with other replicas (using a resource container).

With between-graph replication + synchronous updates, once again you run multiple clients, each training a model replica based on shared parameters, but this time you wrap the optimizer (e.g., a MomentumOptimizer) within a SyncReplicasOptimizer. Each replica uses this optimizer as it would use any other optimizer, but under the hood this optimizer sends the gradients to a set of queues (one per variable), which is read by one of the replica’s SyncReplicasOptimizer, called the chief. The chief aggregates the gradients and applies them, then writes a token to a token queue for each replica, signaling it that it can go ahead and compute the next gradients. This approach supports having spare replicas.

To sum up, a cluster is a set of TensorFlow servers, called tasks. A job is a named group of tasks that have a common role. A machine may also contain several devices, including CPUs and GPUs. A machine may run several tasks, each of which can grab all of part of RAMs of every GPU. Every TensorFlow server provides two services: the master service and the worker service. The master service allows clients to open sessions and use them to run graphs. It coordinates the computations across tasks, relying on the worker service to actually execute computations on other tasks and get their results. In a distributive environment, an operation can be pinned to a device. You can open a session on any of the servers, from a client located in any process on any machine (even from a process running one of the tasks), and use the session like a regular local session. One client can connect to multiple servers by opening multiple sessions in different threads. One server can handle multiple sessions simultaneously from one or more clients. You can run one client per
task (typically within the same process), or just one client to control all tasks. If you create
a variable named x using one client session, it will automatically be available to any other session on the same cluster (even if both sessions are connected to a different server).

你可能感兴趣的:(Hands-On,Machine,Learning,with,Scik,python,机器学习,深度学习)

腾讯云大模型知识引擎与DeepSeek：打造懒人专属的谷歌浏览器翻译插件大富大贵7 程序员知识储备1 程序员知识储备2 程序员知识储备3 腾讯云云计算
摘要：随着人工智能技术的飞速发展，越来越多的前沿技术和工具已走入日常生活。翻译工具作为跨语言沟通的桥梁，一直处于技术创新的风口浪尖。本文探讨了腾讯云大模型知识引擎与DeepSeek结合谷歌浏览器插件的可能性，旨在为用户提供一种便捷、高效的翻译体验。通过应用深度学习、自然语言处理和知识图谱技术，该插件不仅能实时翻译网页内容，还能根据上下文进行智能推荐，实现精准的语境转换。本文将详细阐述其设计思路、技
python动物识别系统(仅有识别功能) OnlySecondS
''@Time:2022/03/298:39@Author:11863@File:AIS_main.py@software:PyCharm'''rules={}#以字典形式存储#读取文件defreadRules():rulesFile=open("rules.txt","r",encoding='utf-8')forlineinrulesFile:#按行读取line=line.replace('I
深度优先搜索和广度优先搜索详细解析和区别潇杨爱吃粉深度优先宽度优先算法数据结构
一、深度优先搜索（DFS）1.核心思想像探险家走迷宫，遇到岔路就选一条路走到头，无路可走时返回上一个岔路口换另一条路。2.实现方式数据结构：栈（Stack，先进后出）或递归（隐式栈）遍历顺序：纵向深入，优先访问最深层的节点3.图解示例假设有以下树结构：A/\BC/\/DEFDFS遍历顺序（从根节点A出发）：A→B→D→E→C→F4.代码实现（Python）defdfs(graph,start):s
DeepSeek 模型未来怎么走？技术创新、行业落地全解析！网罗开发 AI 大模型人工智能人工智能职场和发展
网罗开发（小红书、快手、视频号同名）大家好，我是展菲，目前在上市企业从事人工智能项目研发管理工作，平时热衷于分享各种编程领域的软硬技能知识以及前沿技术，包括iOS、前端、HarmonyOS、Java、Python等方向。在移动端开发、鸿蒙开发、物联网、嵌入式、云原生、开源等领域有深厚造诣。图书作者：《ESP32-C3物联网工程开发实战》图书作者：《SwiftUI入门，进阶与实战》超级个体：CO
Python-modbustcp通信-plc读写张凯的工作室 python python
Python-modbustcp通信-plc读写1，功能码说明读取：%m对应READ_COILS线圈寄存器数值0和1%mw存单字节%mf浮点数%md双字节对应READ_HOLDING_REGISTERS保持寄存器写入单个写入线圈寄存器WRITE_SINGLE_COIL%m单个写入保持寄存器WRITE_SINGLE_REGISTER写入多个保持寄存器WRITE_MULTIPLE_REGISTERS写
PyCharm v2024.3.5 强大的Python IDE工具支持M、Intel芯片 2401_89264762 python ide pycharm
PyCharm是一种PythonIDE，带有一整套可以帮助用户在使用Python语言开发时提高其效率的工具，比如调试、语法高亮、Project管理、代码跳转、智能提示、自动完成、单元测试、版本控制。此外，该IDE提供了一些高级功能，以用于支持Django框架下的专业Web开发。应用介绍PyCharm是由JetBrains打造的一款PythonIDE，VS2010的重构插件Resharper就是出自
免费界面库 python_一个非常简单好用的Python图形界面库(PysimpleGUI) 不妧免费界面库 python
前一阵，我在为朋友编写一个源代码监控程序的时候，发现了一个Python领域非常简单好用的图形界面库。说起图形界面库，你可能会想到TkInter、PyQt、PyGUI等流行的图形界面库，我也曾经尝试使用，一个很直观的感受就是，这太难用了。就去网上搜搜，看看有没有一些demo，拿来改改，结果很少有，当时我就放弃了这些图形库的学习，转而使用了vue+flask的形式以浏览器网页作为程序界面，因为我会这个
Python 网络爬虫：从入门到实践一ge科研小菜菜编程语言 Python python
个人主页：一ge科研小菜鸡-CSDN博客期待您的关注网络爬虫是一种自动化的程序，用于从互联网上抓取数据。Python以其强大的库和简单的语法，是开发网络爬虫的绝佳选择。本文将详细介绍Python网络爬虫的基本原理、开发工具、常用框架以及实践案例。一、网络爬虫的基本原理网络爬虫的工作流程通常包括以下步骤：发送请求：向目标网站发送HTTP请求，获取网页内容。解析内容：提取需要的数据，可以是HTML标签
PySimpleGUI 4.60.5 孔帆贝
PySimpleGUI4.60.5【下载地址】PySimpleGUI4.60.5**PySimpleGUI**是一款专为简化PythonGUI（图形用户界面）编程而生的库。该库设计宗旨在于通过提供简洁、易懂的API接口，使开发者能够以更快的速度和更少的代码量创建出美观实用的应用程序。对于无论是GUI编程新手还是寻求快速开发工具的老手来说，PySimpleGUI都是一个极具吸引力的选择。其通过封装了
《今日AI-人工智能-编程日报》-源自2025年3月19日小亦编辑部每日AI-人工智能-编程日报人工智能
1.豆包AI编程功能迎来三项重磅升级豆包平台今日宣布其AI编程功能迎来三项重要升级，包括：HTML实时预览：支持用户在编写HTML代码时实时查看网页效果，显著提升前端开发效率，尤其适用于小游戏和网页制作。Python代码直接运行与一键修复：用户可直接运行Python代码，并在出错时一键修复，极大降低了编程门槛，提升了开发效率。生成完整项目：新增生成完整项目的功能，帮助用户快速创建应用程序，缩短开发
模拟退火算法：原理、应用与优化策略尹清雅算法
摘要模拟退火算法是一种基于物理退火过程的随机搜索算法，在解决复杂优化问题上表现出独特优势。本文详细阐述模拟退火算法的原理，深入分析其核心要素，通过案例展示在函数优化、旅行商问题中的应用，并探讨算法的优化策略与拓展方向，为解决复杂优化问题提供全面的理论与实践指导，助力该算法在多领域的高效应用与创新发展。一、引言在现代科学与工程领域，复杂优化问题无处不在，如资源分配、路径规划、机器学习模型参数调优等。
python PySimpleGUI 使用 Seeklike python
#PySimpleGUI库快速简单构建一个gui窗口#PySimpleGUI是一个用于简化GUI编程的Python包，它封装了多种底层GUI框架（如tkinter、Qt、WxPython等），提供了简单易用的API。#PySimpleGUI包含了大量的控件（也称为小部件或组件），这些控件可以帮助你快速构建用户界面#导包importPySimpleGUIassgimportcv2importkeyb
2024年最全Python二级考试试题汇总（史上最全） 2401_84584831 程序员 python 开发语言算法
C‘1,2,3,4,5,’D1,2,3,4,5,正确答案：D以下程序的输出结果是：a=30b=1ifa>=10:a=20elifa>=20:a=30elifa>=30:b=aelse:b=0print(‘a={},b={}’.format(a,b))Aa=30,b=1Ba=30,b=30Ca=20,b=20Da=20,b=1正确答案：D以下程序的输出结果是：s=‘’try:foriinrange(
如何通过Python实现自动化任务：从入门到实践小弟有话说1.0 python 自动化开发语言
在当今快节奏的数字化时代，自动化技术正逐渐成为提高工作效率的利器。无论是处理重复性任务，还是管理复杂的工作流程，自动化都能为我们节省大量时间和精力。本文将以Python为例，带你从零开始学习如何实现自动化任务，并通过一个实际案例展示其强大功能。一、为什么选择Python实现自动化？Python作为一种简单易学、功能强大的编程语言，已经成为自动化领域的首选工具。以下是Python在自动化中的几大优势
2024年Python最新Python二级考试试题汇总（史上最全）_计算机二级python真题 2301_82243979 程序员 python 开发语言前端
表达式1001==0x3e7的结果是：AfalseBFalseCtrueDTrue正确答案：B以下选项，不是Python保留字的选项是：AdelBpassCnotDstring正确答案：D表达式eval(‘500/10’)的结果是：A‘500/10’B500/10C50D50.0正确答案：D表达式type(eval(‘45’))的结果是：ABCNoneD正确答案：D表达式divmod(20,3)的
Python点名器代码及打包教程羽落惊鸿TQ python 开发语言
接下来再写一个功能性齐全稍微复杂一点的Python点名器程序，在原简易版的基础上增加历史记录功能、支持多种名单格式（CSV/Excel）、增加点名统计功能，并详细说明了将该程序打包成exe可执行文件的方法，以下是源代码，仅供学习参考：importtkinterastkfromtkinterimportttk, messagebox, filedialogimportrandomimportcsvi
基于python+django的旅游信息网站-旅游景点门票管理系统源码+运行步骤冷琴1996 Python系统设计 python django 旅游
该系统是基于python+django开发的旅游景点门票管理系统。是给师弟做的课程作业。大家学习过程中，遇到问题可以在github咨询作者。学习过程问题可以留言哦演示地址前台地址：http://travel.gitapp.cn后台地址：http://travel.gitapp.cn/admin后台管理帐号：用户名：admin123密码：admin123源码地址https://github.com/
50个常见的python毕业设计/课程设计（源码+文档）冷琴1996 Python系统设计 python 课程设计开发语言
计算机课程设计/毕业设计指南，为计算机相关专业毕业生提供源码、数据库安装、远程调试等相关服务，提供功能讲解视频。下面是50个基于python/django/vue的毕业设计/课程设计。1.网上商城系统这是一个基于python+vue开发的商城网站，平台采用B/S结构，后端采用主流的Python语言进行开发，前端采用主流的Vue.js进行开发。整个平台包括前台和后台两个部分。前台功能包括：首页、商品
PyTorch深度学习框架60天进阶学习计划 - 第28天：多模态模型实践（二）凡人的AI工具箱深度学习 pytorch 学习 AI编程人工智能 python
PyTorch深度学习框架60天进阶学习计划-第28天：多模态模型实践（二）5.跨模态检索系统应用场景5.1图文匹配系统的实际应用应用领域具体场景优势电子商务商品图像搜索、视觉购物用户可以上传图片查找相似商品或使用文本描述查找商品智能媒体内容推荐、图片库搜索通过内容的语义理解提供更精准的推荐和搜索社交网络基于内容的帖子推荐理解用户兴趣，提供更相关的内容推荐教育技术多模态教学资源检索教师和学生可以更
PyTorch深度学习框架60天进阶学习计划 - 第28天：多模态模型实践（一）凡人的AI工具箱深度学习 pytorch 学习 AI编程人工智能 python
PyTorch深度学习框架60天进阶学习计划-第28天：多模态模型实践（一）引言：跨越感知的边界欢迎来到我们的PyTorch学习旅程第28天！今天我们将步入AI世界中最激动人心的领域之一：多模态学习。想象一下，如果你的模型既能"看"又能"读"，并且能够理解图像与文字之间的联系，这将为我们打开怎样的可能性？今天我们将专注于构建图文匹配系统，学习如何使用CLIP（ContrastiveLanguage
分享Python7个爬虫小案例（附源码）人工智能-猫猫爬虫 python 开发语言
在这篇文章中，我们将分享7个Python爬虫的小案例，帮助大家更好地学习和了解Python爬虫的基础知识。以下是每个案例的简介和源代码：1.爬取豆瓣电影Top250这个案例使用BeautifulSoup库爬取豆瓣电影Top250的电影名称、评分和评价人数等信息，并将这些信息保存到CSV文件中。importrequestsfrombs4importBeautifulSoupimportcsv#请求U
后端框架模块化 GIS程序媛—椰子后端
后端框架的模块化设计旨在简化开发流程、提高可维护性，并通过分层解耦降低复杂性。以下是常见的后端模块及其在不同语言（Node.js、Java、Python）中的实现方式：目录1.路由（Routing）2.中间件（Middleware）3.数据库与ORM（models）4.迁移（Migration）5.服务层（ServiceLayer）6.配置管理（Configuration）7.依赖注入（DI）8.
Pyhton 基础 368. python python 开发语言
初识PythonPython是一种解释型语言Python使用缩进对齐组织代码执行，所以没有缩进的代码，都会在载入时自动执行数据类型：整形int无限大浮点型float小数复数complex由实数和虚数组成Python中有6个标准的数据类型：Number(数字)String(字符串)List(列表)Tuple(元组)Sets(集合)Dictionart(字典)其中不可变得数据：Number(数字)St
基于python+django+mysql的小区物业管理系统源码+运行步骤冷琴1996 Python系统设计 python 开发语言
该系统是基于python+django开发的小区物业管理系统。适用场景：大学生、课程作业、毕业设计。学习过程中，如遇问题可以在github给作者留言。主要功能有：业主管理、报修管理、停车管理、资产管理、小区管理、用户管理、日志管理、系统信息。源码学习技术。演示地址http://wuye.gitapp.cn/admin后台管理帐号：用户名：admin123密码：admin123源码地址https:/
用Python修改Word文档字体
在数字化办公场景中，Word文档作为主流文件格式承载着大量商务文书与学术资料。传统手动调整字体格式的操作模式存在显著局限性：当面对批量文档处理、动态内容生成或企业级模板维护时，逐一手工修改不仅效率低下，更难以保障格式规范的统一性。通过Python实现文档字体的程序化控制，能够有效构建自动化处理流程，在确保排版精准度的同时，显著提升文档批量化操作能力。本文将介绍如何使用Python修改Word文档段
10.2 如何解决从复杂 PDF 文件中提取数据的问题？墨染辉大语言模型 pdf
10.2如何解决从复杂PDF文件中提取数据的问题？解决方案：嵌入式表格检索解释：嵌入式表格检索是一种专门针对从复杂PDF文件中的表格提取数据的技术。它结合了表格识别、解析和语义理解，使得从复杂结构的表格中检索信息成为可能。具体步骤：表格检测和识别：目标：在PDF页面中准确地定位和识别表格区域。方法：使用计算机视觉和深度学习技术，如卷积神经网络（CNN）或其他先进的图像处理算法。效果：能够检测出页面
TensorFlow深度学习实战项目：从入门到精通点我头像干啥 Ai 深度学习 tensorflow 人工智能
引言深度学习作为人工智能领域的一个重要分支，近年来取得了显著的进展。TensorFlow作为Google开源的深度学习框架，因其强大的功能和灵活的架构，成为了众多开发者和研究者的首选工具。本文将带领大家通过一个实战项目，深入理解TensorFlow的使用方法，并掌握深度学习的基本流程。1.TensorFlow简介1.1TensorFlow是什么？TensorFlow是一个开源的机器学习框架，由Go
python中strip的使用 ICER瞌睡虫
今天聊聊python去除字符串空格的函数：strip（）和replace（）1.strip():函数功能描述：Pythonstrip()方法用于移除字符串头尾指定的字符（默认为空格或换行符）或字符序列。注意：该方法只能删除开头或是结尾的字符，不能删除中间部分的字符。格式：str.strip([char])。其中，str为待处理的字符，char指定去除的源字符串首尾的字符。返回结果：去除空格时候的新
基于python+django的家教预约网站-家教信息管理系统源码+运行步骤冷琴1996 Python系统设计 python django 开发语言
该系统是基于python+django开发的家教预约网站。是给师妹做的课程作业。大家在学习过程中，遇到问题可以在github给作者留言。共同学习进步哦效果演示前台地址：http://jiajiao.gitapp.cn后台地址：http://jiajiao.gitapp.cn/admin后台管理帐号：用户名：admin123密码：admin123源码地址https://github.com/geee
python strip函数用法_Python字符串函数strip()原理及用法详解 weixin_39944233 python strip函数用法
strip:用于移除字符串头尾指定的字符（默认为空格）或字符序列。注意：该方法只能删除开头或是结尾的字符，不能删除中间部分的字符。语法：str.strip([chars])str="*****thisis**string**example....wow!!!*****"print(str.strip('*'))#指定字符串*输出结果：thisis**string**example....wow!!
解线性方程组 qiuwanchi
package gaodai.matrix; import java.util.ArrayList; import java.util.List; import java.util.Scanner; public class Test { public static void main(String[] args) { Scanner scanner = new Sc
在mysql内部存储代码 annan211 性能 mysql 存储过程触发器
在mysql内部存储代码在mysql内部存储代码，既有优点也有缺点，而且有人倡导有人反对。先看优点： 1 她在服务器内部执行，离数据最近，另外在服务器上执行还可以节省带宽和网络延迟。 2 这是一种代码重用。可以方便的统一业务规则，保证某些行为的一致性，所以也可以提供一定的安全性。 3 可以简化代码的维护和版本更新。 4 可以帮助提升安全，比如提供更细
Android使用Asynchronous Http Client完成登录保存cookie的问题 hotsunshine android
Asynchronous Http Client是android中非常好的异步请求工具除了异步之外还有很多封装比如json的处理，cookie的处理引用 Persistent Cookie Storage with PersistentCookieStore This library also includes a PersistentCookieStore whi
java面试题 Array_06 java 面试
java面试题第一，谈谈final, finally, finalize的区别。 final-修饰符（关键字）如果一个类被声明为final，意味着它不能再派生出新的子类，不能作为父类被继承。因此一个类不能既被声明为 abstract的，又被声明为final的。将变量或方法声明为final，可以保证它们在使用中不被改变。被声明为final的变量必须在声明时给定初值，而在以后的引用中只能
网站加速 oloz 网站加速
前序:本人菜鸟，此文研究总结来源于互联网上的资料，大牛请勿喷！本人虚心学习，多指教. 1、减小网页体积的大小，尽量采用div+css模式，尽量避免复杂的页面结构，能简约就简约。 2、采用Gzip对网页进行压缩； GZIP最早由Jean-loup Gailly和Mark Adler创建，用于UNⅨ系统的文件压缩。我们在Linux中经常会用到后缀为.gz
正确书写单例模式随意而生 java 设计模式单例
　　单例模式算是设计模式中最容易理解，也是最容易手写代码的模式了吧。但是其中的坑却不少，所以也常作为面试题来考。本文主要对几种单例写法的整理，并分析其优缺点。很多都是一些老生常谈的问题，但如果你不知道如何创建一个线程安全的单例，不知道什么是双检锁，那这篇文章可能会帮助到你。　　懒汉式，线程不安全　　当被问到要实现一个单例模式时，很多人的第一反应是写出如下的代码，包括教科书上也是这样
单例模式香水浓 java
懒汉调用getInstance方法时实例化 public class Singleton { private static Singleton instance; private Singleton() {} public static synchronized Singleton getInstance() { if(null == ins
安装Apache问题：系统找不到指定的文件 No installed service named "Apache2" AdyZhang apache http server
安装Apache问题：系统找不到指定的文件 No installed service named "Apache2" 每次到这一步都很小心防它的端口冲突问题，结果，特意留出来的80端口就是不能用，烦。解决方法确保几处： 1、停止IIS启动 2、把端口80改成其它（譬如90，800，，，什么数字都好） 3、防火墙(关掉试试) 在运行处输入 cmd 回车，转到apa
如何在android 文件选择器中选择多个图片或者视频？ aijuans android
我的android app有这样的需求，在进行照片和视频上传的时候，需要一次性的从照片/视频库选择多条进行上传但是android原生态的sdk中，只能一个一个的进行选择和上传。我想知道是否有其他的android上传库可以解决这个问题，提供一个多选的功能，可以使checkbox之类的，一次选择多个处理方法官方的图片选择器(但是不支持所有版本的androi，只支持API Level
mysql中查询生日提醒的日期相关的sql baalwolf mysql
SELECT sysid,user_name,birthday,listid,userhead_50,CONCAT(YEAR(CURDATE()),DATE_FORMAT(birthday,'-%m-%d')),CURDATE(), dayofyear( CONCAT(YEAR(CURDATE()),DATE_FORMAT(birthday,'-%m-%d')))-dayofyear(
MongoDB索引文件破坏后导致查询错误的问题 BigBird2012 mongodb
问题描述： MongoDB在非正常情况下关闭时，可能会导致索引文件破坏，造成数据在更新时没有反映到索引上。解决方案：使用脚本，重建MongoDB所有表的索引。 var names = db.getCollectionNames(); for( var i in names ){ var name = names[i]; print(name);
Javascript Promise bijian1013 JavaScript Promise
Parse JavaScript SDK现在提供了支持大多数异步方法的兼容jquery的Promises模式，那么这意味着什么呢，读完下文你就了解了。一.认识Promises “Promises”代表着在javascript程序里下一个伟大的范式，但是理解他们为什么如此伟大不是件简
[Zookeeper学习笔记九]Zookeeper源代码分析之Zookeeper构造过程 bit1129 zookeeper
Zookeeper重载了几个构造函数，其中构造者可以提供参数最多，可定制性最多的构造函数是 public ZooKeeper(String connectString, int sessionTimeout, Watcher watcher, long sessionId, byte[] sessionPasswd, boolea
【Java命令三】jstack bit1129 jstack
jstack是用于获得当前运行的Java程序所有的线程的运行情况(thread dump），不同于jmap用于获得memory dump [hadoop@hadoop sbin]$ jstack Usage: jstack [-l] <pid> (to connect to running process) jstack -F
jboss 5.1启停脚本　动静分离部署 ronin47
以前启动jboss，往各种xml配置文件，现只要运行一句脚本即可。start nohup sh /**/run.sh -c servicename -b ip -g clustername -u broatcast jboss.messaging.ServerPeerID=int -Djboss.service.binding.set=p
UI之如何打磨设计能力? brotherlamp UI ui教程 ui自学 ui资料 ui视频
在越来越拥挤的初创企业世界里，视觉设计的重要性往往可以与杀手级用户体验比肩。在许多情况下，尤其对于 Web 初创企业而言，这两者都是不可或缺的。前不久我们在《右脑革命：别学编程了，学艺术吧》中也曾发出过重视设计的呼吁。如何才能提高初创企业的设计能力呢?以下是 9 位创始人的体会。 1.找到自己的方式如果你是设计师，要想提高技能可以去设计博客和展示好设计的网站如D-lists或
三色旗算法 bylijinnan java 算法
import java.util.Arrays; /** 问题：假设有一条绳子，上面有红、白、蓝三种颜色的旗子，起初绳子上的旗子颜色并没有顺序，您希望将之分类，并排列为蓝、白、红的顺序，要如何移动次数才会最少，注意您只能在绳子上进行这个动作，而且一次只能调换两个旗子。网上的解法大多类似：在一条绳子上移动，在程式中也就意味只能使用一个阵列，而不使用其它的阵列来
警告:No configuration found for the specified action: \'s chiangfai configuration
1.index.jsp页面form标签未指定namespace属性。  <%@taglib prefix="s" uri="/struts-tags"%> ... <s:form action="submit" method="post"&g
redis -- hash_max_zipmap_entries设置过大有问题 chenchao051 redis hash
使用redis时为了使用hash追求更高的内存使用率，我们一般都用hash结构，并且有时候会把hash_max_zipmap_entries这个值设置的很大，很多资料也推荐设置到1000，默认设置为了512，但是这里有个坑 #define ZIPMAP_BIGLEN 254 #define ZIPMAP_END 255 /* Return th
select into outfile access deny问题 daizj mysql txt 导出数据到文件
本文转自：http://hatemysql.com/2010/06/29/select-into-outfile-access-deny%E9%97%AE%E9%A2%98/ 为应用建立了rnd的帐号，专门为他们查询线上数据库用的，当然，只有他们上了生产网络以后才能连上数据库，安全方面我们还是很注意的，呵呵。授权的语句如下： grant select on armory.* to rn
phpexcel导出excel表简单入门示例 dcj3sjt126com PHP Excel phpexcel
<?php error_reporting(E_ALL); ini_set('display_errors', TRUE); ini_set('display_startup_errors', TRUE); if (PHP_SAPI == 'cli') die('This example should only be run from a Web Brows
美国电影超短200句 dcj3sjt126com 电影
1. I see．我明白了。2. I quit! 我不干了!3. Let go! 放手!4. Me too．我也是。5. My god! 天哪!6. No way! 不行!7. Come on．来吧(赶快)8. Hold on．等一等。9. I agree。我同意。10. Not bad．还不错。11. Not yet．还没。12. See you．再见。13. Shut up!
Java访问远程服务 dyy_gusi httpclient webservice get post
随着webService的崛起，我们开始中会越来越多的使用到访问远程webService服务。当然对于不同的webService框架一般都有自己的client包供使用，但是如果使用webService框架自己的client包，那么必然需要在自己的代码中引入它的包，如果同时调运了多个不同框架的webService，那么就需要同时引入多个不同的clien
Maven的settings.xml配置 geeksun settings.xml
settings.xml是Maven的配置文件，下面解释一下其中的配置含义： settings.xml存在于两个地方： 1.安装的地方：$M2_HOME/conf/settings.xml 2.用户的目录：${user.home}/.m2/settings.xml 前者又被叫做全局配置，后者被称为用户配置。如果两者都存在，它们的内容将被合并，并且用户范围的settings.xml优先。
ubuntu的init与系统服务设置 hongtoushizi ubuntu
转载自： http://iysm.net/?p=178 init Init是位于/sbin/init的一个程序，它是在linux下，在系统启动过程中，初始化所有的设备驱动程序和数据结构等之后，由内核启动的一个用户级程序，并由此init程序进而完成系统的启动过程。 ubuntu与传统的linux略有不同，使用upstart完成系统的启动，但表面上仍维持init程序的形式。运行
跟我学Nginx+Lua开发目录贴 jinnianshilongnian nginx lua
使用Nginx+Lua开发近一年的时间，学习和实践了一些Nginx+Lua开发的架构，为了让更多人使用Nginx+Lua架构开发，利用春节期间总结了一份基本的学习教程，希望对大家有用。也欢迎谈探讨学习一些经验。目录第一章安装Nginx+Lua开发环境第二章 Nginx+Lua开发入门第三章 Redis/SSDB+Twemproxy安装与使用第四章 L
php位运算符注意事项 home198979 位运算 PHP &
$a = $b = $c = 0; $a & $b = 1; $b | $c = 1 问a,b,c最终为多少? 当看到这题时，我犯了一个低级错误，误以为位运算符会改变变量的值。所以得出结果是1 1 0 但是位运算符是不会改变变量的值的，例如： $a=1;$b=2; $a&$b; 这样a,b的值不会有任何改变
Linux shell数组建立和使用技巧 pda158 linux
1.数组定义　　[chengmo@centos5 ~]$ a=(1 2 3 4 5) 　　[chengmo@centos5 ~]$ echo $a 　　1 　　一对括号表示是数组，数组元素用“空格”符号分割开。　　 2.数组读取与赋值　　得到长度：　　[chengmo@centos5 ~]$ echo ${#a[@]} 　　5 　　用${#数组名[@或
hotspot源码(JDK7) ol_beta java HotSpot jvm
源码结构图，方便理解： ├─agent Serviceab
Oracle基本事务和ForAll执行批量DML练习 vipbooks oracle sql
基本事务的使用：从账户一的余额中转100到账户二的余额中去，如果账户二不存在或账户一中的余额不足100则整笔交易回滚 select * from account; -- 创建一张账户表 create table account( -- 账户ID id number(3) not null, -- 账户名称 nam