简牧

distributed tensorflow - tensorflow dev summit 2017

整理自tensorflow dev summit 2017- distributed tensorflow－YouTube视频，文字来自于英文字幕。
如何利用低级别API实现使用分布式tensorflow。

目标

1）模型的replica
2）如何把变量放在不同device上
3）session和server
4）fault tolerance容错

单机和多机简单对比

distributed tensorflow - tensorflow dev summit 2017_第2张图片

展示tensorflow如何从单机到分布式。
在单机时，通过tf.device把graph拆分到不同的device上，机器自动完成子图的训练。

I’m going to show you how the core concepts you might be used to in single-process TensorFlow translate to the distributed world. I’ll give you some ideas for how to deal with the complexity that ensues. So I just claimed that distributed TensorFlow has a minimalist core. What did I mean by that? Well, let’s say I’ve got just one computer, and it’s got a GCPU device and a GPU device in it. And if I want to write a TensorFlow program to use these devices, I can put these little with_tf.device annotations in my code, so that for this example, the variables go on the CPU, and the math goes on the GPU, where it’s going to run faster. Then, when I come to run this program, TensorFlow splits up the graph between the devices and it puts in the necessary DNA between the devices to run it for me.

distributed tensorflow - tensorflow dev summit 2017_第3张图片

对于多机情况，把graph拆分到不同机器上对于开发者来说是透明的，只是在声明tf.dervice的时候名字稍微不同，不同机器之间通过gRPC通信交换tensor。

So what happens if you have multiple machines? Let’s say, for reasons that will become apparent later, that we want to take those variables and put them on the CPU device of a different process. Well, TensorFlow treats the remote devices exactly the same as the local ones. All I have to do is add just a little bit of information to these device names, and the runtime will put the variables in a different process, splitting up the graph between the devices in the different processes and adding the necessary communication. In this case, it will be using GRPC to transfer tensors between the processes instead of DNA from the GPU device. So there you have it. Using distributed TensorFlow is just a simple matter of getting all of your device placements exactly right. And yeah, I heard a wry chuckle. Yeah, I’m sure you know exactly how easy that can be, if you’ve ever written a TensorFlow program.

graph replication

in-graph replication和between-graph replication是tensorflow中两个关于分布式的重要概念，是接近数据并行data parallel分布式训练的两种模型执行方式，其中对于实践来说，between-graph replication又是面对大规模机器训练更有扩展性的方式。

in-graph replication

distributed tensorflow - tensorflow dev summit 2017_第4张图片

distributed tensorflow - tensorflow dev summit 2017_第5张图片

So the first idea that works pretty well for distributed training, particularly when a single model will fit in a single machine, is replication. So just like in DisBelief, we take the compute-intensive part of the model training, the forwards and the backprop, and we make a copy of it in multiple worker tasks, so that each task works on a different subset of the data. This is data parallel training, like we were doing for that Inception example back at the start.

And the simplest thing, simplest way we can achieve this is by doing something I’m going to call in-graph replication. The reason for this name will hopefully become self-explanatory when I-- or maybe just explanatory when I tell you what the code does.

We start by putting the variables on a PS task, like the earlier example. And this is just so that they’re in a central location that they can be accessed by all of the workers.
And then the easiest way to do the in-graph replication is just to split up a batch of input data into equal-sized chunks, loop over the worker tasks, and use this tf.device string here to put a subgraph on each worker to compute a partial result.
And then finally, we combine together all of the partial results into a single loss value that we optimize by using a standard TensorFlow optimizer. And sure enough, when you tell it to compute the loss, TensorFlow will split up the graph across the workers, and it will run across these worker tasks and the PS all in parallel.

So in-graph replication is pretty easy to achieve. It’s not a big modification to your existing programs. And it works pretty well up to a small number of replicas. If you want to replicate across all the GPUs in a single machine, then maybe in-graph replication is the way to go. But one of the things we found out when we tried to apply this technique to a large model like Inception_V, and we tried to scale it up to a couple of hundred machines, was that this graph gets really big if you have to materialize all the replicas in it. And the client gets bogged down trying to coordinate the computation and to build this whole graph.

between-graph replication

And that’s why we came up with an alternative approach, which is called between-graph replication. This is currently what we use most internally. It’s what we recommend in Cloud ML. And it’s also what our high-level APIs are designed to do. So if you’ve ever written an NPI program, between-graph replication should be a kind of familiar concept.

So instead of running one all-powerful client program that knows about all of the worker replicas, we run a smaller client program on each task. And that client program just builds up the graph for a single replica of the model. And this client program is essentially doing the same thing, with one key difference in the device placement. So it takes kind of the non-parameter part of the graph and it puts it on the local devices, or the devices that are local to that worker replica. And so now when you run it, each program is running its smaller graph independently, and they get mapped to different subsets of the devices that intersect on the PS task in the middle. There’s a little bit of magic here which I should probably explain, in the interest of full disclosure. So each replica places its variables on the same PS task. And when you’re running in distributed mode, by default, any two clients that create a variable with the same name on the same device will share the same backing storage for that variable. And if you’re doing replicated training, this is exactly what you want. So the updates from task when it’s applying its gradients, will be visible in task and vice versa. So you’ll train faster. But this brings us to another issue, which is how do you decide where to put the variables?

distributed tensorflow - tensorflow dev summit 2017_第6张图片

variables placement

round-robin variables

distributed tensorflow - tensorflow dev summit 2017_第7张图片

So far, I’ve just been putting them on job PS task and setting that as the explicit device string for the variables. But device strings, you know, they’re all very good if you want to put all of your ops in one particular location. But you often want to do things like have more than one PS task, say if you want to distribute the work of updating the variables or distribute the network load of fetching them to the workers. So instead of passing a device string to tf.device, you can also pass a device function. And we use these to build more complicated, more sophisticated placement strategies. Device functions are pretty general, very powerful. They let you basically customize the placement of every single op in the graph. That makes them a little bit tricky to use. So one of the things we’ve done is to provide a few pre-canned device functions to make your life a bit easier. And the simplest of these is called tf_train_replica_device_setter. And its job is to assign variables to PS tasks in a round-robin fashion as they’re created. One nice thing about this device function is you can just wrap your entire model building code in this with block. It only affects the variables, puts them on PS tasks. And the rest of your ops in the graph will go on a worker. So if you’re doing between-graph replication, this takes care of it all for you. So for this program, it will assign devices to the parameters as they’re created. So the first weight matrix will go on task 0. A bias vector will go on task 1. The second weight’s on task 2. And then the second bias is back on task 0.

load balancing variables

distributed tensorflow - tensorflow dev summit 2017_第8张图片

This is obviously not an optimal balanced load for these variables, neither in terms of the memory usage or the amount of work that would need to be done to update these variables. And just think, if we only had two PS tasks here-- I guess I should have drawn a diagram-- we’d end up with an even worse case, because all the weights would be on task 0, and all the biases would be on the other. We’d have an even bigger imbalance between those tasks. So clearly, we can do better here. And one of the ways that we’ve provided to achieve a more balanced load is to use something called a load balancing strategy. And this is an optional argument to the replica device setter. Right now we only have, I think, a simple greedy strategy that does a kind of online bin packing based on the number of bytes in your parameters. And it leads to this more balanced outcome. Each of the weight matrices is put on a separate PS task, and the biases get packed together on task 0. It’s a lot more balanced, and this should give you a lot better performance.

partitioned variables

distributed tensorflow - tensorflow dev summit 2017_第9张图片

And so far, I’ve only talked about relatively small variables that all fit in a single task. But what about these outrageously large model parameters, things like the large embeddings that might be tens of gigabytes in size? Well, to deal with these, we have something called partitioners. They do exactly what it sounds like they would do. If you create a variable with a partitioner, like this one here, TensorFlow will split the large logical variable into three concrete parts and assign them to different PS tasks. And one nice thing about doing things this way is if you take that embedding variable and then you pass it to some of the embedding-related functions, like embedding look-up, for example, it will take advantage of the knowledge about the fact that it’s going to be partitioned, and it will offload some of the embedding look-up computation and the gradients to the parameter server devices themselves.

variables summary

distributed tensorflow - tensorflow dev summit 2017_第10张图片

All right. That was a lot of detail about device placement. But it is a very big topic. I guess you could say it’s a combinatorially large problem domain that we have to solve. The main thing I want you to take away is that the tf_train_replica_device_setter is this simple heuristic that works for a lot of distributed training use cases. And you can customize it if you need to by providing these optional strategies for things like load balancing and partitioning, or even writing your own device functions that override the simple policies we provide. And this is definitely an area where we’re always on the lookout for new and better device placement policies. So if you do end up implementing your own and you think it might be generally useful, we really encourage you to send it as a pull request, and we can add it to the set of tools that people have to make their lives easier.

sessions and servers

distributed tensorflow - tensorflow dev summit 2017_第11张图片

distributed tensorflow - tensorflow dev summit 2017_第12张图片

All right. So you’ve built your graph. You’ve got all of the devices set. And now you want to go ahead and run it. But when you create a TensorFlow session using code like this, that session will only know about the devices in a local machine. And let’s say you have computers sitting idle over there. How do you make it use those computers? The answer is that you create this thing called a TensorFlow server in each of those machines. And you configure those servers in a cluster so that they can communicate over the network.

Now, looking at it in more detail, the first thing we need to do is to provide a cluster spec. This is something that tells TensorFlow about the machines that you want to run on. A cluster spec is really just a dictionary that maps the names of jobs-- so that’s things like worker and PS, in this example-- to a list of one or more network addresses that correspond to the tasks in each job. Now, we don’t actually expect you to type in all of these addresses by hand. It gets kind of error prone. But in the next talk, Jonathan’s going to be showing you how a cluster manager like Kubernetes or Mesos will do this for you. And if you’re using a cluster manager, internally we use a cluster manager called Borg that a lot of this stuff was inspired by.

So typically, your cluster manager will run an instance of your program on each machine in the cluster, giving it the same cluster spec. And then it’ll start a TensorFlow server in each program. It’ll pass it a particular job name and task index that matches the address of the local machine in that cluster. And then finally, when you create your session, you specify the local server’s address as the target, which is what enables it to connect through that server to any of the machines mentioned in the cluster spec. And then you’re good to go. So your session run call can run code on any device in the entire cluster. And typically, what the worker will do is it’ll start a training loop that just iterates over its partition of the data, running a training op over and over again. OK. So that’s what a worker looks like.

ps tasks

distributed tensorflow - tensorflow dev summit 2017_第13张图片

PS task is much simpler. So PS tasks simply-- they have basically no client code. They just respond to incoming bits of graph that are sent to them by other workers. So in this case, you just build the server. You say it’s job name PS. And then you call join on the server. And all that does is it blocks waiting for connections to come in from other nodes in the cluster. And actually, it’s a bit weird. So I deal with a lot of questions on stack overflow. And one of the common ones is, can you show me where in the implementation of server.join the parameter server code lives? And if you go and look at it, it’s like, five, lines of C++. It just does some error checking, and then it blocks on a couple of threads. It joins a couple of threads. That’s why it’s called join. But it’s a reasonable misconception, because this is your PS task. Well, what’s that actually doing? Well, this highlights something quite important about how distributed TensorFlow works. So parameter servers, everything, all the behavior of a parameter server, is not implemented at the low level in these servers or in the execution engine, but instead it’s built out of TensorFlow programming primitives, as these little bits of data flow graph that a worker ships to a server to say, manage some parameters for me and update them this way. And that, representing the way we program the parameter servers as little fragments of data flow graph, is what gives us the flexibility. So unless you change everything about how parameter management is implemented, if you want to customize the optimization algorithm, or maybe change the synchronization scheme for applying the updates, you can do that just by changing the graph. And we’ve gained a lot of performance by doing that in some of our algorithms.

Fault tolerance

save models

distributed tensorflow - tensorflow dev summit 2017_第14张图片

distributed tensorflow - tensorflow dev summit 2017_第15张图片

distributed tensorflow - tensorflow dev summit 2017_第16张图片

All right. So now that your servers are listening, your sessions are all created, and your training loop is running, are we done? Well, whenever you run some long-running training job on a set of machines, I hope you take the wise words of Leslie Lamport to heart, and always use a saver in any long-running training process.

So a saver, for those of you who haven’t seen that before, is just what you use to write out a checkpoint of your model parameters to disk. The same saver can be used for local and distributed training, but it has a couple of useful features in distributed mode that I want to highlight.
The first one is that you’ll almost certainly want to set sharded equals true when you create your saver. So in this example, we have three PS tasks, and so sharded equals true tells TensorFlow, write the checkpoint in three shards, each containing all the variables from one particular parameter server. And that means that the parameter servers can write directly to the file system, and we don’t have to collect all of the values in one place in order to write them out. So actually the default behavior is kind of pessimal, because if you set sharded equals false, it will bring all of the variables into one save op, meaning they have to be materialized in memory at one process before it writes the first byte. And if you have these really big models, well, that’s a sure way to have a memory error. So always use sharded equals true. That’s if you remember one thing.

OK, second, if you are using between-graph replication, you now have a choice of several worker tasks that could be responsible for writing this checkpoint. And by convention, we give some extra responsibilities to worker task 0. We just picked that because there’s always a worker task in our jobs, because that’s where the numbering starts. And we call it the chief worker. And so the chief worker has a few important maintenance tasks that it has to carry out, doing things like writing checkpoints, in this case. Also things like initializing the parameters at the start of day and logging summaries for TensorBoard. These are all done by the chief.

And then the last thing to note is that savers now support a variety of distributed file systems. So instead of writing to a local file, you can write to Google Cloud Storage, let’s say if you’re running on Cloud ML. Or if you’re running on top of a Hadoop cluster, you can write straight to HDFS. And using a distributed file system here is a smart idea, because it gives you more durable storage. You’re not going to lose your model checkpoints if one of the machines goes on fire. It also makes it easier to read a checkpoint from another machine. So one common pattern we use is we have a separate evaluation task that’s running on a separate machine, and it just goes and picks up the latest checkpoint of the model and evaluates the test set on that. So this makes that kind of loosely-coupled coordination a bit easier.

fault tolerance

distributed tensorflow - tensorflow dev summit 2017_第17张图片

distributed tensorflow - tensorflow dev summit 2017_第18张图片

distributed tensorflow - tensorflow dev summit 2017_第19张图片

All right. So you’re writing checkpoints now. That’s a good step. That guards against failure. But what actually happens when you experience a fault? In the best case-- there are a few cases to consider. And the best case is that one of your non-chief workers fails. And the reason this is the best case is because these workers are typically sort of effectively stateless. So when a non-chief worker comes back up, all it has to do is contact all the PS tasks again and carry on as if nothing happened. And usually we have a cluster manager that will restart a failed process as soon as it crashes.

If a PS task fails, that’s a little worse, because PS tasks are stateful. All the workers are relying on them to send their gradients and get the new values of the parameters. So in this case, the chief is responsible for noticing the failure. It interrupts training on all of the workers, and restores all the PS tasks from the last checkpoint. The trickiest case is when the chief fails, because we’ve given it all of these extra responsibilities, and we need to make sure that we get back to a good state when it comes back up. So training could continue when the chief is down, but when it comes back up, we don’t know if, for example, another task has failed. So it has to go and restore bits from the last checkpoint. So the thing we do is just to interrupt training when the chief fails. And when it comes back up, we restore from a checkpoint, just like a parameter server fails. This is a pretty simple and a pretty conservative scheme. And it only really works if your machines are reliable enough that they aren’t failing all the time. And it keeps the logic for recovery really, really simple. But depending on how common failures are in your cluster environment, maybe it would be beneficial to use a different policy that tries to keep the training job running without interruption. And I just want to sort of provoke some thinking here. One thing you could do is maybe use a configuration management server, something like Apache ZooKeeper or Etcd, to choose a chief by leader election, rather than saying it always has to be worker 0. And then if you did that, you could pass on the responsibility of being a chief when one of them fails, and kind of do a failover without interrupting the training process. If somebody wanted to try that out and contribute it back, that would be just great.

MonitoredTrainingSession

distributed tensorflow - tensorflow dev summit 2017_第20张图片

distributed tensorflow - tensorflow dev summit 2017_第21张图片

But if you do like that simple policy, we’ve got something for you. So you can use this recently added MonitoredTrainingSession class to automate the recovery process. So going back to our simple, single-process example, when you create a TensorFlow session, usually what you do is you explicitly initialize the variables, or restore them from a checkpoint, before you start training. And MonitoredTrainingSession looks a lot like a regular tf session,

but you’ll see that it kind of takes a little bit more information in its constructor-- needs to know if it’s the chief or not, needs to know what server you’re using. And you’ll note that there’s no init op in the program anymore. Instead, what the MonitoredTrainingSession is going to do is it will automatically ensure that all of the variables have been initialized either from their initial values, like the randomly initialized variables, or by restoring them from the latest checkpoint if one is available, before it returns control back to the user.

And it does something slightly different depending on if you’re the chief or not. So if you’re the chief, it’s going to go and look and see if there is a checkpoint, or it’s going to run the initializers. If you’re not the chief, it just sits there waiting until the chief does its work. And then as soon as it’s done, it can carry on running the training loop. There’s a lot of potential for customization here. So we support these installable hooks that you can attach either when you create the session or before or after doing a session run call.

And the MonitoredTrainingSession is sort of a tastefully curated set of these hooks that makes distributed training a bit easier. So it comes with standard hooks for writing a checkpoint every minutes, writing summaries every steps. And if you want to, you can customize its behavior by adding more hooks or replacing the ones that are there.

【最新】TensorFlow、cuDNN、CUDA三者之间的最新版本对应及下载地址江上_酒开发环境及工具配置 TensorFlow CUDA cuDNN
TensorFlow、cuDNN、CUDA对应关系官网查询地址CUDA下载地址cuDNN下载地址VersionPythonversionCompilerBuildtoolscuDNNCUDAtensorflow_gpu-2.9.03.7-3.10MSVC2019Bazel5.0.08.111.2tensorflow_gpu-2.8.03.7-3.10MSVC2019Bazel4.2.18.111.
AI模型技术演进与行业应用图谱智能计算研究中心其他
内容概要当前AI模型技术正经历从基础架构到行业落地的系统性革新。主流深度学习框架如TensorFlow和PyTorch持续优化动态计算图与分布式训练能力，而MXNet凭借高效的异构计算支持在边缘场景崭露头角。与此同时，模型压缩技术通过量化和知识蒸馏将参数量降低60%-80%，联邦学习则通过加密梯度交换实现多机构数据协同训练。在应用层面，医疗诊断模型通过迁移学习在CT影像分类任务中达到98.2%的准
模型优化驱动产业应用创新智能计算研究中心其他
内容概要当前模型优化技术的迭代正沿着多维路径快速演进，其核心驱动力在于突破算法性能与产业需求间的适配瓶颈。以自适应学习机制与迁移学习框架为基础的优化策略，显著提升了模型在跨场景应用中的泛化能力，而超参数自动调优技术则通过PyTorch、TensorFlow等主流框架的接口标准化，降低了复杂模型的开发门槛。在部署层面，边缘计算与联邦学习的协同应用不仅缩短了金融预测、医疗影像分析等场景的响应延迟，更通
TensorFlow和Pytorch在功能上的区别以及优势 Honeysea_70 #算法 tensorflow pytorch 人工智能
功能上的区别1.计算图TensorFlow：使用静态计算图（StaticGraph）。在运行模型之前，需要先构建完整的计算图，然后通过会话（Session）运行图。优点是性能优化更高效，适合大规模分布式训练和生产环境部署。缺点是调试相对复杂，因为计算图的构建和运行是分离的。PyTorch：使用动态计算图（DynamicGraph）。计算图是动态构建和执行的，每次迭代都会重新构建图。优点是调试方便，
TensorFlow深度学习实战项目：从入门到精通点我头像干啥 Ai 深度学习 tensorflow 人工智能
引言深度学习作为人工智能领域的一个重要分支，近年来取得了显著的进展。TensorFlow作为Google开源的深度学习框架，因其强大的功能和灵活的架构，成为了众多开发者和研究者的首选工具。本文将带领大家通过一个实战项目，深入理解TensorFlow的使用方法，并掌握深度学习的基本流程。1.TensorFlow简介1.1TensorFlow是什么？TensorFlow是一个开源的机器学习框架，由Go
查看 CUDA cudnn 版本查看Navicat GPU版本 FergusJ 备份 python 开发语言
查看显卡型号：lspci|grepVGA（lspci是linux查看硬件信息的命令），屏幕会打印出主机的集显几独显信息python中查看显卡型号fromtensorflow.python.clientimportdevice_libdevice_lib.list_local_devices()
错误moduleNotFoundError: No module named 'matplotlib' 逆着tensor tensorflow2.0学习 tensorflow
错误ModuleNotFoundError:Nomodulenamed‘matplotlib’问题tensorflow2.0中jupyternotebook编写线性回归例子，出现ModuleNotFoundError:Nomodulenamed'matplotlib’错误解决办法好了，重新加载程序，已经可以用了。
下一代模型技术演进与场景应用突破智能计算研究中心其他
内容概要当前模型技术正经历多维度的范式跃迁，可解释性模型与自动化机器学习（AutoML）成为突破传统黑箱困境的核心路径。在底层架构层面，边缘计算与量子计算的融合重构了算力分配模式，联邦学习技术则为跨域数据协作提供了安全可信的解决方案。主流框架如TensorFlow和PyTorch持续迭代优化能力，通过动态参数压缩与自适应超参数调优策略，显著提升模型部署效率。应用层创新呈现垂直化特征，医疗诊断模型通
TikTokenizer 项目常见问题解决方案齐飞锴Timothea
TikTokenizer项目常见问题解决方案tiktokenizerOnlineplaygroundforOpenAPItokenizers项目地址:https://gitcode.com/gh_mirrors/ti/tiktokenizer项目基础介绍TikTokenizer是一个开源项目，主要用于文本处理，特别是将文本转化为可用于深度学习的格式。该项目是基于TensorFlow和Keras开发
软件定义世界下的教育创新：高校计算机实验室应重心转向开源平台开源
一、一键式教学环境部署，节省90%准备时间•应用模板库：提供200+预置教学工具模板（如JupyterLab+TensorFlow、MySQL集群），教师可根据课程需求选择模板，5分钟内完成包含依赖库、运行环境的全栈部署。•多版本隔离：支持同一服务器并行运行不同版本框架（如Django3.2教学版与4.1开发版），避免版本冲突导致30%的课堂时间浪费。•自助式环境创建：学生通过命令行快速申请带GP
使用 TensorFlow 进行图像处理：深度解析卷积神经网络（CNN）一碗黄焖鸡三碗米饭人工智能前沿与实践 tensorflow 图像处理 cnn 人工智能机器学习 python ai
目录使用TensorFlow进行图像处理：深度解析卷积神经网络（CNN）1.什么是卷积神经网络（CNN）？CNN的基本结构为什么CNN适合图像处理？2.使用TensorFlow构建CNN2.1环境准备2.2加载并预处理MNIST数据集2.3构建CNN模型2.4编译和训练模型2.5评估模型3.CNN的优化与改进3.1使用数据增强3.2调整网络结构4.CNN在其他图像处理任务中的应用5.总结参考文献在
LeetCode98-验证二叉搜索树学习的学习者 LeetCode Python 二叉搜索树
上个星期和导师去了华农一趟名义上是和导师去参加一个国家级的项目其实没我啥事都是我导师在那口若悬河当时和那边的本科生去了另一间会议室交流了关于GAN的知识偶然听说大家都在用pytorch好像最新版的也挺好用的反正就是学术界目前主要用这个框架工业界主要用Tensorflow(没办法，Google出品)这两天也拿来瞧了瞧好像也确实可以的！！！98-验证二叉搜索树给定一个二叉树，判断其是否是一个有效的二叉
人工智能（AI）系统化学习路线 xiaoyu❅ python 人工智能学习
一、为什么需要系统化学习AI？人工智能技术正在重塑各行各业，但许多初学者容易陷入误区：❌盲目跟风：直接学习TensorFlow/PyTorch，忽视数学与算法基础。❌纸上谈兵：只看理论不写代码，无法解决实际问题。❌方向模糊：对CV/NLP/RL等细分领域缺乏认知，难以针对性提升。正确的学习姿势：“金字塔式”分层学习（理论→算法→框架→应用→工程化），逐步构建完整的AI知识体系。二、人工智能学习路线
使用TensorFlow、OpenCV和Pygame实现图像处理与游戏开发 UwoiGit tensorflow opencv pygame
在本篇文章中，我们将介绍如何结合使用TensorFlow、OpenCV和Pygame来进行图像处理和游戏开发。这三个工具在机器学习、计算机视觉和游戏开发领域都非常流行，并且它们的结合可以提供强大的功能和无限的创造力。我们将逐步介绍如何安装和配置这些工具，并提供相关的源代码示例。安装TensorFlowTensorFlow是一个基于数据流图的开源机器学习框架，提供了丰富的工具和库来构建和训练各种深度
MNIST数据集&手写数字识别 Zoro｜ keras tensorflow 人工智能机器学习
TensorFlow是一个开源的机器学习框架，由Google开发并发布。它提供了一种基于数据流图的编程模型，用于构建和训练机器学习模型。TensorFlow的核心概念是张量（Tensor）和流图（Graph）。张量是TensorFlow中的基本数据单位，可以理解为多维数组，可以是标量、向量、矩阵或更高维度的数组。流图是由一系列操作（Operation）和张量组成的。操作定义了计算和转换张量的方式。
AI模型技术前沿与跨场景应用实践智能计算研究中心其他
内容概要当前AI模型技术正呈现多维度突破与跨领域融合的特征。从技术演进角度看，可解释性模型与量子计算框架的协同发展正在突破传统黑箱限制，而联邦学习、自适应优化等技术则为复杂场景建模提供了新的方法论支撑。应用层面，TensorFlow与PyTorch框架在医疗影像诊断、金融时序预测等领域的实战案例，验证了深度学习模型在垂直行业的泛化能力。值得关注的是，工具链整合已成为技术落地的关键环节，MXNet与
大数据开发之Kubernetes篇----安装部署Kubernetes&dashboard 豆豆总 kubernetes
Kubernetes简介由于公司有需要，需要将外后的服务外加Tensorflow模型部署加训练全部集成到k8s上，所以特意记录下这次简单部署的过程。k8s安装部署首先，我们在部署任何大型的组件前都必须要做的事情就是关闭防火墙和设置hostname了vi/etc/hostsk8s001xxx.xxx.xxx.xxk8s002xxx.xxx.xxx.xx...systemctlstopfirewall
如何使用Python实现生成对抗网络（GAN）「已注销」互联网前沿技术韩进的创作空间全栈开发知识库 python 生成对抗网络 tensorflow 深度学习数据分析
生成对抗网络（GAN）是一种深度学习模型，由两个部分组成：生成器和判别器。生成器负责生成与训练数据相似的新数据，而判别器负责判断输入数据是真实的还是由生成器生成的。这两个部分不断相互博弈，直到生成器能够生成非常逼真的数据，使判别器难以区分生成数据和真实数据。下面是一个简单的Python实现，使用TensorFlow和Keras库。在开始之前，请确保已经安装了TensorFlow和Keras。imp
从零开始大模型开发与微调：PyCharm的下载与安装 AI天才研究院 AI大模型企业级应用开发实战 AI大模型应用入门实战与进阶 DeepSeek R1 &大数据AI人工智能大模型计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
从零开始大模型开发与微调：PyCharm的下载与安装1.背景介绍随着人工智能和深度学习技术的不断发展,大型语言模型(LargeLanguageModels,LLMs)已经成为当前最引人注目的研究热点之一。LLMs能够在各种自然语言处理任务上展现出惊人的性能,例如机器翻译、文本生成、问答系统等。PyTorch和TensorFlow等深度学习框架为训练和微调大型语言模型提供了强大的支持。PyCharm
tensorflow 不支持python3以下的版本辽宁大学神经网络神经网络
小白一枚，没用过tensorflow，所以在安装的时候导致版本错误安装不上。遇到这种情况建议换python的版本。
3.13.0 python 配置tensorflow（CPU版本） m0_Gattuso tensorflow 人工智能 python
condacreate--nametestpython=3.12activatetestpipinstalltensorflow上面3步骤在condaprompt里完成退出时记得condadeactivate来源：Windows下tensorflow/pytorch环境配置_pycharm怎么配置tensorflow环境-CSDN博客然后问题出现了：condaenvironment里什么都没有，理
模型可解释性：基于博弈论的SHAP值计算与特征贡献度分析（附PyTorch/TensorFlow实现）燃灯工作室 Ai pytorch tensorflow 人工智能
一、技术原理与数学推导（含典型案例）1.1Shapley值基础公式SHAP值基于合作博弈论中的Shapley值，计算公式为：ϕi=∑S⊆F∖{i}∣S∣!(∣F∣−∣S∣−1)!∣F∣![f(S∪{i})−f(S)]\phi_i=\sum_{S\subseteqF\setminus\{i\}}\frac{|S|!(|F|-|S|-1)!}{|F|!}[f(S\cup\{i\})-f(S)]ϕi=S
pytorch训练权重转化为tensorflow模型的教训小枫小疯深度学习部署模型转移 pytorch tensorflow 人工智能
模型构建时候有时候在工程量比较大的时候，不可避免使用迭代算法，迭代算法本身会让错误的追踪更加困难，因此掌握基本的框架之间的差异非常重要。以下均是在模型转换过程中出现的错误。shuffleoperation(shuffle操作)这个操作原本是用来将各个通道之间的信息进行打乱后，此时面临重要的问题就是，如果将通道打乱，在pytorch里面与tensorflow中间，两种通道排序是不一样的，是采用不同的
OpenCV 深度学习模块 cv2.dnn 与其他深度学习框架的优缺点对比及适用场景白.夜深度学习 opencv
OpenCV提供了一个深度学习模块cv2.dnn，让开发者能够在计算机视觉项目中轻松加载和推理深度学习模型。相比于TensorFlow、PyTorch等其他深度学习框架，cv2.dnn有其独特的优点与缺点，适用于不同的应用场景。在这篇文章中，我们将详细分析cv2.dnn的优缺点，并讨论它的适用场景。一、cv2.dnn的优点1.简单易用cv2.dnn提供了一个相对简单且易于使用的接口，适合已经在使用
【Transformer-Hugging Face手册 07/10】微调预训练模型无水先生人工智能高级阶段人工智能综合 transformer 深度学习人工智能
微调预训练模型-目录一、说明二、在本机PyTorch中微调预训练模型。2.1加载数据2.2训练2.2.1使用PyTorchTrainer进行训练2.3训练超参数2.4评价2.5训练类三、使用Keras训练TensorFlow模型3.1为Keras加载数据3.2将数据加载为tf.data.Dataset3.3数据加载器3.4优化器和学习率调度器3.5训练循环3.6评价四、结论一、说明使用预训练模
数据分析及人工智能框架汇总 xihuanyuye 机器学习
一、数据分析二、人工智能1、Tensorflow1、简介TensorFlow是谷歌基于DistBelief进行研发的第二代人工智能学习系统，其命名来源于本身的运行原理。Tensor（张量）意味着N维数组，Flow（流）意味着基于数据流图的计算，TensorFlow为张量从流图的一端流动到另一端计算过程。TensorFlow是将复杂的数据结构传输至人工智能神经网中进行分析和处理过程的系统。Tenso
本地部署时，如何通过硬件加速（如 CUDA、TensorRT）提升 DeepSeek 的推理性能？不同显卡型号的兼容性如何测试？百态老人人工智能科技算法 vscode
本地部署DeepSeek模型的硬件加速优化与显卡兼容性测试指南一、硬件加速技术实现路径CUDA基础环境搭建版本匹配原则：根据显卡架构选择CUDA版本（如NVIDIARTX50系列需CUDA12+，V100需CUDA11.x），并通过nvcc--version验证安装。GPU加速验证：运行以下代码检查硬件加速状态：importtensorflowastfprint("可用GPU数量：",len(tf
训练大模型LLM选择哪种开发语言最好大0马浓人工智能训练 python
训练大型语言模型（LLM）时，选择合适的编程语言主要取决于效率、生态支持、开发便利性以及特定需求（如性能优化或硬件适配）。以下是常见语言的分析和推荐：---1.Python（首选语言）优势：-生态系统丰富：主流深度学习框架（PyTorch、TensorFlow、JAX）均以Python为主要接口，提供完整的工具链（数据处理、模型训练、评估部署）。-开发效率高：语法简洁，适合快速实验和原型开发，社区
开发ai模型最佳的系统是Ubuntu还是linux？俺足人工智能 ubuntu
在AI/ML开发中，Ubuntu是更优选的Linux发行版，原因如下：1.开箱即用的AI工具链支持Ubuntu预装了主流的AI框架（如TensorFlow、PyTorch）和依赖库，且通过apt包管理器可快速部署开发环境。提供针对NVIDIAGPU的官方驱动支持，简化CUDA和cuDNN的配置流程（如nvidia-smi直接监控显存）。2.社区生态与长期维护（LTS）UbuntuLTS版本（如24
如何增强机器学习基础，提升大模型面试通过概率 weixin_40941102 机器学习面试人工智能
我的好朋友没有通过面试所以我给我的好朋友准备了这一篇学习路线随着大模型（如Transformer、GPT-4、LLaMA等）在自然语言处理（NLP）、计算机视觉（CV）和多模态任务中的广泛应用，AI行业的招聘竞争愈发激烈。面试官不仅要求候选人熟练使用深度学习框架（如PyTorch、TensorFlow），还希望他们具备扎实的机器学习理论基础、算法实现能力和实际问题解决经验。本文将从机器学习基础入手
Spring的注解积累 yijiesuifeng spring 注解
用注解来向Spring容器注册Bean。需要在applicationContext.xml中注册： <context:component-scan base-package=”pagkage1[,pagkage2,…,pagkageN]”/>。如：在base-package指明一个包 <context:component-sc
传感器百合不是茶 android 传感器
android传感器的作用主要就是来获取数据,根据得到的数据来触发某种事件下面就以重力传感器为例; 1,在onCreate中获得传感器服务 private SensorManager sm;// 获得系统的服务 private Sensor sensor;// 创建传感器实例 @Override protected void
[光磁与探测]金吕玉衣的意义 comsci
这是一个古代人的秘密:现在告诉大家信不信由你们: 穿上金律玉衣的人,如果处于灵魂出窍的状态,可以飞到宇宙中去看星星这就是为什么古代
精简的反序打印某个数沐刃青蛟打印
以前看到一些让求反序打印某个数的程序。比如：输入123，输出321。记得以前是告诉你是几位数的，当时就抓耳挠腮，完全没有思路。似乎最后是用到%和/方法解决的。而今突然想到一个简短的方法，就可以实现任意位数的反序打印（但是如果是首位数或者尾位数为0时就没有打印出来了）代码如下： long num, num1=0;
PHP：6种方法获取文件的扩展名 IT独行者 PHP 扩展名
PHP：6种方法获取文件的扩展名 1、字符串查找和截取的方法 1 $extension = substr ( strrchr ( $file , '.' ), 1); 2、字符串查找和截取的方法二 1 $extension = substr
面试111 文强chu 面试
1事务隔离级别有那些，事务特性是什么（问到一次） 2 spring aop 如何管理事务的，如何实现的。动态代理如何实现，jdk怎么实现动态代理的，ioc是怎么实现的，spring是单例还是多例，有那些初始化bean的方式，各有什么区别（经常问） 3 struts默认提供了那些拦截器（一次） 4 过滤器和拦截器的区别（频率也挺高） 5 final，finally final
XML的四种解析方式小桔子 dom jdom dom4j sax
在平时工作中，难免会遇到把 XML 作为数据存储格式。面对目前种类繁多的解决方案，哪个最适合我们呢？在这篇文章中，我对这四种主流方案做一个不完全评测，仅仅针对遍历 XML 这块来测试，因为遍历 XML 是工作中使用最多的（至少我认为）。　　预备　　测试环境：　　AMD 毒龙1.4G OC 1.5G、256M DDR333、Windows2000 Server
wordpress中常见的操作 aichenglong 中文注册 wordpress 移除菜单
1 wordpress中使用中文名注册解决办法 1)使用插件 2)修改wp源代码进入到wp-include/formatting.php文件中找到 function sanitize_user( $username, $strict = false
小飞飞学管理-1 alafqq 管理
项目管理的下午题，其实就在提出问题（挑刺），分析问题，解决问题。今天我随意看下10年上半年的第一题。主要就是项目经理的提拨和培养。结合我自己经历写下心得对于公司选拔和培养项目经理的制度有什么毛病呢？ 1，公司考察，选拔项目经理，只关注技术能力，而很少或没有关注管理方面的经验，能力。 2，公司对项目经理缺乏必要的项目管理知识和技能方面的培训。 3，公司对项目经理的工作缺乏进行指
IO输入输出部分探讨百合不是茶 IO
//文件处理在处理文件输入输出时要引入java.IO这个包； /* 1，运用File类对文件目录和属性进行操作 2，理解流，理解输入输出流的概念 3，使用字节/符流对文件进行读/写操作 4，了解标准的I/O 5，了解对象序列化 */ //1，运用File类对文件目录和属性进行操作 //在工程中线创建一个text.txt
getElementById的用法 bijian1013 element
getElementById是通过Id来设置/返回HTML标签的属性及调用其事件与方法。用这个方法基本上可以控制页面所有标签，条件很简单，就是给每个标签分配一个ID号。返回具有指定ID属性值的第一个对象的一个引用。语法： &n
励志经典语录 bijian1013 励志人生
经典语录1: 哈佛有一个著名的理论：人的差别在于业余时间，而一个人的命运决定于晚上8点到10点之间。每晚抽出2个小时的时间用来阅读、进修、思考或参加有意的演讲、讨论，你会发现，你的人生正在发生改变，坚持数年之后，成功会向你招手。不要每天抱着QQ/MSN/游戏/电影/肥皂剧……奋斗到12点都舍不得休息，看就看一些励志的影视或者文章，不要当作消遣；学会思考人生，学会感悟人生
[MongoDB学习笔记三]MongoDB分片 bit1129 mongodb
MongoDB的副本集(Replica Set)一方面解决了数据的备份和数据的可靠性问题，另一方面也提升了数据的读写性能。MongoDB分片(Sharding)则解决了数据的扩容问题，MongoDB作为云计算时代的分布式数据库，大容量数据存储，高效并发的数据存取，自动容错等是MongoDB的关键指标。本篇介绍MongoDB的切片(Sharding) 1.何时需要分片 &nbs
【Spark八十三】BlockManager在Spark中的使用场景 bit1129 manager
1. Broadcast变量的存储，在HttpBroadcast类中可以知道 2. RDD通过CacheManager存储RDD中的数据，CacheManager也是通过BlockManager进行存储的 3. ShuffleMapTask得到的结果数据，是通过FileShuffleBlockManager进行管理的，而FileShuffleBlockManager最终也是使用BlockMan
yum方式部署zabbix ronin47 yum方式部署zabbix
安装网络yum库#rpm -ivh http://repo.zabbix.com/zabbix/2.4/rhel/6/x86_64/zabbix-release-2.4-1.el6.noarch.rpm 通过yum装mysql和zabbix调用的插件还有agent代理#yum install zabbix-server-mysql zabbix-web-mysql mysql-
Hibernate4和MySQL5.5自动创建表失败问题解决方法 byalias J2EE Hibernate4
今天初学Hibernate4，了解了使用Hibernate的过程。大体分为4个步骤： ①创建hibernate.cfg.xml文件 ②创建持久化对象 ③创建*.hbm.xml映射文件 ④编写hibernate相应代码在第四步中，进行了单元测试，测试预期结果是hibernate自动帮助在数据库中创建数据表，结果JUnit单元测试没有问题，在控制台打印了创建数据表的SQL语句，但在数据库中
Netty源码学习-FrameDecoder bylijinnan java netty
Netty 3.x的user guide里FrameDecoder的例子，有几个疑问： 1.文档说：FrameDecoder calls decode method with an internally maintained cumulative buffer whenever new data is received. 为什么每次有新数据到达时，都会调用decode方法？ 2.Dec
SQL行列转换方法 chicony 行列转换
create table tb(终端名称 varchar(10) , CEI分值 varchar(10) , 终端数量 int) insert into tb values('三星' , '0-5' , 74) insert into tb values('三星' , '10-15' , 83) insert into tb values('苹果' , '0-5' , 93)
中文编码测试 ctrain 编码
循环打印转换编码 String[] codes = { "iso-8859-1", "utf-8", "gbk", "unicode" }; for (int i = 0; i < codes.length; i++) { for (int j
hive 客户端查询报堆内存溢出解决方法 daizj hive 堆内存溢出
hive> select * from t_test where ds=20150323 limit 2; OK Exception in thread "main" java.lang.OutOfMemoryError: Java heap space 问题原因： hive堆内存默认为256M 这个问题的解决方法为：修改/us
人有多大懒，才有多大闲 (评论『卓有成效的程序员』) dcj3sjt126com 程序员
卓有成效的程序员给我的震撼很大，程序员作为特殊的群体，有的人可以这么懒，懒到事情都交给机器去做，而有的人又可以那么勤奋，每天都孜孜不倦得做着重复单调的工作。在看这本书之前，我属于勤奋的人，而看完这本书以后，我要努力变成懒惰的人。不要在去庞大的开始菜单里面一项一项搜索自己的应用程序，也不要在自己的桌面上放置眼花缭乱的快捷图标
Eclipse简单有用的配置 dcj3sjt126com eclipse
1、显示行号 Window -- Prefences -- General -- Editors -- Text Editors -- show line numbers 2、代码提示字符 Window ->Perferences，并依次展开 Java -> Editor -> Content Assist，最下面一栏 auto-Activation
在tomcat上面安装solr4.8.0全过程 eksliang Solr solr4.0后的版本安装 solr4.8.0安装
转载请出自出处： http://eksliang.iteye.com/blog/2096478 首先solr是一个基于java的web的应用，所以安装solr之前必须先安装JDK和tomcat，我这里就先省略安装tomcat和jdk了第一步：当然是下载去官网上下载最新的solr版本，下载地址
Android APP通用型拒绝服务、漏洞分析报告 gg163 漏洞 android APP 分析
点评：记得曾经有段时间很多SRC平台被刷了大量APP本地拒绝服务漏洞，移动安全团队爱内测（ineice.com）发现了一个安卓客户端的通用型拒绝服务漏洞，来看看他们的详细分析吧。 0xr0ot和Xbalien交流所有可能导致应用拒绝服务的异常类型时，发现了一处通用的本地拒绝服务漏洞。该通用型本地拒绝服务可以造成大面积的app拒绝服务。针对序列化对象而出现的拒绝服务主要
HoverTree项目已经实现分层 hvt 编程 .net Web C#ASP.ENT
HoverTree项目已经初步实现分层，源代码已经上传到 http://hovertree.codeplex.com请到SOURCE CODE查看。在本地用SQL Server 2008 数据库测试成功。数据库和表请参考：http://keleyi.com/a/bjae/ue6stb42.htmHoverTree是一个ASP.NET 开源项目，希望对你学习ASP.NET或者C#语言有帮助，如果你对
Google Maps API v3: Remove Markers 移除标记天梯梦 google maps api
Simply do the following: I. Declare a global variable: var markersArray = []; II. Define a function: function clearOverlays() { for (var i = 0; i < markersArray.length; i++ )
jQuery选择器总结 lq38366 jquery 选择器
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
基础数据结构和算法六：Quick sort sunwinner Algorithm Quicksort
Quick sort is probably used more widely than any other. It is popular because it is not difficult to implement, works well for a variety of different kinds of input data, and is substantially faster t
如何让Flash不遮挡HTML div元素的技巧_HTML/Xhtml_网页制作刘星宇 html Web
今天在写一个flash广告代码的时候，因为flash自带的链接，容易被当成弹出广告，所以做了一个div层放到flash上面，这样链接都是a触发的不会被拦截，但发现flash一直处于div层上面，原来flash需要加个参数才可以。让flash置于DIV层之下的方法，让flash不挡住飘浮层或下拉菜单，让Flash不档住浮动对象或层的关键参数：wmode=opaque。方法如下：
Mybatis实用Mapper SQL汇总示例 wdmcygah sql mysql mybatis 实用
Mybatis作为一个非常好用的持久层框架，相关资料真的是少得可怜，所幸的是官方文档还算详细。本博文主要列举一些个人感觉比较常用的场景及相应的Mapper SQL写法，希望能够对大家有所帮助。不少持久层框架对动态SQL的支持不足，在SQL需要动态拼接时非常苦恼，而Mybatis很好地解决了这个问题，算是框架的一大亮点。对于常见的场景，例如：批量插入/更新/删除，模糊查询，多条件查询，联表查询，