Robin_Pi

Python 音频处理：wave

wave模块

1. 概述
- 1.1 Wave_read对象
- 1.2 Wave_write 对象
2. 实际使用中的问题
- 2.1 音频保存
3. 源代码：`Lib/wave.py`

1. 概述

wave 模块提供了一个处理 WAV 声音格式的便利接口。它不支持压缩/解压，但是支持单声道/立体声。

用法：wave.open(file, mode=None)，其中，mode为rb 和 wb：

rb：生成 wav_read 对象
wb：生成 wav_write 对象
注意不支持同时读写。

注：关于 r、w、rb、wb
r和w是普通读和写文件（简单理解为人工编写的文件）；
rb和wb是读写二进制文件（简单理解为可以操作图片等非手工编写的文件）

1.1 Wave_read对象

import wave

# wave.read 对象方法

wr = wave.open('/Users/robin/Desktop/WavTest.wav', 'rb')

print(wr.getnchannels()) # 返回声道数量（1 为单声道，2 为立体声）
print(wr.getsampwidth()) # 返回采样字节长度
print(wr.getframerate()) # 返回采样频率
print(wr.getnframes()) # 返回音频总的帧数

print(wr.getparams()) # 返回一个 namedtuple() (nchannels, sampwidth, framerate, nframes, comptype, compname)，与 get*() 方法的输出相同。

print(wr.readframes(n))

1
2
16000
103765
_wave_params(nchannels=1, sampwidth=2, framerate=16000, nframes=103765, comptype='NONE', compname='not compressed')
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0
...

1.2 Wave_write 对象

# wave.write 对象方法
#（1）open 创建文件
ww = wave.open('/Users/robin/Desktop/WavTest.wav', 'wb')

#（2）set 设置参数
ww.setchannels(n) # 设置通道数
ww.setsampwidth(n) # 设置采样字节长度为 n
ww.setframerate(n) # 设置采样频率为 n

ww.setparams(n) # 设置所有形参，(nchannels, sampwidth, framerate, nframes, comptype, compname)，每项的值应可用于 set*() 方法。

# (3) writeframes 写入数据流
ww.writeframes(data) # 写入音频帧并确保 nframes 是正确的

# (4) close 关闭
ww.close()

2. 实际使用中的问题

2.1 音频保存

1、setsampwidth：采样字节长度 or 采样位数、量化位数

表示用多少bit表达一次采样所采集的数据，通常有8bit、16bit、24bit和32bit等几种，
需要注意的是，在设置setsampwidth时，采用的单位是bytes而不是bit。比如，16 bits per sample (= 2 bytes)，应该设置setsampwidth=2而不是setsampwidth=16，否则可能会报错：raise Error('sample width not specified')

代码示例：

channel = 1
bits = 2  # 16 bits = 2 bytes(wave模块中使用byte！！！注意除以8进行转换)
rates = 16000

# read（or resample） wav and save
def write_wav(file_name, input_data):
    wavfile = wave.open(file_name, 'wb')  # 创建（空）文件
    wavfile.setparams((channel, bits,  rates, 0, 'NONE', 'NONE'))
    print("wav文件信息：", wavfile.getparams()[:4])
    wavfile.writeframes(input_data)  # 写入数据
    wavfile.close()

3. 源代码：`Lib/wave.py`

"""Stuff to parse WAVE files.
Usage.
Reading WAVE files:
      f = wave.open(file, 'r')
where file is either the name of a file or an open file pointer.
The open file pointer must have methods read(), seek(), and close().
When the setpos() and rewind() methods are not used, the seek()
method is not  necessary.
This returns an instance of a class with the following public methods:
      getnchannels()  -- returns number of audio channels (1 for
                         mono, 2 for stereo)
      getsampwidth()  -- returns sample width in bytes
      getframerate()  -- returns sampling frequency
      getnframes()    -- returns number of audio frames
      getcomptype()   -- returns compression type ('NONE' for linear samples)
      getcompname()   -- returns human-readable version of
                         compression type ('not compressed' linear samples)
      getparams()     -- returns a namedtuple consisting of all of the
                         above in the above order
      getmarkers()    -- returns None (for compatibility with the
                         aifc module)
      getmark(id)     -- raises an error since the mark does not
                         exist (for compatibility with the aifc module)
      readframes(n)   -- returns at most n frames of audio
      rewind()        -- rewind to the beginning of the audio stream
      setpos(pos)     -- seek to the specified position
      tell()          -- return the current position
      close()         -- close the instance (make it unusable)
The position returned by tell() and the position given to setpos()
are compatible and have nothing to do with the actual position in the
file.
The close() method is called automatically when the class instance
is destroyed.
Writing WAVE files:
      f = wave.open(file, 'w')
where file is either the name of a file or an open file pointer.
The open file pointer must have methods write(), tell(), seek(), and
close().
This returns an instance of a class with the following public methods:
      setnchannels(n) -- set the number of channels
      setsampwidth(n) -- set the sample width
      setframerate(n) -- set the frame rate
      setnframes(n)   -- set the number of frames
      setcomptype(type, name)
                      -- set the compression type and the
                         human-readable compression type
      setparams(tuple)
                      -- set all parameters at once
      tell()          -- return current position in output file
      writeframesraw(data)
                      -- write audio frames without patching up the
                         file header
      writeframes(data)
                      -- write audio frames and patch up the file header
      close()         -- patch up the file header and close the
                         output file
You should set the parameters before the first writeframesraw or
writeframes.  The total number of frames does not need to be set,
but when it is set to the correct value, the header does not have to
be patched up.
It is best to first set all parameters, perhaps possibly the
compression type, and then write audio frames using writeframesraw.
When all frames have been written, either call writeframes(b'') or
close() to patch up the sizes in the header.
The close() method is called automatically when the class instance
is destroyed.
"""

from chunk import Chunk
from collections import namedtuple
import audioop
import builtins
import struct
import sys


__all__ = ["open", "Error", "Wave_read", "Wave_write"]

class Error(Exception):
    pass

WAVE_FORMAT_PCM = 0x0001

_array_fmts = None, 'b', 'h', None, 'i'

_wave_params = namedtuple('_wave_params',
                     'nchannels sampwidth framerate nframes comptype compname')

class Wave_read:
    """Variables used in this class:
    These variables are available to the user though appropriate
    methods of this class:
    _file -- the open file with methods read(), close(), and seek()
              set through the __init__() method
    _nchannels -- the number of audio channels
              available through the getnchannels() method
    _nframes -- the number of audio frames
              available through the getnframes() method
    _sampwidth -- the number of bytes per audio sample
              available through the getsampwidth() method
    _framerate -- the sampling frequency
              available through the getframerate() method
    _comptype -- the AIFF-C compression type ('NONE' if AIFF)
              available through the getcomptype() method
    _compname -- the human-readable AIFF-C compression type
              available through the getcomptype() method
    _soundpos -- the position in the audio stream
              available through the tell() method, set through the
              setpos() method
    These variables are used internally only:
    _fmt_chunk_read -- 1 iff the FMT chunk has been read
    _data_seek_needed -- 1 iff positioned correctly in audio
              file for readframes()
    _data_chunk -- instantiation of a chunk class for the DATA chunk
    _framesize -- size of one frame in the file
    """

    def initfp(self, file):
        self._convert = None
        self._soundpos = 0
        self._file = Chunk(file, bigendian = 0)
        if self._file.getname() != b'RIFF':
            raise Error('file does not start with RIFF id')
        if self._file.read(4) != b'WAVE':
            raise Error('not a WAVE file')
        self._fmt_chunk_read = 0
        self._data_chunk = None
        while 1:
            self._data_seek_needed = 1
            try:
                chunk = Chunk(self._file, bigendian = 0)
            except EOFError:
                break
            chunkname = chunk.getname()
            if chunkname == b'fmt ':
                self._read_fmt_chunk(chunk)
                self._fmt_chunk_read = 1
            elif chunkname == b'data':
                if not self._fmt_chunk_read:
                    raise Error('data chunk before fmt chunk')
                self._data_chunk = chunk
                self._nframes = chunk.chunksize // self._framesize
                self._data_seek_needed = 0
                break
            chunk.skip()
        if not self._fmt_chunk_read or not self._data_chunk:
            raise Error('fmt chunk and/or data chunk missing')

    def __init__(self, f):
        self._i_opened_the_file = None
        if isinstance(f, str):
            f = builtins.open(f, 'rb')
            self._i_opened_the_file = f
        # else, assume it is an open file object already
        try:
            self.initfp(f)
        except:
            if self._i_opened_the_file:
                f.close()
            raise

    def __del__(self):
        self.close()

    def __enter__(self):
        return self

    def __exit__(self, *args):
        self.close()

    #
    # User visible methods.
    #
    def getfp(self):
        return self._file

    def rewind(self):
        self._data_seek_needed = 1
        self._soundpos = 0

    def close(self):
        self._file = None
        file = self._i_opened_the_file
        if file:
            self._i_opened_the_file = None
            file.close()

    def tell(self):
        return self._soundpos

    def getnchannels(self):
        return self._nchannels

    def getnframes(self):
        return self._nframes

    def getsampwidth(self):
        return self._sampwidth

    def getframerate(self):
        return self._framerate

    def getcomptype(self):
        return self._comptype

    def getcompname(self):
        return self._compname

    def getparams(self):
        return _wave_params(self.getnchannels(), self.getsampwidth(),
                       self.getframerate(), self.getnframes(),
                       self.getcomptype(), self.getcompname())

    def getmarkers(self):
        return None

    def getmark(self, id):
        raise Error('no marks')

    def setpos(self, pos):
        if pos < 0 or pos > self._nframes:
            raise Error('position not in range')
        self._soundpos = pos
        self._data_seek_needed = 1

    def readframes(self, nframes):
        if self._data_seek_needed:
            self._data_chunk.seek(0, 0)
            pos = self._soundpos * self._framesize
            if pos:
                self._data_chunk.seek(pos, 0)
            self._data_seek_needed = 0
        if nframes == 0:
            return b''
        data = self._data_chunk.read(nframes * self._framesize)
        if self._sampwidth != 1 and sys.byteorder == 'big':
            data = audioop.byteswap(data, self._sampwidth)
        if self._convert and data:
            data = self._convert(data)
        self._soundpos = self._soundpos + len(data) // (self._nchannels * self._sampwidth)
        return data

    #
    # Internal methods.
    #

    def _read_fmt_chunk(self, chunk):
        try:
            wFormatTag, self._nchannels, self._framerate, dwAvgBytesPerSec, wBlockAlign = struct.unpack_from(', chunk.read(14))
        except struct.error:
            raise EOFError from None
        if wFormatTag == WAVE_FORMAT_PCM:
            try:
                sampwidth = struct.unpack_from(', chunk.read(2))[0]
            except struct.error:
                raise EOFError from None
            self._sampwidth = (sampwidth + 7) // 8
            if not self._sampwidth:
                raise Error('bad sample width')
        else:
            raise Error('unknown format: %r' % (wFormatTag,))
        if not self._nchannels:
            raise Error('bad # of channels')
        self._framesize = self._nchannels * self._sampwidth
        self._comptype = 'NONE'
        self._compname = 'not compressed'

class Wave_write:
    """Variables used in this class:
    These variables are user settable through appropriate methods
    of this class:
    _file -- the open file with methods write(), close(), tell(), seek()
              set through the __init__() method
    _comptype -- the AIFF-C compression type ('NONE' in AIFF)
              set through the setcomptype() or setparams() method
    _compname -- the human-readable AIFF-C compression type
              set through the setcomptype() or setparams() method
    _nchannels -- the number of audio channels
              set through the setnchannels() or setparams() method
    _sampwidth -- the number of bytes per audio sample
              set through the setsampwidth() or setparams() method
    _framerate -- the sampling frequency
              set through the setframerate() or setparams() method
    _nframes -- the number of audio frames written to the header
              set through the setnframes() or setparams() method
    These variables are used internally only:
    _datalength -- the size of the audio samples written to the header
    _nframeswritten -- the number of frames actually written
    _datawritten -- the size of the audio samples actually written
    """

    def __init__(self, f):
        self._i_opened_the_file = None
        if isinstance(f, str):
            f = builtins.open(f, 'wb')
            self._i_opened_the_file = f
        try:
            self.initfp(f)
        except:
            if self._i_opened_the_file:
                f.close()
            raise

    def initfp(self, file):
        self._file = file
        self._convert = None
        self._nchannels = 0
        self._sampwidth = 0
        self._framerate = 0
        self._nframes = 0
        self._nframeswritten = 0
        self._datawritten = 0
        self._datalength = 0
        self._headerwritten = False

    def __del__(self):
        self.close()

    def __enter__(self):
        return self

    def __exit__(self, *args):
        self.close()

    #
    # User visible methods.
    #
    def setnchannels(self, nchannels):
        if self._datawritten:
            raise Error('cannot change parameters after starting to write')
        if nchannels < 1:
            raise Error('bad # of channels')
        self._nchannels = nchannels

    def getnchannels(self):
        if not self._nchannels:
            raise Error('number of channels not set')
        return self._nchannels

    def setsampwidth(self, sampwidth):
        if self._datawritten:
            raise Error('cannot change parameters after starting to write')
        if sampwidth < 1 or sampwidth > 4:
            raise Error('bad sample width')
        self._sampwidth = sampwidth

    def getsampwidth(self):
        if not self._sampwidth:
            raise Error('sample width not set')
        return self._sampwidth

    def setframerate(self, framerate):
        if self._datawritten:
            raise Error('cannot change parameters after starting to write')
        if framerate <= 0:
            raise Error('bad frame rate')
        self._framerate = int(round(framerate))

    def getframerate(self):
        if not self._framerate:
            raise Error('frame rate not set')
        return self._framerate

    def setnframes(self, nframes):
        if self._datawritten:
            raise Error('cannot change parameters after starting to write')
        self._nframes = nframes

    def getnframes(self):
        return self._nframeswritten

    def setcomptype(self, comptype, compname):
        if self._datawritten:
            raise Error('cannot change parameters after starting to write')
        if comptype not in ('NONE',):
            raise Error('unsupported compression type')
        self._comptype = comptype
        self._compname = compname

    def getcomptype(self):
        return self._comptype

    def getcompname(self):
        return self._compname

    def setparams(self, params):
        nchannels, sampwidth, framerate, nframes, comptype, compname = params
        if self._datawritten:
            raise Error('cannot change parameters after starting to write')
        self.setnchannels(nchannels)
        self.setsampwidth(sampwidth)
        self.setframerate(framerate)
        self.setnframes(nframes)
        self.setcomptype(comptype, compname)

    def getparams(self):
        if not self._nchannels or not self._sampwidth or not self._framerate:
            raise Error('not all parameters set')
        return _wave_params(self._nchannels, self._sampwidth, self._framerate,
              self._nframes, self._comptype, self._compname)

    def setmark(self, id, pos, name):
        raise Error('setmark() not supported')

    def getmark(self, id):
        raise Error('no marks')

    def getmarkers(self):
        return None

    def tell(self):
        return self._nframeswritten

    def writeframesraw(self, data):
        if not isinstance(data, (bytes, bytearray)):
            data = memoryview(data).cast('B')
        self._ensure_header_written(len(data))
        nframes = len(data) // (self._sampwidth * self._nchannels)
        if self._convert:
            data = self._convert(data)
        if self._sampwidth != 1 and sys.byteorder == 'big':
            data = audioop.byteswap(data, self._sampwidth)
        self._file.write(data)
        self._datawritten += len(data)
        self._nframeswritten = self._nframeswritten + nframes

    def writeframes(self, data):
        self.writeframesraw(data)
        if self._datalength != self._datawritten:
            self._patchheader()

    def close(self):
        try:
            if self._file:
                self._ensure_header_written(0)
                if self._datalength != self._datawritten:
                    self._patchheader()
                self._file.flush()
        finally:
            self._file = None
            file = self._i_opened_the_file
            if file:
                self._i_opened_the_file = None
                file.close()

    #
    # Internal methods.
    #

    def _ensure_header_written(self, datasize):
        if not self._headerwritten:
            if not self._nchannels:
                raise Error('# channels not specified')
            if not self._sampwidth:
                raise Error('sample width not specified')
            if not self._framerate:
                raise Error('sampling rate not specified')
            self._write_header(datasize)

    def _write_header(self, initlength):
        assert not self._headerwritten
        self._file.write(b'RIFF')
        if not self._nframes:
            self._nframes = initlength // (self._nchannels * self._sampwidth)
        self._datalength = self._nframes * self._nchannels * self._sampwidth
        try:
            self._form_length_pos = self._file.tell()
        except (AttributeError, OSError):
            self._form_length_pos = None
        self._file.write(struct.pack(',
            36 + self._datalength, b'WAVE', b'fmt ', 16,
            WAVE_FORMAT_PCM, self._nchannels, self._framerate,
            self._nchannels * self._framerate * self._sampwidth,
            self._nchannels * self._sampwidth,
            self._sampwidth * 8, b'data'))
        if self._form_length_pos is not None:
            self._data_length_pos = self._file.tell()
        self._file.write(struct.pack(', self._datalength))
        self._headerwritten = True

    def _patchheader(self):
        assert self._headerwritten
        if self._datawritten == self._datalength:
            return
        curpos = self._file.tell()
        self._file.seek(self._form_length_pos, 0)
        self._file.write(struct.pack(', 36 + self._datawritten))
        self._file.seek(self._data_length_pos, 0)
        self._file.write(struct.pack(', self._datawritten))
        self._file.seek(curpos, 0)
        self._datalength = self._datawritten

def open(f, mode=None):
    if mode is None:
        if hasattr(f, 'mode'):
            mode = f.mode
        else:
            mode = 'rb'
    if mode in ('r', 'rb'):
        return Wave_read(f)
    elif mode in ('w', 'wb'):
        return Wave_write(f)
    else:
        raise Error("mode must be 'r', 'rb', 'w', or 'wb'")

参考：

wave — 读写WAV格式文件
python 3x - about writing with the python wave module

SpringBoot一键提取身份证与营业执照信息一名技术极客 #java相关工具类 spring boot 后端 java
SpringBoot一键提取身份证与营业执照信息使用的工具和库步骤和代码示例添加依赖图像预处理和文字识别信息提取使用OpenCV对图像进行预处理OpenCV图像预处理示例集成到OCR服务中在SpringBoot中实现图片中的身份证号、营业执照等信息的识别，可以分为以下几个步骤：图像预处理：为了提高识别的准确性，首先对图片进行预处理，如调整大小、对比度、亮度等。文字检测：使用图像处理算法或框架来定位
javascript取随机数_javaScript中的随机数方法 weixin_39827905 javascript取随机数
随机数方法是javaScript中经常使用的一种方法。例如，需要在屏幕上的一个随机位置显示一幅图像，编写的小游戏要扔骰子等。javaScript中Math对象的random()方法生成0-1之间的随机数，它的随机数种子采用系统时间，因此可以基本保证每次调用random()方法时都会采用不同的伪随机数序列。下面我们对javaScript中的各种随机数方法做一个小结。基本的随机数在javaScript
好用的算法推荐工具全解析 CodeJourney. 算法
一、引言在当今数字化时代，算法广泛应用于各个领域，从搜索引擎优化到金融风险预测，从图像识别到自然语言处理。对于算法学习者、研究者以及开发者而言，合适的算法推荐工具至关重要。它们不仅能帮助理解算法原理，还能在实际应用中提供高效的解决方案。接下来，我们将详细介绍多种好用的算法推荐工具。二、算法可视化工具（一）VisuAlgo功能特点-动态演示：VisuAlgo能够以动态的方式展示各类算法的执行过程。例
Qt小例子学习53 - 使用resizeEvent调整窗口大小时调整Qlabel的图像大小虾球xz
Qt小例子学习53-使用resizeEvent调整窗口大小时调整Qlabel的图像大小testsize.h#ifndefTESTSIZE_H#defineTESTSIZE_H#includeclassQLabel;classtestsize:publicQWidget{Q_OBJECTpublic:explicittestsize(QWidget*parent=0);~testsize();pri
AlexNet：开启深度学习图像识别新纪元池央深度学习人工智能
一、引言在深度学习的璀璨星空中，AlexNet无疑是一颗极为耀眼的明星。它于2012年横空出世，并在ImageNet竞赛中一举夺冠，这一历史性的突破彻底改变了计算机视觉领域的发展轨迹，让全世界深刻认识到深度卷积神经网络在图像识别任务中的巨大潜力，从而掀起了深度学习研究与应用的热潮。二、AlexNet网络架构详解（一）输入层AlexNet的输入图像通常为224x224x3的彩色图像。这一尺寸的确定是
python 爬虫学习 lally. python 爬虫学习
目录requst库访问HTML语言常用HTML标签结构性标签文本格式化标签超链接与图像列表标签HTML练习BeautifulSoup处理数据requst库访问fromrequestsimport*response=get("https://19j.tv/")print(response)若访问成功，状态码为200，访问失败，则查询状态码，http和https的状态码是一样的http状态码可以采取伪
Docker入门系列之三：如何将dockerfile制作好的镜像发布到Docker hub上
在多模态模型的架构上，ChatGPT的绘图能力主要依赖以下几个核心组件：跨模态编码器（Cross-ModalEncoder）：跨模态编码器的作用是将文本和图像的特征进行对齐。GPT可以将用户输入的文本描述转换为文本特征表示，然后利用跨模态编码器将这些特征映射到图像特征空间。这种方式确保模型能够理解描述性语言中不同细节是如何与图像特征对应的。
成功
在多模态模型的架构上，ChatGPT的绘图能力主要依赖以下几个核心组件：跨模态编码器（Cross-ModalEncoder）：跨模态编码器的作用是将文本和图像的特征进行对齐。GPT可以将用户输入的文本描述转换为文本特征表示，然后利用跨模态编码器将这些特征映射到图像特征空间。这种方式确保模型能够理解描述性语言中不同细节是如何与图像特征对应的。
内核详细知识「已注销」基础知识
支持这个网站。捐。Search内核（操作系统）有关其他用途，请参阅内核（消歧）。“内核（计算）”重定向到这里。有关其他用途，请参阅内核（消歧）。“核心（计算机科学）”重定向到这里。不要与Compute内核，内核方法或内核（图像处理）混淆。该内核是一个计算机程序是计算机的核心操作系统，拥有系统的一切完全控制。[1]在大多数系统中，它是启动时加载的第一个程序之一（在引导加载程序之后）。它处理剩余的启动
OpenCV相机标定与3D重建(54)解决透视 n 点问题（Perspective-n-Point, PnP）函数solvePnP()的使用 jndingxin OpenCV opencv 3d
操作系统：ubuntu22.04OpenCV版本：OpenCV4.9IDE:VisualStudioCode编程语言：C++11算法描述根据3D-2D点对应关系找到物体的姿态。cv::solvePnP是OpenCV库中的一个函数，用于解决透视n点问题（Perspective-n-Point,PnP），即通过已知的3D点及其对应的2D图像点来估计物体的姿态（旋转和平移）。这个函数可以处理任意数量的点
神经架构搜索在大模型效率优化中的应用 AI大模型应用之禅计算机软件编程原理与应用实践 java python javascript kotlin golang 架构人工智能
神经架构搜索，大模型，效率优化，自动机器学习，深度学习1.背景介绍近年来，深度学习模型取得了令人瞩目的成就，在图像识别、自然语言处理、语音识别等领域展现出强大的能力。然而，随着模型规模的不断扩大，训练和部署这些大模型也带来了巨大的挑战。计算资源消耗巨大:大模型的训练需要大量的计算资源，例如高性能GPU和TPU，这导致训练成本高昂，难以普及。内存占用量大:大模型的参数量庞大，需要大量的内存进行存储和
卷积神经网络（CNN）：深度学习中的核心模型任义礼智信深度学习 cnn 人工智能
引言卷积神经网络（ConvolutionalNeuralNetworks,CNNs）是深度学习领域的一种重要模型，广泛应用于图像处理、计算机视觉、自然语言处理等多个领域。CNN凭借其卓越的特征提取能力和参数共享机制，已成为计算机视觉任务中最主流的算法之一。本文将深入探讨CNN的基本原理、结构组件、应用场景及其发展方向。CNN的基本原理CNN是一种特殊的前馈神经网络（FeedforwardNeura
深度学习图像算法中的网络架构：Backbone、Neck 和 Head 详解肥猪猪爸 #深度学习深度学习算法人工智能数据结构神经网络计算机视觉机器学习
深度学习已经成为图像识别领域的核心技术，特别是在目标检测、图像分割等任务中，深度神经网络的应用取得了显著进展。在这些任务的网络架构中，通常可以分为三个主要部分：Backbone、Neck和Head。这些部分在整个网络中扮演着至关重要的角色，它们各自处理不同的任务，从特征提取到最终的预测输出，形成了一个完整的图像处理流程。本文将详细介绍这三部分的作用以及它们在目标检测和图像分割中的应用，帮助大家更好
开源多模态推理模型QVQ：视觉推理能力的突破与未来展望前端
近年来，AI代码生成器等人工智能技术飞速发展，多模态推理模型作为其中一个重要分支，正展现出越来越强大的能力。它能够理解和处理多种类型的数据，例如图像、文本、音频等，并进行复杂的推理和决策。阿里云通义千问团队近日发布的QVQ-72B-Preview模型，就是一个极具代表性的例子。该模型开源且在视觉推理方面表现突出，为多模态模型的发展树立了新的里程碑。QVQ模型的核心能力与突破QVQ-72B-Prev
遗传算法与深度学习实战（25）——使用Keras构建卷积神经网络盼小辉丶遗传算法与深度学习实战深度学习 keras cnn
遗传算法与深度学习实战（25）——使用Keras构建卷积神经网络0.前言1.卷积神经网络基本概念1.1卷积1.2步幅1.3填充1.4激活函数1.5池化2.使用Keras构建卷积神经网络3.CNN层的问题4.模型泛化小结系列链接0.前言卷积神经网络(ConvolutionalNeuralNetwork,CNN)的提出是为了解决传统神经网络的缺陷。即使对象位于图片中的不同位置或其在图像中具有不同占比，
【精选】基于RFCBAMConv与YOLOv8优化的杂草分割系统农业智能检测平台、深度学习图像分割与注意力机制融合杂草智能识别与分类系统、深度学习目标分割优化改、进型YOLOv8杂草图像分割系统程序员阿龙深度学习实战案例 Python精选毕业设计 YOLO 感受野注意力卷积图像分割与分类智能农业图像分析农业智能检测系统农作物生长环境监测
博主介绍：✌我是阿龙，一名专注于Java技术领域的程序员，全网拥有10W+粉丝。作为CSDN特邀作者、博客专家、新星计划导师，我在计算机毕业设计开发方面积累了丰富的经验。同时，我也是掘金、华为云、阿里云、InfoQ等平台的优质作者。通过长期分享和实战指导，我致力于帮助更多学生完成毕业项目和技术提升。技术范围：我熟悉的技术领域涵盖SpringBoot、Vue、SSM、HLMT、Jsp、PHP、Nod
C++：实现聚类算法（附带源码） Katie。 c c++实现算法算法聚类支持向量机
项目介绍聚类是无监督学习中一种常用的算法，用于将数据集中的对象分组（称为簇），使得同一簇中的对象相似度较高，而不同簇之间的对象相似度较低。在许多领域，如数据挖掘、图像处理和模式识别等，聚类算法都有广泛应用。在本项目中，我们将实现最常见的聚类算法之一——K均值聚类（K-MeansClustering）。该算法的目标是通过迭代的方式将数据集划分为K个簇，每个簇由其中心（均值）表示。项目实现思路输入参数
Java - 文字识别；示例代码基于SpringAI和国产大模型沈询-阿里 microsoft 机器学习人工智能后端
文字识别在Java开发中的应用在Java开发中，将图像中的文字进行识别能力被广泛应用于多种场景，比如自动审核图片内容、商品搜索分析等。过去，这类需求主要通过OCR（光学字符识别）技术来实现，但其对于复杂图像的处理效果往往不尽人意。如今，随着大模型技术的发展，利用这些先进的AI模型进行文字识别成为可能，不仅大大提升了识别精度和速度，还能更好地理解图像中的复杂信息，为用户提供更加准确可靠的服务。本文采
《AI赋能光追：开启图形渲染新时代》人工智能深度学习
光线追踪技术是图形渲染领域的重大突破，能够通过模拟光的传播路径，精准渲染反射、折射、阴影和间接光照等效果，实现高度逼真的场景呈现。而人工智能的加入，更是为光线追踪技术带来了前所未有的变革，主要体现在以下几个方面：降噪传统光线追踪为减少计算量，向场景发射少量光线样本，会产生带噪点的斑点图像，需人工设计降噪器通过多帧累积或空间插值来处理，但存在增加开发成本、降低帧率等问题。AI驱动的降噪技术则引入神经
VLM 系列——Qwen2 VL——论文解读——前瞻（源码解读） TigerZ* AIGC算法 AIGC 人工智能 transformer 计算机视觉图像处理
一、概述1、是什么是一系列多模态大型语言模型（MLLM），其中包括2B、7B、72B三个版本，整体采用视觉编码器+LLM形式（可以认为没有任何投射层）。比较创新的是图像缩放方式+3DLLM位置编码+（预估后面的训练方式也不太一样）。能够处理包括文本、图像在内的多种数据类型，具备图片描述、单图文问答、多图问对话、视频理解对话、json格式、多语言、agent、高清图理解（代码编写和debug论文暂时
YOLOv11改进策略【Neck】| TPAMI 2024 FreqFusion 频域感知特征融合模块解决密集图像预测问题 Limiiiing YOLOv11改进专栏 YOLO 深度学习计算机视觉目标检测
一、本文介绍本文主要利用FreqFusion结构改进YOLOv11的目标检测网络模型。FreqFusion结构针对传统特征融合在密集图像预测中存在的问题，创新性地引入自适应低通滤波器生成器、偏移量生成器和自适应高通滤波器生成器。将FreqFusion应用于YOLOv11的改进过程中，能够使模型在处理复杂场景图像时，更精准地聚焦目标物体边界，减少背景噪声干扰，显著强化目标物体边界特征表达，进而提升模
人工智能在医疗领域的应用人工智能
人工智能在医疗领域的应用前景广阔。医疗机器人是其中之一，如智能假肢、外骨骼等可修复受损身体，IBM的达・芬奇手术系统等则能承担手术或医疗保健功能.智能药物研发借助深度学习技术，可快速准确挖掘筛选化合物或生物，缩短新药研发周期、降低成本、提高成功率，在心血管药、抗肿瘤药等研发中已取得突破.智能诊疗让计算机学习专家医疗知识，模拟思维和诊断推理，给出可靠诊断与治疗方案.智能影像识别可对医学影像进行图像识
主体分割技术，提升图像信息提取能力
在智能设备普及和AI技术进步的推动下，用户对线上互动的质量、个性化以及沉浸式体验的追求日益增强。例如，对于热衷于图片编辑或视频制作的用户来说，他们需要一种快速而简便的方法来将特定主体从背景中分离出来。HarmonyOSSDK基础视觉服务（CoreVisionKit）提供主体分割能力，可以检测出图片中区别于背景的前景物体或区域（即"显著主体"），并将其从背景中分离出来，适用于需要识别和提取图像主要信
Python小项目：利用U-net完成细胞图像分割
利用U-Net完成细胞图像分割的详细指南在生物医学领域，细胞图像分割是一个关键步骤，能够帮助研究人员分析细胞结构和功能。U-Net作为一种强大的卷积神经网络结构，广泛应用于医学图像分割任务。本文将详细介绍如何利用U-Net完成细胞图像分割项目，涵盖从数据准备到模型部署的各个步骤。项目步骤概览数据准备数据预处理构建U-Net模型训练模型模型评估图像分割结果可视化调优和优化部署和应用1.数据准备收集数
jupyter notebook练手项目：线性回归——学习时间与成绩的关系橙意满满的西瓜大侠机器学习 jupyter 线性回归机器学习
线性回归——学习时间与学习成绩的关系第1步：导入工具库pandas——数据分析库，提供了数据结构（如DataFrame和Series）和数据操作方法，方便对数据集进行读取、清洗、转换等操作。matplotlib——绘图库，pyplot提供了一系列简单易用的绘图函数，用于创建各种类型的图表，如折线图、散点图、柱状图等。%matplotlibinline——使matplotlib绘制的图像嵌入在Jup
CSS浮动：概念、特性与应用平常心cyk css 前端
CSS浮动是网页设计和开发中常见的布局技术之一，以下是CSS浮动相关的所有重要知识点：一、浮动的定义与语法浮动（float）属性可以指定一个元素应沿其容器的左侧或右侧放置，允许文本和内联元素环绕它。浮动属性最初只用于在一段文本内浮动图像，实现文字环绕的效果。作用让多个盒子(div)水平排列成一行，使得浮动成为布局的重要手段。可以实现盒子的左右对齐等等。浮动最早是用来控制图片，实现文字环绕图片的效果
如何用JavaScript判断前端应用运行环境（移动平台还是桌面环境）
在多模态模型的架构上，ChatGPT的绘图能力主要依赖以下几个核心组件：跨模态编码器（Cross-ModalEncoder）：跨模态编码器的作用是将文本和图像的特征进行对齐。GPT可以将用户输入的文本描述转换为文本特征表示，然后利用跨模态编码器将这些特征映射到图像特征空间。这种方式确保模型能够理解描述性语言中不同细节是如何与图像特征对应的。
基于社交网络算法优化的二维最大熵图像分割智能算法研学社（Jack旭）智能优化算法应用图像分割算法 php 开发语言
智能优化算法应用：基于社交网络优化的二维最大熵图像阈值分割-附代码文章目录智能优化算法应用：基于社交网络优化的二维最大熵图像阈值分割-附代码1.前言2.二维最大熵阈值分割原理3.基于社交网络优化的多阈值分割4.算法结果：5.参考文献：6.Matlab代码摘要：本文介绍基于最大熵的图像分割，并且应用社交网络算法进行阈值寻优。1.前言阅读此文章前，请阅读《图像分割：直方图区域划分及信息统计介绍》htt
使用LLaVa和Ollama实现多模态RAG示例 llzwxh888 python 人工智能开发语言
本文将详细介绍如何使用LLaVa和Ollama实现多模态RAG（检索增强生成），通过提取图像中的结构化数据、生成图像字幕等功能来展示这一技术的强大之处。安装环境首先，您需要安装以下依赖包：!pipinstallllama-index-multi-modal-llms-ollama!pipinstallllama-index-readers-file!pipinstallunstructured!p
番茄西红柿叶子病害分类数据集12882张11类别 futureflsl 数据集分类数据挖掘人工智能
数据集类型：图像分类用，不可用于目标检测无标注文件数据集格式：仅仅包含jpg图片，每个类别文件夹下面存放着对应图片图片数量(jpg文件个数)：12882分类类别数：11类别名称:["Bacterial_Spot_Bacteria","Early_Blight_Fungus","Healthy","Late_Blight_Water_Mold","Leaf_Mold_Fungus","Powdery
redis学习笔记——不仅仅是存取数据 Everyday都不同 returnSource expire/del incr/lpush 数据库分区 redis
最近项目中用到比较多redis，感觉之前对它一直局限于get/set数据的层面。其实作为一个强大的NoSql数据库产品，如果好好利用它，会带来很多意想不到的效果。（因为我搞java，所以就从jedis的角度来补充一点东西吧。PS：不一定全，只是个人理解，不喜勿喷） 1、关于JedisPool.returnSource(Jedis jeids) 这个方法是从red
SQL性能优化-持续更新中。。。。。。 atongyeye oracle sql
1 通过ROWID访问表--索引你可以采用基于ROWID的访问方式情况,提高访问表的效率, , ROWID包含了表中记录的物理位置信息..ORACLE采用索引(INDEX)实现了数据和存放数据的物理位置(ROWID)之间的联系. 通常索引提供了快速访问ROWID的方法,因此那些基于索引列的查询就可以得到性能上的提高. 2 共享SQL语句--相同的sql放入缓存 3 选择最有效率的表
[JAVA语言]JAVA虚拟机对底层硬件的操控还不完善 comsci JAVA虚拟机
如果我们用汇编语言编写一个直接读写CPU寄存器的代码段，然后利用这个代码段去控制被操作系统屏蔽的硬件资源，这对于JVM虚拟机显然是不合法的，对操作系统来讲，这样也是不合法的，但是如果是一个工程项目的确需要这样做，合同已经签了，我们又不能够这样做，怎么办呢？那么一个精通汇编语言的那种X客，是否在这个时候就会发生某种至关重要的作用呢？ &n
lvs- real 男人50 LVS
#!/bin/bash # # Script to start LVS DR real server. # description: LVS DR real server # #. /etc/rc.d/init.d/functions VIP=10.10.6.252 host='/bin/hostname' case "$1" in sta
生成公钥和私钥 oloz DSA 安全加密
package com.msserver.core.util; import java.security.KeyPair; import java.security.PrivateKey; import java.security.PublicKey; import java.security.SecureRandom; public class SecurityUtil {
UIView 中加入的cocos2d，背景透明 374016526 cocos2d glClearColor
要点是首先pixelFormat:kEAGLColorFormatRGBA8，必须有alpha层才能透明。然后view设置为透明glView.opaque = NO;[director setOpenGLView:glView];[self.viewController.view setBackgroundColor:[UIColor clearColor]];[self.viewControll
mysql常用命令香水浓 mysql
连接数据库 mysql -u troy -ptroy 备份表 mysqldump -u troy -ptroy mm_database mm_user_tbl > user.sql 恢复表（与恢复数据库命令相同） mysql -u troy -ptroy mm_database < user.sql 备份数据库 mysqldump -u troy -ptroy
我的架构经验系列文章 - 后端架构 - 系统层面 agevs JavaScript jquery css html5
系统层面：高可用性所谓高可用性也就是通过避免单独故障加上快速故障转移实现一旦某台物理服务器出现故障能实现故障快速恢复。一般来说，可以采用两种方式，如果可以做业务可以做负载均衡则通过负载均衡实现集群，然后针对每一台服务器进行监控，一旦发生故障则从集群中移除；如果业务只能有单点入口那么可以通过实现Standby机加上虚拟IP机制，实现Active机在出现故障之后虚拟IP转移到Standby的快速
利用ant进行远程tomcat部署 aijuans tomcat
在javaEE项目中，需要将工程部署到远程服务器上，如果部署的频率比较高，手动部署的方式就比较麻烦，可以利用Ant工具实现快捷的部署。这篇博文详细介绍了ant配置的步骤（http://www.cnblogs.com/GloriousOnion/archive/2012/12/18/2822817.html），但是在tomcat7以上不适用，需要修改配置，具体如下： 1.配置tomcat的用户角色
获取复利总收入 baalwolf 获取
public static void main(String args[]){ int money=200; int year=1; double rate=0.1; &
eclipse.ini解释 BigBird2012 eclipse
大多数java开发者使用的都是eclipse，今天感兴趣去eclipse官网搜了一下eclipse.ini的配置，供大家参考，我会把关键的部分给大家用中文解释一下。还是推荐有问题不会直接搜谷歌，看官方文档，这样我们会知道问题的真面目是什么，对问题也有一个全面清晰的认识。 Overview 1、Eclipse.ini的作用 Eclipse startup is controlled by th
AngularJS实现分页功能 bijian1013 JavaScript AngularJS 分页
对于大多数web应用来说显示项目列表是一种很常见的任务。通常情况下，我们的数据会比较多，无法很好地显示在单个页面中。在这种情况下，我们需要把数据以页的方式来展示，同时带有转到上一页和下一页的功能。既然在整个应用中这是一种很常见的需求，那么把这一功能抽象成一个通用的、可复用的分页（Paginator）服务是很有意义的。 &nbs
[Maven学习笔记三]Maven archetype bit1129 ArcheType
archetype的英文意思是原型，Maven archetype表示创建Maven模块的模版，比如创建web项目，创建Spring项目等等. mvn archetype提供了一种命令行交互式创建Maven项目或者模块的方式， mvn archetype 1.在LearnMaven-ch03目录下，执行命令mvn archetype:gener
【Java命令三】jps bit1129 Java命令
jps很简单，用于显示当前运行的Java进程，也可以连接到远程服务器去查看 [hadoop@hadoop bin]$ jps -help usage: jps [-help] jps [-q] [-mlvV] [<hostid>] Definitions: <hostid>: <hostname>[:
ZABBIX2.2 2.4 等各版本之间的兼容性 ronin47
zabbix更新很快，从2009年到现在已经更新多个版本，为了使用更多zabbix的新特性，随之而来的便是升级版本，zabbix版本兼容性是必须优先考虑的一点客户端AGENT兼容 zabbix1.x到zabbix2.x的所有agent都兼容zabbix server2.4：如果你升级zabbix server，客户端是可以不做任何改变，除非你想使用agent的一些新特性。 Zabbix代理（p
unity 3d还是cocos2dx哪个适合游戏？ brotherlamp unity自学 unity教程 unity视频 unity资料 unity
unity 3d还是cocos2dx哪个适合游戏？问：unity 3d还是cocos2dx哪个适合游戏？答：首先目前来看unity视频教程因为是3d引擎，目前对2d支持并不完善，unity 3d 目前做2d普遍两种思路，一种是正交相机，3d画面2d视角，另一种是通过一些插件，动态创建mesh来绘制图形单元目前用的较多的是2d toolkit，ex2d，smooth moves，sm2，
百度笔试题：一个已经排序好的很大的数组，现在给它划分成m段，每段长度不定，段长最长为k，然后段内打乱顺序，请设计一个算法对其进行重新排序 bylijinnan java 算法面试百度招聘
import java.util.Arrays; /** * 最早是在陈利人老师的微博看到这道题： * #面试题#An array with n elements which is K most sorted，就是每个element的初始位置和它最终的排序后的位置的距离不超过常数K * 设计一个排序算法。It should be faster than O(n*lgn)。
获取checkbox复选框的值 chiangfai checkbox
<title>CheckBox</title> <script type = "text/javascript"> doGetVal: function doGetVal() { //var fruitName = document.getElementById("apple").value;//根据
MySQLdb用户指南 chenchao051 mysqldb
原网页被墙，放这里备用。 MySQLdb User's Guide Contents Introduction Installation _mysql MySQL C API translation MySQL C API function mapping Some _mysql examples MySQLdb
HIVE 窗口及分析函数 daizj hive 窗口函数分析函数
窗口函数应用场景：（1）用于分区排序（2）动态Group By （3）Top N （4）累计计算（5）层次查询一、分析函数用于等级、百分点、n分片等。函数说明 RANK() &nbs
PHP ZipArchive 实现压缩解压Zip文件 dcj3sjt126com PHP zip
PHP ZipArchive 是PHP自带的扩展类，可以轻松实现ZIP文件的压缩和解压，使用前首先要确保PHP ZIP 扩展已经开启，具体开启方法就不说了，不同的平台开启PHP扩增的方法网上都有，如有疑问欢迎交流。这里整理一下常用的示例供参考。一、解压缩zip文件 01 02 03 04 05 06 07 08 09 10 11
精彩英语贺词 dcj3sjt126com 英语
I'm always here 我会一直在这里支持你 &nb
基于Java注解的Spring的IoC功能 e200702084 java spring bean IOC Office
java模拟post请求 geeksun java
一般API接收客户端（比如网页、APP或其他应用服务）的请求，但在测试时需要模拟来自外界的请求，经探索，使用HttpComponentshttpClient可模拟Post提交请求。此处用HttpComponents的httpclient来完成使命。 import org.apache.http.HttpEntity ; import org.apache.http.HttpRespon
Swift语法之 ---- ?和!区别 hongtoushizi ?swift !
转载自： http://blog.sina.com.cn/s/blog_71715bf80102ux3v.html Swift语言使用var定义变量，但和别的语言不同，Swift里不会自动给变量赋初始值，也就是说变量不会有默认值，所以要求使用变量之前必须要对其初始化。如果在使用变量之前不进行初始化就会报错： var stringValue : String //
centos7安装jdk1.7 jisonami jdk centos
安装JDK1.7 步骤1、解压tar包在当前目录 [root@localhost usr]#tar -xzvf jdk-7u75-linux-x64.tar.gz 步骤2：配置环境变量在etc/profile文件下添加 export JAVA_HOME=/usr/java/jdk1.7.0_75 export CLASSPATH=/usr/java/jdk1.7.0_75/lib
数据源架构模式之数据映射器 home198979 PHP 架构数据映射器 datamapper
前面分别介绍了数据源架构模式之表数据入口、数据源架构模式之行和数据入口数据源架构模式之活动记录，相较于这三种数据源架构模式，数据映射器显得更加“高大上”。一、概念数据映射器（Data Mapper）：在保持对象和数据库（以及映射器本身）彼此独立的情况下，在二者之间移动数据的一个映射器层。概念永远都是抽象的，简单的说，数据映射器就是一个负责将数据映射到对象的类数据。 &nb
在Python中使用MYSQL pda158 mysql python
缘由　　近期在折腾一个小东西须要抓取网上的页面。然后进行解析。将结果放到数据库中。　　了解到 Python在这方面有优势，便选用之。　　由于我有台 server上面安装有 mysql，自然使用之。在进行数据库的这个操作过程中遇到了不少问题，这里记录一下，大家共勉。　　 python中mysql的调用　　百度之后能够通过MySQLdb进行数据库操作。
单例模式 hxl1988_0311 java 单例设计模式单件
package com.sosop.designpattern.singleton; /* * 单件模式：保证一个类必须只有一个实例，并提供全局的访问点 * * 所以单例模式必须有私有的构造器，没有私有构造器根本不用谈单件 * * 必须考虑到并发情况下创建了多个实例对象 * */ /** * 虽然有锁，但是只在第一次创建对象的时候加锁，并发时不会存在效率
27种迹象显示你应该辞掉程序员的工作 vipshichg 工作
1、你仍然在等待老板在2010年答应的要提拔你的暗示。 2、你的上级近10年没有开发过任何代码。 3、老板假装懂你说的这些技术，但实际上他完全不知道你在说什么。 4、你干完的项目6个月后才部署到现场服务器上。 5、时不时的，老板在检查你刚刚完成的工作时，要求按新想法重新开发。 6、而最终这个软件只有12个用户。 7、时间全浪费在办公室政治中，而不是用在开发好的软件上。 8、部署前5分钟才开始测试。