yaml中使用记录及环境变量的交互

hydra,yaml

  • yaml配合hydra,omegaconf
    • 使用环境变量
    • 查看系统环境变量
    • notebook 中加载环境变量
  • hydra 配置框架用法详解
    • 1
    • 2
    • 3
    • 4
    • 5 tab 补全
    • 6 ConfigStore
    • 7 ConfigStore 组配置
    • 8 配置继承
    • 9 只读的config
    • 10 [hydra config 的结构](https://github.com/facebookresearch/hydra/blob/main/hydra/conf/__init__.py)
    • 11 帮助
    • 12 插件
    • 13 hydra术语
    • 14 hydra 对象实例化
    • 15 组合
    • 16 应用安装示例
    • 17 Callbacks

yaml配合hydra,omegaconf

使用环境变量

name: &name "Citrinet-512-8x-Stride"

model:
  sample_rate: &sample_rate 16000

  train_ds:
    manifest_filepath: "${oc.env:train_data_dir}/train_manifest.json"

查看系统环境变量

在命令行输入以下命令

printenv

查看PATH环境变量

printenv PATH

查看sehll变量

set

命令行新增shell变量

TEST_VAR='Hello World!'

查看刚刚新增的变量

set | grep TEST_VAR

可以查看环境变量中是否有这个变量

printenv | grep TEST_VAR  #没有输出

notebook 中加载环境变量

%dotenv -v
print(os.environ.get('train_data_dir'))

或者指定一个env文件

%dotenv /ntt/path/to/.env

python dotenv加载的环境变量,是不能在shell中读取到的

shell 中设置环境变量,创建env.sh输入

export qqqq=123123

命令行输入source env.sh
shell脚本中的路径拼接echo $train_data_dir"/train_manifest.json"

以指定环境变量运行python 脚本

train_data_dir=/ntt/aldata python app.py

hydra 配置框架用法详解

1

创建config.yaml

model:
  sample_rate: &sample_rate 16000

  train_ds:
    manifest_filepath: ???
    sample_rate: 16000
    batch_size: 32
    trim_silence: false
    max_duration: 16.7
    shuffle: true
    is_tarred: false
    tarred_audio_filepaths: null
    use_start_end_token: false
    
defaults:
  - _self_  #放在- foo: bar的前面优先使用defaults下面的- foo: bar中的值
  - foo: bar
  #- _self_ #放在- foo: bar的后面优先使用defaults外面的- foo: bar中的值

同级目录下创建test.py

from omegaconf import DictConfig, OmegaConf
import hydra
from nemo.core.config import hydra_runner

# @hydra.main(config_path='.',config_name="config")
@hydra_runner(config_path='.',config_name="config")
def my_app(cfg: DictConfig) -> None:
    print(OmegaConf.to_yaml(cfg))

if __name__ == "__main__":
    my_app()

命令行执行

python3 test.py model.train_ds.sample_rate=$sample_rate model.train_ds.manifest_filepath=/home/hydra/train_manifest.json +model.val_ds.sample_rate=44444 ++model.train_ds.batch_size=545 ~foo

解释+新增(yaml文件中不存在),~删除(yaml文件中存在的某个键值对),++(yaml文件中)存在则覆盖,不存在则新增

2

from hydra.utils import get_original_cwd, to_absolute_path

@hydra.main()
def my_app(_cfg: DictConfig) -> None:
    print(f"Current working directory : {os.getcwd()}")
    print(f"Orig working directory    : {get_original_cwd()}")
    print(f"to_absolute_path('foo')   : {to_absolute_path('foo')}")
    print(f"to_absolute_path('/foo')  : {to_absolute_path('/foo')}")

3

命令行中 hydra.verbose=true可以输出DEBUG级别的日志信息,
相似的还有hydra/job_logging=disabled关闭log输出

import logging
from omegaconf import DictConfig
import hydra

# A logger for this file
log = logging.getLogger(__name__)

@hydra.main()
def my_app(_cfg: DictConfig) -> None:
    log.info("Info level message")
    log.debug("Debug level message")

if __name__ == "__main__":
    my_app()
python3 test.py hydra.verbose=true
python my_app.py hydra.verbose=[__main__,hydra]

4

--cfg 可选的有job,hydra,all

python my_app.py --cfg job

类似有
您可以使用 --package-p 来显示配置的子集
--info 标志可以提供有关 Hydra 和您的应用程序的各个方面的信息

  • --info all
  • --info config
  • --info defaults
  • --info defaults-tree
  • --info plugins

5 tab 补全

shell有多种,比如bash、zsh、csh、ksh、sh、tcsh等
Zsh与现有bash完全兼容

eval "$(python my_app.py -sc install=bash)"

6 ConfigStore

from dataclasses import dataclass

import hydra
from hydra.core.config_store import ConfigStore

@dataclass
class MySQLConfig:
    host: str = "localhost"
    port: int = 3306

cs = ConfigStore.instance()
# Registering the Config class with the name 'config'.
cs.store(name="config", node=MySQLConfig)

@hydra.main(config_path=None, config_name="config")
def my_app(cfg: MySQLConfig) -> None:
    # pork should be port!
    if cfg.pork == 80:
        print("Is this a webserver?!")

if __name__ == "__main__":
    my_app()

7 ConfigStore 组配置

@dataclass
class MySQLConfig:
    driver: str = "mysql"
    host: str = "localhost"
    port: int = 3306

@dataclass
class PostGreSQLConfig:
    driver: str = "postgresql"
    host: str = "localhost"
    port: int = 5432
    timeout: int = 10

@dataclass
class Config:
    # We will populate db using composition.
    db: Any

# Create config group `db` with options 'mysql' and 'postgreqsl'
cs = ConfigStore.instance()
cs.store(name="config", node=Config)
cs.store(group="db", name="mysql", node=MySQLConfig)
cs.store(group="db", name="postgresql", node=PostGreSQLConfig)

@hydra.main(config_path=None, config_name="config")
def my_app(cfg: Config) -> None:
    print(OmegaConf.to_yaml(cfg))

8 配置继承

MISSING 分配给字段以指示它没有默认值。 这相当于???

from omegaconf import MISSING

@dataclass
class DBConfig:
    host: str = "localhost"
    port: int = MISSING
    driver: str = MISSING

@dataclass
class MySQLConfig(DBConfig):
    driver: str = "mysql"
    port: int = 3306

@dataclass
class PostGreSQLConfig(DBConfig):
    driver: str = "postgresql"
    port: int = 5432
    timeout: int = 10

@dataclass
class Config:
    # We can now annotate db as DBConfig which
    # improves both static and dynamic type safety.
    db: DBConfig

9 只读的config

@dataclass(frozen=True)
class SerialPort:
    baud_rate: int = 19200
    data_bits: int = 8
    stop_bits: int = 1


cs = ConfigStore.instance()
cs.store(name="config", node=SerialPort)


@hydra.main(config_name="config")
def my_app(cfg: SerialPort) -> None:
    print(cfg)


if __name__ == "__main__":
    my_app()

10 hydra config 的结构

环境变量设置

hydra:
  job:
    env_copy:# 复制已有的环境变量
      - AWS_KEY
    env_set:
      RANK: ${hydra:job.num} # 设置环境变量rank
defaults:
  - override hydra/job_logging: custom

配置输出路径

hydra:
  run:
    dir: ./outputs/${now:%Y-%m-%d}/${now:%H-%M-%S}
    #dir: outputs/${hydra.job.name}/${now:%Y-%m-%d_%H-%M-%S}
    #dir: outputs/${now:%Y-%m-%d_%H-%M-%S}/opt:${optimizer.type}

11 帮助

python my_app.py --help
python my_app.py --hydra-help

12 插件

带颜色的log

pip install hydra_colorlog --upgrade
defaults:
  - override hydra/job_logging: colorlog
  - override hydra/hydra_logging: colorlog

其他插件

13 hydra术语

1,Input Configs
2,Config files
3,Structured Config
4,Other configs

  • Primary Config
  • Output Config

5,Overrides
6,Defaults List
7,Config Group
8,Config Group Option
9,Package
10,Config Search Path类似PYTHONPATH
11,Plugins

14 hydra 对象实例化

1,在一个my_app.py中创建

class Optimizer:
    algo: str
    lr: float

    def __init__(self, algo: str, lr: float) -> None:
        self.algo = algo
        self.lr = lr

2,yaml 文件中按如下写_target_

optimizer:
  _target_: my_app.Optimizer
  algo: SGD
  lr: 0.01

3,实例化

opt = instantiate(cfg.optimizer)
print(opt)
# Optimizer(algo=SGD,lr=0.01)

# override parameters on the call-site
opt = instantiate(cfg.optimizer, lr=0.2)
print(opt)
# Optimizer(algo=SGD,lr=0.2)

4,递归实例化参考这儿
5,禁用递归_recursive_=False

optimizer = instantiate(cfg.trainer, _recursive_=False)
print(optimizer)

15 组合

from hydra import compose, initialize
from omegaconf import OmegaConf

if __name__ == "__main__":
    # context initialization
    with initialize(config_path="conf", job_name="test_app"):
        cfg = compose(config_name="config", overrides=["db=mysql", "db.user=me"])
        print(OmegaConf.to_yaml(cfg))

    # global initialization
    initialize(config_path="conf", job_name="test_app")
    cfg = compose(config_name="config", overrides=["db=mysql", "db.user=me"])
    print(OmegaConf.to_yaml(cfg))

16 应用安装示例

$ python examples/advanced/hydra_app_example/hydra_app/main.py
dataset:
  name: imagenet
  path: /datasets/imagenet
$ pip install examples/advanced/hydra_app_example
...
Successfully installed hydra-app-0.1
$ hydra_app
dataset:
  name: imagenet
  path: /datasets/imagenet

17 Callbacks

from hydra.experimental.callback import Callback

class MyCallback(Callback):
   def __init__(self, bucket: str, file_path: str) -> None:
        self.bucket = bucket
        self.file_path = file_path

   def on_job_end(self, config: DictConfig, **kwargs: Any) -> None:
        print(f"Job ended,uploading...")
        # uploading...

@hydra.main(config_path="conf", config_name="config")
def my_app(cfg: DictConfig) -> None:
    print(OmegaConf.to_yaml(cfg))


if __name__ == "__main__":
    my_app()

你可能感兴趣的:(笔记,python,hydra)