Python library with Neural Networks for Image
Segmentation based on PyTorch.
The main features of this library are:
High level API (just two lines to create neural network)
5 models architectures for binary and multi class segmentation (including legendary Unet)
46 available encoders for each architecture
All encoders have pre-trained weights for faster and better convergence
Table of content
Quick start
Since the library is built on the PyTorch framework, created segmentation model is just a PyTorch nn.Module, which can be created as easy as:
import segmentation_models_pytorch as smp
model = smp.Unet()
Depending on the task, you can change the network architecture by choosing backbones with fewer or more parameters and use pretrainded weights to initialize it:
model = smp.Unet('resnet34', encoder_weights='imagenet')
Change number of output classes in the model:
model = smp.Unet('resnet34', classes=3, activation='softmax')
All models have pretrained encoders, so you have to prepare your data the same way as during weights pretraining:
from segmentation_models_pytorch.encoders import get_preprocessing_fn
preprocess_input = get_preprocessing_fn('resnet18', pretrained='imagenet')
Examples
Training model for cars segmentation on CamVid dataset here.
Training SMP model with Catalyst (high-level framework for PyTorch), Ttach (TTA library for PyTorch) and Albumentations (fast image augmentation library) - here
Models
Architectures
Encoders
Encoder
Weights
Params, M
resnet18
imagenet
11M
resnet34
imagenet
21M
resnet50
imagenet
23M
resnet101
imagenet
42M
resnet152
imagenet
58M
resnext50_32x4d
imagenet
22M
resnext101_32x8d
imagenet
86M
resnext101_32x16d
191M
resnext101_32x32d
466M
resnext101_32x48d
826M
dpn68
imagenet
11M
dpn68b
imagenet+5k
11M
dpn92
imagenet+5k
34M
dpn98
imagenet
58M
dpn107
imagenet+5k
84M
dpn131
imagenet
76M
vgg11
imagenet
9M
vgg11_bn
imagenet
9M
vgg13
imagenet
9M
vgg13_bn
imagenet
9M
vgg16
imagenet
14M
vgg16_bn
imagenet
14M
vgg19
imagenet
20M
vgg19_bn
imagenet
20M
senet154
imagenet
113M
se_resnet50
imagenet
26M
se_resnet101
imagenet
47M
se_resnet152
imagenet
64M
se_resnext50_32x4d
imagenet
25M
se_resnext101_32x4d
imagenet
46M
densenet121
imagenet
6M
densenet169
imagenet
12M
densenet201
imagenet
18M
densenet161
imagenet
26M
inceptionresnetv2
imagenet
imagenet+background
54M
inceptionv4
imagenet
imagenet+background
41M
efficientnet-b0
imagenet
4M
efficientnet-b1
imagenet
6M
efficientnet-b2
imagenet
7M
efficientnet-b3
imagenet
10M
efficientnet-b4
imagenet
17M
efficientnet-b5
imagenet
28M
efficientnet-b6
imagenet
40M
efficientnet-b7
imagenet
63M
mobilenet_v2
imagenet
2M
xception
imagenet
22M
timm-efficientnet-b0
imagenet
advprop
noisy-student
4M
timm-efficientnet-b1
imagenet
advprop
noisy-student
6M
timm-efficientnet-b2
imagenet
advprop
noisy-student
7M
timm-efficientnet-b3
imagenet
advprop
noisy-student
10M
timm-efficientnet-b4
imagenet
advprop
noisy-student
17M
timm-efficientnet-b5
imagenet
advprop
noisy-student
28M
timm-efficientnet-b6
imagenet
advprop
noisy-student
40M
timm-efficientnet-b7
imagenet
advprop
noisy-student
63M
timm-efficientnet-b8
imagenet
advprop
84M
timm-efficientnet-l2
noisy-student
474M
Models API
model.encoder - pretrained backbone to extract features of different spatial resolution
model.decoder - depends on models architecture (Unet/Linknet/PSPNet/FPN)
model.segmentation_head - last block to produce required number of mask channels (include also optional upsampling and activation)
model.classification_head - optional block which create classification head on top of encoder
model.forward(x) - sequentially pass x through model`s encoder, decoder and segmentation head (and classification head if specified)
Input channels
Input channels parameter allow you to create models, which process tensors with arbitrary number of channels. If you use pretrained weights from imagenet - weights of first convolution will be reused for 1- or 2- channels inputs, for input channels > 4 weights of first convolution will be initialized randomly.
model = smp.FPN('resnet34', in_channels=1)
mask = model(torch.ones([1, 1, 64, 64]))
Auxiliary classification output
All models support aux_params parameters, which is default set to None. If aux_params = None than classification auxiliary output is not created, else model produce not only mask, but also label output with shape NC. Classification head consist of GlobalPooling->Dropout(optional)->Linear->Activation(optional) layers, which can be configured by aux_params as follows:
aux_params=dict(
pooling='avg', # one of 'avg', 'max'
dropout=0.5, # dropout ratio, default is None
activation='sigmoid', # activation function, default is None
classes=4, # define number of output labels
)
model = smp.Unet('resnet34', classes=4, aux_params=aux_params)
mask, label = model(x)
Depth
Depth parameter specify a number of downsampling operations in encoder, so you can make your model lighted if specify smaller depth.
model = smp.Unet('resnet34', encoder_depth=4)
Installation
PyPI version:
$ pip install segmentation-models-pytorch
Latest version from source:
$ pip install git+https://github.com/qubvel/segmentation_models.pytorch
Competitions won with the library
Segmentation Models package is widely used in the image segmentation competitions. Here you can find competitions, names of the winners and links to their solutions.
Contributing
Run test
$ docker build -f docker/Dockerfile.dev -t smp:dev . && docker run --rm smp:dev pytest -p no:cacheprovider
Generate table
$ docker build -f docker/Dockerfile.dev -t smp:dev . && docker run --rm smp:dev python misc/generate_table.py
Citing
@misc{Yakubovskiy:2019,
Author = {Pavel Yakubovskiy},
Title = {Segmentation Models Pytorch},
Year = {2020},
Publisher = {GitHub},
Journal = {GitHub repository},
Howpublished = {\url{https://github.com/qubvel/segmentation_models.pytorch}}
}
License
Project is distributed under MIT License