wingbeats.modelling package¶
Submodules¶
wingbeats.modelling.builds module¶
Library for factory functions to define model architectures within the Functional API
-
wingbeats.modelling.builds.build_embedder(in_shape, out_shape, f_extractor, reg_param=0.001, input_name='input_signal', model_name='Embedder', training=None)[source]¶ Build model for learning hierarchical class embeddings.
- Architectures
f_extractor(Layer) + Dense + L2
- Outputs
predicted embedding
- Parameters
in_shape (tuple) – Input shape. No need to specify batch dimension.
out_shape (tuple) – Model output shape (here equal to the size of embedded taxonomic level).
f_extractor (tf.Layer) – Feature extractor.
reg_param (float) – Regularization parameter for the weights in the Dense layer. Defaults to 0.001.
input_name (str) – Name of the input. Defaults to ‘input_signal’.
model_name (str) – Name of the architecture. Defaults to ‘Embedder’.
training (bool, optional) – Whether to run model in training mode. Particularly relevant for layers such as Batch Normalization and Dropout. Defaults to None.
- Returns
Embedder
-
wingbeats.modelling.builds.build_hiera_classifier(in_shape, out_shapes, f_extractor, reg_param=0.001, taxonomic_levels=['genus', 'species'], parallel=True, input_name='input_signal', model_name='Hiera_Classifier', training=None)[source]¶ Build model for classifying signals according to more than one taxonomic level.
- Architectures (layers in brackets are branched out)
(series) f_extractor(Layer) + Dense(+ Softmax) + DenseBlock(+ Softmax) … + DenseBlock + Softmax
(parallel) f_extractor(Layer) (+ Dense + Softmax) (+ Dense + Softmax) …
- Outputs
predicted class probabilities
- Parameters
in_shape (tuple) – Input shape. No need to specify batch dimension.
out_shape (tuple) – Model output shape (here equal to the size of embedded taxonomic level).
f_extractor (tf.Layer) – Feature extractor.
reg_param (float) – Regularization parameter for the weights in the Dense layer. Defaults to 0.001.
taxonomic_levels (list) – Taxonomic levels to include in the loss function. The model only predicts one but multiple superior levels can be inferred from the predicted one and penalized in the loss. Defaults to [‘genus’, ‘species’].
parallel (bool) – Whether to attach parallel Dense layers for every prediction. Otherwise, they are connected one after another. Defaults to True.
input_name (str) – Name of the input. Defaults to ‘input_signal’.
model_name (str) – Name of the architecture. Defaults to ‘Hiera_Classifier’.
training (bool, optional) – Whether to run model in training mode. Particularly relevant for layers such as Batch Normalization and Dropout. Defaults to None.
- Returns
Hierarchical Classifier
-
wingbeats.modelling.builds.build_hiera_embedder_classifier(in_shape, out_shapes, f_extractor, reg_param=0.001, taxonomic_levels=['genus', 'species'], parallel=True, input_name='input_signal', model_name='Hiera_Embedder_Classifier', training=None)[source]¶ Build model for learning embeddings and classifying signals according to more than one taxonomic level.
- Architectures (layers in brackets are branched out)
(series) f_extractor(Layer) + Dense (+ L2) + DenseBlock(+ Softmax) + … + DenseBlock + Softmax
(parallel) f_extractor(Layer) + Dense (+ L2) (+ DenseBlock + Softmax) …
- Outputs
predicted embedding and class probabilities
- Parameters
in_shape (tuple) – Input shape. No need to specify batch dimension.
out_shape (tuple) – Model output shape (here equal to the size of embedded taxonomic level).
f_extractor (tf.Layer) – Feature extractor.
reg_param (float) – Regularization parameter for the weights in the Dense layer. Defaults to 0.001.
taxonomic_levels (list) – Taxonomic levels to include in the loss function. The model only predicts one but multiple superior levels can be inferred from the predicted one and penalized in the loss. Defaults to [‘genus’, ‘species’].
parallel (bool) – Whether to attach parallel Dense layers for every prediction. Otherwise, they are connected one after another. Defaults to True.
input_name (str) – Name of the input. Defaults to ‘input_signal’.
model_name (str) – Name of the architecture. Defaults to ‘Hiera_Embedder_Classifier’.
training (bool, optional) – Whether to run model in training mode. Particularly relevant for layers such as Batch Normalization and Dropout. Defaults to None.
- Returns
Hierarchical Embedder Classifier
-
wingbeats.modelling.builds.build_simple_classifier(in_shape, out_shape, f_extractor, reg_param=0.001, taxonomic_levels=['species'], input_name='input_signal', model_name='Simple_Classifier', training=None)[source]¶ Build model for classifying signals according to only one taxonomic level i.e. genus or species. It is possible to extend the loss function to penalize the model for getting wrong higher hierarchies, as well (just add them to the tax_levels list).
- Architectures
f_extractor(Layer) + Dense + Softmax
- Outputs
predicted class probabilities
- Parameters
in_shape (tuple) – Input shape. No need to specify batch dimension.
out_shape (tuple) – Model output shape (here equal to the size of embedded taxonomic level).
f_extractor (tf.Layer) – Feature extractor.
reg_param (float) – Regularization parameter for the weights in the Dense layer. Defaults to 0.001.
taxonomic_levels (list) – Taxonomic levels to include in the loss function. The model only predicts one but multiple superior levels can be inferred from the predicted one and penalized in the loss. Defaults to [‘species’].
input_name (str) – Name of the input. Defaults to ‘input_signal’.
model_name (str) – Name of the architecture. Defaults to ‘Simple_Classifier’.
training (bool, optional) – Whether to run model in training mode. Particularly relevant for layers such as Batch Normalization and Dropout. Defaults to None.
- Returns
Simple Classifier
-
wingbeats.modelling.builds.build_simple_embedder_classifier(in_shape, out_shapes, f_extractor, reg_param=0.001, taxonomic_levels=['species'], input_name='input_signal', model_name='Simple_Embedder_Classifier', training=None)[source]¶ Build model for learning the embedding of one taxonomic level and classifying signals according to only one taxonomic level (does not have to coincide to the embedding). It is possible to extend the loss function to penalize the model for getting wrong higher hierarchies, as well (just add them to the tax_levels list).
- Architectures (layers in brackets are branched out)
f_extractor(Layer) + Dense(+ L2) + DenseBlock + Softmax
- Outputs
predicted embedding and class probabilities
- Parameters
in_shape (tuple) – Input shape. No need to specify batch dimension.
out_shape (tuple) – Model output shape (here equal to the size of embedded taxonomic level).
f_extractor (tf.Layer) – Feature extractor.
reg_param (float) – Regularization parameter for the weights in the Dense layer. Defaults to 0.001.
taxonomic_levels (list) – Taxonomic levels to include in the loss function. The model only predicts one but multiple superior levels can be inferred from the predicted one and penalized in the loss. Defaults to [‘species’].
input_name (str) – Name of the input. Defaults to ‘input_signal’.
model_name (str) – Name of the architecture. Defaults to ‘Simple_Embedder_Classifier’.
training (bool, optional) – Whether to run model in training mode. Particularly relevant for layers such as Batch Normalization and Dropout. Defaults to None.
- Returns
Simple Embedder Classifier
-
wingbeats.modelling.builds.embed(x, out_shape, reg_param=0.001, apply_l2=False)[source]¶ Compute embedding of a vector by passing it through a Dense layer and l2-normalizing it (meant to be used as module in models with embedding layers).
- Parameters
x (Tensor) – Input features.
out_shape (int) – Dense layer output shape.
reg_param (float) – Regularization parameter for the weights in the Dense layer. Defaults to 0.001.
apply_l2 (bool) – Whether to l2-normalize the features output by the Dense layer. Defaults to False.
- Returns
Predicted embedding.
-
wingbeats.modelling.builds.predict_prob(x, out_shape, reg_param=0.001, taxonomic_level='species', add_softmax=True, as_block=False)[source]¶ Compute (normalized) probabilities of signal belonging to different classes of specified taxonomic level.
- Parameters
x (Tensor) – Input features.
out_shape (int) – Dense layer output shape.
reg_param (float) – Regularization parameter for the weights in the Dense layer. Defaults to 0.001.
taxonomic_level (str) – Predicted taxonomic level. Defaults to ‘species’.
add_softmax (bool) – Whether to normalize the probabilities with a Softmax layer. Defaults to True.
as_block (bool) – Whether to pass x through a simple Dense layer or a Dense block. See wingbeats.modelling.layers. Defaults to False.
- Returns
Predicted probability vector.
wingbeats.modelling.callbacks module¶
Library for model callbacks to be used during training
-
wingbeats.modelling.callbacks.lr_exp(epoch, max_lr, max_ep=30)[source]¶ Define an exponentially decaying learning rate schedule.
- Parameters
epoch (int) – Current epoch.
max_lr (float) – Maximum learning rate.
max_ep (int) – How many epochs the pattern should repeat before the learning rate stays constant. Defaults to 30.
- Returns
Learning rate
- Return type
float
-
wingbeats.modelling.callbacks.lr_triangle(epoch, max_lr, max_ep=30, step_size=15)[source]¶ Define a decreasing triangular learning rate schedule. Each isosceles triangle represents a new cycle with length 2 * step_size.
- Parameters
epoch (int) – Current epoch.
max_lr (float) – Maximum learning rate.
max_ep (int) – How many epochs the pattern should repeat before the learning rate stays constant. Defaults to 30.
step_size (int) – Number of epochs until monotony reverses (half of triangle). Defaults to 15.
- Returns
Learning rate
- Return type
float
wingbeats.modelling.hypertuning module¶
Library for hyperparameter optimization functions
-
wingbeats.modelling.hypertuning.build_hyper_embedder(hp, in_shape, out_shape, f_extractor, lr_values, reg_values, emb_matrix, input_name='input_signal', model_name='Hyper_Embedder', strategy=None)[source]¶ Build Embedder for Hyperband-optimization.
- Parameters
hp (kerastuner.hyperband) – Hyperband object.
in_shape (tuple) – Input shape. No need to specify batch dimension.
out_shape (tuple) – Model output shape (here equal to the size of embedded taxonomic level).
f_extractor (tf.Layer) – Feature extractor.
lr_values (list) – Discrete learning rate values.
reg_values (list) – Discrete regularization parameter values.
emb_matrix (array) – Matrix of hierarchical embeddings. Defaults to None.
input_name (str) – Name of the input. Defaults to ‘input_signal’.
model_name (str) – Name of the architecture. Defaults to ‘Hyper_Embedder’.
strategy (Strategy from tf.distribute, optional) – Distribution strategy (CPU, GPU, TPU). Defaults to None (CPU).
- Returns
Hyperband-optimizable Embedder
-
wingbeats.modelling.hypertuning.build_hyper_hiera_classifier(hp, in_shape, out_shape, f_extractor, lr_values, reg_values, taxonomic_levels=['genus', 'species'], parallel=True, input_name='input_signal', model_name='Hyper_Hiera_Classifier', strategy=None)[source]¶ Build Hierarchical Classifier for Hyperband-optimization.
- Parameters
hp (kerastuner.hyperband) – Hyperband object.
in_shape (tuple) – Input shape. No need to specify batch dimension.
out_shape (tuple) – Model output shape (here equal to the size of embedded taxonomic level).
f_extractor (tf.Layer) – Feature extractor.
lr_values (list) – Discrete learning rate values.
reg_values (list) – Discrete regularization parameter values.
taxonomic_levels (list) – Taxonomic levels to include in the loss function. Defaults to [‘genus’, ‘species’].
parallel (bool) – Whether to attach parallel Dense layers for every prediction. Otherwise, they are connected one after another. Defaults to True.
input_name (str) – Name of the input. Defaults to ‘input_signal’.
model_name (str) – Name of the architecture. Defaults to ‘Hyper_Hiera_Classifier’.
strategy (Strategy from tf.distribute, optional) – Distribution strategy (CPU, GPU, TPU). Defaults to None (CPU).
- Returns
Hyperband-optimizable Hierarchical Classifier
-
wingbeats.modelling.hypertuning.build_hyper_hiera_embedder_classifier(hp, in_shape, out_shape, f_extractor, lr_values, reg_values, emb_matrix, taxonomic_levels=['genus', 'species'], parallel=True, input_name='input_signal', model_name='Hyper_Hiera_Embedder_Classifier', strategy=None)[source]¶ Build Hierarchical Embedder-Classifier for Hyperband-optimization.
- Parameters
hp (kerastuner.hyperband) – Hyperband object.
in_shape (tuple) – Input shape. No need to specify batch dimension.
out_shape (tuple) – Model output shape (here equal to the size of embedded taxonomic level).
f_extractor (tf.Layer) – Feature extractor.
lr_values (list) – Discrete learning rate values.
reg_values (list) – Discrete regularization parameter values.
emb_matrix (array) – Matrix of hierarchical embeddings. Defaults to None.
taxonomic_levels (list) – Taxonomic levels to include in the loss function. Defaults to [‘genus’, ‘species’].
parallel (bool) – Whether to attach parallel Dense layers for every prediction. Otherwise, they are connected one after another. Defaults to True.
input_name (str) – Name of the input. Defaults to ‘input_signal’.
model_name (str) – Name of the architecture. Defaults to ‘Hyper_Hiera_Embedder_Classifier’.
strategy (Strategy from tf.distribute, optional) – Distribution strategy (CPU, GPU, TPU). Defaults to None (CPU).
- Returns
Hyperband-optimizable Hierarchical Embedder-Classifier
-
wingbeats.modelling.hypertuning.build_hyper_simple_classifier(hp, in_shape, out_shape, f_extractor, lr_values, reg_values, taxonomic_levels=['species'], input_name='input_signal', model_name='Hyper_Simple_Classifier', strategy=None)[source]¶ Build Simple Classifier for Hyperband-optimization.
- Parameters
hp (kerastuner.hyperband) – Hyperband object.
in_shape (tuple) – Input shape. No need to specify batch dimension.
out_shape (tuple) – Model output shape (here equal to the size of embedded taxonomic level).
f_extractor (tf.Layer) – Feature extractor.
lr_values (list) – Discrete learning rate values.
reg_values (list) – Discrete regularization parameter values.
taxonomic_levels (list) – Taxonomic levels to include in the loss function. Defaults to [‘species’].
input_name (str) – Name of the input. Defaults to ‘input_signal’.
model_name (str) – Name of the architecture. Defaults to ‘Hyper_Simple_Classifier’.
strategy (Strategy from tf.distribute, optional) – Distribution strategy (CPU, GPU, TPU). Defaults to None (CPU).
- Returns
Hyperband-optimizable Simple Classifier
-
wingbeats.modelling.hypertuning.build_hyper_simple_embedder_classifier(hp, in_shape, out_shape, f_extractor, lr_values, reg_values, emb_matrix, taxonomic_levels=['species'], input_name='input_signal', model_name='Hyper_Simple_Embedder_Classifier', strategy=None)[source]¶ Build Simple Embedder Classifier for Hyperband-optimization.
- Parameters
hp (kerastuner.hyperband) – Hyperband object.
in_shape (tuple) – Input shape. No need to specify batch dimension.
out_shape (tuple) – Model output shape (here equal to the size of embedded taxonomic level).
f_extractor (tf.Layer) – Feature extractor.
lr_values (list) – Discrete learning rate values.
reg_values (list) – Discrete regularization parameter values.
emb_matrix (array) – Matrix of hierarchical embeddings. Defaults to None.
input_name (str) – Name of the input. Defaults to ‘input_signal’.
model_name (str) – Name of the architecture. Defaults to ‘Hyper_Simple_Embedder_Classifier’.
strategy (Strategy from tf.distribute, optional) – Distribution strategy (CPU, GPU, TPU). Defaults to None (CPU).
- Returns
Hyperband-optimizable Simple Embedder Classifier
-
wingbeats.modelling.hypertuning.kfold_cv(X, y, models, genus_mapping, emb_matrix=None, n_splits=4, epochs=30, batch_size=64, sm=None, model_callbacks=None, sampling_rate=16000, window=None, nperseg=None, noverlap=None, cutoff=None)[source]¶ Execute KFold Cross-Validation on dataset (X, y) which is to be split into n_splits stratified folds. User should provide n_splits models to be trained on each fold. Each model name should follow the pattern architecture_inputFormat (e.g. HieraCls_spectro). The n_splits-1 folds held for training are also augmented using SMOTE, if sm is not None.
- Parameters
X (list) – Matrix of signals.
y (list) – Label vector.
genus_mapping (list) – List containing genus indexes ef every species.
emb_matrix (array) – Matrix of hierarchical embeddings. Defaults to None. Only needed for Embedder models.
n_splits (int) – Number of folds. Defaults to 4.
epochs (int) – Number of epochs to train models each fold. Defaults to 30.
batch_size (int) – Size of one signal batch in tf.Dataset. Defaults to 64.
sm (imblearn.smote, optional) – SMOTE object to augment data. Defaults to None.
model_callbacks (list) – Callbacks for model training (e.g. Early Stopping, Model Checkpoint, Learning Rate Schedules). Defaults to None.
sampling_rate (int) – Sampling frequency. Defaults to 16000.
window (str (for psd) or function pointer (for spectrograms)) – Window-function to multiply each segment with i.e. ‘hann’ (for psd) or
tf.signal.hann_window(for spectrograms). Defaults to None.nperseg (int) – Length of a segment for applying the Welch-Transform or STFT. Defaults to None.
noverlap (int) – Lenth of overlapping region between segments. Defaults to None.
cutoff (int, optional) – How many PSD frequencies should be kept. Defaults to None.
- Returns
Mean and std. dev. confusion matrices over all folds for genus and species and list of training histories for every fold.
wingbeats.modelling.layers module¶
Library for custom layers
-
class
wingbeats.modelling.layers.CNN1D(*args, **kwargs)[source]¶ Bases:
tensorflow.python.keras.engine.base_layer.LayerClass for a CNN made up of 5 ConvBlock1d’s of increasing filter sizes.
- Parameters
drop_rate (float) – Dropout rate (between 0 and 1).
mcdrop (bool, optional) – Whether to apply Monte-Carlo Dropout. Defaults to False.
-
class
wingbeats.modelling.layers.CNN2D(*args, **kwargs)[source]¶ Bases:
tensorflow.python.keras.engine.base_layer.LayerClass for a CNN made up of 5 ConvBlock2d’s of increasing filter sizes.
- Parameters
drop_rate (float) – Dropout rate (between 0 and 1).
mcdrop (bool, optional) – Whether to apply Monte-Carlo Dropout. Defaults to False.
-
class
wingbeats.modelling.layers.CNN_Efficient(*args, **kwargs)[source]¶ Bases:
tensorflow.python.keras.engine.base_layer.LayerClass for a CNN made up of the feature extractor of an EfficientNet.
- Parameters
in_shape (tuple) – Input shape.
drop_rate (float) – Dropout rate (between 0 and 1).
mcdrop (bool, optional) – Whether to apply Monte-Carlo Dropout. Defaults to False.
-
class
wingbeats.modelling.layers.CNN_Mobile(*args, **kwargs)[source]¶ Bases:
tensorflow.python.keras.engine.base_layer.LayerClass for a CNN made up of the feature extractor of a MobileNet.
- Parameters
in_shape (tuple) – Input shape.
alpha (float) – Network width parameter. If alpha < 1.0, the number of filters proportionally decreases in each layer. For alpha > 1.0, it increases.
drop_rate (float) – Dropout rate (between 0 and 1).
mcdrop (bool, optional) – Whether to apply Monte-Carlo Dropout. Defaults to False.
-
class
wingbeats.modelling.layers.ConvBlock1d(*args, **kwargs)[source]¶ Bases:
tensorflow.python.keras.engine.base_layer.LayerClass for a simple one-dimensional convolutional block. The block consists of
Conv1D(num_filters, kernel_size = 3, strides = 1) + BatchNorm + Relu + MaxPool1D(pool_size = ", strides = 2).- Parameters
num_filters (int) – Number of convolutional filters.
-
class
wingbeats.modelling.layers.ConvBlock2d(*args, **kwargs)[source]¶ Bases:
tensorflow.python.keras.engine.base_layer.LayerClass for a simple two-dimensional convolutional block. The block consists of
Conv2D(num_filters, kernel_size = 3, strides = 1) + BatchNorm + Relu + MaxPool1D(pool_size = ", strides = 2).- Parameters
num_filters (int) – Number of convolutional filters.
-
class
wingbeats.modelling.layers.DenseBlock(*args, **kwargs)[source]¶ Bases:
tensorflow.python.keras.engine.base_layer.LayerClass for a simple Dense block. The block consists of
BatchNorm + Relu + Dense(units).- Parameters
units (int) – Number of nodes in the Dense layer.
reg_param (float) – L2-Regularization factor.
-
class
wingbeats.modelling.layers.Identity(*args, **kwargs)[source]¶ Bases:
tensorflow.python.keras.engine.base_layer.LayerIdentity layer (just returns input unmodified)
wingbeats.modelling.metrics module¶
Library for custom metrics and losses
-
wingbeats.modelling.metrics.embedding_loss(emb_matrix)[source]¶ Compute embedding loss of current batch as 1.0 - embedding_similarity.
- Parameters
emb_matrix (array) – Matrix of hierarchical embeddings.
- Returns
Function that computes the embedding loss w.r.t. true and predicted embeddings.
-
wingbeats.modelling.metrics.embedding_similarity(emb_matrix)[source]¶ Compute embedding similarity of current batch.
- Parameters
emb_matrix (array) – Matrix of hierarchical embeddings.
- Returns
Function that computes similarity between true and predicted embeddings.
-
wingbeats.modelling.metrics.focal_loss(gamma=2.0)[source]¶ Compute focal loss as modified cross entropy loss. The goal is to penalize hard examples harsher.
- Parameters
gamma (float) – Penalty exponent. If gamma is 0.0, the focal loss becomes the standard cross entropy loss. Defaults to 2.0.
- Returns
Focal loss function w.r.t. true and predicted probabilities.
-
wingbeats.modelling.metrics.get_species_from_embeddings(pred_embs, genus_mapping, emb_matrix)[source]¶ Infer predicted species from predicted embeddings.
- Parameters
pred_embs (array) – Predicted embeddings to be compared via nearest neighbor to the true embeddings.
genus_mapping (list) – List that maps the index of the species to the index of the genus.
emb_matrix (array) – Matrix of hierarchical embeddings. Only needed for Emb.
- Returns
Predicted species.
- Return type
list
-
wingbeats.modelling.metrics.predict_gen_spec(model, X, model_name, genus_mapping, emb_matrix)[source]¶ Make genus and species predictions on dataset X according to model architecture.
- Parameters
model (tf.Model) – Pretrained classifier.
X (tf.Dataset) – Matrix of signals.
model_name (str) – Name of the architecture. Currently only allowed: SimpleCls, Emb, SimpleEmbCls, HieraCls, HieraEmbCls.
genus_mapping (list) – List that maps the index of the species to the index of the genus.
emb_matrix (array) – Matrix of hierarchical embeddings. Only needed for Emb.
- Returns
Predicted genus and species
- Return type
tuple