NAM#

class stream_topic.NAM.DownstreamModel(trained_topic_model, target_column, dataset=None, task='regression', batch_size=128, lr=0.0005, hidden_units=None, feature_dropout=0.0, hidden_dropout=0.3, activation='relu', out_activation=None)[source]#

PyTorch Lightning module for downstream modeling using a trained topic model.

Parameters:
  • trained_topic_model (AbstractModel) – Trained topic model.

  • target_column (str) – Name of the target column.

  • dataset (AbstractDataset, optional) – Dataset object (default is None).

  • structured_data (pd.DataFrame, optional) – Structured data (default is None).

  • task (str, optional) – Type of task, either ‘regression’ or ‘classification’ (default is ‘regression’).

  • batch_size (int, optional) – Batch size for training (default is 128).

  • lr (float, optional) – Learning rate for optimization (default is 0.0005).

  • hidden_units (List[int], optional) – List of hidden layer sizes for the Neural Additive Model (default is None).

  • feature_dropout (float, optional) – Dropout probability for input features (default is 0.0).

  • hidden_dropout (float, optional) – Dropout probability for hidden layers (default is 0.3).

  • activation (str, optional) – Activation function for hidden layers (default is ‘relu’).

  • out_activation (nn.Module, optional) – Activation function for output layer (default is None).

trained_topic_model#

Trained topic model.

Type:

AbstractModel

task#

Type of task, either ‘regression’ or ‘classification’.

Type:

str

batch_size#

Batch size for training.

Type:

int

lr#

Learning rate for optimization.

Type:

float

loss_fn#

Loss function for the task.

Type:

nn.Module

structured_data#

Structured data used for downstream modeling.

Type:

pd.DataFrame

target_column#

Name of the target column.

Type:

str

combined_data#

Combined DataFrame containing structured data and topic probabilities.

Type:

pd.DataFrame

model#

Neural Additive Model for downstream modeling.

Type:

NeuralAdditiveModel

configure_optimizers()[source]#

Configure optimizer for training.

Returns:

Optimizer.

Return type:

torch.optim.Optimizer

define_nam_model(hidden_units, feature_dropout, hidden_dropout, activation, out_activation)[source]#

Define the Neural Additive Model architecture.

Parameters:
  • hidden_units (List[int]) – List of hidden layer sizes for the Neural Additive Model.

  • feature_dropout (float) – Dropout probability for input features.

  • hidden_dropout (float) – Dropout probability for hidden layers.

  • activation (str) – Activation function for hidden layers.

  • out_activation (nn.Module) – Activation function for output layer.

Returns:

Initialized Neural Additive Model.

Return type:

NeuralAdditiveModel

forward(x)[source]#

Forward pass of the model.

Parameters:

x (torch.Tensor) – Input tensor.

Returns:

Output tensor.

Return type:

torch.Tensor

get_feature_names()[source]#

Get names of input features.

Returns:

List of feature names.

Return type:

List[str]

plot_feature_nns()[source]#

Plot the learned functions for each feature-specific neural network.

prepare_combined_data()[source]#

Prepare combined DataFrame containing structured data and topic probabilities.

Returns:

Combined DataFrame.

Return type:

pd.DataFrame

preprocess_structured_data(data)[source]#

Preprocess structured data.

Parameters:

data (pd.DataFrame) – Structured data.

Returns:

Preprocessed structured data.

Return type:

pd.DataFrame

setup(stage=None)[source]#

Setup datasets for training and validation.

train_dataloader()[source]#

DataLoader for training dataset.

Returns:

Training DataLoader.

Return type:

DataLoader

val_dataloader()[source]#

DataLoader for validation dataset.

Returns:

Validation DataLoader.

Return type:

DataLoader