Hi everyone,

I implemented a sort-of Deep Markov Model (as in your tutorial) and used the `MaskedDistribution`

to cope with sequences of different lengths in the batch. This is the code snippet:

```
for t in range(1, T_max + 1):
k = pyro.sample(
"obs_x_%d" % t,
dist.OneHotCategorical(probs[:, t - 1, :]).mask(mini_batch_mask[:, t - 1 : t]),
obs=one_hot(target[:, t - 1], self.embedding.num_embeddings),
)
```

Let me suppose that `T_max`

is the maximum sequence length in the batch (e.g. 41), `probs`

is a three-dimensional tensor of shape [batch_size, max_seq_length, cat_probs] = [16, 41, 40], `mini_batch_mask`

is a two-dimensional boolean vector of shape [batch_size, max_seq_length] = [16, 41], and finally `target`

is a two-dimensional tensor of shape [16, 40].

From the code above, I would have expected that the shape of `dist.OneHotCategorical(probs[:, t - 1, :]).mask(mini_batch_mask[:, t - 1 : t])`

would have been [16, 40], but it returns [16, 16, 40] instead.

Am I wrong? My goal is to avoid, step by step, that the padding symbol (0) “pollutes” (to use the same words as in your tutorial) the model computation.

Thank you in advance