TransformerEncoderLayer is made up of self-attn and feedforward network.
参数描述d_modelthe number of expected features in the input (required).nheadthe number of heads in the multiheadattention models (required).dim_feedforwardthe dimension of the feedforward network model (default=2048).dropoutthe dropout value (default=0.1).activationthe activation function of intermediate layer, relu or gelu (default=relu). encoder_layer = nn.TransformerEncoderLayer(d_model=512, nhead=8) src = torch.rand(10, 32, 512) out = encoder_layer(src)参考: https://pytorch.org/docs/master/generated/torch.nn.TransformerEncoderLayer.html#torch.nn.TransformerEncoderLayer