I'm trying to implement a CNN from scratch and I'd like to understand how m=
any 2d matrices are being produced during a single stage of the convolution=
If image components (RGB) are split into three 2d matrices (W x H x Compone=
nt), and a convolution operation is applied to each of those (over a 3 x 3 =
2d kernel, for example),
does that mean that for each kernel I would get three 2d output matrices?
If the above is true, and each convolution layer can have multiple kernels =
(let's say 8 kernels per layer), does that mean that the first layer would =
produce 8 * 3 =3D 24 matrices, and each subsequent layer would produce 8 ti=
mes more matrices (2nd layer produces 8 * 24 =3D 192 matrices, 3rd produces=
8 * 192 =3D 1536, etc.) ?
I would appreciate it if someone could clarify this for me.
Thanks a lot,