I've also tried something similar, the main problem with these approaches IMO is that you need to use the same feature set for both models. You cannot perform these extensions (i.e. using the init_model parameter of LGBM) and change the feature set, and in a lot of cases the actual feature importance and effect can drastically change between two scenarios.
I've also done this and my concclusion was transfer learning also doesn't work very well with neural networks, especially working on tabluar datasets like these. They only seem to work for images and text perhaps due to similarity.
I've also tried something similar, the main problem with these approaches IMO is that you need to use the same feature set for both models. You cannot perform these extensions (i.e. using the init_model parameter of LGBM) and change the feature set, and in a lot of cases the actual feature importance and effect can drastically change between two scenarios.
I've also done this and my concclusion was transfer learning also doesn't work very well with neural networks, especially working on tabluar datasets like these. They only seem to work for images and text perhaps due to similarity.