Does the code supports for the entire end-to-end fine-tuning including the retriever ? 

The REALM paper highlights that for downstream tasks they kept the retriever frozen.  What about a task like domain-specific open domain question answering?  In that kind of a scenario can we train the entire REALM with this code.

if yes: we might able to compare results with RAG-end2end 

https://github.com/huggingface/transformers/tree/master/examples/research_projects/rag-end2end-retriever