At the moment the indexer decide which files to extract content from based on their file name. This assume something about the content in dokumentobjekt.referanseDokumentfil that is not specified in Noark 5, and I have run into extractions where the file names did not include file extentions.
It would be better if values in dokumentobjekt.format were consulted in addition to looking at file suffixes. According to Arkivverket, the values in this field is now standardized as PRONOM codes, so those values should at least be recognized.