Skip to content

Index makes emergency shutdown #19

Open
@go4webch

Description

@go4webch

When running

messenger:consume dynamic_search_queue -vv
dynamic-search:run -v

I get an index with only 1 or 2 documents. It seems to be because of the emergency shutdown the earlier documents get removed. Any help would be appreciated. I manually modified the line 109 in the DefaultResourceNormalizer to make my config work:

// existing line
return new ResourceMeta($documentId, $resourceId, $resourceCollectionType, $resourceType, null);
// my overwritten line
 return new ResourceMeta($documentId, $resourceId, $resourceCollectionType, $resourceType, null, ['path' => parse_url($crawler->getUri(), PHP_URL_PATH)]);


Error log:

 Adding document with id document_7675 to lucene index "my_lucene_storage"
ERROR: Error while executing data provider. Error was: RecursiveDirectoryIterator::__construct(localpath.../var/bundles/DsWebCrawlerBundle/persistence-store/cc20be2bf046e8632c814343ac4755c1): Failed to open directory: No such file or directory [Line: 43, File localpath..../vendor/symfony/finder/Iterator/RecursiveDirectoryIterator.php]. FailOver has been initiated
DEBUG: executing provider emergency shutdown

Configuration:

  AppBundle\DynamicSearch\IndexDefinition\CrawlerDocumentDefinition:
    tags:
      - { name: dynamic_search.document_definition_builder }

dynamic_search:
  context:
    default:
      index_provider:
        service: 'lucene'
        options:
          database_name: 'my_lucene_storage'
      data_provider:
        service: 'web_crawler'
        options:
          always:
            own_host_only: true
            content_max_size: 4
            allowed_schemes: ['http','https']
          single_dispatch:
            host: 'https://www.mydomain.ch/'
          full_dispatch:
            seed: 'https://www.mydomain.ch/'
            valid_links:
              - '@^https://www\.mydomain\.ch/@i'
        normalizer:
          service: 'web_crawler_default_resource_normalizer'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions