Skip to content

Alfresco Connector for Hyland Experience Insight: Sends ACS events to Hx Insight and updates the Repository with the predictions that it generates

License

Notifications You must be signed in to change notification settings

Alfresco/hxinsight-connector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2,394 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Alfresco Connector for Content Intelligence

The Alfresco Connector for Content Intelligence provides knowledge retrieval capabilities by connecting your content repository, Alfresco Content Services (ACS), to Knowledge Discovery. Knowledge Discovery allows you to apply machine learning to your content repository.

Documentation

Document Description
Documentation Index Index page for documentation
Compatibility Supported Alfresco versions and requirements
Component Overview Status and description of all components
Installation Guide JAR, Docker, and Kubernetes deployment
Live Ingester Config Real-time event processing configuration
Bulk Ingester Config Batch ingestion configuration
Knowledge Discovery JAR Module Alfresco repository module configuration
Nucleus User Sync User synchronization (WIP)
Prediction Applier Prediction application (Deprecated)
ACS Private APIs Internal API documentation

Development Environment

To run tests in IntelliJ IDEA you should first build application with mvn clean install -DskipTests -Pdistribution

To set up a local developer environment then build the jar, the docker image and finally run the docker-compose environment:

mvn clean install -DskipTests -Pdistribution && \
./scripts/ci/buildDockerImages.sh && \
cd distribution/src/main/resources/docker-compose && \
docker compose --project-name dev up

It's also possible to set up a local developer environment adjusted to run Live Ingester outside docker container, to do so please run the following command:

mvn clean install -DskipTests -Pdistribution && \
./scripts/ci/buildDockerImages.sh && \
cd distribution/src/main/resources/docker-compose && \
docker compose --file docker-compose-ingesterless.yml --project-name dev up

In order to run tests for Alfresco event requests against OpenApi specification of Insight Ingestion stored in OpenApiTckRequestValidationTest.java class, we need to clone and build docker images of private Ingestion Connector Technology Compatibility Kit repository from HylandSoftware organisation. For that we can set up PAT token (authorised for HylandSoftware organisation) as environment variable locally as per instruction from the link below: https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry#authenticating-with-a-personal-access-token-classic

Having a token we can clone and build TCK images by running following commands:

./scripts/ci/cloneTCK.sh
cd ingestion-connector-tck
docker compose up -d
### Code Quality
This project uses `spotless` that enforces `alfresco-formatter.xml` to ensure code quality.

To check code-style violations you can use:
```bash
mvn spotless:check

To reformat files you can use:

mvn spotless:apply

Secret Detection

We are using detect-secrets to try to avoid accidentally publishing secret keys. If you have pre-commit installed then this should run automatically when making a commit. Usually there should be no issues, but if it finds a potential issue (e.g. a high entropy string) then you will see the following:

Detect secrets...........................................................Failed
- hook id: detect-secrets
- exit code: 1

ERROR: Potential secrets about to be committed to git repo!

Secret Type: Secret Keyword
Location:    test.txt:1

If this is a false positive and you actually want to commit the string then run these two commands:

detect-secrets scan --baseline .secrets.baseline
detect-secrets audit .secrets.baseline

This will update the baseline file to include your new code and then allow you to review the detected secret and mark it as a false positive. Once you are finished then you can add .secrets.baseline to the staged changes and you should be able to create a commit.

Live Ingester configuration

Retry

In case of an error while trying to call external endpoint the call will be reattempted. Retry specification default values are:

  • attempts: 10
  • initial delay: 500 ms
  • delay multiplier: 2

Above default properties can be overwritten with custom specification, which for particular endpoints may look like, e.g.:

  • authentication request:
hyland-experience:
  authentication:
    retry:
      attempts: 5
      initial-delay: 1000
      delay-multiplier: 1.5
  • file download from shared file store:
alfresco:
  transform:
    shared-file-store:
      retry:
        attempts: 5
        initial-delay: 1000
        delay-multiplier: 1.5
  • storage location request:
hyland-experience:
  storage:
    location:
      retry:
        attempts: 5
        initial-delay: 1000
        delay-multiplier: 1.5
  • file upload to obtained storage location:
hyland-experience:
  storage:
    upload:
      retry:
        attempts: 5
        initial-delay: 1000
        delay-multiplier: 1.5
  • ingest request:
hyland-experience:
  ingester:
    retry:
      attempts: 5
      initial-delay: 1000
      delay-multiplier: 1.5

Bulk Ingester configuration

Namespace prefixes

As namespace prefixes are not available in db you have to specify mapping between namespace->prefix in configuration file. By default, prefixes mappings are specified in namespace-prefixes.json file - you can change it via the alfresco.bulk.ingest.namespace-prefixes-mapping property

With use of namespaces-to-namespace-prefixes-file-generator.py you can automatically generate namespace-prefixes.json with all types in your repository

python3 scripts/utils/namespaces-to-namespace-prefixes-file-generator.py --help

User Group Mapping (WIP)

The (nucleus-sync) application is a long-lived Spring Boot app which will periodically load data from a running Alfresco instance via the REST API and publish it to Nucleus. The following information is published:

  • Alfresco users with users of Nucleus (obtained from IAM) based on user's email.
  • Alfresco Groups if their member users are mapped.
  • Alfresco Group memberships for those users and groups which have been mapped.

About

Alfresco Connector for Hyland Experience Insight: Sends ACS events to Hx Insight and updates the Repository with the predictions that it generates

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 19

Languages