Alfresco Connector for Content Intelligence

The Alfresco Connector for Content Intelligence provides knowledge retrieval capabilities by connecting your content repository, Alfresco Content Services (ACS), to Knowledge Discovery. Knowledge Discovery allows you to apply machine learning to your content repository.

Documentation

Document	Description
Documentation Index	Index page for documentation
Compatibility	Supported Alfresco versions and requirements
Component Overview	Status and description of all components
Installation Guide	JAR, Docker, and Kubernetes deployment
Live Ingester Config	Real-time event processing configuration
Bulk Ingester Config	Batch ingestion configuration
Knowledge Discovery JAR Module	Alfresco repository module configuration
Nucleus User Sync	User synchronization (WIP)
Prediction Applier	Prediction application (Deprecated)
ACS Private APIs	Internal API documentation

Development Environment

To run tests in IntelliJ IDEA you should first build application with `mvn clean install -DskipTests -Pdistribution`

To set up a local developer environment then build the jar, the docker image and finally run the docker-compose environment:

mvn clean install -DskipTests -Pdistribution && \
./scripts/ci/buildDockerImages.sh && \
cd distribution/src/main/resources/docker-compose && \
docker compose --project-name dev up

It's also possible to set up a local developer environment adjusted to run Live Ingester outside docker container, to do so please run the following command:

mvn clean install -DskipTests -Pdistribution && \
./scripts/ci/buildDockerImages.sh && \
cd distribution/src/main/resources/docker-compose && \
docker compose --file docker-compose-ingesterless.yml --project-name dev up

In order to run tests for Alfresco event requests against OpenApi specification of Insight Ingestion stored in OpenApiTckRequestValidationTest.java class, we need to clone and build docker images of private Ingestion Connector Technology Compatibility Kit repository from HylandSoftware organisation. For that we can set up PAT token (authorised for HylandSoftware organisation) as environment variable locally as per instruction from the link below: https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry#authenticating-with-a-personal-access-token-classic

Having a token we can clone and build TCK images by running following commands:

./scripts/ci/cloneTCK.sh
cd ingestion-connector-tck
docker compose up -d

### Code Quality
This project uses `spotless` that enforces `alfresco-formatter.xml` to ensure code quality.

To check code-style violations you can use:
```bash
mvn spotless:check

To reformat files you can use:

mvn spotless:apply

Secret Detection

We are using detect-secrets to try to avoid accidentally publishing secret keys. If you have pre-commit installed then this should run automatically when making a commit. Usually there should be no issues, but if it finds a potential issue (e.g. a high entropy string) then you will see the following:

Detect secrets...........................................................Failed
- hook id: detect-secrets
- exit code: 1

ERROR: Potential secrets about to be committed to git repo!

Secret Type: Secret Keyword
Location:    test.txt:1

If this is a false positive and you actually want to commit the string then run these two commands:

detect-secrets scan --baseline .secrets.baseline
detect-secrets audit .secrets.baseline

This will update the baseline file to include your new code and then allow you to review the detected secret and mark it as a false positive. Once you are finished then you can add .secrets.baseline to the staged changes and you should be able to create a commit.

Live Ingester configuration

Retry

In case of an error while trying to call external endpoint the call will be reattempted. Retry specification default values are:

attempts: 10
initial delay: 500 ms
delay multiplier: 2

Above default properties can be overwritten with custom specification, which for particular endpoints may look like, e.g.:

authentication request:

hyland-experience:
  authentication:
    retry:
      attempts: 5
      initial-delay: 1000
      delay-multiplier: 1.5

file download from shared file store:

alfresco:
  transform:
    shared-file-store:
      retry:
        attempts: 5
        initial-delay: 1000
        delay-multiplier: 1.5

storage location request:

hyland-experience:
  storage:
    location:
      retry:
        attempts: 5
        initial-delay: 1000
        delay-multiplier: 1.5

file upload to obtained storage location:

hyland-experience:
  storage:
    upload:
      retry:
        attempts: 5
        initial-delay: 1000
        delay-multiplier: 1.5

ingest request:

hyland-experience:
  ingester:
    retry:
      attempts: 5
      initial-delay: 1000
      delay-multiplier: 1.5

Bulk Ingester configuration

Namespace prefixes

As namespace prefixes are not available in db you have to specify mapping between namespace->prefix in configuration file. By default, prefixes mappings are specified in namespace-prefixes.json file - you can change it via the alfresco.bulk.ingest.namespace-prefixes-mapping property

With use of namespaces-to-namespace-prefixes-file-generator.py you can automatically generate namespace-prefixes.json with all types in your repository

python3 scripts/utils/namespaces-to-namespace-prefixes-file-generator.py --help

User Group Mapping (WIP)

The (nucleus-sync) application is a long-lived Spring Boot app which will periodically load data from a running Alfresco instance via the REST API and publish it to Nucleus. The following information is published:

Alfresco users with users of Nucleus (obtained from IAM) based on user's email.
Alfresco Groups if their member users are mapped.
Alfresco Group memberships for those users and groups which have been mapped.

Name		Name	Last commit message	Last commit date
Latest commit History 2,394 Commits
.github		.github
bulk-ingester		bulk-ingester
common-authentication		common-authentication
common-test		common-test
common		common
distribution		distribution
docs		docs
e2e-test		e2e-test
hxinsight-extension		hxinsight-extension
live-ingester		live-ingester
nucleus-sync		nucleus-sync
prediction-applier		prediction-applier
scripts		scripts
src/license		src/license
.ci.settings.xml		.ci.settings.xml
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.secrets.baseline		.secrets.baseline
LICENSE.txt		LICENSE.txt
README.md		README.md
alfresco-formatter.xml		alfresco-formatter.xml
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml
srcclr.yml		srcclr.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Alfresco Connector for Content Intelligence

Documentation

Development Environment

To run tests in IntelliJ IDEA you should first build application with `mvn clean install -DskipTests -Pdistribution`

Secret Detection

Live Ingester configuration

Retry

Bulk Ingester configuration

Namespace prefixes

User Group Mapping (WIP)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 19

Uh oh!

Languages

License

Alfresco/hxinsight-connector

Folders and files

Latest commit

History

Repository files navigation

Alfresco Connector for Content Intelligence

Documentation

Development Environment

To run tests in IntelliJ IDEA you should first build application with mvn clean install -DskipTests -Pdistribution

Secret Detection

Live Ingester configuration

Retry

Bulk Ingester configuration

Namespace prefixes

User Group Mapping (WIP)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 19

Uh oh!

Languages

To run tests in IntelliJ IDEA you should first build application with `mvn clean install -DskipTests -Pdistribution`

Packages