OF-2954: New feature: Spam Reporting #2661

guusdk · 2025-01-14T22:17:07Z

This commit provides a basic implementation of XEP-0377: Spam Reporting

The changes include:

persistent storage of spam reports
a provider implementation to allow other spam reporting persistence providers
an event listening mechanism
an optional notification of admins

guusdk · 2025-01-14T22:19:32Z

This isn't quite ready yet, but seems to be functional.

Pending work:

~~database install script modification for all databases except HSQLDB~~
~~database update scripts for all databases~~
~~make the provider implementation non-static (it hard-codes to the 'default' provider now)~~
forward to originating server (note: warn about end-user consent to do this)
forward to xmppbl (note: warn about end-user consent to do this)
add support for including the original, offending stanza (eg: urn:xmpp:forward:0)

stokito · 2025-01-15T17:08:56Z

distribution/src/database/openfire_mysql.sql

+  reporter            VARCHAR(1024) NOT NULL,
+  reported            VARCHAR(1024) NOT NULL,
+  reason              VARCHAR(255)  NOT NULL,
+  created             BIGINT        NOT NULL,


we may use timestamp

That's a good idea, but not for now: We've had trouble before finding a consistent way to represent a timestamp in all of the databases that we support, which is why we use a number instead. I'm not sure if this is still as impossible as it was in 2004, by the way, but I'd like Openfire to be consistent.

If we do change number for timestamp (which would be a good thing), we should do it for all columns that currently use a number.

Also see the remark at the top of https://download.igniterealtime.org/openfire/docs/latest/documentation/database-guide.html which describes this.

good. If the migration is anyway planed and you know how it will be implemented, then for the new code it may better to start using it. Just my 2c.

stokito · 2025-01-15T17:10:24Z

distribution/src/database/openfire_db2.sql

+  reporter            VARCHAR(1024) NOT NULL,
+  reported            VARCHAR(1024) NOT NULL,
+  reason              VARCHAR(255)  NOT NULL,
+  created             BIGINT        NOT NULL,


the db2 has the TIMESTAMP type. In other places of the file used char for date, which it very strange

(as above: we currently never use a TIMESTAMP field, which is certainly something I'd like to improve on. If we do improve on that, we should do it consistently, everywhere).

stokito · 2025-01-15T17:14:36Z

distribution/src/database/openfire_sybase.sql

+  reported            NVARCHAR(1024) NOT NULL,
+  reason              NVARCHAR(255)  NOT NULL,
+  created             INTEGER        NOT NULL,
+  "raw"               LONG VARCHAR   NOT NULL


maybe rename the raw field?
It contains the "XML representation of the report" but maybe we just need only the reported message so that we can easily parse the spam and train spamd.

That column is not great, indeed. I'm in the middle of a refactoring of the database structure. I think this column will be dropped (and instead we'll have a table that can be JOINed with, to have zero-to-many reported stanzas. More on this later!

xmppserver/src/main/java/org/jivesoftware/openfire/spamreporting/SpamReport.java

stokito · 2025-01-15T17:24:31Z

xmppserver/src/main/java/org/jivesoftware/openfire/handler/IQBlockingHandler.java

                }

                final List<JID> toBlocks = new ArrayList<>();
+                final Set<SpamReport> reports = new HashSet<>();


It looks like the SpamReport doesn't have equals() so the Set won't check for duplicates. Maybe we can use a List here?

In general the HashSet eats a lot of memory and if it's possible to avoid it and use the ArrayList then it will reduce pressure on GC.

Meh. Resizing ArrayLists when they are populated can be costly too. All such arguments border on premature optimization, which I'd like to avoid.

Instead, I prefer to look at the semantics:

usage of Set, unlike List, suggests that the entities are all different.

usage of List, unlike Set, suggests that entities are ordered.

In that sense, Set is a better fit for this use-case than List.

Yes, you a right and even more: this is a typical usage in all Java projects on which I worked, so this is kind of "industrial standard".
Java, as a "common denominator language", is both bad at business logic semantics and on the Mechanical Sympathy.

The HashSet is very memory expensive, so that it can be even used for a DDoS. When it's used inside of a method like here this is more or less fine because it will be fried. But when the (Hash)Set is used in a field (e.g. SpamReport's Set<Text> context) then the eaten memory remains used. So I used a few times a HashSet to deduplicate and then conversion to an ArrayList for a long storage.
The Android has the ArraySet but the OpenJDK still doesn't offer it out of the box.

Please note: the SpamReport doesn't have the equals() methods so the Set here is useless now.

stokito · 2025-01-15T17:44:16Z

xmppserver/src/main/java/org/jivesoftware/openfire/spamreporting/SpamReport.java

+    public synchronized Set<Text> getContext()
+    {
+        if (context == null) {
+            context = new HashSet<>(Text.allFromChildren(reportElement));


here we have an allocation and copy, two heavy operations

stokito · 2025-01-15T17:45:49Z

xmppserver/src/main/java/org/jivesoftware/openfire/spamreporting/SpamReport.java

+
+        public static List<Text> allFromChildren(final Element parentElement)
+        {
+            final List<Text> result = new ArrayList<>();


we can set an initial capacity to texts.size()

stokito · 2025-01-15T17:49:06Z

xmppserver/src/main/java/org/jivesoftware/openfire/spamreporting/SpamReportManager.java

+        listeners.forEach(listener -> {
+            try {
+                listener.receivedSpamReport(report);
+            } catch (Throwable t) {


maybe catch an Exception and not the Throwable

I'm interested in learning the rationale for that. I've always been going back and forth between the two without a clear reason. I sometimes wish to catch Error instances, I think - but maybe that's not always appropriate.

In 99% the Error would be the OutOfMemoryError. Generally this considered like a panic situation while an Exception is something that is recoverable.
A common practice is to catch Exception while leaving the Error only in critical situations https://stackoverflow.com/questions/2274102/difference-between-using-throwable-and-exception-in-a-try-catch

stokito · 2025-01-15T17:50:34Z

xmppserver/src/main/java/org/jivesoftware/openfire/spamreporting/SpamReportAdminNotifier.java

+            notification.setBody(body);
+            notification.getElement().add(reportElement);
+
+            XMPPServer.getInstance().getAdmins().forEach(jid -> {


(minor, style) you may use a for each loop that is easier to understand and faster to compile and analyze for linters

Style is in the eye of the beholder. :) The linter in Intellij is perfectly able to reason about this.

stokito · 2025-01-15T17:53:07Z

xmppserver/src/main/java/org/jivesoftware/openfire/stanzaid/StanzaIDUtil.java

-        return null;
+        final List<StanzaID> sids = StanzaID.allFromChildren(packet.getElement());
+        return sids.stream()
+            .filter(stanzaID -> stanzaID.getBy().toString().equals(by))


the filter will be executed for all elements so this will work slower than previously.

guusdk · 2025-01-15T19:37:13Z

Thanks for your feedback, @stokito! I've commented on most of them. Most of the others for me fall in similar categories of what I wrote elsewhere. This PR is still very much a work in progress, so do expect changes!

guusdk · 2025-01-15T21:35:26Z

I have now applied a major refactoring.

guusdk · 2025-01-15T21:40:24Z

Another thing that I can't get working yet is for Conversations (the client that I'm using to report spam) to include a stanza-ID that identifies a spam message.

This should be done by long-pressing a message from a stranger, as implemented here: https://codeberg.org/iNPUTmice/Conversations/src/branch/master/src/main/java/eu/siacs/conversations/ui/ConversationFragment.java#L1323-L1328

That fails for me. My theory is that Openfire doesn't include a stanza-ID to the messages. Does this need (changing in) the monitoring plugin?

akrherz · 2025-03-21T15:04:22Z

@guusdk I have assigned this to a 5.1.0 milestone as I suspect you are not intending it to be ready for 5.0.0 ?

This commit provides a basic implementation of XEP-0377: Spam Reporting The changes include: - persistent storage of spam reports - a provider implementation to allow other spam reporting persistence providers - an event listening mechanism - an optional notification of admins

The XML element representation that's in a SpamReport should be a copy, and detached. This allows it to be used elsewhere, without one consumer's modifications affected another's.

When admins are notified of a new spam report, the report itself should be included. Supporting clients can then render it.

The data structure defined in XEP-0377 is only usable in context of IQ Blocking. It's not really usable for more generic incident reporting. The refactoring moves the database structure towards a more re-usable structure: - it no longer contains the XEP-0377 element as raw XML - it references 0 to many stanzas in a different table - the stanzas are minimally represented by a StanzaID, but may include the full XMPP stanza.

This adds a simple page to the admin console on which spam reports can be viewed. The page is heavily based on the audit log viewer.

The new version of the XEP introduces two flags that can be used to define if a spam report can be shared with others. The admin console now has an additional details page for spam reports.

guusdk force-pushed the OF-2954_Spam-Reporting branch from abbd110 to d4f9b04 Compare January 14, 2025 22:18

guusdk force-pushed the OF-2954_Spam-Reporting branch 3 times, most recently from b6e74e7 to 83fb015 Compare January 15, 2025 10:22

stokito approved these changes Jan 15, 2025

View reviewed changes

akrherz added this to the 5.1.0 milestone Mar 21, 2025

guusdk added 7 commits April 16, 2025 12:20

OF-2954: Make SpamReportProvider configurable

11c37c7

OF-2954: Database install and update scripts

64cf3c8

OF-2954: SpamReport should retain copy of detached element

f1d280e

The XML element representation that's in a SpamReport should be a copy, and detached. This allows it to be used elsewhere, without one consumer's modifications affected another's.

OF-2954: Include spam report in admin notification

a7b2a0d

When admins are notified of a new spam report, the report itself should be included. Supporting clients can then render it.

OF-2954: Add Spam Reporting admin console panel

4e38ff3

This adds a simple page to the admin console on which spam reports can be viewed. The page is heavily based on the audit log viewer.

guusdk force-pushed the OF-2954_Spam-Reporting branch from 8ec5fa3 to 4e38ff3 Compare April 16, 2025 10:33

OF-2954: Add support for version 0.4.0 of XEP-0377: Spam Reporting

543b1e7

The new version of the XEP introduces two flags that can be used to define if a spam report can be shared with others. The admin console now has an additional details page for spam reports.

guusdk force-pushed the OF-2954_Spam-Reporting branch from 39b5d0a to 543b1e7 Compare April 16, 2025 16:16

Uh oh!

OF-2954: New feature: Spam Reporting #2661

Are you sure you want to change the base?

OF-2954: New feature: Spam Reporting #2661

Uh oh!

Conversation

guusdk commented Jan 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guusdk commented Jan 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guusdk commented Jan 15, 2025

Uh oh!

guusdk commented Jan 15, 2025

Uh oh!

guusdk commented Jan 15, 2025

Uh oh!

akrherz commented Mar 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

guusdk commented Jan 14, 2025 •

edited

Loading

guusdk commented Jan 14, 2025 •

edited

Loading