Skip to content

chrisbeach/discourse-civility

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Summary

Measure the civility of posts on your Discourse forum using Google Perspective API

See discussion here: https://meta.discourse.org/t/evaluating-googles-perspective-api-on-your-discourse-forum/79971

Notes:

  • Requires Docker to be installed
  • You'll need an API key (see https://www.perspectiveapi.com/)
  • Only users with a post_count of 10+ are considered
  • Posts are truncated to fit the Perspective API's 3,000 character limit

1. Start Database Server, Populated From Discourse Backup

Browse to /admin/backups on your forum and click Backup. Select "Yes (do not include files)." Click Download and you'll be mailed a link to the backup file.

Now run the following command in the root of this project:

./db.sh [ABSOLUTE PATH TO YOUR BACKUP FILE].sql

Wait as your database is imported. The message "PostgreSQL init process complete; ready for start up" will then be displayed. Postgres is ready to go (it's a foreground task, please leave it running for now).

2. Run App

In another terminal, run the following command in the root of this project:

./run.sh [YOUR API KEY]

This will create a table in the running Postgres database, query the Google Perspective API and populate the table with the results. Given the rate limits (10/sec), this may take some time.

3. Analyse Results

Connect to the Postgres database using the SQL client of your choice:

  • Host:port/database: localhost:5432/postgres
  • Username: postgres
  • Password: [no password]

Try one of the sample queries below:

Example query for most toxic posters

SELECT
    '@' || u.username as "Username",
    round(avg(pe.toxicity)::numeric, 3) as "Average Toxicity",
    count(*) as "Number of Posts"
FROM
    posts p
    JOIN users u ON p.user_id = u.id
    JOIN perspective pe ON pe.post_id = p.id
GROUP BY u.username
ORDER BY avg(pe.toxicity) desc
LIMIT 100

Example query for most toxic posts

SELECT
    '@' || u.username as "Username",
    round(pe.toxicity::numeric, 3) as "Toxicity",
    p.raw
FROM
    posts p
    JOIN users u ON p.user_id = u.id
    JOIN perspective pe ON pe.post_id = p.id
ORDER BY pe.toxicity desc
LIMIT 20

Example query for overall site stats

SELECT
  round(avg(e.attack_on_author)::numeric, 3) as "Attack on Author",
  round(avg(e.attack_on_commenter)::numeric, 3) as "Attack on Commenter",
  round(avg(e.incoherent)::numeric, 3) as "Incoherent",
  round(avg(e.inflammatory)::numeric, 3) as "Inflammatory",
  round(avg(e.likely_to_reject)::numeric, 3) as "Likely to Reject",
  round(avg(e.obscene)::numeric, 3) as "Obscene",
  round(avg(e.severe_toxicity)::numeric, 3) as "Severe Toxicity",
  round(avg(e.spam)::numeric, 3) as "Spam",
  round(avg(e.toxicity)::numeric, 3) as "Toxicity",
  round(avg(e.unsubstantial)::numeric, 3) as "Unsubstantial"
FROM perspective e

4. Shutdown

To tidy up when finished (removes Docker container):

./shutdown.sh

Troubleshooting

Rate limit errors

Despite our best efforts to stay within Google's 1000 requests / 100 s rate limits, sending bursty parallel requests sometimes causes occasional rate limit breaches. To resolve, reduce the value of parallelism in src/main/resources/application.conf

Further Reference

https://github.com/conversationai/perspectiveapi/blob/master/api_reference.md#models

https://conversationai.github.io/

Contributing

I'm new to Akka-HTTP, Docker, Doobie and the Perspective API. I'd be grateful for your feedback and/or PRs.

License

Apache 2.0

About

Measure the civility of posts on your Discourse forum using Google Perspective API

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published