Skip to content

Conversation

@achintha-weerasinghe
Copy link

@achintha-weerasinghe achintha-weerasinghe commented Apr 17, 2025

Hi, here I have a reproduction.
#22

  1. Prepare the database using schema.sql
  2. Try the following GraphQL query
query MyQuery {
  shopsByIds(ids: ["1", "2"]) {
    id
    image {
      original
    }
    options {
      pickup
    }
    owner {
      id
    }
    info {
      petInfo
      customersInfo
    }
  }
}

If you explain the generated SQL query, you will see that there are two calls to the same function and each looped 3 times. We have total of 4 rows in shop table, and only 3 of them are visible = true. The function was executed for all the filtered visible shops. Instead it should have been executed only 2 times, because in the query we are only requesting for 2 items. In our actual environments, we have thousands of visible rows returning, hence the queries were very slow.

  • If you remove any of the fields excluding id from GQL query
  • If the v_metadata view did not have the lateral join

Looping issue won't occur.

@achintha-weerasinghe achintha-weerasinghe changed the title SQL function duplications SQL function duplications and unnecessary calls Apr 17, 2025
@achintha-weerasinghe achintha-weerasinghe changed the title SQL function duplications and unnecessary calls SQL function duplication and unnecessary calls Apr 17, 2025
@benjie
Copy link
Owner

benjie commented Apr 18, 2025

Okay so your issue seems to be that when you explicitly tell the system to execute the function for each of the fields getInfo and customersInfo, it does so. Previously, it deduplicated these which meant that it covered up this inefficiency in the way that you had specified these plans. It may or may not be possible to restore this deduplication, but I don't really understand why you've done it this way, rather than

      Shop: {
        info($parent) {
          return $parent;
        },
      },
      ShopInfo: {
        petInfo($parent) {
          const $fnData = get_shop_info.execute([{ step: $parent.get('id'), pgCodec: TYPES.int, name: 'id_param' }], 'normal')

          return $fnData;
        },
        customersInfo($parent) {
          const $fnData = get_shop_info.execute([{ step: $parent.get('id'), pgCodec: TYPES.int, name: 'id_param' }], 'normal')

          return $fnData;
        }
      },

I would expect:

      Shop: {
        info($parent) {
          const $fnData = get_shop_info.execute([{ step: $parent.get('id'), pgCodec: TYPES.int, name: 'id_param' }], 'normal');
          return $fnData
        },
      },
      ShopInfo: {
        petInfo($fnData) {
          return $fnData;
        },
        customersInfo($fnData) {
          return $fnData;
        }
      },

This isn't really a bug in PostGraphile (though it is a potential optimisation that we could (re-)add), it's an issue with the way that you have built your query plans. Had you done this in GraphQL.js resolvers, that also would have resulted in the function being called multiple times.

@benjie
Copy link
Owner

benjie commented Apr 18, 2025

The other issue that you describe:

If you explain the generated SQL query, you will see that there are two calls to the same function and each looped 3 times. We have total of 4 rows in shop table, and only 3 of them are visible = true. The function was executed for all the filtered visible shops. Instead it should have been executed only 2 times, because in the query we are only requesting for 2 items.

This is an issue with your PostgreSQL query planning, essentially Postgres is doing the join before it filters on visible. This is likely because it's not able to filter by visible using indexes and it's not expecting this to be a Big Deal, but of course for you it is a Big Deal hence the issues. Fortunately in V5 you can work around this by forcing this function to not inline:

      ShopInfo: {
        petInfo($parent) {
          const $fnData = get_shop_info.execute([{ step: $parent.get('id'), pgCodec: TYPES.text, name: 'id_param' }], 'normal')
+         forbidInlining($fnData);

          return $fnData;
        },
        customersInfo($parent) {
          const $fnData = get_shop_info.execute([{ step: $parent.get('id'), pgCodec: TYPES.text, name: 'id_param' }], 'normal')
+         forbidInlining($fnData);

          return $fnData;
        }
      },

with this function:

import {
  PgClassExpressionStep,
  PgSelectSingleStep,
  PgSelectStep,
} from "postgraphile/@dataplan/pg";

/** @param $step { import('postgraphile/grafast').Step } */
function forbidInlining($step) {
  if ($step instanceof PgClassExpressionStep) {
    $step = $step.getParentStep();
  }
  if ($step instanceof PgSelectSingleStep) {
    $step = $step.getClassStep();
  }
  if ($step instanceof PgSelectStep) {
    $step.setInliningForbidden();
  }
}

(Note: I also changed TYPES.int to TYPES.text since TYPES.int was wrong and would result in Postgres failing to find the relevant function to execute. It only works when inlining because the type specifier is not used in that case.)

@benjie
Copy link
Owner

benjie commented Apr 18, 2025

Another option I just remembered is you can use explicit step caching via operationPlan().cacheStep which I don't think is documented yet, but is a handy workaround:

export const myExtensions = makeExtendSchemaPlugin((build) => {
  const {
    input: {
      pgRegistry: {
        pgResources: { shop, get_shop_info },
      },
    },
    grafast: { each, constant, operationPlan },
  } = build;

  function getShopInfo($parent) {
    return operationPlan().cacheStep($parent, "getShopInfo", null, () => {
      const $id = $parent.get("id");
      return get_shop_info.execute([
        { step: $id, pgCodec: TYPES.text, name: "id_param" },
      ]);
    });
  }

  return {
    typeDefs: gql`
      extend type Query {
        shopsByIds(ids: [String!]!): [Shop]!
      }

      extend type Shop {
        info: ShopInfo
      }

      type ShopInfo {
        petInfo: JSON
        customersInfo: JSON
      }
    `,
    plans: {
      Query: {
        shopsByIds(_, { $ids }) {
          return each($ids, ($id) =>
            shop.find({ id: $id, visible: constant(true) }).single()
          );
        },
      },
      Shop: {
        info($parent) {
          return $parent;
        },
      },
      ShopInfo: {
        petInfo($parent) {
          return getShopInfo($parent);
        },
        customersInfo($parent) {
          return getShopInfo($parent);
        },
      },
    },
  };
});

@benjie
Copy link
Owner

benjie commented Apr 18, 2025

The root cause of this is that the related PgFromExpressionSteps for get_shop_info cannot deduplicate because their placeholder symbols don't match. They don't match because making them match would require looking up what they relate to in their children which is not allowed, because essentially I've created a pseudo-cycle between them:

flowchart TD
  PgFromExpression["PgFromExpression (unary)"] --> PgSelect
  PgSelect -..-> PgFromExpression
Loading

This is not ideal and will need reworking likely through the removal of PgFromExpression entirely, or by redoing how arguments integrate with PgFromExpression. But since this is an optimization task I'm not going to let it hold up the release of V5 any further, so for now you'll need to use one of the three approaches I outlined above that can solve this.

@benjie benjie moved this to 🌳 Triage in V5.X Apr 18, 2025
@benjie benjie added this to V5.X Apr 18, 2025
@achintha-weerasinghe
Copy link
Author

Actually the issue I think is the two lateral joins which makes the postgres planner confused for some reason. The one generated by postgraphile at top-level and the other from our v_metadata view. By changing postgraphile's to use WHERE IN or changing our's to a CTE, postgres was able to optimise and run the function only for the returned rows. So I changed our view to use CTE. And also, for the duplicated functions issue, I used the approach you suggested and it worked. Thank you! ❤️

@benjie
Copy link
Owner

benjie commented Jun 23, 2025

I've also created a plugin to enable you to @forbidInlining of a view/table:

graphile/crystal#2590

@achintha-weerasinghe
Copy link
Author

Our code has evolved a bit and the view is now a materialized view. Anyway view was not a problem anymore at least for now. But after upgrading to beta-45, this issue re-occurred (SQL function duplication) taking too long query execution times and loading the db. But I was able to resolve it using forbidInlining helper on function call steps as you have suggested above. Caching approach didn't work. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants