Skip to content

[Good first issue | Enhance] Add column description info in LLM prompt #142

@MooooCat

Description

@MooooCat

🚅Search before asking

I have searched for issues similar to this one.

🚅Description

This GFI allows sdgx to obtain the column description from raw_data or raw_data sampled data, and return a string in a text form that LLM can understand. The information in the text should include but not be limited to:

  • Column type: float type, int type, category type, datetime type, etc.;
  • For numeric type columns: maximum value, minimum value, mean, standard deviation, distribution, etc.;
  • For datetime type: start and end dates, datetime type format, etc.;
  • For category types: specific category value, number of category values, etc.;
  • For ID type: ID category, format, etc.;
  • For other types of necessary information, developers are asked to add it based on their own creativity.

🏕Solution

Implements the _form_columns_description method of sdgx.models.LLM.single_table.base.LLMBaseModel.

def _form_columns_description(self):

This method returns a string.

Developers can refer to the implementation ideas of _form_message_with_offtable_features and _form_dataset_description methods.


Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions