[FEA]: Clean up returned results objects

### Is this a new feature, an improvement, or a change to existing functionality?

New Feature

### How would you describe the priority of this feature request

Currently preventing usage

### Please provide a clear description of problem this feature solves

Currently ingestor.ingest (in batch mode) returns a very large amount of data from Ray workers.

This takes a long time, and includes far more fields than desired.

### Describe the feature, and optionally a solution or implementation and any alternatives

1. source_name - document name (filename)
2. source_location - fully qualified path to ingested file
3. raw_location - fully qualified path for accessing related page image, cropped images, audio/video chunks or frames - Related to #1714 
 
5. element_type - text / image / table / chart / infographic / audio / custom (can be populated by UDFs)
6. sequence_number - for citation references - this is page number for PDFs, but audio/video/text chunk number for other content types. "sequence_number" is less clear than page number for PDFs, but "page_number" is not clear for non pdf content types :)
7. bounding box - [x1, y1, x2, y2] coordinates, if applicable for image/table overlays
8. page dimensions - W/H for bbox normalization
9. content_type - top-level file/content type

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA]: Clean up returned results objects #1768

Is this a new feature, an improvement, or a change to existing functionality?

How would you describe the priority of this feature request

Please provide a clear description of problem this feature solves

Describe the feature, and optionally a solution or implementation and any alternatives

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEA]: Clean up returned results objects #1768

Description

Is this a new feature, an improvement, or a change to existing functionality?

How would you describe the priority of this feature request

Please provide a clear description of problem this feature solves

Describe the feature, and optionally a solution or implementation and any alternatives

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions