Skip to content

Add support for Text fragment feature (#1545) #1600

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

thiru-appitap
Copy link

@thiru-appitap thiru-appitap commented Dec 25, 2024

Text Fragment feature implementation pull request. The feature follows the published URL Fragment Text Directives specification (https://wicg.github.io/scroll-to-text-fragment/).

lychee -vv --include-text-fragments  https://developer.mozilla.org/en-US/docs/Web/URI/Fragment/Text_fragments
[DEBUG] tdirective: "text=From%20the%20foregoing%20remarks%20we%20may%20gather%20an%20idea%20of%20the%20importance
[DEBUG] status: Completed
[DEBUG] result: "From the foregoing remarks we may gather an idea of the importance"
[200] https://mdn.github.io/css-examples/target-text/index.html#:~:text=From%20the%20foregoing%20remarks%20we%20may%20gather%20an%20idea%20of%20the%20importance
[DEBUG] tdirective: "text=linked%20URL,-'s%20format"
[DEBUG] status: Completed
[DEBUG] result: "linked URL"
[DEBUG] tdirective: "text=Deprecated-,attributes,attribute"
[DEBUG] status: Completed
[DEBUG] result: "attributes     charset Deprecated   Hinted at the character encoding of the linked URL.   Note:This attribute"
[200] https://developer.mozilla.org/en-US/docs/Web/HTML/Element/a#:~:text=linked%20URL,-'s%20format&text=Deprecated-,attributes,attribute
[DEBUG] tdirective: "text=downgrade:-,The%20Referer,be%20sent,-to%20origins"
[DEBUG] status: Completed
[DEBUG] result: "The Referer header will not be sent"
[200] https://developer.mozilla.org/en-US/docs/Web/HTML/Element/a#:~:text=downgrade:-,The%20Referer,be%20sent,-to%20origins
[DEBUG] tdirective: "text=linked%20URL,defining%20a%20value"
[DEBUG] status: Completed
[DEBUG] result: "linked URL as a download. Can be used with or without a filename value:    Without a value, the browser will suggest a filename/extension, generated from various sources:   The Content-Disposition HTTP header  The final segment in the URL path  The media type (from the Content-Type header, the start of a data: URL, or Blob.type for a blob: URL)     filename: defining a value"
[200] https://developer.mozilla.org/en-US/docs/Web/HTML/Element/a#:~:text=linked%20URL,defining%20a%20value

If the fragment directive is not found, a TextDirectiveNotFound error will be returned.

Below changes are completed:

  1. Fragment Directive parser uses fancy-regex
  • this package was added as a dependency
  1. a new flag, include-text-fragments is added to support the feature
  • this is a deviation from the original feature request (which asked for using the text-fragments flag itself)
  1. Fragment (Text) Directive feature is tested on LTR sites only
  2. new UrlExt trait is implemented to enhance Url's to support Fragment Directive
  3. Support for multiple text fragment directives (for example, #:~:text=linked%20URL,-'s%20format&text=Deprecated-,attributes,attribute)
  4. tests are added for validating the feature
  5. cargo clippy & cargo tests were executed

@thiru-appitap
Copy link
Author

I missed to run the clippy across the test modules - the related lint failure issues are now fixed and ready for review!

);
match url {
Ok(url) => {
eprintln!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be an assertion

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified the tests to have asserts (instead of the manual checks that was earlier)

Comment on lines 108 to 125
let mut status = Status::new(&response, self.accepted.clone());
if self.validate_text_fragments && has_fragment_directive {
if let Ok(res) = response.text().await {
info!("checking fragment directive...");
if let Some(fd) = req_url.fragment_directive() {
info!("directive: {:?}", fd.text_directives);
match fd.check(&res) {
Ok(stat) => {
status = stat;
}
Err(e) => {
return e.into();
}
}
}
}
}
status
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can move that into a function/method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some tests for that part would also be nice

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code is modified to move this into a separate function - as well, bulk of the logic is abstracted at the textfrag crate itself so that the websitechecker will deal with only error responses from the text fragment checker function.

assert!(res.status().is_success());

// start with suffix
println!("\ntesting start with suffix...");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'll probably remove the println!s right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - the println's are removed now


use crate::types::TextDirective;

const BLOCK_ELEMENTS: &[&str] = &[
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a long list. Does that mean we'd have to maintain the HTML keywords here? Maybe we can avoid that as it would be an uphill battle.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am yet to explore alternative approach to this and am open for suggestion - for now, this list is retained (reserving this change for a near future commit)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that this functionality is isolated in its own module. But it's a looot of code. 😅 Not sure what to do here, but at least the ratio of code/tests could be improved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could move it into a separate crate or use an upstream crate for that? I think it would be a nice library to maintain individually as more applications could profit from it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

text fragment is now moved into its own crate, textfrag, in the lychee main tree itself - as I am new to the ecosystem, I'll need help in moving this into a separate crate - please suggest!

let mut all_directives_found = false;
let directive = td.directive.borrow();

'directive_loop: while !all_directives_found {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The labels make it quite hard to read. Have you considered any alternatives?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - the code logic was cumbersome and I've rewritten this now - please share your comments!

pub(crate) const FRAGMENT_DIRECTIVE_DELIMITER: &str = ":~:";
pub(crate) const TEXT_DIRECTIVE_DELIMITER: &str = "text=";

pub(crate) const TEXT_DIRECTIVE_REGEX: &str = r"(?s)^text=(?:\s*(?P<prefix>[^,&-]*)-\s*[,$]?\s*)?(?:\s*(?P<start>[^-&,]*)\s*)(?:\s*,\s*(?P<end>[^,&-]*)\s*)?(?:\s*,\s*-(?P<suffix>[^,&-]*)\s*)?$";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does that regex come from? Did you write it yourself? If there's an "official" regex for those text fragments, we could perhaps add a link to the reference.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this regex was built by myself - I started searching for any equivalent but couldn't land on one when I started this implementation - I am open to revisit this.

In fact, i want to replace regex with a simple parser for this requirement - the specification is not particular about the order of the directives, whereas the regex assumes the order imperatively - with a parser, we might be able to get away from the ordering constraints.

@@ -23,10 +26,61 @@ pub(crate) fn find_links(input: &str) -> impl Iterator<Item = linkify::Link> {
LINK_FINDER.links(input)
}

/// Fragment Directive feature trait
/// we will use the extension trait pattern to extend the Url to support Text Fragment feature
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you - this has now moved into textfrag crate

@@ -23,10 +26,61 @@ pub(crate) fn find_links(input: &str) -> impl Iterator<Item = linkify::Link> {
LINK_FINDER.links(input)
}

/// Fragment Directive feature trait
/// we will use the extension trait pattern to extend the Url to support Text Fragment feature
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// we will use the extension trait pattern to extend the Url to support Text Fragment feature
/// We will use the extension trait pattern to extend [`url::Url`] to support the text fragment feature

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorporated this comment in to the textfrag::utils::url file

/// Fragment Directive feature trait
/// we will use the extension trait pattern to extend the Url to support Text Fragment feature
pub(crate) trait UrlExt {
/// Checks if the url has a fragment and if the fragment is has the fragment directive delimiter embedded
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Checks if the url has a fragment and if the fragment is has the fragment directive delimiter embedded
/// Checks if the url has a fragment and if the fragment has the fragment directive delimiter embedded

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comments section (with slight rewording) in the textfrag crate

}

impl UrlExt for Url {
/// Returns whether the URL has fragment directive or not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Returns whether the URL has fragment directive or not
/// Checks whether the URL has fragment directive or not

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have addressed this is in the textfrag::utils::url[:16]

Copy link
Member

@mre mre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. I like the overall structure. Good work so far!

@thiru-appitap
Copy link
Author

Left some comments. I like the overall structure. Good work so far!

@mre
I'll take a look at each of the comments and work to address it - thank you!

@mre
Copy link
Member

mre commented Feb 5, 2025

@thiru, any updates? Let me know in case you need any help. 😃

@thiru-appitap
Copy link
Author

@thiru, any updates? Let me know in case you need any help. 😃

@mre, I will submit the fixes by this weekend for your review. Earlier I started moving the feature into a separate crate, while addressing your review comments, but got pulled into few other tasks and so couldn't get back earlier :-(.

@mre
Copy link
Member

mre commented Feb 6, 2025

Thanks, sounds good!

@thiru-appitap
Copy link
Author

Thanks, sounds good!

@mre my apologies for the delay - while refactoring, into a separate crate, I encountered few corner case issues and it took more time than planned - running the tests now and hoping to re-raise the pull-request by tomorrow!

@thiru-appitap
Copy link
Author

I am committing the changes into my fork and will initiate a pull-request shortly!

lib's website checker continues to have the logic to validate text fragments
clean-up of the tests were done
review feedback incorporated - addressed structural, logic, tests and document comments
@thiru-appitap
Copy link
Author

@mre request your help in addressing the CI / publish-check failure - please recommend if I've to make any changes on my end - thank you!

@@ -85,10 +94,51 @@ impl WebsiteChecker {
status
}

fn check_text_fragments(site_data: &str, url: &Url, mut status: Status) -> Status {
let res = check_text_fragments(site_data, url);
if res.is_err() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the if condition here. You're matching on res.err() below, and in the case where res.err() is not set, it will be None. This case is already covered in your _res match arm since _res is just a placeholder, which also includes the value being None.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - I've updated the code

///
/// # Errors
/// - `TextDirectiveNotFound`, if text directive match fails
// fn check_fragment_directive(&self, buf: &str) -> Result<TextFragmentStatus, TextFragmentError> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// fn check_fragment_directive(&self, buf: &str) -> Result<TextFragmentStatus, TextFragmentError> {

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean-up done!

buf: &str,
) -> Result<FragmentDirectiveStatus, FragmentDirectiveError> {
let mut map = HashMap::new();
let fd_checker = FragmentDirectiveTokenizer::new(self.text_directives()); // self.text_directives().clone());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let fd_checker = FragmentDirectiveTokenizer::new(self.text_directives()); // self.text_directives().clone());
let fd_checker = FragmentDirectiveTokenizer::new(self.text_directives());

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean-up done!

for td in &tok.sink.get_text_directives() {
let directive = td.raw_directive().to_string();
log::debug!("text directive: {:?}", directive);
println!("text directive: {:?}", directive);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
println!("text directive: {:?}", directive);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed unnecessary println!'s


let _status = status.to_string();
log::debug!("search status: {:?}", status);
println!("search status: {:?}", status);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
println!("search status: {:?}", status);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed unnecessary println!'s


let res_str = td.get_result_str();
log::debug!("search result: {:?}", res_str);
println!("search result: {:?}", res_str);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
println!("search result: {:?}", res_str);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed unnecessary println!'s

Comment on lines 199 to 200
// assert_eq!(results.len(), 1);
// assert_eq!(results[FRAGMENT], Ok(TextDirectiveStatus::Completed));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// assert_eq!(results.len(), 1);
// assert_eq!(results[FRAGMENT], Ok(TextDirectiveStatus::Completed));

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean-up done!

Comment on lines 223 to 224
// assert_eq!(res.len(), 1);
// assert_eq!(res[FRAGMENT], TextDirectiveStatus::Completed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// assert_eq!(res.len(), 1);
// assert_eq!(res[FRAGMENT], TextDirectiveStatus::Completed);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean-up done!

Comment on lines 247 to 248
// assert_eq!(results.len(), 1);
// assert_eq!(results[FRAGMENT], TextDirectiveStatus::Completed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// assert_eq!(results.len(), 1);
// assert_eq!(results[FRAGMENT], TextDirectiveStatus::Completed);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean-up done!

Comment on lines 271 to 272
// assert_eq!(results.len(), 1);
// assert_eq!(results[FRAGMENT], TextDirectiveStatus::Completed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// assert_eq!(results.len(), 1);
// assert_eq!(results[FRAGMENT], TextDirectiveStatus::Completed);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean-up done!

Comment on lines 295 to 296
// assert_eq!(results.len(), 1);
// assert_eq!(results[FRAGMENT], TextDirectiveStatus::Completed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// assert_eq!(results.len(), 1);
// assert_eq!(results[FRAGMENT], TextDirectiveStatus::Completed);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean-up done!

Comment on lines 320 to 321
// assert!(results.len() == 1);
// assert_eq!(results[FRAGMENT], TextDirectiveStatus::Completed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// assert!(results.len() == 1);
// assert_eq!(results[FRAGMENT], TextDirectiveStatus::Completed);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean-up done!


/// [Internal] the use of regular expression does not comply with the specification
/// To be used for testing purposes only
fn _check(&self, input: &str) -> Result<FragmentDirectiveStatus, TextFragmentError> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using underscores for internal methods is rather uncommon. It's clear enought that there's no pub in front.

Suggested change
fn _check(&self, input: &str) -> Result<FragmentDirectiveStatus, TextFragmentError> {
fn check(&self, input: &str) -> Result<FragmentDirectiveStatus, TextFragmentError> {

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed the function name and also to suppress the warning, i've included the #[allow(dead_code)] directive

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed the function per feedback

mod error;
mod status;
mod url;
// mod frag_directive;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// mod frag_directive;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean-up done!

pub use error::*;
pub use status::*;
pub use url::*;
// pub use frag_directive::*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// pub use frag_directive::*;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean-up done

Comment on lines 19 to 20
// FragmentDirectiveStatus::PartialOk(m) => write!(f, "Partial Ok {:?}", m),
// FragmentDirectiveStatus::Error(e) => write!(f, "Error: {:?}", e),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about these?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take care of now - I've made sure that all the unwanted code is now cleaned entirely

@@ -0,0 +1,78 @@
/// Defines the status of the Text Fragment search and extraction/search operation status
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Defines the status of the Text Fragment search and extraction/search operation status
//! Defines the status of the Text Fragment search and extraction/search operation status

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored code takes are of this now!


use crate::types::{FragmentDirective, FRAGMENT_DIRECTIVE_DELIMITER};

/// Fragment Directive feature trait
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Fragment Directive feature trait
/// Fragment Directive extension trait

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the comment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we don't need that file and can move the code closer to where it's used? (Assuming it's only used in one place.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the folder as part of the refactoring exercise

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module is so large, I'd assume a relatively large doc-block as well.
In there, I'd answer a few questions:

  • What does the module do?
  • How is it supposed to be used (including an example)?
  • What were the design tradeoffs?
  • What are the possible error conditions?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

working on it - will commit this documentation for review!


use crate::types::{TextDirective, TextDirectiveKind, TextDirectiveStatus};

const BLOCK_ELEMENTS: &[&str] = &[
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not too happy that we have to maintain this list.
Looked around for alternatives, and I found https://github.com/servo/html5ever/blob/main/html5ever/src/tree_builder/tag_sets.rs.
Not sure if it can be used, but I wanted to mention it.
If it's too much hassle to integrate, we can keep the current implementation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately, the html5ever does not expose these macros for us to consume and so couldn't use it - i don't want to keep a copy of this file inside lychee repository and so ruling it out. And i did search around to check on alternative approaches and found either we need to have a headless browser to manage/identify the element type (as block) or stay with this current approach.
My recommendation, for now, is to keep this code - in future, if the html5ever makes these macros as public, we can potentially be freed up of maintaining this list.

@mre
Copy link
Member

mre commented Feb 14, 2025

I've added a few more comments.
Most of them are minor. I think it should be fairly easy to go through them and accept/reject my suggestions from the GitHub UI.

As for the error, it's currently failing because there is no textfrag crate on crates.io. That is to be expected, because we haven't published it yet. However, the question is if we do want to publish it at all.
cargo expects all dependencies to be published on crates.io. We have three options:

  1. Publish the crate under lychee-textfrag. This would require renaming the crate to fit our naming scheme.
  2. Publish the crate as a separate project, e.g. textfrag. Then we'd have to move the code out into a separate repo and perhaps add a CI/CD process for releases. We could keep it in the lycheeverse namespace, or you publish it under your own name. The license has to be compatible (Apache/MIT).
  3. Don't publish the crate. In that case, we'd have to move the textfrag crate as a submodule into the lychee-lib crate. The advantage is that we don't have to set up a separate repo, which will make future changes easier.

I'd vote for option 3, which is the easiest right now and allows us to keep maintaining the code inside lychee. It's still reasonably well encapsuled in its own module and we can always make it a separate crate later once the code is mature enough. The downside is longer compile-times because it would be in the same codegen unit as the rest of the library code.

Detailed instructions for option 3

To merge the textfrag code directly into lychee-lib, here's what you could do:

  1. Move the textfrag code into lychee-lib, perhaps in a submodule like lychee-lib/src/textfrag/
  2. Remove textfrag from your workspace members in the root Cargo.toml
  3. Update any imports in lychee-lib to reference the new module location instead of the external crate

This approach has several benefits:

  • Simplifies your publishing process - no need to maintain and publish a separate crate
  • Makes it clear that this is internal implementation code
  • Gives you more flexibility to change the code without worrying about breaking other potential users

Let me know what you think.

@thiru-appitap
Copy link
Author

I've added a few more comments. Most of them are minor. I think it should be fairly easy to go through them and accept/reject my suggestions from the GitHub UI.

As for the error, it's currently failing because there is no textfrag crate on crates.io. That is to be expected, because we haven't published it yet. However, the question is if we do want to publish it at all. cargo expects all dependencies to be published on crates.io. We have three options:

  1. Publish the crate under lychee-textfrag. This would require renaming the crate to fit our naming scheme.
  2. Publish the crate as a separate project, e.g. textfrag. Then we'd have to move the code out into a separate repo and perhaps add a CI/CD process for releases. We could keep it in the lycheeverse namespace, or you publish it under your own name. The license has to be compatible (Apache/MIT).
  3. Don't publish the crate. In that case, we'd have to move the textfrag crate as a submodule into the lychee-lib crate. The advantage is that we don't have to set up a separate repo, which will make future changes easier.

I'd vote for option 3, which is the easiest right now and allows us to keep maintaining the code inside lychee. It's still reasonably well encapsuled in its own module and we can always make it a separate crate later once the code is mature enough. The downside is longer compile-times because it would be in the same codegen unit as the rest of the library code.

Detailed instructions for option 3
To merge the textfrag code directly into lychee-lib, here's what you could do:

  1. Move the textfrag code into lychee-lib, perhaps in a submodule like lychee-lib/src/textfrag/
  2. Remove textfrag from your workspace members in the root Cargo.toml
  3. Update any imports in lychee-lib to reference the new module location instead of the external crate

This approach has several benefits:

  • Simplifies your publishing process - no need to maintain and publish a separate crate
  • Makes it clear that this is internal implementation code
  • Gives you more flexibility to change the code without worrying about breaking other potential users

Let me know what you think.

@mre I understand and agree with your recommendation - I am addressing this and, as well, the other review comments and republish for review - thanks for your patience!

@mre
Copy link
Member

mre commented Feb 20, 2025

@thiru-appitap, I saw that you did some work. Can you close the resolved conversations already? It becomes a bit hard to keep track of the open TODOs. 😉
Also, did you use the "commit suggestion" feature from GitHub? For some of the changes, I made suggestions, which are easy to merge with a single click. This way, changes should be much easier to handle on your end. Of course, you have to git pull the changes locally to be up-to-date.

thiru-appitap and others added 3 commits February 21, 2025 12:55
- moved (back) the textfrag as a module into the lychee-lib crate
- added documents and ran doctests to verify its working
- added a cli test to validate the text fragment functionality
@thiru-appitap
Copy link
Author

@thiru-appitap, I saw that you did some work. Can you close the resolved conversations already? It becomes a bit hard to keep track of the open TODOs. 😉 Also, did you use the "commit suggestion" feature from GitHub? For some of the changes, I made suggestions, which are easy to merge with a single click. This way, changes should be much easier to handle on your end. Of course, you have to git pull the changes locally to be up-to-date.

@mre was juggling between couple of priorities, along with local travels and so couldn't resolve & commit the changes earlier itself - now on I should be able to get back with much faster turnaround on the feedback
now, the changes are all done (fingerscrossed) based on the feedback; along with, I have tried to adopt the idiomatic way as much as I could - willing to learn more and please keep helping me with more feedback!

@almereyda
Copy link

This branch kindly asks for a rebase.

Running cargo install --branch text-fragment --git https://github.com/thiru-appitap/lychee.git and then using lychee --include-text-fragments shows that this works in principle.

Would be nice to see this arrive.

@thiru-appitap
Copy link
Author

This branch kindly asks for a rebase.

Running cargo install --branch text-fragment --git https://github.com/thiru-appitap/lychee.git and then using lychee --include-text-fragments shows that this works in principle.

Would be nice to see this arrive.

I have rebased the branch to the latest but am facing lint failures - working on to fix it and recommit by this weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants