Skip to content

Carry anchor text with link for parsing #6329

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

uranusjr
Copy link
Member

Close #6272. The idea is to pass the text along with the collected link for parsing.

@cjerdonek
Copy link
Member

Thanks for the PR! Could you include a reduced failing test that would pass with this change (e.g. one where the anchor text is different)?

I think HTMLPage.iter_links() (HTMLPage.iter_anchors() in your PR) would be a good candidate as it would let you start with HTML and check that the correct info is being parsed.

It looks like you also need to update some tests, anyways.

@uranusjr
Copy link
Member Author

Tests fixed to match new internal function names. I updated the fixtures to make it explicit that the URL does not matter during link searching.

@cjerdonek
Copy link
Member

@uranusjr I made my comment above before seeing @dstufft's comment here: #6272 (comment)

Does that comment mean we should be changing the PEP to match pip's behavior rather than vice versa? Is there consensus yet on how to proceed? (I gather this PR is in the direction of changing pip's behavior.)

@uranusjr
Copy link
Member Author

I am not sure. There are no real consequences either way (alternative PEP 503 servers and clients will likely continue to work either way). Conceptually I feel the restriction is unnecessary and prefer to change pip’s behaviour. It would be a lot easier to change the PEP, however, so I’m fine with that too.

@cjerdonek
Copy link
Member

Conceptually I feel the restriction is unnecessary

By this, you mean you think it should be okay for the anchor text to differ from the filename portion of the URL?

On the PR itself, would some sort of validation need to be done on the anchor text (anchor.text) to ensure e.g. that it isn't a path with more than one path component? (When the URL is used, this aspect would get satisfied automatically.)

@uranusjr
Copy link
Member Author

By this, you mean you think it should be okay for the anchor text to differ from the filename portion of the URL?

Yes. IMO there’s no reason why this shouldn’t be allowed:

<a href="/mypackage/dl/sdist/">mypackage-1.0.tar.gz</a>

On the PR itself, would some sort of validation need to be done on the anchor text (anchor.text) to ensure e.g. that it isn't a path with more than one path component? (When the URL is used, this aspect would get satisfied automatically.)

The text is parsed directly as either sdist or wheel (depending on the suffix), and a path would automatically fail and be skipped. I guess there is no harm performing explicit checks, but technically it is not needed. Maybe I should add a few tests for those cases?

@cjerdonek
Copy link
Member

Maybe I should add a few tests for those cases?

The only thing is I wouldn't want you to do extra work if we're not sure yet if people would be okay with the change. (Personally, I was just asking some questions since I'm not an expert in this area.)

@uranusjr
Copy link
Member Author

After some time to rethink I believe the best way to avoid unnecessary work is to revise the PEP instead. The behaviour does not have much practical effects either way, and it’s the safest to modify the spec to fit the implementation before anyone else notices :p

@uranusjr uranusjr closed this Mar 29, 2019
@lock
Copy link

lock bot commented May 28, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot added the auto-locked Outdated issues that have been locked by automation label May 28, 2019
@lock lock bot locked as resolved and limited conversation to collaborators May 28, 2019
@uranusjr uranusjr deleted the simple-api-compliance branch September 28, 2020 14:44
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HTMLPage PEP 503 compliance
2 participants