Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream of Portage (Gentoo) Builder #40

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

lwshanbd
Copy link
Contributor

No description provided.

Copy link

vercel bot commented Aug 22, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
llvm-ir-dataset-utils ✅ Ready (Inspect) Visit Preview 💬 Add feedback Aug 22, 2024 9:15pm

@lwshanbd lwshanbd marked this pull request as ready for review August 22, 2024 21:15
@boomanaiden154 boomanaiden154 self-requested a review August 22, 2024 22:31
Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments.


Unlike the above builders, Portage-related builders must be built in a Gentoo or Gentoo container.

If you are using a Gentoo container, please use `-v`(docker or podman) to map the host path where you want to store the data as `/data`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try and wrap lines at 80 characters.

Also, is there anything additional that needs to be done here for singularity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea for using gentoo container with singularity since I cannot use --usern on LC.

@@ -0,0 +1,15 @@
app-office_skrooge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this file for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mask is a flag in Portage, which means this packages is "masked" so that you cannot build/install it.
Will remove it with the pkg.list since we have corresponding scripts.

@@ -0,0 +1,4135 @@
x11-libs/vte
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a script to generate this and this file should not be checked in.

@@ -46,6 +46,27 @@ def generate_emerge_command(package_to_build, threads, build_dir):
return command_vector


def perform_build_again(package_name, assembled_build_command, corpus_dir,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a separate function to perform the build again? Why can't we just have build call itself recursively?

Also, why does this exist in the first place? Are you seeing build non-determinism? That should be fixed in upstream Gentoo (assuming feasible) rather than here.

logging.info(f"Finished build portage package {package_name}")
return True


def extract_ir(package_spec, corpus_dir, build_dir, threads):
# Not using the tmp directory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add to the comment here explaining why?



if __name__ == "__main__":
if len(sys.argv) != 3:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull everything here into an actual function rather than leaving all the code under the if statement.


with open('../../corpus_descriptions_test/portage_pkg.list', 'r') as file:
for i, package_name in enumerate(file):
if i < a:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use more descriptive variable names than a and b.

print("Please provide integer values for <a> and <b>.")
sys.exit(1)

with open('../../corpus_descriptions_test/portage_pkg.list', 'r') as file:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file path should be a flag value.

break
package_name = package_name.strip()
package_name = package_name.replace('_', '/', 1)
build(package_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this doesn't do any build distribution and just builds everything serially?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Portage locks its database while performing operations to prevent multiple processes from making changes simultaneously. If two parallel emerge processes attempt to access the database at the same time, it could result in a deadlock or other issues.
Again, because I don't have the correct container permissions (usern), I have to use the root user for system level installations of software using Gentoo. This can have unpredictable effects if parallelized.

# neither.json is a log analyzing the results of a package whose installation result type is neither.
# 'Y' means it has been built successfully.
# 'N' means its build failed.
with open('./portage-lists/neither.json') as neither_pkgs_files:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these file arguments be CLI flags? Or are they static on all Gentoo installations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, the N/Y analysis here does not exist in checkin. Here is the N/Y that I got from analyzing the build log externally using jupyter-notebook+OpenAI API. maybe I should just delete these analyses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants