-
Notifications
You must be signed in to change notification settings - Fork 779
Open
Description
The pprof report shows high memory usage in the following areas:
- 281.63MB 25.71% 25.71% 349.70MB 31.92% http://github.com/anchore/syft/internal/task.finalizePkgCatalogerResults
- 223.01MB 20.36% 46.06% 281.51MB 25.70% http://github.com/anchore/syft/syft/pkg/cataloger/java.parseJavaManifest
(pprof) top10
Showing nodes accounting for 853MB, 77.86% of 1095.56MB total
Dropped 452 nodes (cum <= 5.48MB)
Showing top 10 nodes out of 143
flat flat% sum% cum cum%
281.63MB 25.71% 25.71% 349.70MB 31.92% github.com/anchore/syft/internal/task.finalizePkgCatalogerResults
223.01MB 20.36% 46.06% 281.51MB 25.70% github.com/anchore/syft/syft/pkg/cataloger/java.parseJavaManifest
104.57MB 9.54% 55.61% 104.57MB 9.54% github.com/anchore/syft/syft/file.(*LocationSet).Add
66.50MB 6.07% 61.68% 66.50MB 6.07% bufio.(*Scanner).Text (inline)
55.55MB 5.07% 66.75% 63.06MB 5.76% github.com/anchore/syft/syft/pkg.(*LicenseSet).Add
38.51MB 3.52% 70.26% 38.51MB 3.52% github.com/anchore/syft/syft/file.Location.WithAnnotation
30.21MB 2.76% 73.02% 30.21MB 2.76% github.com/google/licensecheck/internal/match.(*dfaBuilder).add
23.50MB 2.15% 75.17% 24MB 2.19% fmt.Sprintf
17.01MB 1.55% 76.72% 36.11MB 3.30% github.com/anchore/syft/syft/pkg.(*Collection).addToIndex
12.50MB 1.14% 77.86% 12.50MB 1.14% github.com/klauspost/compress/zip.readDirectoryHeader
Looking at http://github.com/anchore/syft/syft/pkg/cataloger/java.parseJavaManifest we see that sections is taking upwards of 150MB~
list github.com/anchore/syft/syft/pkg/cataloger/java.parseJavaManifest
Total: 1.07GB
ROUTINE ======================== github.com/anchore/syft/syft/pkg/cataloger/java.parseJavaManifest in pkg/cataloger/java/parse_java_manifest.go
223.01MB 281.51MB (flat, cum) 25.70% of Total
. . 20:func parseJavaManifest(path string, reader io.Reader) (*pkg.JavaManifest, error) {
1.50MB 1.50MB 21: var manifest pkg.JavaManifest
. . 22: sections := make([]pkg.KeyValues, 0)
. . 23:
. . 24: currentSection := func() int {
. . 25: return len(sections) - 1
. . 26: }
. . 27:
. . 28: var lastKey string
. . 29: scanner := bufio.NewScanner(reader)
. . 30:
. . 31: for scanner.Scan() {
. 58.50MB 32: line := scanner.Text()
. . 33:
. . 34: // empty lines denote section separators
. . 35: if line == "" {
. . 36: // we don't want to allocate a new section map that won't necessarily be used, do that once there is
. . 37: // a non-empty line to process
. . 38:
. . 39: // do not process line continuations after this
. . 40: lastKey = ""
. . 41:
. . 42: continue
. . 43: }
. . 44:
. . 45: if line[0] == ' ' {
. . 46: // this is a continuation
. . 47:
. . 48: if lastKey == "" {
. . 49: log.Debugf("java manifest %q: found continuation with no previous key: %q", path, line)
. . 50: continue
. . 51: }
. . 52:
. . 53: lastSection := sections[currentSection()]
. . 54:
155.38MB 155.38MB 55: sections[currentSection()][len(lastSection)-1].Value += strings.TrimSpace(line)
. . 56:
. . 57: continue
. . 58: }
. . 59:
. . 60: // this is a new key-value pair
. . 61: idx := strings.Index(line, ":")
. . 62: if idx == -1 {
. . 63: log.Debugf("java manifest %q: unable to split java manifest key-value pairs: %q", path, line)
. . 64: continue
. . 65: }
. . 66:
. . 67: key := strings.TrimSpace(line[0:idx])
. . 68: value := strings.TrimSpace(line[idx+1:])
. . 69:
. . 70: if key == "" {
. . 71: // don't attempt to add new keys or sections unless there is a non-empty key
. . 72: continue
. . 73: }
. . 74:
. . 75: if lastKey == "" {
. . 76: // we're entering a new section
4.58MB 4.58MB 77: sections = append(sections, make(pkg.KeyValues, 0))
. . 78: }
. . 79:
61.55MB 61.55MB 80: sections[currentSection()] = append(sections[currentSection()], pkg.KeyValue{
. . 81: Key: key,
. . 82: Value: value,
. . 83: })
. . 84:
. . 85: // keep track of key for potential future continuations
for http://github.com/anchore/syft/internal/task.finalizePkgCatalogerResults it is using 281MB~ on a slice of CPEs
(pprof) list github.com/anchore/syft/internal/task.finalizePkgCatalogerResults
Total: 1.07GB
ROUTINE ======================== github.com/anchore/syft/internal/task.finalizePkgCatalogerResults in task/package_task_factory.go
281.63MB 349.70MB (flat, cum) 31.92% of Total
. . 75:func finalizePkgCatalogerResults(cfg CatalogingFactoryConfig, resolver file.PathResolver, catalogerName string, pkgs []pkg.Package, relationships []artifact.Relationship) ([]pkg.Package, []artifact.Relationship) {
. . 76: for i, p := range pkgs {
. . 77: if p.FoundBy == "" {
. . 78: p.FoundBy = catalogerName
. . 79: }
. . 80:
. . 81: if cfg.DataGenerationConfig.GenerateCPEs && !hasAuthoritativeCPE(p.CPEs) {
. . 82: // generate CPEs (note: this is excluded from package ID, so is safe to mutate)
. . 83: // we might have binary classified CPE already with the package so we want to append here
. 3.51MB 84: dictionaryCPEs, ok := cpeutils.DictionaryFind(p)
. . 85: if ok {
. . 86: log.Tracef("used CPE dictionary to find CPEs for %s package %q: %s", p.Type, p.Name, dictionaryCPEs)
. . 87: p.CPEs = append(p.CPEs, dictionaryCPEs...)
. . 88: } else {
281.63MB 289.63MB 89: p.CPEs = append(p.CPEs, cpeutils.Generate(p)...)
. . 90: }
. . 91: }
. . 92:
. . 93: // if we were not able to identify the language we have an opportunity
. . 94: // to try and get this value from the PURL. Worst case we assert that
. . 95: // we could not identify the language at either stage and set UnknownLanguage
. . 96: if p.Language == "" {
. . 97: p.Language = pkg.LanguageFromPURL(p.PURL)
. . 98: }
. . 99:
. . 100: if cfg.RelationshipsConfig.PackageFileOwnership {
. . 101: // create file-to-package relationships for files owned by the package
. . 102: owningRelationships, err := packageFileOwnershipRelationships(p, resolver)
. . 103: if err != nil {
. . 104: log.Debugf("unable to create any package-file relationships for package name=%q type=%q: %v", p.Name, p.Type, err)
. . 105: } else {
. . 106: relationships = append(relationships, owningRelationships...)
. . 107: }
. . 108: }
. . 109:
. . 110: // we want to know if the user wants to preserve license content or not in the final SBOM
. . 111: // note: this looks incorrect, but pkg.License.Content is NOT used to compute the Package ID
. . 112: // this does NOT change the reproducibility of the Package ID
. 56.55MB 113: applyLicenseContentRules(&p, cfg.LicenseConfig)
. . 114:
. . 115: pkgs[i] = p
. . 116: }
. . 117: return pkgs, relationships
. . 118:}
This PR reduces the amount of string allocations in https://github.com/anchore/syft/syft/pkg/cataloger/java.parseJavaManifest -> #4624
Reactions are currently unavailable