Skip to content

Commit e9e8ade

Browse files
spattencsasarak
andauthored
[ANE-1967] Recursive jars in containers (#1478)
Co-authored-by: Christopher Sasarak <[email protected]>
1 parent 666372e commit e9e8ade

File tree

11 files changed

+261
-29
lines changed

11 files changed

+261
-29
lines changed

Cargo.lock

+1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Changelog.md

+1
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
- Microsoft SQL Server 2019 Developer, 2019 Evaluation, and 2019 Express
1414
- Microsoft SQL Server 2022 Enterprise, Standard, Web
1515
- Viskoe.dk Terms of Use
16+
- Container scanning: Recursively find jars within jars ([#1478](https://github.com/fossas/fossa-cli/pull/1478))
1617

1718
## 3.9.37
1819

docs/references/subcommands/container/scanner.md

+28-26
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
# FOSSA's container scanner
22

3-
- [FOSSA's container scanner](#fossas-new-container-scanner)
4-
- [What's new in this scanner?](#whats-new-in-this-scanner)
3+
- [FOSSA's container scanner](#fossas-container-scanner)
4+
- [What's supported in FOSSA's container scanner?](#whats-supported-in-fossas-container-scanner)
55
- [Documentation](#documentation)
66
- [Container image source](#container-image-source)
77
- [1) Exported docker archive](#1-exported-docker-archive)
88
- [2) From Docker Engine](#2-from-docker-engine)
99
- [3) From registries](#3-from-registries)
1010
- [Container image analysis](#container-image-analysis)
11-
- [Container Jar analysis](#container-jar-analysis)
11+
- [Container JAR analysis](#container-jar-analysis)
1212
- [Distroless Containers](#distroless-containers)
1313
- [Supported Container Package Managers](#supported-container-package-managers)
1414
- [View detected projects](#view-detected-projects)
@@ -19,7 +19,7 @@
1919
- [How do I scan multi-platform container images with `fossa-cli`?](#how-do-i-scan-multi-platform-container-images-with-fossa-cli)
2020
- [How can I only scan for system dependencies (alpine, dpkg, rpm)?](#how-can-i-only-scan-for-system-dependencies-alpine-dpkg-rpm)
2121
- [How do I exclude specific projects from container scanning?](#how-do-i-exclude-specific-projects-from-container-scanning)
22-
- [Limitations & Workarounds](#limitations--workarounds)
22+
- [Limitations \& Workarounds](#limitations--workarounds)
2323

2424
## What's supported in FOSSA's container scanner?
2525

@@ -50,9 +50,9 @@ To scan a container image with `fossa-cli`, use the `container analyze` command:
5050
# This command uses the repository name as project name, and image digest as the revision.
5151
# Like standard FOSSA analysis, the project name is customizable via `--project` and revision via `--revision`:
5252
#
53-
# >> fossa container analyze <IMAGE> --project <PROJECT-NAME> --revision <REVISION-VALUE>
53+
# >> fossa container analyze <IMAGE> --project <PROJECT-NAME> --revision <REVISION-VALUE>
5454
#
55-
fossa container analyze <IMAGE>
55+
fossa container analyze <IMAGE>
5656

5757
# Similar to the above, but instead of uploading the results they are instead written to the terminal in JSON format.
5858
#
@@ -89,13 +89,13 @@ By default `fossa-cli` attempts to identify `<IMAGE>` source in the following or
8989

9090
```bash
9191
docker save redis:alpine > redis_alpine.tar
92-
fossa container analyze redis_alpine.tar
92+
fossa container analyze redis_alpine.tar
9393
```
9494

9595
### 2) From Docker Engine
9696

9797
```bash
98-
fossa container analyze redis:alpine
98+
fossa container analyze redis:alpine
9999
```
100100

101101
For this image source to work, `fossa-cli` requires docker to be running and accessible.
@@ -118,7 +118,7 @@ curl --unix-socket /var/run/docker.sock -X GET "http://localhost/v1.28/images/re
118118
### 3) From registries
119119

120120
```bash
121-
fossa container analyze ghcr.io/fossas/haskell-dev-tools:9.0.2
121+
fossa container analyze ghcr.io/fossas/haskell-dev-tools:9.0.2
122122
```
123123

124124
This step works even if you do not have docker installed or have docker engine accessible.
@@ -138,17 +138,17 @@ If `<IMAGE>` is not a docker image archive and is not accessible via the docker
138138
| `quay.io/org/image:tag` | `quay.io` | `org/image` | `tag` |
139139

140140
Note:
141-
- When the domain is not present, `fossa-cli` defaults to the registry `index.docker.io`.
142-
- When digest or tag is not present, `fossa-cli` defaults to the tag `latest`.
143-
- When the registry is `index.docker.io`, and repository does not contain the literal `/`, `fossa-cli` infers that this is official image stored under `library/<image>`.
144-
- When a multi-platform image is provided (e.g. `ghcr.io/graalvm/graalvm-ce:ol7-java11-21.3.3`), `fossa-cli` defaults to selecting image artifacts for current runtime platform.
141+
- When the domain is not present, `fossa-cli` defaults to the registry `index.docker.io`.
142+
- When digest or tag is not present, `fossa-cli` defaults to the tag `latest`.
143+
- When the registry is `index.docker.io`, and repository does not contain the literal `/`, `fossa-cli` infers that this is official image stored under `library/<image>`.
144+
- When a multi-platform image is provided (e.g. `ghcr.io/graalvm/graalvm-ce:ol7-java11-21.3.3`), `fossa-cli` defaults to selecting image artifacts for current runtime platform.
145145

146146
Analyzing the container image for a platform other than the one currently running is possible by specifying the digest for the image on a different platform.
147147

148148
For example, the following command analyzes the `arm64` platform image of `ghcr.io/graalvm/graalvm-ce@sha256` regardless of the platform running `fossa container analyze`:
149149

150150
```bash
151-
fossa container analyze ghcr.io/graalvm/graalvm-ce@sha256:bdcba07acb11053fea0026b807ecf94550ace7df27b10596ca4c765165243cef
151+
fossa container analyze ghcr.io/graalvm/graalvm-ce@sha256:bdcba07acb11053fea0026b807ecf94550ace7df27b10596ca4c765165243cef
152152
```
153153

154154
**Private registries**
@@ -171,18 +171,18 @@ This is done in following steps:
171171
}
172172
```
173173

174-
If any of the steps above fail, `fossa-cli` defaults to connecting without user credentials.
175-
174+
If any of the steps above fail, `fossa-cli` defaults to connecting without user credentials.
175+
176176
To explicitly provide a username and password, use HTTP-style authentication in the image URL.
177177
For this to work the host value must be present in the image URL:
178178

179179
```bash
180-
fossa container analyze user:[email protected]/org/image:tag
180+
fossa container analyze user:[email protected]/org/image:tag
181181
```
182182

183183
**Retrieving image from registry**
184184

185-
`fossa-cli` uses `/v2/` registry api (per OCI distribution spec) for retrieving
185+
`fossa-cli` uses `/v2/` registry api (per OCI distribution spec) for retrieving
186186
image manifests, and image artifacts from registry. It does so in following manner:
187187

188188
1) `HEAD <repository>/manifests/<tag-or-digest>` (to see if the manifests exists)
@@ -194,20 +194,22 @@ image manifests, and image artifacts from registry. It does so in following mann
194194
4) Download all blobs using `GET /v2/<repository>/blobs/<digest>` (if blobs are tar.gzip, they will be gzip extracted)
195195
5) From artifacts downloaded representative image tarball will be created.
196196

197-
All `GET` request from step 2 to step 5, will make `HEAD` call prior to confirm existence of resource. If
197+
All `GET` request from step 2 to step 5, will make a `HEAD` call prior to confirm existence of resource. If
198198
401 status is received new access token will be generated using auth flow mentioned in step (1).
199199

200200
## Container image analysis
201201

202202
The container scanner scans in two steps:
203203
1. The base layer.
204-
2. The rest of the layers, squashed.
204+
2. The rest of the layers, squashed.
205205

206206
### Container JAR analysis
207207

208208
The container analyzer will try to find Java Archive (Jar) files inside each layer.
209209
It will then report them to FOSSA which will try to match the Jar files to the project they are a build artifact from.
210210

211+
The container analyzer will also expand each Jar file that it encounters and report any Jar files that it finds in the expanded Jar file. This is done recursively.
212+
211213
This process relies on there being a back-end that can perform that analysis.
212214
SaaS customers should have this functionality available but on-prem customers may need to contact FOSSA support to have it enabled.
213215

@@ -264,7 +266,7 @@ and if desired can inform [analysis target configuration](../../files/fossa-yml.
264266

265267
Example output:
266268
```bash
267-
; fossa container list-targets ghcr.io/tcort/markdown-link-check:stable
269+
; fossa container list-targets ghcr.io/tcort/markdown-link-check:stable
268270

269271
[ INFO] Discovered image for: ghcr.io/tcort/markdown-link-check:stable (of 137610196 bytes) via docker engine api.
270272
[ INFO] Exporting docker image to temp file: /private/var/folders/hb/pg5d0r196kq1qdswr6_79hzh0000gn/T/fossa-docker-engine-tmp-f7af2b5d1ec5173d/image.tar! This may take a while!
@@ -296,7 +298,7 @@ exclude:
296298
297299
### Debugging
298300
299-
`fossa-cli` supports the `--debug` flag and debug bundle generation with the container scanner.
301+
`fossa-cli` supports the `--debug` flag and debug bundle generation with the container scanner.
300302

301303
```bash
302304
fossa container analyze redis:alpine --debug
@@ -315,7 +317,7 @@ Images can be exported to archives using Docker:
315317
docker pull <IMAGE>:<TAG> # or docker pull <IMAGE>@<DIGEST>
316318
docker save <IMAGE>:<TAG> > image.tar
317319
318-
fossa container analyze image.tar --container scanner
320+
fossa container analyze image.tar --container scanner
319321
320322
rm image.tar
321323
```
@@ -328,7 +330,7 @@ By default when `fossa-cli` is analyzing multi-platform image it prefers using t
328330
If a specific platform is desired, use the digest for that platform:
329331

330332
```bash
331-
fossa container analyze ghcr.io/graalvm/graalvm-ce@sha256:bdcba07acb11053fea0026b807ecf94550ace7df27b10596ca4c765165243cef
333+
fossa container analyze ghcr.io/graalvm/graalvm-ce@sha256:bdcba07acb11053fea0026b807ecf94550ace7df27b10596ca4c765165243cef
332334
```
333335

334336
### How can I only scan for system dependencies (alpine, dpkg, rpm)?
@@ -342,7 +344,7 @@ fossa container analyze <IMAGE> --only-system-deps
342344
### How do I exclude specific projects from container scanning?
343345

344346
Use a FOSSA configuration file to perform exclusion of projects or paths.
345-
Refer to the [configuration file](./../../files/fossa-yml.md) documentation for more details.
347+
Refer to the [configuration file](./../../files/fossa-yml.md) documentation for more details.
346348

347349
As an example, the following configuration file only analyzes `setuptools`, and `alpine` packages:
348350

@@ -371,7 +373,7 @@ The recommended workaround is to export the image to an archive, then analyze th
371373
docker pull quay.io/org/image:tag
372374
docker save quay.io/org/image:tag > img.tar
373375
374-
fossa container analyze img.tar
376+
fossa container analyze img.tar
375377
rm img.tar
376378
```
377379

extlib/millhone/Cargo.toml

+1
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ tracing-subscriber = { version = "0.3.17", features = ["json"] }
3838
lazy-regex = { version = "3.0.2", features = ["std", "regex"] }
3939
fingerprint = { git = "https://github.com/fossas/lib-fingerprint.git", tag = "v3.0.0", default-features = false, features = ["fp-content-serialize-base64"] }
4040
tar = "0.4.41"
41+
zip = "2.1.3"
4142

4243
[dev-dependencies]
4344
maplit = "1.0.2"

extlib/millhone/src/cmd/analyze_container.rs

+153-3
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ use std::{
22
collections::{HashMap, HashSet},
33
fs::File,
44
io::{BufWriter, Read},
5-
path::PathBuf,
5+
path::{Path, PathBuf},
66
};
77

88
use clap::Parser;
@@ -125,10 +125,15 @@ fn jars_in_layer(entry: Entry<'_, impl Read>) -> Result<Vec<DiscoveredJar>> {
125125
debug!("fingerprinting");
126126
let entry = buffer(entry).context("read jar file")?;
127127

128-
match Combined::from_buffer(entry) {
129-
Ok(fingerprints) => discoveries.push(DiscoveredJar::new(path, fingerprints)),
128+
match Combined::from_buffer(entry.clone()) {
129+
Ok(fingerprints) => {
130+
discoveries.push(DiscoveredJar::new(path.clone(), fingerprints))
131+
}
130132
Err(e) => warn!("failed to fingerprint: {e:?}"),
131133
}
134+
let mut discovered_in_jars =
135+
recursive_jars_in_jars(&entry, path, 0).context("recursively discover jars")?;
136+
discoveries.append(&mut discovered_in_jars);
132137

133138
Ok(())
134139
})?;
@@ -137,6 +142,56 @@ fn jars_in_layer(entry: Entry<'_, impl Read>) -> Result<Vec<DiscoveredJar>> {
137142
Ok(discoveries)
138143
}
139144

145+
const MAX_JAR_DEPTH: u32 = 100;
146+
147+
#[tracing::instrument(skip(jar_contents))]
148+
fn recursive_jars_in_jars(
149+
jar_contents: &[u8],
150+
containing_jar_path: PathBuf,
151+
depth: u32,
152+
) -> Result<Vec<DiscoveredJar>> {
153+
if depth > MAX_JAR_DEPTH {
154+
return Ok(vec![]);
155+
}
156+
let mut discoveries = Vec::new();
157+
let mut archive =
158+
zip::ZipArchive::new(std::io::Cursor::new(jar_contents)).context("unzipping jar")?;
159+
for path in archive.clone().file_names() {
160+
debug!("file_name: {path}");
161+
if !path.ends_with(".jar") {
162+
continue;
163+
}
164+
165+
debug!(?path, "jar file found");
166+
let mut zip_file = archive
167+
.by_name(path)
168+
.context("getting zip file info by path")?;
169+
if !zip_file.is_file() {
170+
debug!(?path, "skipped: not a file");
171+
continue;
172+
}
173+
let mut buffer: Vec<u8> = Vec::new();
174+
zip_file
175+
.read_to_end(&mut buffer)
176+
.context("reading jar from zip into buffer")?;
177+
let joined_path = Path::new(&containing_jar_path).join(path);
178+
179+
// fingerprint the jar
180+
match Combined::from_buffer(buffer.clone()) {
181+
Ok(fingerprints) => {
182+
discoveries.push(DiscoveredJar::new(joined_path.clone(), fingerprints))
183+
}
184+
Err(e) => warn!("failed to fingerprint: {e:?}"),
185+
}
186+
187+
// recursively find more jars
188+
let mut discovered_in_jars = recursive_jars_in_jars(&buffer, joined_path, depth + 1)
189+
.context("recursively discover jars")?;
190+
discoveries.append(&mut discovered_in_jars);
191+
}
192+
Ok(discoveries)
193+
}
194+
140195
#[tracing::instrument]
141196
fn list_container_layers(layer_path: &PathBuf) -> Result<HashSet<PathBuf>> {
142197
let mut layers = HashSet::new();
@@ -250,4 +305,99 @@ mod tests {
250305
let expected: Value = serde_json::from_str(MILLHONE_OUT).expect("Parse expected json");
251306
pretty_assertions::assert_eq!(expected, res);
252307
}
308+
309+
// This container contains top.jar which contains middle.jar, which contains deepest.jar
310+
// It also includes middle.jar and deepest.jar
311+
// So we should find 6 total jars: three from top.jar and its nested jars, two from middle.jar and its nested jar and then deepest.jar
312+
// We are also testing that the fingerprints from the nested jars are equal to the fingerprints when they are at top-level
313+
// See test/App/Fossa/Container/testdata/nested-jar/README.md for info on how nested_jars.tar was made
314+
#[test]
315+
fn it_finds_nested_jars() {
316+
let nested_jars_millhone_out: String = format!(
317+
r#"
318+
{{
319+
"discovered_jars": {{
320+
"blobs/sha256/3af1c7e331a4b6791c25101e0c862125a597d8d75d786aead62de19f78a5a992": [
321+
{{
322+
"kind": "v1.discover.binary.jar",
323+
"path": "jars/deepest.jar",
324+
"fingerprints": {{
325+
"sha_256": "LsXfP24XYFIZnkS3Z7RaNim1o8/TtGnueThkZv9hCok=",
326+
"v1.class.jar": "47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=",
327+
"v1.mavencentral.jar": "1+4xPh5QS5IW0H6lfbxamjtVVdk=",
328+
"v1.raw.jar": "UMQ1yS7xM6tF4YMvAWz8UP6+qAIRq3JauBoiTlVUNkM="
329+
}}
330+
}}
331+
],
332+
"blobs/sha256/5ee98bff2cf0e70d115677fc37f734d26848435eef5fe52e905229ff7a7d87fb": [
333+
{{
334+
"kind": "v1.discover.binary.jar",
335+
"path": "jars/middle.jar",
336+
"fingerprints": {{
337+
"sha_256": "nKFXVngFtkHIv4FC/rr5o4k+v/KSKzWJ0B9uBuRb+4k=",
338+
"v1.class.jar": "47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=",
339+
"v1.mavencentral.jar": "2XA3GFJJkvvpEbAM9nLnAypojEo=",
340+
"v1.raw.jar": "36i3JNvrLMWCMfjB2c9bjQt4Vhmvfq29cb+Hqrb6XeI="
341+
}}
342+
}},
343+
{{
344+
"kind": "v1.discover.binary.jar",
345+
"path": "jars/middle.jar{separator}deepest.jar",
346+
"fingerprints": {{
347+
"v1.mavencentral.jar": "1+4xPh5QS5IW0H6lfbxamjtVVdk=",
348+
"sha_256": "LsXfP24XYFIZnkS3Z7RaNim1o8/TtGnueThkZv9hCok=",
349+
"v1.raw.jar": "UMQ1yS7xM6tF4YMvAWz8UP6+qAIRq3JauBoiTlVUNkM=",
350+
"v1.class.jar": "47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU="
351+
}}
352+
}}
353+
],
354+
"blobs/sha256/6979b741102e5c5c787f94ad8bfdebeee561b1b89f21139d38489e1b3d6f9096": [],
355+
"blobs/sha256/931c525b52485e01ab5e2926a4b3c884f1c7325782dca13bd11e345f46cc34c3": [],
356+
"blobs/sha256/10bb0e91eb016af401369ecaadccfea9f4768776e54d46ad4e9a0309c82f1d7f": [
357+
{{
358+
"kind": "v1.discover.binary.jar",
359+
"path": "jars/top.jar",
360+
"fingerprints": {{
361+
"v1.raw.jar": "TNW7ezd3fqw3MULVTrexg68Q1x2PTDGk2DkltAqUefk=",
362+
"v1.mavencentral.jar": "TtwsgEXwLd/8UFTohsFhJqYMJ74=",
363+
"v1.class.jar": "47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=",
364+
"sha_256": "l9XTA5PwWJhnFlz9t0SWKvr2cHDmcytIVvPsr6vqFis="
365+
}}
366+
}},
367+
{{
368+
"kind": "v1.discover.binary.jar",
369+
"path": "jars/top.jar{separator}middle.jar",
370+
"fingerprints": {{
371+
"v1.mavencentral.jar": "2XA3GFJJkvvpEbAM9nLnAypojEo=",
372+
"v1.class.jar": "47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=",
373+
"v1.raw.jar": "36i3JNvrLMWCMfjB2c9bjQt4Vhmvfq29cb+Hqrb6XeI=",
374+
"sha_256": "nKFXVngFtkHIv4FC/rr5o4k+v/KSKzWJ0B9uBuRb+4k="
375+
}}
376+
}},
377+
{{
378+
"kind": "v1.discover.binary.jar",
379+
"path": "jars/top.jar{separator}middle.jar{separator}deepest.jar",
380+
"fingerprints": {{
381+
"v1.raw.jar": "UMQ1yS7xM6tF4YMvAWz8UP6+qAIRq3JauBoiTlVUNkM=",
382+
"sha_256": "LsXfP24XYFIZnkS3Z7RaNim1o8/TtGnueThkZv9hCok=",
383+
"v1.mavencentral.jar": "1+4xPh5QS5IW0H6lfbxamjtVVdk=",
384+
"v1.class.jar": "47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU="
385+
}}
386+
}}
387+
]
388+
}}
389+
}}
390+
"#,
391+
separator = std::path::MAIN_SEPARATOR_STR.replace("\\", "\\\\")
392+
);
393+
let image_tar_file =
394+
PathBuf::from("../../test/App/Fossa/Container/testdata/nested_jars.tar");
395+
let res = jars_in_container(&image_tar_file)
396+
.expect("Read jars out of container image.")
397+
.pipe(serde_json::to_value)
398+
.expect("encode as json");
399+
let expected: Value =
400+
serde_json::from_str(&nested_jars_millhone_out).expect("Parse expected json");
401+
pretty_assertions::assert_eq!(expected, res);
402+
}
253403
}

0 commit comments

Comments
 (0)