|
| 1 | +# Proposal: Security Hub |
| 2 | + |
| 3 | +Author: stonezdj |
| 4 | + |
| 5 | +## Discussion |
| 6 | + |
| 7 | +[#10496](https://github.com/goharbor/harbor/issues/10496) |
| 8 | +[#13](https://github.com/goharbor/pluggable-scanner-spec/issues/13) |
| 9 | + |
| 10 | +## Abstract |
| 11 | + |
| 12 | +Security Hub feature provides a flexible way to search and report the project or application’s vulnerability information for administrators. |
| 13 | + |
| 14 | +## Background |
| 15 | + |
| 16 | +Harbor 2.7.0 provide a way to scan image and export CVE information by project, but it could not search involved images by CVE ID. it could not review all the CVEs in system level or project level. it could not provide the summary information of current existing CVEs. it can’t provide a way to present a vulnerability in a holistic view with different level such as system level, project level, or any freestyle (label) level. with the CVE search and report feature, administrator could search and report the vulnerability information in a flexible way. |
| 17 | + |
| 18 | +## User stories |
| 19 | + |
| 20 | +1. As an administrator, he/she can check the total number of vulnerabilities scanned in current view (system/project/label), the report include the total number of vulnerabilities count by severity level: Critical, High, Medium, Low, None and Unknown. Most dangerous vulnerabilities, Most dangerous artifacts in current report. |
| 21 | +1. As an administrator, he/she can filter vulnerabilities by score range in the current scope, the scope should be project or system level. |
| 22 | +1. As an administrator, he/she can filter vulnerabilities by image tag in the current scope. |
| 23 | +1. As an administrator, he/she can filter vulnerabilities by package in the current scope. |
| 24 | +1. As an administrator, he/she can filter vulnerabilities by CVE id in the current scope. |
| 25 | + |
| 26 | +## Personas |
| 27 | + |
| 28 | +Only system administrators could access the security hub feature. |
| 29 | + |
| 30 | +## None Goals |
| 31 | + |
| 32 | +This proposal does not cover the following items: |
| 33 | + |
| 34 | +1. The security hub feature does not provide a way to search and report the vulnerability information by image layer. |
| 35 | +2. It doesn't provide a way to fix the vulnerability or take action to the vulnerability. |
| 36 | + |
| 37 | +## Compatibilities |
| 38 | + |
| 39 | +The implementation of this feature should be compatible with the [pluggable scanner spec v1.0](https://github.com/goharbor/pluggable-scanner-spec) and only support the trivy adapter. |
| 40 | + |
| 41 | +Some table schema should be changed to enable this feature |
| 42 | + |
| 43 | +The scan_report table need to add the following columns: |
| 44 | + |
| 45 | +| Column Name | Description | |
| 46 | +| ------------- | ------------- | |
| 47 | +| critical_cnt | The current report contains critical CVE’s count | |
| 48 | +| high_cnt | The current report contains high CVE’s count | |
| 49 | +| medium_cnt | The current report contains medium CVE’s count | |
| 50 | +| low_cnt | The current report contains low CVE’s count | |
| 51 | +| none_cnt | The current report contains none CVE’s count | |
| 52 | +| unknown_cnt | The current report contains unknown CVE’s count | |
| 53 | +| fixable_cnt | The current report contains fixible CVE’s count | |
| 54 | + |
| 55 | + |
| 56 | +## Implementation |
| 57 | + |
| 58 | +There are REST APIs provided: |
| 59 | + |
| 60 | +1. Retrieve the vulnerability summary of the system |
| 61 | +``` |
| 62 | +GET /api/v2.0/security/summary?q=xxx&with_cve=true&with_artifact=true |
| 63 | +``` |
| 64 | +Response data |
| 65 | +``` |
| 66 | +{ |
| 67 | + “total_count”: 240, |
| 68 | + “high_count”: 20, |
| 69 | + “critical_count”: 35, |
| 70 | + “medium_count”: 90, |
| 71 | + “low_count”: 23, |
| 72 | + “none_count”: 0, |
| 73 | + “fixiable_count”:103, |
| 74 | + “scanned”, 1032, |
| 75 | + “not_scanned”:59 |
| 76 | + “most_dangerous_cve”: [ |
| 77 | + {"cve_id":“CVE-2022-32221”, "package": "curl", "version": "2.3.2", "cvss_score_v3": 9.8}, |
| 78 | + ... |
| 79 | + ] |
| 80 | + “most_dangerous_artifact”: |
| 81 | + [ |
| 82 | + {“artifact_id”: 2377, “artifact_repository”:”library/nuxas”, “digest”:"sha256:7027e69a2172e38cef8ac2cb1f046025895c9fcf3160e8f70ffb26446f680e4d", “serverity_above_high”: 23} |
| 83 | + }, |
| 84 | + ... |
| 85 | + ] |
| 86 | +} |
| 87 | +``` |
| 88 | + |
| 89 | +To query the security summary of a project |
| 90 | + |
| 91 | +``` |
| 92 | +GET /api/v2.0/projects/{project_name_or_id}/security/summary?with_cve=true&with_artifact=true |
| 93 | +``` |
| 94 | +Reponse data format is the same as the system level query. |
| 95 | + |
| 96 | +If with_cve, with_artifact are true, then the response data should include the “most_dangerous_cve” and “most_dangerous_artifact” information. |
| 97 | + |
| 98 | + |
| 99 | +The sql query for this API: |
| 100 | + |
| 101 | +``` |
| 102 | +-- query all the vulnerability count by severity level |
| 103 | +select sum(s.critical_cnt) critical_cnt, |
| 104 | + sum(s.high_cnt) high_cnt, |
| 105 | + sum(s.medium_cnt) medium_cnt, |
| 106 | + sum(s.low_cnt) low_cnt, |
| 107 | + sum(s.none_cnt) none_cnt, |
| 108 | + sum(s.unknown_cnt) unknown_cnt, |
| 109 | + sum(s.fixable_cnt) fixable_cnt |
| 110 | +from artifact a |
| 111 | + left join scan_report s on a.digest = s.digest |
| 112 | + where s.registration_uuid = ? |
| 113 | +
|
| 114 | +-- query total artifact count |
| 115 | +
|
| 116 | +SELECT COUNT(1) |
| 117 | +FROM artifact A |
| 118 | +WHERE NOT EXISTS (select 1 from artifact_accessory acc WHERE acc.artifact_id = a.id) |
| 119 | + AND (EXISTS (SELECT 1 FROM tag WHERE tag.artifact_id = a.id) |
| 120 | + OR NOT EXISTS (SELECT 1 FROM artifact_reference ref WHERE ref.child_id = a.id)) |
| 121 | +
|
| 122 | +-- query scanned count |
| 123 | +SELECT COUNT(1) |
| 124 | +FROM artifact a |
| 125 | +WHERE EXISTS (SELECT 1 |
| 126 | + FROM scan_report s |
| 127 | + WHERE a.digest = s.digest |
| 128 | + AND s.registration_uuid = ?) |
| 129 | + -- exclude artifact accessory |
| 130 | + AND NOT EXISTS (SELECT 1 FROM artifact_accessory acc WHERE acc.artifact_id = a.id) |
| 131 | + -- exclude artifact without tag and part of the image index |
| 132 | + AND EXISTS (SELECT 1 |
| 133 | + FROM tag |
| 134 | + WHERE tag.artifact_id = id |
| 135 | + OR (NOT EXISTS (SELECT 1 FROM artifact_reference ref WHERE ref.child_id = a.id))) |
| 136 | + -- include image index which is scanned |
| 137 | + OR EXISTS (SELECT 1 |
| 138 | + FROM scan_report s, |
| 139 | + artifact_reference ref |
| 140 | + WHERE s.digest = ref.child_digest |
| 141 | + AND ref.parent_id = a.id AND s.registration_uuid = ? AND NOT EXISTS (SELECT 1 |
| 142 | + FROM scan_report s |
| 143 | + WHERE s.digest = a.digest and s.registration_uuid = ?)) // scanned count |
| 144 | +
|
| 145 | +-- query top 5 of the most dangerous cve |
| 146 | +SELECT vr.id, |
| 147 | + vr.cve_id, |
| 148 | + vr.package, |
| 149 | + vr.cvss_score_v3, |
| 150 | + vr.description, |
| 151 | + vr.fixed_version, |
| 152 | + vr.severity, |
| 153 | + CASE vr.severity |
| 154 | + WHEN 'Critical' THEN 5 |
| 155 | + WHEN 'High' THEN 4 |
| 156 | + WHEN 'Medium' THEN 3 |
| 157 | + WHEN 'Low' THEN 2 |
| 158 | + WHEN 'None' THEN 1 |
| 159 | + WHEN 'Unknown' THEN 0 END AS severity_level |
| 160 | +FROM vulnerability_record vr |
| 161 | +WHERE EXISTS (SELECT 1 FROM report_vulnerability_record WHERE vuln_record_id = vr.id) |
| 162 | + AND vr.cvss_score_v3 IS NOT NULL |
| 163 | + AND vr.registration_uuid = ? |
| 164 | +ORDER BY vr.cvss_score_v3 DESC, severity_level DESC |
| 165 | +LIMIT 5 |
| 166 | +
|
| 167 | +-- query top 5 of the most dangerous artifact |
| 168 | +select a.project_id project, a.repository_name repository, a.digest, s.critical_cnt, s.high_cnt, s.medium_cnt, s.low_cnt |
| 169 | +from artifact a, |
| 170 | + scan_report s |
| 171 | +where a.digest = s.digest |
| 172 | + and s.registration_uuid = ? |
| 173 | +order by s.critical_cnt desc, s.high_cnt desc, s.medium_cnt desc, s.low_cnt desc |
| 174 | +limit 5 |
| 175 | +
|
| 176 | +``` |
| 177 | +If a label is specified in the query condition, the label could be changed to a filter condition of artifact id in the sql query |
| 178 | + |
| 179 | +2. Search Vulnerability information |
| 180 | + |
| 181 | +The Vulnerability is the security issue found in the artifact, it includes the package information and the CVE information, but it is not the CVE itself. |
| 182 | +``` |
| 183 | +GET /api/v2.0/security/vul?q=xxx&tune_count=true |
| 184 | +``` |
| 185 | +Response data |
| 186 | +``` |
| 187 | +[{ |
| 188 | +“project”: “library”, |
| 189 | +“repository”: "library/nuxas”, |
| 190 | +“digest”: “sha256:7027e69a2172e38cef8ac2cb1f046025895c9fcf3160e8f70ffb26446f680e4d”, |
| 191 | +“tags”: [“v2.3.0”, “latest”], |
| 192 | +“css_v3_score”: 8.9, |
| 193 | +“cve_id”: “CVE-2022-32221”, |
| 194 | +“package” “nfs-utils”, |
| 195 | +“package_version” “v3.1.0”, |
| 196 | +"fix_version": "2.3.1" |
| 197 | +“description”: “The package nuxas before 2.3.0 for Python allows Directory Traversal via a crafted tar file.”, |
| 198 | +"urls": “https://nvd.nist.gov/vuln/detail/CVE-2022-32221”, |
| 199 | +}, |
| 200 | +{ |
| 201 | + //another cve record |
| 202 | +} |
| 203 | +] |
| 204 | +``` |
| 205 | + |
| 206 | +The tune_count option is used to tune the query of the query count, if the query total count > 1000, then the query will display that the total count is more than 1000, and x-total-count will be set to -1, and the response is the same as the query without tune_count option. |
| 207 | + |
| 208 | +The q parameters like q see lib/q to pass the following parameters |
| 209 | + |
| 210 | +| Query condition | Description | |
| 211 | +| ------------- |--------------------------------------------------------------------------| |
| 212 | +| cve_id | Search vulnerability information by CVE ID, support exact match | |
| 213 | +| severity | Search vulnerability information by severity level | |
| 214 | +| cvss_v3_score | Search vulnerability information by cvss v3 score | |
| 215 | +| project_id | Search vulnerability information by project id | |
| 216 | +| digest | Search vulnerability information by artifact digest, support exact match | |
| 217 | +| repository | Search vulnerability information by repository name, support exact match | |
| 218 | +| package | Search vulnerability information by package name, support exact match | |
| 219 | +| tag | Search vulnerability information by tag name, support exact match | |
| 220 | + |
| 221 | +An example of the query condition: |
| 222 | + |
| 223 | +``` |
| 224 | +GET /api/v2.0/security/vul?q=cve_id=CVE-2023-12345,cvss_v3_score=[7.0~10.0],severity=Critical,project_id=1,repository=library/nuxas,package=nfs-utils,tag=v2.3.0 |
| 225 | +``` |
| 226 | + |
| 227 | +The sql query for this API: |
| 228 | + |
| 229 | +``` |
| 230 | +select vr.cve_id, vr.cvss_score_v3, vr.package, a.repository_name, a.id artifact_id, a.digest, vr.package, vr.package_version, vr.severity, vr.fixed_version, vr.description, vr.urls, a.project_id |
| 231 | +from artifact a, |
| 232 | + scan_report s, |
| 233 | + report_vulnerability_record rvr, |
| 234 | + vulnerability_record vr |
| 235 | +where a.digest = s.digest |
| 236 | + and s.uuid = rvr.report_uuid |
| 237 | + and rvr.vuln_record_id = vr.id |
| 238 | + and rvr.report_uuid is not null |
| 239 | + and vr.registration_uuid = ? |
| 240 | +
|
| 241 | +``` |
| 242 | + |
| 243 | +Database schema change: |
| 244 | + |
| 245 | + |
| 246 | +scan_report: |
| 247 | +``` |
| 248 | +alter table scan_report add column IF NOT EXISTS critical_cnt int; |
| 249 | +alter table scan_report add column IF NOT EXISTS high_cnt int; |
| 250 | +alter table scan_report add column IF NOT EXISTS medium_cnt int; |
| 251 | +alter table scan_report add column IF NOT EXISTS low_cnt int; |
| 252 | +alter table scan_report add column IF NOT EXISTS none_cnt int; |
| 253 | +alter table scan_report add column IF NOT EXISTS unknown_cnt int; |
| 254 | +alter table scan_report add column IF NOT EXISTS fixable_cnt int; |
| 255 | +``` |
| 256 | + |
| 257 | +Beside the upward APIs, there are some other refactor work. |
| 258 | + |
| 259 | +1. To improve the performance, refactor scan report add summary information, such as total, high, low, medium count, fixible in a single scan report, when querying the summary information, these data could be aggregated without join other table. |
| 260 | +2. Refactor scan report insert CVE process, regulate the data insert into the table, current cvss_v3_score is emtpy, we need to extract these information from vendor attribute data, and store the information in the cvss_v3_score column. |
| 261 | +3. Previous scan report table doesn't contain any critical_cnt, high_cnt, medium_cnt, low_cnt, none_cnt, unknown_cnt, fixable_cnt information, we need to extract these information from vendor attribute data, and store the information in the vendor_attribute column. |
| 262 | + |
| 263 | +## UI work |
| 264 | + |
| 265 | +The draft UI of the security Hub: |
| 266 | +Summary: |
| 267 | + |
| 268 | +Search vulnerability: |
| 269 | + |
| 270 | + |
| 271 | + |
| 272 | +## Open Questions |
| 273 | + |
| 274 | + 1. Current trivy adapter report doesn't contain the `preferred_cvss` attribute, as a workaround, we need to extract the information from vendor attribute data, waiting for the trivy adapter to provide this information in the scan report, the score will be stored in the `cvss_v3_score`. the final solution is update the plugable-scanner-spec to add the `cvss_v3_score` attribute. there maybe other vendor's score information, but we only support these two vendor's score information when searching. the score information will be stored in the `vulnerability_record` table's vendor_attribute column. |
| 275 | + |
| 276 | + 2. Peformance consideration, a typical registry might have 10000+ artifacts, and each artifact might have 1000+ CVE's, the table of report_vulnerability_record will have 10000000+ records, the query performance is a big concern, we need to refactor the sql query for better performance, and add index for the table. further more we will limit the records returned by a query to 100 records, and add the total count in the response header. all queries should be returned in 1 minute. |
| 277 | + |
| 278 | + 3. The currrent implementation is based on database, it is possible to use other storage in future, such as elasticsearch, if we use elasticsearch, we need to add the support for elasticsearch in the post scan job, to index each CVE records, and add the support for elasticsearch in the query API. |
0 commit comments