-
Notifications
You must be signed in to change notification settings - Fork 0
GSoC Final Report: Integration of Software Heritage in FOSSology
GSoC Final Report: Integration of Software Heritage in FOSSology
FOSSology is an open-source license compliance software system and toolkit. As a toolkit, a user can run license, copyright and export control scans from the command line. Users files can be imported, stored and tracked in FOSSology for compliance workflow experience. License, Copyright, and export scanners are tools available in FOSSology to help with the user's compliance.
The FOSSology system is a combination of agent runs in series to perform a specific task. Fossology has several agents like unpacking, license analysis, Copyright, etc. Agents are used to performing analysis or management tasks related to anything in the database. Each agent in FOSSology performs one task.
In this project, I was working on an agent to integrate the FOSSology in software Heritage. There were several steps to implement the feature. The total work is divided into four stages. The stages are dependent on each other and the total flow can be understood by the diagram as described below.
Basic Details
As we have seen in the above picture I have gone through four stages to complete the project
- Calculate the hash values
- Database Schema creation for software heritage
- Agent creation and Data storing
- Display the fetched records
I was working on this section during my first evaluation. This section deals with the sha256 calculation of a file. Previously when a package is being uploaded in [FOSSology](https://www.fossology.org/ the md5 and sha1 values are being calculated for the files of the package and is being inserted into the pfile table by the ununpack agent(which will run while uploading a package). My job was to alter the table pfile to add a pfile_sha256 column in it and to calculate the sha256 for each file and stored it in a new column in the same column. The development phases contain several lines of code changes and database schema alteration along with migration commands to run on the existing data in databases. In pfile table of the database, I have added one column named pfile_sha256. The structure of the column is:-
|column | Type |
|pfile_sha256 | character(64) |
The code for calculating hash values can be found at utils.c file of unpack agent and the process it does is similar to the code what is being shown bellow
snprintf(command, PATH_MAX + 13, "sha256sum '%s'", CI->Source);
FILE* file = popen(command, "r");
if (file != (FILE*) NULL)
{
read = fscanf(file, "%64s", SHA256);
retcode = WEXITSTATUS(pclose(file));
}
if (file == (FILE*) NULL || retcode != 0 || read != 1)
{
LOG_FATAL("Unable to calculate SHA256 of %s\n", CI->Source);
SafeExit(56);
}
The SHA256 is required to get the value from software heritage.
-
#d364193
feat(db): Calculate the sha256 value of the uploading file and store it in database
I was working on this feature on my second evaluation. It deals with the schema creation to store software heritage data. The data fetched from Software Heritage is going to save in a table named software_heritage table. I and Mentors decided to store two types of data(origin and license) from software heritage archive along with two more columns(primary key and pfile_fk). The mote is to relate each pfile data with software heritage table. The table structure can be seen as:-
** Table Structure **
| Column | Type | Modifiers |
|---|---|---|
| Software_heritage_pk | integer | not null default nextval('software_heritage_pk_seq'::regclass) |
| Pfile_fk | integer | not null |
| License | text | |
| Origin | text |
Foreign-key constraints:
"software_heritage_pfile_fk_fkey" FOREIGN KEY (pfile_fk) REFERENCES pfile(pfile_pk) ON DELETE CASCADE
As you can see here the primary key is software_heritage_pk, pfile_fk is the foreign key and other two columns license and origin holds the records from the software heritage.
This one is the most exciting and the heart of my project. I was working on this section during my second evaluation of GSoC. This feature includes calling the API all the features that the agent is going to perform and various sections like
- Redundancy check feature
- API calling feature
- Storing the value in Software Heritage Table
- Inserting the License Info in
license_filetable. - Registering the agent.
- Basic Visualization of the data Let's discuss in a brief about all the things:
If the software heritage is returning a 404 HTTP Exception for a record then we are not inserting that record into software_heritage table. The application is made in such a way that a user can run the softwareHeritage agent as many time as they want to run. But the softwareHeritage agent will run on those files of a package whose records are not in the software_heritage table. As a result, the redundancy was reduced back to zero. I have added two sections to make it happen. One is in SoftwareHeritageDao and another one in softwareHeritage agent. In SoftwareHeritageDao getSoftwareHeritagePfileFk is taking uploadId as a parameter and returning the pfile ids of the files of a package whose records are there in software_heritage table. In softwareHeritage agent we call the API for those files whose records are not there.
SoftwareHeritageDao
/**
* @brief Get all the pfile_fk stored in software heritage table
* @param Integer $uploadId
* @return array
*/
public function getSoftwareHeritagePfileFk($uploadId)
{
$uploadTreeTableName = $this->uploadDao->getUploadtreeTableName($uploadId);
$stmt = __METHOD__.$uploadTreeTableName;
$sql = "SELECT software_heritage.pfile_fk AS pfile_fk
FROM $uploadTreeTableName
JOIN software_heritage
ON $uploadTreeTableName.upload_fk = $1
AND software_heritage.pfile_fk = $uploadTreeTableName.pfile_fk";
$rows = $this->dbManager->getRows($sql,array($uploadId),$stmt);
$results = [];
foreach ($rows as $row) {
$results[] = $row['pfile_fk'];
}
return $results;
}
softwareHeritageAgent
/*codes*/
// Getting the pfile FKs
$pfileFks = $this->shDao->getSoftwareHeritagePfileFk($uploadId);
/*codes*/
foreach(/*codes*/)
{
// C
if(!in_array($pfileDetail['pfile_pk'],$pfileFks))
{
/*codes*/
}
$this->heartbeat(1);
}
The next step is to get call the API for the files and get the values from softwareHeritage. Then Api is being stored in the agent/softwareHeritage.conf file.
api[url] = "https://archive.softwareheritage.org"
api[uri] = "/api/1/content/sha256:"
api[content] = "/license"
We use GUZZLEHTTP to call the API. The API is called for the files of the package and the result is bing returned for further process. If it is returning 404 HTTP RESPONSE then a blank license array is being returned. and no values are being stored in the database along with the license_file data are getting stored too in the database.
API Calling and Data Storing
/**
* @brief Get the license details from software heritage
* @param String $sha256
*
* @return array
*/
protected function getSoftwareHeritageLicense($sha256)
{
$client = new Client(['http_errors' => false]);
$response = $client->get($this->configuration['api']['url'].$this->configuration['api']['uri'].$sha256.$this->configuration['api']['content']);
$statusCode = $response->getStatusCode();
if(200 === $statusCode)
{
$responseContent = json_decode($response->getBody()->getContents(),true);
$licenseRecord = $responseContent["facts"][0]["licenses"];
return $licenseRecord;
}
else
{
return [];
}
}
/**
* @brief Insert the License Details in softwareHeritage table
* @param int $pfileId
* @param array $licenses
* @param int $agentId
* @return boolean True if finished
*/
protected function insertSoftwareHeritageRecord($pfileId,$licenses,$agentId)
{
foreach($licenses as $license)
{
$this->shDao->setshDetails($pfileId, $license);
$l = $this->licenseDao->getLicenseByShortName($license);
if($l != NULL)
{
$this->dbManeger->insertTableRow('license_file',['agent_fk' => $agentId,'pfile_fk' => $pfileId,'rf_fk'=> $l->getId()]);
}
}
return true;
}
The agent is being registered using the Agent plugin method as done in the previous agents. Through the agent-shagent.php the agent is getting registered and the same is getting reflected in the User Interface. The basic view is being shown in the license listing pages, file-browser pages.
-
#63cfa7
feat(software-heritage): Create a software heritage agent and add the functionality -
#b807e4
feat(db): Make table of software heritage to store information -
#38f51a
feat(software-heritage): Make the ui section of software heritage and register the agent -
#b9c1fc
feat(software-heritage): Make softwareHeritage dao function and add all the functionality related software_heritage table to it -
#da4806
feat(softwareHeritageView): Show the details of software heritage in the license list page
I was working on this section during my third evaluation. We decided to have a separate view for the data under software_heritage section. The basic idea what we have got to display the result like a tabular view like we were doing in the file-browser section. The development process of the section includes two steps:-
- creating the backend file structure
- creating the frontend file structure
In the backend file structure, there were two files(softwareHeritage-plugin.php and AjaxSHDetailsBrowser.php). softwareHeritagePlugin is the basic request handlers which calculate basic frontend details like registering the menu, Getting the total number of records in a package, etc. Whereas the AjaxSHDetailsBrowser is an API which returns the file tree view along with hash value for each file and license details of each file. The softwareHeritage.html.twig display the records whereas softwareHeritage.js.twig calls the API and fill the table contains with the help of datatable.js.
-
#1718e2
feat(softwareHeritage): Make the view for softwareHeritage records for a package
Add Software Heritage Agent

Software Heritage Table
License List Page
File Info Page
Apart from the main deliverables above, I also contributed a few other patches.
-
#e304e4
fix(vscode): Add vscode editor file to gitignore -
#e514dc
feat(ui):Add user description of available user in group management page
What tasks were accomplished
| Task | Planned | Completed |
|---|---|---|
| Calculate the SHA256 value of the files of a package | yes | yes |
| Make the migration file to insert sha256 for previous files | yes | yes |
| Create the software heritage agent | yes | yes |
| Create a database table for software heritage details | yes | yes |
| Register the agent | yes | yes |
| Run the Agent | yes | yes |
| Redundancy check of the details | yes | yes |
| Store the details in software heritage table | yes | yes |
| Store the license details in license_file table | no | yes |
| Display the details fetched from software heritage | yes | yes |
Currently, we are not getting the origin value from the software heritage archive. When the software heritage archive makes that data public we need to add that section right after. The work involves adding the api and adding the same functionality.
During my whole time period in Google Summer of Code, I have learned so many things such as:
- Understanding the code base of Fossology
- Exploring new features of Php and C while working on the various features.
- Along with I have learned how to create one agent and working with databases in fossology application.
- I have learned a new feature of handling user interface feature in fossology
- My approach to solving a problem is also improved.
- I have sharpened my knowledge on debugging and error correction process




