Skip to content

A platform for the world's largest open datasets, stored on a decentralized network

Notifications You must be signed in to change notification settings

FilecoinFoundationWeb/open-panda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Open Panda

Open Panda graph image

The repository for the Open Panda statically-generated frontend. Open Panda is a platform for data researchers, analysts, students, and enthusiasts to interact with the largest open datasets in the world, stored on Filecoin's decentralized network.

The static branch contains the statically-generated frontend. The develop and main branches contain the non-static application (this version has more features, but requires a database and is now in legacy).

This respository is structured as a monorepo using npm workspaces.

Requirements

  • This README assumes the usage of a device running macOS
  • Node 16.x or higher must be used
    • NVM can be used to install and switch between multiple Node versions

Local development

Create a .env file in packages/fe and populate with the following:

NODE_ENV=development
SERVER_ENV=development

Generate a localhost SSL cert:

cd ~/.ssh
brew install mkcert # replace with another package manager for linux distro
brew install nss # need to install certutil before running `mkcert -install` so the CA can be automatically installed in Firefox

# at this point, open any https website in Firefox before running the below commands

mkcert -install
mkcert -key-file localhost_key.pem -cert-file localhost_cert.pem localhost 127.0.0.1
cat localhost_cert.pem > localhost_fullchain.pem
cat "$(mkcert -CAROOT)/rootCA.pem" >> localhost_fullchain.pem

Copy the generated PEM files (localhost_cert.pem and localhost_key.pem) to the root open-panda project directory.

Install dependencies in CI mode and start the server:

npm ci
npm run dev -w fe

Open the URL in a browser:

https://localhost:13010/

Editing content

Content on this website can be edited right in this repository, accross json, markdown, and media files. This will allow you to edit both the written content on the site, as well as the datasets, their CIDs, and their sources. The sections below describe how to edit each content type.

Images

Before images can be used in the site, they need to be added to the packages/fe/static directory. Once added, they can be referenced inside content files. For example, if you added the following image: packages/fe/static/new-folder/fancy-image.jpeg, then inside content files you can reference the file by using the following path: /new-folder/fancy-image.jpeg.

Structured content

Page text and images can be modified by editing the corresponding page JSON file in packages/fe/content/pages. This applies to all pages except those in markdown, which is currently used only for two pages on the site (Privacy Policy and Terms).

Full text pages

Unlike the scructured content throughout the rest of the site, the Privacy Policy and Terms pages are handled through markdown and can be edited in packages/fe/content/markdown.

Categories

Datasets may be assigned a catoegy, which is shown on the home page categories slider. These can be edited in packages/fe/content/categories.json.

Adding a dataset

packages/fe/content/data/dataset-list.json This file contains the dataset metadata. If a dataset is added or removed from this file, then it will be added or removed from the frontend. This is a JSON file that is an array of objects, with each object representing a dataset. Datasets will be displayed on the home page in the same order as they are inputted into this array.

The schema of all available keys for creating a dataset:

{
  slug: String,
  name: String,
  replication: Number,
  size: Number,
  total: Number,
  storage: Number,
  fileExtensions: [String],
  locations: [{
    full: String,
    country_code: String
  }],
  authors: [String],
  funders: [String],
  categories: [String],
  createdAt: String,
  description: String,
  availableUntil: String,
  downloadLinks: [{
    label: String,
    url: String
  }]
}

Here is an example dataset with all keys populated:

{
  "slug": "arpa-e-perform",
  "name": "ARPA-E Performance Data",
  "replication": 5.12,
  "size": 143254216451,
  "total": 124685412,
  "storage": 86,
  "fileExtensions": ["xml"],
  "locations": [
    {
      "full": "Japan",
      "country_code": "JP"
    }, {
      "full": "Canada",
      "country_code": "CA"
    }, {
      "full": "United States",
      "country_code": "US"
    }, {
      "full": "United Kingdom",
      "country_code": "GB"
    }
  ],
  "authors": [
    "John Doe",
    "Jane Doe"
  ],
  "funders": [
    "John Doe",
    "Jane Doe"
  ],
  "categories": [
    "Genetics",
    "Biology",
    "Genome"
  ],
  "createdAt": "May 10, 2024",
  "description": "<h5>Genome in a Bottle is an academic consortium hosted by NIST to develop reference materials and standards for clinical sequencing.</h5><p>The Genome in a Bottle Consortium is a public-private-academic consortium hosted by NIST to develop the technical infrastructure (reference standards, reference methods, and reference data) to enable translation of whole human genome sequencing to clinical practice and innovations in technologies.</p><p>The priority of GIAB is authoritative characterization of human genomes for use in benchmarking, including analytical validation and technology development, optimization, and demonstration. Current work in the GIAB Analysis Team is focused on establishing assembly-based benchmarks for challenging medically relevant genes and other difficult regions. GIAB is also exploring expanding to additional samples consented for release of WGS and redistribution of commercial products: increasing the diversity of germline reference samples and developing paired tumor-normal cell lines.</p>",
  "availableUntil": "Nov 24, 2027",
  "downloadLinks": [
    {
      "label": "Entire dataset",
      "url": "https://www.genomeinabottle.org"
    }, {
      "label": "North America",
      "url": "https://www.genomeinabottle.org"
    }, {
      "label": "Europe",
      "url": "https://www.genomeinabottle.org"
    }, {
      "label": "Asia",
      "url": "https://www.genomeinabottle.org"
    }
  ]
}

Adding dataset CIDs

Dataset CIDs, as seen on the singular dataset pages, are added separately and are not required. If not included, the CID table will simply be hidden.

Dataset CIDs can be added to packages/fe/content/datasets/**. Each dataset must be in a separate JSON file and the file must have the following structure:

{
  "pieces": [
    {
      "PieceCID": "baga6ea4seaqhprnghduw2bqjnszoqr5jy2hhke6lez6lsacviw7axa2f2pvl4dq",
      "PieceSize": "34359738368",
      "RootCID": "bafkreibruulyjd3x4xtfqjbhb5f5wohmlyb5quo237zhmjyww5w3jyqquq",
      "FileSize": "33758601502",
      "StoragePath": "baga6ea4seaqhprnghduw2bqjnszoqr5jy2hhke6lez6lsacviw7axa2f2pvl4dq.car"
    },
    {
      "PieceCID": "baga6ea4seaqfash33myagq7mgn6uruby4e2bfszyflov5jd6ru6sxn6xhqsaqba",
      "PieceSize": "34359738368",
      "RootCID": "bafkreiflviphjemjff3kexkkgppenfhybmzfsn7n37nzqrekgpcalsxea4",
      "FileSize": "33702690691",
      "StoragePath": "baga6ea4seaqfash33myagq7mgn6uruby4e2bfszyflov5jd6ru6sxn6xhqsaqba.car"
    },
    ...
  ]
}

All other keys in the file are ignored.

❗️ the filename must match the slug in the corresponding packages/fe/content/data/dataset-list.json dataset entry. For example, if you added a new dataset to dataset-list.json like so:

{
  "slug": "arpa-e-perform",
  "name": "ARPA-E Performance Data",
  "replication": 5.12,
  "size": 143254216451,
  ...
}

Then you must add a file called arpa-e-perform.json (same as slug property) to packages/fe/content/datasets.

In order to generate the correct structure automatically, take the file that is produced by Singularity:

AttachmentID  SourceStorageID  
1             2                
    SourceStorage
        ID  Name        Type  Path            
        2   cesmlenns2  s3    ncar-cesm-lens  
    Pieces
        PieceCID                                                          PieceSize    RootCID                                                      FileSize     StoragePath                                                           
        baga6ea4seaqhkxpnckwlp7hpiqviinlgphkslp3q6224m6kqyl44ri4wb34e6oa  34359738368  bafkreibvpthpy6hoebk7tibilqg7k3zr7rfo7j4k4msedface4pnv3grim  33803063476  baga6ea4seaqhkxpnckwlp7hpiqviinlgphkslp3q6224m6kqyl44ri4wb34e6oa.car  
        baga6ea4seaqh5ltpeufrejrha7pcdxz5nomuh4dbjcqwhbj35qqkkagev6wmqja  34359738368  bafkreibdfsyjiomahfceoydmggnrrjkdaoemzzggffkfb6cepkfvbayaau  33801808545  baga6ea4seaqh5ltpeufrejrha7pcdxz5nomuh4dbjcqwhbj35qqkkagev6wmqja.car  
        baga6ea4seaqpu27bvbkjj2ok5pwe6gatxbi72lilamdn7p2d2krrctjlkh72eai  34359738368  bafkreiekkhcz22a6lzh7oum3nmzvrhfa4dvhq67dl4kv3pui2c3qy7tmky  33770978605  baga6ea4seaqpu27bvbkjj2ok5pwe6gatxbi72lilamdn7p2d2krrctjlkh72eai.car  
        baga6ea4seaqolkvfezbcgt53hm3evxhsopdue6pbxykwrgjk3bd3e7v55dyqmmi  34359738368  bafkreihy3qqh2g74hojd5cjieq2uwzvpq7je4z65kc6qrrhpfaglgtmxpy  33807189849  baga6ea4seaqolkvfezbcgt53hm3evxhsopdue6pbxykwrgjk3bd3e7v55dyqmmi.car

and run this awk script (runs as bash in any unix-compatible CLI):

for f in *; do
 [[ -f $f ]] && awk '
  NR==2 { a=$1; s=$2 }
  NR==5 { id=$1; n=$2; t=$3; p=$4 }
  NR>8 && NF>0 {
      x = x "{\"PieceCID\":\"" $1 "\",\"PieceSize\":\"" $2 "\",\"RootCID\":\"" $3 "\",\"FileSize\":\"" $4 "\",\"StoragePath\":\"" $5 "\"},"
  }
  END {
      sub(/,$/, "", x)
      print "{"
      print "  \"AttachmentID\": " a ","
      print "  \"SourceStorageID\": " s ","
      print "  \"SourceStorage\": {"
      print "    \"ID\": " id ","
      print "    \"Name\": \"" n "\","
      print "    \"Type\": \"" t "\","
      print "    \"Path\": \"" p "\""
      print "  },"
      print "  \"pieces\": [" x "]"
      print "}"
  }' "$f" | jq . > "$f.json"
done

Generating the static site

To generate the static site files for production, or just for futher local testing, simply run the following:

npm ci && npm run generate -w fe

The newly created dist directory in packages/fe/dist contains the entire site and can be deployed anywhere

For services such as Cloudflare Pages, Vercel or Fleek, you should set the "output" directory as packages/fe/dist.