index.html

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/html" xmlns="http://www.w3.org/1999/html">

  <head>
    <meta charset='utf-8' />
    <meta http-equiv="X-UA-Compatible" content="chrome=1" />
    <meta name="description" content="Open-data-standards.github.com : Open data standards" />

    <link rel="stylesheet" type="text/css" media="screen" href="stylesheets/stylesheet.css">

    <title>Community Driven Open Data Interoperability and Data Portability Standards</title>
  </head>

  <body>

    <!-- HEADER -->
    <div id="header_wrap" class="outer">
        <header class="inner">
          <a id="forkme_banner" href="https://github.com/open-data-standards">View on GitHub</a>

          <h1 id="project_title">The Open Data Substrate</h1>
          <h2 id="project_tagline">Community Driven Open Data Interoperability and Data Portability Standards</h2>

        </header>
    </div>

    <!-- MAIN CONTENT -->
    <div id="main_content_wrap" class="outer">
      <section id="main_content" class="inner">
        <h3>What is an Open Data "Substrate"?</h3>

<p>The goal of this industry organization is to provide a forum to quickly establish standards so that Open Data
    implementors can ensure they are developing Open Data solutions that interoperate. This community-driven
    initiative to promote vendor interoperability and data portability is derived from a fundamental belief that
    "data is more useful" when "more people can use it". To help achieve that goal, we need industry standards and
    multi-lateral agreements so that organizations disseminating data via any conforming Open Data solution will reap
    the benefits of data federation and data portability between heterogenous Open Data solutions offered by all vendors.</p>

    <p>We call this a <strong>substrate</strong>, because we believe these are the required standards needed
to promote vendor interoperability and data portability.  This will provide the foundation that higher level knowledge networks
        and applications can be built on top of.</p>

<p>The current challenge in the market is that we run a risk of API fragmentation, lack of interoperability and concerns about data portability.  As more companies
  and products enter the market, we run the risk of this getting harder, not easier.</p>

<p>Imagine a few years down the line.
    If there are 4 different vendors and 6 open source projects.  If none of them share the same APIs or none can federate we
    go back to creating a bunch of data stove pipes, instead of an open network of data.  In short, we will be making it harder for people to
    ultimately use this data, so we will be making the data less valuable.</p>

<p>We need something that looks more like an Open Data Network, rather than thousands of Open Data sites.</p>

<h3>Why Not a Traditional Standards Organization?</h3>

<p>We want everyone to get a voice, but decisions to be made as a meritocracy.  Sound like how Open Source Software works?  That's not by accident.
Many of the people working on Open Data are working in Open Source.  In addition, many don't have the money required to join traditional standards organizations.
    Here's the approach we suggest:
    <ul>
          <li>Specifications are written and stored in Git.</li>
          <li>Issues with standards can be created as issues and will be debated and resolved as software issues would be</li>
          <li>Use the <a href="https://groups.google.com/forum/#!forum/opendatastandards">Open Data Standards Google Group</a> </li>
          <li>In general, operate like an open source community.</li>
    </ul>

We're hoping this will allow the biggest number of people to help give input and direction, and that when we have a few implementations we can then go to a standards body to make this official.
</p>

          <h6>Catalog Federation for browsing diverse datasets</h6>  <a href="http://open-data-standards.github.com/data-catalog-schema/">See catalog.xml in data-catalog-schema for related efforts</a>
          <p>Datasets can be organized in many different ways.  Imagine a dataset like Chicago Crimes, where it is published
              by the City of Chicago and surfaced by the Chicago's data catalog.  However, there are several other communities of researchers or citizens that may be interested in it.  Many of these
              communities want to create more specialized data catalogs that don't necassarily host the data, but point their members to all the relevant datasets located elsewhere.
              For example, organizations interested in a Chicago Crime dataset may be:  researchers interested in crime around the US, researchers interested in big cities around the world,
              people living in Chicago may be interested in it, organizations trying determine "livability" for cities may be interested in it, etc.</p>

          <p>These organizations should be able to create their own catalogs, that include other datasets while being able to keep in sync with changes in metadata
              (such as new formats, views, derived datasets, etc.).  We would like to support protocols, so that anyone creating a super catalog would be able to federate and see changes
              in a uniform way (regardless of vender or version)</p>

          <h6>API lists for exposing what APIs are exposed</h6>  <a href="http://open-data-standards.github.com/data-catalog-schema/">See apis.xml in data-catalog-schema for related efforts</a>
          <p>Organizations like the federal government are pushing towards an API-first approach to building out their infrastructure.  Many of these
              APIs are primarily built to expose data, however, given the large number of datasets it is critically important that these APIs can be gathered and managed in automated fashions across a diverse set of
              implementations.  We will support protocols and formats to describe the APIs, so they can be programmatically gathered for any organization aggregated, put in catalogs, etc.</p>

          <h6>Data Querying</h6>
          <p>As the number of APIs on top of data increases, we would like to standardize on the languages and protocols used to query and update data.  So, if Yelp begins using an
              API on Restaurant Inspection Data, they should be able have a standard query language that can work across any Open Data implementation.</p>

          <h6>Data Federation</h6>
          <p>Some organizations or people may be interested in having some datasets stored locally.  This may be because they have expensive analytics they need to run, they may want to
              experiment with "learning" approaches, or they may simply want to have a back-up.  We would like to provide the protocols so that anyone looking to federate data will be able to keep in sync with changing
              data sets.</p>

          <ul>
            <li><a href="architecture.html">Link to abstractions and architectures</a></li>
            <li><a href="efforts.html">Link to current standardization efforts</a></li>
          </ul>
      </section>
    </div>

    <!-- FOOTER  -->
    <div id="footer_wrap" class="outer">
      <footer class="inner">
        <p>Published with <a href="http://pages.github.com">GitHub Pages</a></p>
      </footer>
    </div>

    

  </body>
</html>