Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Landing Page #15

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions _config.yml
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
title: Legion Programming System
tagline: A data-centric approach to parallel programming
tagline: High Productivity High Performance Computing
description: Home page for the Legion parallel programming system

# Owner/author information
owner:
name: Legion
bio: A Data-Centric Parallel Programming System
bio: High Productivity High Performance Computing
email: #[email protected]
# Social networking links are used in author-bio sidebar. Update and remove as you like.
twitter:
Expand Down
Binary file added images/analogy.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/hphpc.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
96 changes: 79 additions & 17 deletions index.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,84 @@
---
layout: page
layout: page
---

Legion is a data-centric parallel programming system for
writing portable high performance programs targeted at
distributed heterogeneous architectures. Legion presents
abstractions which allow programmers to describe properties
of program data (e.g. independence, locality). By making the
Legion programming system aware of the structure of
program data, it can automate many of the tedious tasks
programmers currently face, including correctly extracting
task- and data-level parallelism and moving data around
complex memory hierarchies. A novel mapping interface
provides explicit programmer controlled placement of data
in the memory hierarchy and assignment of tasks to processors
in a way that is orthogonal to correctness, thereby enabling
easy porting and tuning of Legion applications to new
architectures.
## Legion: High-Productivity High-Performance Computing ##

The vast majority of all programs are sequential. Programmers are inherently
productive when developing sequential code because they can construct more
powerful programs simply by composing functionality from one or more software modules (e.g. libraries)
in serial without worrying about parallelism, data coherence, or synchronization.
The productivity engendered by this facet of sequential programming is vital to the
success of many popular software ecosystems such as Python, R, and MATLAB.
However, the implementations of these environments struggle to achieve high performance
on parallel and distributed hardware without resorting to explicit parallelism.
Ideally users want to write programs in a high
productivity sequential programming model and have those programs automatically executed with high performance on
parallel hardware. Achieving this end requires the development of a nuanced programming model and
sophisticated programming systems capable of analyzing and transforming sequential programs into parallel programs.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my comment below, we should have a blurb about what writing programs in Legion can do for you. Things that come to mind:

  • Easily scale to large machines (of GPUs and CPUs) while writing sequential programs
  • Automatic parallelization, synchronization and communication (no more writing buggy MPI programs with deadlock and stale data, underutilization of the network etc)
  • Easily port to different machines and processor kinds through the mapping interface. Share experiences of moving from summit to crusher for example
  • low-level and high-level APIs (raw Legion, Regent, Legate)
  • programmers can still use all of their favorite libraries on individual devices (cublas, blas etc)
  • interop with MPI is possible for old applications (example S3D porting story)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly disagree with this one (in fact it is exactly what I was told was wrong with the current website). We cannot under any circumstances have a laundry list of things that Legion is/does. There must be one and only one idea of what Legion is and how it works, so that anyone can understand what it is doing. I have a laundry list of things that are novel about Legion in the section below this and we can add to it, but anything in this top section cannot detract from the singular story about what Legion is.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n fact it is exactly what I was told was wrong with the current website). We cannot under any circumstances have a laundry list of things that Legion is/does

I'm pretty surprised about that. I'm not suggesting to replace this story about Legion, but I think it should be somewhere. Most mature software projects I've interacted with have something like this: https://github.com/cockroachdb/cockroach#what-is-cockroachdb, https://arrow.apache.org/, https://www.sqlite.org/index.html, the list goes on. I'm speaking a bit from experience, but when I first saw the Legion project after talking to Alex when applying, I didn't really understand what was gained from the fancy abstractions that Legion provides. If I had a better understanding then of what users actually get from Legion, then a story of how those features come out of the programming model would make more sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All those projects you listed have comparable systems, i.e. everyone knows what a database system is, so they can easily communicate what it is their system does and how it is slightly different. The problem with Legion is that there is nothing else that is even remotely like it (in software anyway). Therefore the 'why' has to come before the 'what', otherwise you have no context for understanding. For example, we could say: "Legion is a software implementation of a superscalar out-of-order seven-stage pipelined processor for sequential task programs that executes as a distributed system so it can run on everything from your desktop to a supercomputer." but nobody would have any idea what we are talking about or why they need that and we'd lose them right away.

Copy link
Contributor

@elliottslaughter elliottslaughter Jan 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to try harder to summarize Legion in a handful of sentences. We can follow that with the "why", but the current text loses the reader really quickly. The irony of a long "why" section is that I'm left wondering why I care, while I'm reading about why I should care. We need to hook readers faster than that.

I personally like Rohan's bullets above. I don't think they suffer from the "N ideas" problem. They're consequences of the one idea that is central to the system. Plus, they're also fast to read. The cost of a set of short bullets is a lot lower than dense paragraphs of text.

It's also worth thinking about who we're trying to target with this. I'm assuming it's mostly people who are technical but not necessarily experts in parallel/distributed programming. I would assume even less in the way of hardware experience.

I personally think that the SOOP analogy doesn't work for new users, because most of the people we're trying to reach don't have the necessary hardware experience to grasp it intuitively. I think this causes more confusion than it helps. Instead I'd focus on sequential semantics, dynamic analysis, and parallel/distributed/accelerated execution. Maybe we can add a second page (right now we call this the "overview", maybe there's a better name for it) that goes into more details about the programming model, deferred/asynchronous execution, the layers of the software stack and runtime organization, etc. That would be an appropriate place to talk about the SOOP. But not here.

One possible structure would be:

Home page:

  • The Legion pitch. 50 words max.
  • Bullets, per Rohan above (or similar)
  • The "why it matters"
  • Link to overview with more details

Overview page:

  • The "what" section: SOOP analogy, program analysis, core data abstractions, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can follow that with the "why", but the current text loses the reader really quickly. The irony of a long "why" section is that I'm left wondering why I care, while I'm reading about why I should care. We need to hook readers faster than that.
I personally think that the SOOP analogy doesn't work for new users, because most of the people we're trying to reach don't have the necessary hardware experience to grasp it intuitively.

You guys are trying to write the Legate pitch, and not the Legion pitch. The pitch for Legion should be for the expert users who are going to be building the things on top, not for the end-users. Note you also need to encapsulate what Regent is in the same idea, which my text does as well, but your suggestions do not. Again, it must be one and only one core idea. If you can come up with a better analogy than what I did, then I am all ears.

I'm assuming it's mostly people who are technical but not necessarily experts in parallel/distributed programming. I would assume even less in the way of hardware experience.

The people that are yelling at me are the exact opposite on all three axes of this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lightsighter Could you tell us what feedback you got about the current website? It's a little difficult to respond to it without knowing what it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feedback I was given is this: the Legion project is too big and complicated as currently explained. You need to have one and only one idea stating clearly what the purpose of Legion is and how to think about it.


![High Productivity High Performance Computing](images/hphpc.png)

Fortunately, there already exist
[many](https://en.wikipedia.org/wiki/Tomasulo%27s_algorithm)
[well](https://en.wikipedia.org/wiki/Very_long_instruction_word)
[known](https://en.wikipedia.org/wiki/Register_renaming)
[techniques](https://en.wikipedia.org/wiki/Speculative_execution)
[for](https://en.wikipedia.org/wiki/Instruction_pipelining)
[implicitly](https://en.wikipedia.org/wiki/Superscalar_processor)
[parallelizing](https://en.wikipedia.org/wiki/Out-of-order_execution)
sequential programs to target parallel hardware.
However, in most systems these algorithms are currently only deployed to exploit
fine-grained instruction-level parallelism. The primary thesis of the Legion project is
that these same techniques can and should be deployed hierarchically at coarser granularities
in software to leverage modern parallel hardware (multi-core CPUs, GPUs, supercomputers, etc.)
without compromising the productivity of developing sequential programs.

The basis for this thesis rests upon the fundamental observation that implicitly mapping
sequential programs onto parallel hardware looks similar at many different scales.
At the finest granularity, hardware or compilers can extract parallelism from a stream of
instructions by analyzing register usage and mapping independent
instructions onto parallel hardware units. The same principles apply when extracting parallelism
from a stream of demarcated functions called *tasks* operating on *logical regions* of data to map
onto the parallel execution units inside of a workstation or a supercomputer
for different granularities of tasks and regions. (Legion derives its name from the concatenation
of the words in 'logical region'.)

![Implicit Parallelism Analogy](images/analogy.png)

This analogy forms the basis of the Legion project, and its two primary software
artifacts can be understood as direct analogs to existing systems. The Legion
runtime endeavors to be a full reimplementation of a pipelined out-of-order superscalar processor
in software for dynamically exploiting task-parallelism from a stream of tasks
generated by the execution of a sequential program. Similarly, the Regent compiler
strives to be an optimizing compiler, performing static analyses and transformations
of programs at the coarser granularity of tasks before mapping them onto the Legion runtime.
Armed with these systems that automatically parallelize and distribute sequential programs,
we aim to facilitate the creation of high productivity high performance computing ecosystems
so that all users can leverage modern massively parallel machines.

#### The Key Ideas ####
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that something like this should go ahead of the analogies to hardware / out-of-order processing. On a first scan of the website, users care more about what legion can do for them, rather than how legion works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to refactor the text to move this part up higher.


In order to realize the above vision, the Legion project has developed several novel technologies:

* A [dynamic data model](/pdfs/oopsla2013.pdf) that is flexible enough for tasks to
dynamically specify arbitrary working set regions and the effects they will have on those regions.
* The ability to dynamically compute [mathematical relationships](/pdfs/dpl2016.pdf) between regions,
and an [auto-parallelizing framework](/pdfs/parallelizer2019.pdf) for synthesizing them.
* A [dynamic dependence and distributed coherence analysis](/pdfs/visibility2023.pdf) based
on algorithms and data structures from ray tracing to handle arbitrary aliasing of regions.
* A technique called *control replication* that decouples
task creation from execution to avoid sequential bottlenecks
(with both [static](/pdfs/cr2017.pdf) and [dynamic](/pdfs/dcr2021.pdf) incarnations).
* A [scale-free programming model](/pdfs/idx2021.pdf) that encourages the development
of programs that can be ported to run on machines of different sizes without modification.
* A *mapping interface* for [decoupling correctness from performance](/pdfs/sc2012.pdf})
and thereby guaranteeing performance portability of programs.

Many of these ideas are intertwined and resonate with each other in the design
and we encourage you to explore them further.

#### Get Started ####

To learn more about Legion you can:

Expand All @@ -25,7 +87,7 @@ To learn more about Legion you can:
* Download our [publications]({{ "/publications/" | relative_url }})
* Ask questions on our [mailing list]({{ "/community/" | relative_url }})

#### About Legion ####
#### Acknowledgments ####

Legion is developed as an open source project, with major
contributions from [LANL](https://www.lanl.gov/),
Expand Down