You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: rfcs/profiler.md
+10-10Lines changed: 10 additions & 10 deletions
Original file line number
Diff line number
Diff line change
@@ -2,15 +2,15 @@
2
2
3
3
# Goal
4
4
5
-
Single Cell datasets and queries run across multiple platforms (Mac OSes, Linux, AWS VMs, etc.) and across multiple layers of software stack (python, C++ and R as of now!). Performance and memory monitoring for such applications is essential because:
5
+
Single-cell datasets and queries run across multiple platforms (Mac OSes, Linux, AWS VMs, etc.) and across multiple layers of software stack (Python, C++ and R as of now!). Performance and memory monitoring for such applications is essential because all of the following are important:
6
6
7
-
- Detection of regression as new features are added is very important
8
-
- Performance comparison across different platforms and languages to spot memory and execution time hot spots for optimization is critical
9
-
- Measuring the system scalability as the workload grows in size is very important
10
-
- Profiling a _suite_ of various operations, which can be run as a unit, reproducibly.
7
+
- Detection of regression as new features are added
8
+
- Performance comparison across different platforms and languages to spot memory and execution time hot spots for optimization
9
+
- Measuring the system scalability as the workload grows in size
10
+
- Profiling a _suite_ of various operations, which can be run as a unit, reproducibly
11
11
-~~Having a tool for customers to monitor and debug their workloads~~
12
12
13
-
The goal of this process is to provide a multi-layer profiler (consisting of a generic top profiler and multiple custom , languagedependent, profilers) to help Single Cell detect potential regression caused by bugs or new releases ~~and also help the customers detect performance issues in their queries~~.
13
+
The goal of this process is to provide a multi-layer profiler -- consisting of a generic top profiler and multiple custom , language-dependent profilers -- to help detect potential regression caused by bugs or new releases of TileDB-SOMA~~and also help the customers detect performance issues in their queries~~.
14
14
15
15
# Terminology & Concepts
16
16
@@ -36,7 +36,7 @@ The goal of this process is to provide a multi-layer profiler (consisting of a g
36
36
37
37
## Future Work
38
38
39
-
This work is openended as there is a chance to add more and more custom profilers to the system. Also right now, we plan to use flamegraph to connect custom profilers and the generic main profiler. This can also be extended by using different intermediate formats or objects.
39
+
This work is open-ended as there is a chance to add more and more custom profilers to the system. Also right now, we plan to use flamegraph to connect custom profilers and the generic main profiler. This can also be extended by using different intermediate formats or objects.
40
40
41
41
## Open Sourcing Strategy
42
42
@@ -90,9 +90,9 @@ If a more detailed breakdown of the software stack is needed (for example if we
90
90
91
91
### Custom Profiler API
92
92
93
-
We studied a good number of python profilers including [cProfile](https://docs.python.org/3/library/profile.html), [line_profiler](https://pypi.org/project/line-profiler/), [tracemalloc](https://docs.python.org/3/library/tracemalloc.html), etc. While each of these profilers provides great information, the format and output of them is different and supporting them for languages across different systems can be challenging. Instead, we decided to have custom profilers that use a consistent format for their outputs. This provides a common and useful interface into the generic profiler.
93
+
We studied a good number of Python profilers including [cProfile](https://docs.python.org/3/library/profile.html), [line_profiler](https://pypi.org/project/line-profiler/), [tracemalloc](https://docs.python.org/3/library/tracemalloc.html), etc. While each of these profilers provides great information, the format and output of them is different and supporting them for languages across different systems can be challenging. Instead, we decided to have custom profilers that use a consistent format for their outputs. This provides a common and useful interface into the generic profiler.
94
94
95
-
We decided to use [flamegraph](https://github.com/brendangregg/FlameGraph) as this common interface. The **_framegraphs_** are a very popular interactive way of tracking performance metrics across software components. Therefore, the generic profiler will be given a set of custom profilers (and their arguments) to run and simply expect each profiler to generate a new **_flamegraph_** of the program software stack in a particular location and it adds the generated files to the Database as well. Given the overhead of tracebased profilers, custom profilers are always going to be optional. For example upon detecting a regression in the generic profiler DB, we can rerun the application with the custom profiler to get the **_flamegraphs_** of the application. For R, we can use [xProf](https://github.com/atheriel/xrprof) and for python, we can use [pyFlame](https://uwekorn.com/2018/10/05/pyflame.html) for this purpose.
95
+
We decided to use [flamegraph](https://github.com/brendangregg/FlameGraph) as this common interface. The **_flamegraphs_** are a very popular interactive way of tracking performance metrics across software components. Therefore, the generic profiler will be given a set of custom profilers (and their arguments) to run and simply expect each profiler to generate a new **_flamegraph_** of the program software stack in a particular location and it adds the generated files to the database as well. Given the overhead of trace-based profilers, custom profilers are always going to be optional. For example upon detecting a regression in the generic profiler DB, we can rerun the application with the custom profiler to get the **_flamegraphs_** of the application. For R, we can use [xProf](https://github.com/atheriel/xrprof) and for python, we can use [pyFlame](https://uwekorn.com/2018/10/05/pyflame.html) for this purpose.
96
96
97
97

98
98
@@ -104,7 +104,7 @@ We decided to use [flamegraph](https://github.com/brendangregg/FlameGraph) as th
104
104
105
105
### Drawbacks
106
106
107
-
One drawback here is limiting custom profilers’ API to flamegraph. As mentioned earlier, there are many profilers with different output formats. One possible solution to this problem is to allow the byte array associated with each custom profiler in the DB schema to be open to different interpretations which while more scalable is a less secure solution.
107
+
One drawback here is limiting custom profilers’ API to flamegraph. As mentioned earlier, there are many profilers with different output formats. One possible solution to this problem is to allow the byte array associated with each custom profiler in the DB schema to be open to different interpretations which, while more scalable, is a less secure solution.
0 commit comments