@@ -7,8 +7,6 @@ It aims to be fast, easy to debug, and provide an easy-to-use Pythonic interface
77
88## Your First Tripy Program
99
10- But enough talk; let's see some code:
11-
1210``` py
1311# doc: no-print-locals
1412a = tp.arange(5 )
@@ -18,54 +16,7 @@ assert np.array_equal(cp.from_dlpack(c).get(), np.arange(5, dtype=np.float32) +
1816```
1917
2018This should look familiar if you've used linear algebra or deep learning libraries like
21- NumPy and PyTorch.
22-
23-
24- ### Lazy Evaluation: Putting Off Work
25-
26- One important point is that Tripy uses a lazy evaluation model; that is,
27- no computation is performed until a value is actually needed.
28-
29- In the example above, that means that ` c ` will not be evaluated until it is used,
30- such as when we print its values.
31-
32- In most cases, this is simply an implementation detail that you will not notice.
33- One exception to this is when attempting to time code. Consider the following code:
34-
35- ``` py
36- # doc: no-print-locals
37- import time
38-
39- start = time.time()
40- a = tp.arange(5 )
41- b = tp.arange(5 )
42- c = a + b + tp.tanh(a)
43- end = time.time()
44-
45- print (f " Time to create 'c': { (end - start) * 1000 :.3f } ms. " )
46- ```
47-
48- It looks like Tripy is very fast! While Tripy * execution* is very fast, compiling the program
49- takes some time. The reason the time is so low relative to what we'd expect for initializing
50- and running the compiler is that * we're not doing that yet* .
51-
52- The actual compilation and computation only happens when we evaluate ` c ` :
53-
54- ``` py
55- # doc: no-print-locals
56- start = time.time()
57- print (c)
58- end = time.time()
59-
60- print (f " Time to print 'c': { (end - start) * 1000 :.3f } ms. " )
61- ```
62-
63- That is why the time to print ` c ` is so much higher than the time to create it.
64-
65- If we wanted to time individual parts of the model, we would insert calls to ` .eval() ` ;
66- for example, adding a ` c.eval() ` prior to checking the end time would tell us how
67- long it took to compile and run the subgraph that computes ` c ` .
68-
19+ NumPy and PyTorch. Hopefully, the code above is self-explanatory, so we won't go into details.
6920
7021## Organizing Code Using Modules
7122
@@ -77,10 +28,10 @@ For example, we can define a Transfomer MLP block like so:
7728
7829``` py
7930class MLP (tp .Module ):
80- def __init__ (self , embedding_size , dtype = tp.float32):
31+ def __init__ (self , embd_size , dtype = tp.float32):
8132 super ().__init__ ()
82- self .c_fc = tp.Linear(embedding_size , 4 * embedding_size , bias = True , dtype = dtype)
83- self .c_proj = tp.Linear(4 * embedding_size, embedding_size , bias = True , dtype = dtype)
33+ self .c_fc = tp.Linear(embd_size , 4 * embd_size , bias = True , dtype = dtype)
34+ self .c_proj = tp.Linear(4 * embd_size, embd_size , bias = True , dtype = dtype)
8435
8536 def __call__ (self , x ):
8637 x = self .c_fc(x)
@@ -92,14 +43,14 @@ class MLP(tp.Module):
9243To use it, we just need to construct and call it:
9344
9445``` py
95- mlp = MLP(embedding_size = 2 )
46+ # doc: no-print-locals mlp
47+ mlp = MLP(embd_size = 2 )
9648
9749inp = tp.iota(shape = (1 , 2 ), dim = 1 , dtype = tp.float32)
9850out = mlp(inp)
9951```
10052
101-
102- ## To ` compile ` Or Not To ` compile `
53+ ## Compiling Code
10354
10455All the code we've seen so far has been using Tripy's eager mode. It is also possible to compile
10556functions or modules ahead of time, which can result in significantly better performance.
@@ -111,37 +62,88 @@ Let's compile the MLP module we defined above as an example:
11162
11263``` py
11364# doc: no-print-locals
114- # When we compile, we need to indicate which parameters to the function should be runtime inputs.
115- # In this case, MLP takes a single input tensor for which we can specify our desired shape and datatype.
65+ # When we compile, we need to indicate which parameters to the function
66+ # should be runtime inputs. In this case, MLP takes a single input tensor
67+ # for which we can specify our desired shape and datatype.
11668fast_mlp = tp.compile(mlp, args = [tp.InputInfo(shape = (1 , 2 ), dtype = tp.float32)])
11769```
11870
119- It is also possible to compile for a range of possible input shapes.
120- See {func}` tripy.compile ` for details.
121-
12271Now let's benchmark the compiled version against eager mode:
12372``` py
12473# doc: no-print-locals
74+ import time
75+
12576start = time.time()
12677out = mlp(inp)
127- out.eval() # Recall that we need to evaluate in order to actually materialize `out`
78+ # We need to evaluate in order to actually materialize `out`.
79+ # See the section on lazy evaluation below for details.
80+ out.eval()
12881end = time.time()
12982
13083eager_time = (end - start) * 1000
13184print (f " Eager mode time: { eager_time:.4f } ms " )
13285
133- ITERS = 10
13486start = time.time()
135- for _ in range (ITERS ):
136- out = fast_mlp(inp)
137- out.eval()
87+ out = fast_mlp(inp)
88+ out.eval()
13889end = time.time()
13990
140- compiled_time = (( end - start) / ITERS ) * 1000
141- print (f " Compiled mode average time: { compiled_time:.4f } ms " )
91+ compiled_time = (end - start) * 1000
92+ print (f " Compiled mode time: { compiled_time:.4f } ms " )
14293# Make sure compiled mode is actually faster # doc: omit
14394assert compiled_time < 0.01 * eager_time # doc: omit
14495```
14596
146- As you can see, the compiled module is significantly faster than running the module
147- in eager mode.
97+ For more information on the compiler, compiled functions/modules, and dynamic shapes,
98+ see the [ compiler guide] ( project:./02-compiler.md ) .
99+
100+ ## Things To Note
101+
102+ ### Eager Mode: How Does It Work?
103+
104+ If you've used TensorRT before, you may know that it does not support an eager mode.
105+ In order to provide eager mode support in Tripy, we actually need to compile the graph
106+ under the hood.
107+
108+ Although we employ several tricks to make compile times faster when using eager mode,
109+ we do still need to compile, and so eager mode will likely be slower than other
110+ comparable frameworks.
111+
112+ Consequently, we suggest that you use eager mode primarily for debugging and
113+ compiled mode for deployments.
114+
115+ ### Lazy Evaluation: Putting Off Work
116+
117+ One important point is that Tripy uses a lazy evaluation model; that is,
118+ no computation is performed until a value is actually needed.
119+
120+ In most cases, this is simply an implementation detail that you will not notice.
121+ One exception to this is when attempting to time code. Consider the following code:
122+
123+ ``` py
124+ # doc: no-print-locals
125+ import time
126+
127+ start = time.time()
128+ a = tp.arange(5 )
129+ b = tp.arange(5 )
130+ c = a + b + tp.tanh(a)
131+ end = time.time()
132+
133+ print (f " Time to create 'c': { (end - start) * 1000 :.3f } ms. " )
134+ ```
135+
136+ Given what we said above about eager mode, it seems like Tripy is very fast!
137+ Of course, this is because * we haven't actually done anything yet* .
138+ The actual compilation and execution only happens when we evaluate ` c ` :
139+
140+ ``` py
141+ # doc: no-print-locals
142+ start = time.time()
143+ print (c)
144+ end = time.time()
145+
146+ print (f " Time to print 'c': { (end - start) * 1000 :.3f } ms. " )
147+ ```
148+
149+ That is why the time to print ` c ` is so much higher than the time to create it.
0 commit comments