Description
🐛 Bug
I`m trying to install xFormers and a few other packages in an auxillliary notebook in order to use it as a utility script in a submission notebook I am preparing for the UBC-OCEAN competition. However, after installation, I try running my submission notebook and obtain the following error message:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[6], line 37
35 coords = coords.squeeze(0)
36 X = tiles.float().to(device=device, non_blocking=True)
---> 37 y_prob, pred, features = model(X, coords)
38 query_preds.append((image_id.item(), labels[pred.to(device='cpu').item()]))
39 query_features.append(features.view(-1).to(device='cpu'))
File /kaggle/usr/lib/ubc_ocean_packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
Cell In[4], line 227, in WSINet.forward(self, x, coords)
225 def forward(self, x, coords):
226 features = self.encoder(x).unsqueeze(0)
--> 227 features, mask = self.roformer(features, coords)
228 y_prob, y_hat, attention = self.attention(features)
230 return y_prob, y_hat, attention
File /kaggle/usr/lib/ubc_ocean_packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
Cell In[4], line 97, in RoFormerLayer.forward(self, x, coords)
95 q, k = apply_rotary_position_embeddings(self.rope(h, grid_h, grid_w), q, k)
96 q, k, v = q.reshape(bs, n, self.heads, self.head_dim), k.reshape(bs, n, self.heads, self.head_dim), v.reshape(bs, n, self.heads, self.head_dim)
---> 97 att = fmha.memory_efficient_attention(q, k, v, attn_bias=mask, p = self.dropout, op=(fmha.cutlass.FwOp, fmha.cutlass.BwOp))
98 o = self.norm2(h + att.reshape(bs, n, h.size(-1)))
99 ff = self.mlp(o)
File /kaggle/usr/lib/ubc_ocean_packages/xformers/ops/fmha/__init__.py:223, in memory_efficient_attention(query, key, value, attn_bias, p, scale, op)
116 def memory_efficient_attention(
117 query: torch.Tensor,
118 key: torch.Tensor,
(...)
124 op: Optional[AttentionOp] = None,
125 ) -> torch.Tensor:
126 """Implements the memory-efficient attention mechanism following
127 `"Self-Attention Does Not Need O(n^2) Memory" <[http://arxiv.org/abs/2112.05682>`_.](http://arxiv.org/abs/2112.05682%3E%60_.%3C/span%3E)
128
(...)
221 :return: multi-head attention Tensor with shape ``[B, Mq, H, Kv]``
222 """
--> 223 return _memory_efficient_attention(
224 Inputs(
225 query=query, key=key, value=value, p=p, attn_bias=attn_bias, scale=scale
226 ),
227 op=op,
228 )
File /kaggle/usr/lib/ubc_ocean_packages/xformers/ops/fmha/__init__.py:321, in _memory_efficient_attention(inp, op)
316 def _memory_efficient_attention(
317 inp: Inputs, op: Optional[AttentionOp] = None
318 ) -> torch.Tensor:
319 # fast-path that doesn't require computing the logsumexp for backward computation
320 if all(x.requires_grad is False for x in [inp.query, inp.key, inp.value]):
--> 321 return _memory_efficient_attention_forward(
322 inp, op=op[0] if op is not None else None
323 )
325 output_shape = inp.normalize_bmhk()
326 return _fMHA.apply(
327 op, inp.query, inp.key, inp.value, inp.attn_bias, inp.p, inp.scale
328 ).reshape(output_shape)
File /kaggle/usr/lib/ubc_ocean_packages/xformers/ops/fmha/__init__.py:339, in _memory_efficient_attention_forward(inp, op)
337 op = _dispatch_fw(inp, False)
338 else:
--> 339 _ensure_op_supports_or_raise(ValueError, "memory_efficient_attention", op, inp)
341 out, *_ = op.apply(inp, needs_gradient=False)
342 return out.reshape(output_shape)
File /kaggle/usr/lib/ubc_ocean_packages/xformers/ops/fmha/dispatch.py:39, in _ensure_op_supports_or_raise(exc_type, name, op, inp)
37 if not reasons:
38 return
---> 39 raise exc_type(
40 f"""Operator `{name}` does not support inputs:
41 {textwrap.indent(_format_inputs_description(inp), ' ')}
42 {_format_not_supported_reasons(op, reasons)}"""
43 )
ValueError: Operator `memory_efficient_attention` does not support inputs:
query : shape=(1, 7040, 8, 96) (torch.float32)
key : shape=(1, 7040, 8, 96) (torch.float32)
value : shape=(1, 7040, 8, 96) (torch.float32)
attn_bias : <class 'xformers.ops.fmha.attn_bias.BlockDiagonalMask'>
p : 0.25
`cutlassF` is not supported because:
xFormers wasn't build with CUDA support
operator wasn't built - see `python -m xformers.info` for more info
The 'python -m xformers.info' command returns the following (note the unavailability of the memory_efficient_attention methods; I need these):
xFormers 0.0.22.post7+cu118
memory_efficient_attention.cutlassF: unavailable
memory_efficient_attention.cutlassB: unavailable
memory_efficient_attention.decoderF: unavailable
[email protected]: unavailable
[email protected]: unavailable
memory_efficient_attention.smallkF: unavailable
memory_efficient_attention.smallkB: unavailable
memory_efficient_attention.tritonflashattF: unavailable
memory_efficient_attention.tritonflashattB: unavailable
memory_efficient_attention.triton_splitKF: unavailable
indexing.scaled_index_addF: available
indexing.scaled_index_addB: available
indexing.index_select: available
swiglu.dual_gemm_silu: unavailable
swiglu.gemm_fused_operand_sum: unavailable
swiglu.fused.p.cpp: not built
is_triton_available: True
pytorch.version: 2.0.1+cu118
pytorch.cuda: available
gpu.compute_capability: 6.0
gpu.name: Tesla P100-PCIE-16GB
build.info: available
build.cuda_version: 1108
build.python_version: 3.10.13
build.torch_version: 2.1.0+cu118
build.env.TORCH_CUDA_ARCH_LIST: 5.0+PTX 6.0 6.1 7.0 7.5 8.0+PTX 9.0
build.env.XFORMERS_BUILD_TYPE: Release
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None
build.env.NVCC_FLAGS: None
build.env.XFORMERS_PACKAGE_FROM: wheel-v0.0.22.post7
source.privacy: open source
Also, I have noticed the following warnings on the xformers part of the install log for the auxilliary notebook, despite having successully installed the required cupy-cuda11x:
cudf 23.8.0 requires cupy-cuda11x>=12.0.0, which is not installed.
cuml 23.8.0 requires cupy-cuda11x>=12.0.0, which is not installed.
dask-cudf 23.8.0 requires cupy-cuda11x>=12.0.0, which is not installed.
I believe the missing cupy-cuda11x is responsible for the fact that CUDA is not available for this xFormers install.
In short, I need one or both the following:
1- the package cupy-cuda11x (current version is 12.2.0 and works with CUDA 11.2-11.8) installed in the environment for GPUs
2- (even better) the package xFormers cu118 installed in the environment for GPUs (however, I am conscious of the fact that the P100s are using CUDA 11.4 only, an upgrade of Nvidia driver might be required)
To Reproduce
On a GPU P100 notebook with Persistence - Files only and set up as a utility script, run and save the following:
!pip install cupy-cuda11x --target=/kaggle/working/
!pip install torchstain --target=/kaggle/working
!pip install faiss-cpu --target=/kaggle/working/
!pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 torchaudio==2.0.2 --target=/kaggle/working/ --index-url https://download.pytorch.org/whl/cu118
!pip3 install xformers --target=/kaggle/working/ --index-url https://download.pytorch.org/whl/cu118
Then, in another GPU P100 notebook, add the utility script built above and run the command '!python -m xformers.info'
Expected behavior
The notebook should be able to contain and execute any correctly written piece of code calling for the 'memory_efficient_attention' component of xFormers, at least for the base CUTLASS methods (the other methods are reliant on the Triton and Flash-Attention packages being installed; for the purpose of this request, I do not need them)