How to load model with half-precision, such as float16 since only have limited gpu memory

## Description of the bug
can not load model with half precision. And haven't figured out how to transfer model to CPU or GPU?

## To Reproduce
run model gpt-j-6B as in the demo
use local huggingface method 


## Expected behavior
return a repsonse. 


## Error Logs/Screenshots
requests.exceptions.HTTPError: {'message': '"LayerNormKernelImpl" not implemented for \'Half\''}

## Environment (please complete the following information)
 - OS: [e.g. Ubuntu 20.04]

Thanks in advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to load model with half-precision, such as float16 since only have limited gpu memory #123

Description of the bug

To Reproduce

Expected behavior

Error Logs/Screenshots

Environment (please complete the following information)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

How to load model with half-precision, such as float16 since only have limited gpu memory #123

Description

Description of the bug

To Reproduce

Expected behavior

Error Logs/Screenshots

Environment (please complete the following information)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions