This package contains the Box document loader for LangChain.js. For more information about Box, check out our developer documentation.
In order to integrate with Box, you need a few things:
- A Box instance — if you are not a current Box customer, sign up for a free dev account
- A Box app — more on how to create an app
- Your app approved in your Box instance — This is done by your Box admin. The good news is if you are using a free developer account, you are the admin. Authorize your app
- Node.js >= 20
npm install langchainjs-box
The langchainjs-box
package offers some flexibility to authentication. The most basic authentication method is by using a developer token. This can be found in the Box developer console on the configuration screen. This token is purposely short-lived (1 hour) and is intended for development. With this token, you can add it to your environment as BOX_DEVELOPER_TOKEN
, you can pass it directly to the loader, or you can use the BoxAuth
authentication helper class.
BoxAuth
supports the following authentication methods:
- Token — either a developer token or any token generated through the Box SDK
- JWT with a service account
- JWT with a specified user
- CCG with a service account
- CCG with a specified user
Note: If using JWT authentication, you will need to download the configuration from the Box developer console after generating your public/private key pair. Place this file in your application directory structure somewhere. You will use the path to this file when using the
BoxAuth
helper class.
For more information, learn about how to set up a Box application, and check out the Box authentication guide for more about our different authentication options.
Developer Token
import { BoxLoader, BoxAuth, BoxAuthType } from 'langchainjs-box';
const auth = new BoxAuth({
authType: BoxAuthType.TOKEN,
boxDeveloperToken: 'DEVELOPER_TOKEN'
});
const loader = new BoxLoader({
boxAuth: auth,
boxFileIds: ['FILE_ID_1', 'FILE_ID_2']
});
const docs = await loader.load();
JWT with a service account // Ensure that service account has been added as a colaborator to content
import { BoxLoader, BoxAuth, BoxAuthType } from 'langchainjs-box';
const auth = new BoxAuth({
authType: BoxAuthType.JWT,
boxJwtPath: './path/to/jwt-config.json'
});
const loader = new BoxLoader({
boxAuth: auth,
boxFolderId: 'FOLDER_ID'
});
const docs = await loader.load();
JWT with a specified user
import { BoxLoader, BoxAuth, BoxAuthType } from 'langchainjs-box';
const auth = new BoxAuth({
authType: BoxAuthType.JWT,
boxJwtPath: './path/to/jwt-config.json',
boxUserId: 'USER_ID'
});
const loader = new BoxLoader({
boxAuth: auth,
boxFolderId: 'FOLDER_ID'
});
const docs = await loader.load();
CCG with a service account // Ensure that service account has been added as a colaborator to content
import { BoxLoader, BoxAuth, BoxAuthType } from 'langchainjs-box';
const auth = new BoxAuth({
authType: BoxAuthType.CCG,
boxClientId: 'CLIENT_ID',
boxClientSecret: 'CLIENT_SECRET',
boxEnterpriseId: 'ENTERPRISE_ID'
});
const loader = new BoxLoader({
boxAuth: auth,
boxFolderId: 'FOLDER_ID'
});
const docs = await loader.load();
CCG with a specified user
import { BoxLoader, BoxAuth, BoxAuthType } from 'langchainjs-box';
const auth = new BoxAuth({
authType: BoxAuthType.CCG,
boxClientId: 'CLIENT_ID',
boxClientSecret: 'CLIENT_SECRET',
boxUserId: 'USER_ID'
});
const loader = new BoxLoader({
boxAuth: auth,
boxFolderId: 'FOLDER_ID'
});
const docs = await loader.load();
To obtain CCG user access tokens (Box-managed users), ensure the following in the Box Developer Console for your app, then re-authorize the app in the Admin Console:
- App Access Level: set to "App + Enterprise Access".
- Client Credentials Grant: enable "Generate user access tokens".
- Choose "All managed users" or "Select users" and include the specific user.
- Scopes: enable read scopes required for file/folder access.
The boxUserId
must be the numeric Box user ID and the user must be a managed user in the same enterprise.
The BoxLoader
class helps you get your unstructured content from Box in LangChain's Document
format. You can do this with either an array of Box file IDs, or with a Box folder ID.
If getting files from a folder with folder ID, you can also set a boolean to tell the loader to get all sub-folders in that folder, as well.
Info: A Box instance can contain Petabytes of files, and folders can contain millions of files. Be intentional when choosing what folders you choose to index. And we recommend never getting all files from folder 0 recursively. Folder ID 0 is your root folder.
import { BoxLoader } from 'langchainjs-box';
// Using environment variable BOX_DEVELOPER_TOKEN
process.env.BOX_DEVELOPER_TOKEN = 'your_developer_token_here';
const loader = new BoxLoader({
boxFileIds: ['FILE_ID_1', 'FILE_ID_2'],
characterLimit: 10000 // Optional. Defaults to no limit
});
const docs = await loader.load();
import { BoxLoader } from 'langchainjs-box';
// Using environment variable BOX_DEVELOPER_TOKEN
process.env.BOX_DEVELOPER_TOKEN = 'your_developer_token_here';
const loader = new BoxLoader({
boxFolderId: 'FOLDER_ID',
recursive: false, // Optional. return entire tree, defaults to false
characterLimit: 10000 // Optional. Defaults to no limit
});
const docs = await loader.load();
import { BoxLoader } from 'langchainjs-box';
const loader = new BoxLoader({
boxFolderId: 'FOLDER_ID'
});
// Load documents one by one
for await (const doc of loader.lazyLoad()) {
console.log(doc.metadata.file_name);
// Process each document
}
You can set the following environment variables:
BOX_DEVELOPER_TOKEN
- Developer token from Box consoleBOX_JWT_PATH
- Path to JWT configuration fileBOX_USER_ID
- User ID for user-specific authenticationBOX_CLIENT_ID
- Client ID for CCG authenticationBOX_CLIENT_SECRET
- Client secret for CCG authenticationBOX_ENTERPRISE_ID
- Enterprise ID for enterprise CCG authentication
The loader includes error handling for common scenarios:
- Files that cannot be read (binary files, permission issues)
- Network connectivity issues
- Authentication failures
- Invalid file or folder IDs
When errors occur, the loader will log warnings but continue processing other files.
For the following file types, the loader requests Box's markdown representation:
- Microsoft Office:
.docx
,.pptx
,.xls
,.xlsx
,.xlsm
- Google Workspace:
.gdoc
,.gslide
,.gslides
,.gsheet
- PDF:
.pdf
For other supported text-like files, the loader falls back to the extracted text representation.
MIT