Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 100 additions & 0 deletions src/content/docs/integrations/testingbot.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
---
title: TestingBot
description: How to use TestingBot to scrape websites with Val Town
---

Some websites are partially (or entirely) rendered on the client (aka your web
browser). If you try to search the initial HTML for elements that haven't
finished rendering, you won't find them.

One solution is to use a headless browser that runs a web browser in the
background that fetches the page, renders it, and _then_ allows you to search
the final document.

[TestingBot](https://testingbot.com/)
provides an API to interact with a remote headless browser. You can use a Function to [scrape a website](https://testingbot.com/support/functions/scrape) and fetch its contents.

## Sign up to TestingBot and retrieve your credentials

Copy the API key and SECRET from the
[TestingBot dashboard](https://testingbot.com/members/user/security)
and save it as Val Town environment variables `testingbot_key` and `testingbot_secret`.

## Make an API call to the [/scrape API](https://testingbot.com/support/functions/scrape)

Check the documentation for the
[/scrape API](https://testingbot.com/support/functions/scrape) and prepare your request.

For example, here's how you scrape the introduction paragraph of OpenAI's
Wikipedia page.

```ts title="Scrape API example" val
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you link a val here that readers can remix under a TestingBot account? Want to also make sure this runs!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import { fetchJSON } from "https://esm.town/v/stevekrouse/fetchJSON?v=41";

export default async function ScrapeWebsite(url: String, selector: String) {
const res = await fetchJSON(
`https://cloud.testingbot.com/scrape?key=${Deno.env.get("testingbot_key")}&secret=${
Deno.env.get("testingbot_secret")
}&browserName=chrome&version=latest&platform=WIN10`,
{
method: "POST",
body: JSON.stringify({
url: url,
elements: [
{
// The second <p> element on the page
selector: selector,
},
],
}),
},
);
// For this request, TestingBot will return one data item
const data = res[0];
// That contains a single element
const elements = data.results;
// Which we want to turn into its innerText value
const intro = elements[0].text;
return intro;
}
console.log("Scrape result:", await ScrapeWebsite("https://en.wikipedia.org/wiki/OpenAI", "p:nth-of-type(2)"));
```

There are other functions available, such as [taking screenshots](https://testingbot.com/support/functions/screenshot), [generating PDFs](https://testingbot.com/support/functions/pdf) and more.

## Use Puppeteer to instrument a remote browser

You can use the [Puppeteer](https://pptr.dev/) library to connect to a remote browser running on TestingBot.

Once you've navigated to a URL, you can run arbitrary JavaScript with
`page.evaluate` - like getting the text from a paragraph.

```ts title="Puppeteer example" val
import { PuppeteerDeno } from "https://deno.land/x/[email protected]/src/deno/Puppeteer.ts";

const puppeteer = new PuppeteerDeno({
productName: "chrome",
});
const capabilities = {
'tb:options': {
key: Deno.env.get("testingbot_key"),
secret: Deno.env.get("testingbot_secret")
},
browserName: 'chrome',
browserVersion: 'latest'
};
const browser = await puppeteer.connect({
browserWSEndpoint: `wss://cloud.testingbot.com/puppeteer?capabilities=${encodeURIComponent(JSON.stringify(capabilities))}`,
});
const page = await browser.newPage();
await page.goto("https://en.wikipedia.org/wiki/OpenAI");
const intro = await page.evaluate(
`document.querySelector('p:nth-of-type(2)').innerText`
);
await browser.close();
console.log(intro);
```

```txt title="Logs"
"OpenAI is an American artificial intelligence (AI) research laboratory consisting of the non-profit OpenAI Incorporated and its for-profit subsidiary corporation OpenAI Limited Partnership. OpenAI conducts AI research with the declared intention of promoting and developing friendly AI."
```