Stagehand 教學指南

簡介

什麼是 Stagehand?

Stagehand 是由 Browserbase 開發，它結合了傳統程式碼控制的精確性與 AI 自然語言處理的靈活性,讓開發者能夠輕鬆建立生產級的網頁自動化工具。

為什麼選擇 Stagehand?

大多數現有的瀏覽器自動化工具要麼需要您在 Selenium、Playwright 或 Puppeteer 等框架中編寫低階程式碼,要麼使用高階 AI 代理如BrowseUse，但在生產環境中可能不夠穩定。Stagehand 讓您自由選擇何時使用程式碼、何時使用自然語言,成為生產環境中瀏覽器自動化的理想選擇。

主要功能

- act(): 使用自然語言執行單一動作 - observe(): 探索頁面元素,規劃工作流程 - extract(): 從頁面提取結構化資料 - agent(): 使用 Agent 代理執行複雜任務 - 基於 Playwright 的穩定底層架構

安裝說明

使用 create-browser-app (推薦)

最快速的開始方式是使用官方的專案生成器:

npx create-browser-app

接下來

cd my-stagehand-app # Enter the project directory
cp .env.example .env  # Add your API keys

安裝 ts-node (如果還沒安裝)

npm install -D ts-node
npm install @browserbasehq/stagehand playwright zod
npx playwright install

環境變數配置

建立 .env 檔案並設定必要的 API 金鑰:

# LLM Provider API Keys (至少選擇一個)
OPENAI_API_KEY=your_openai_api_key
ANTHROPIC_API_KEY=your_anthropic_api_key
Browserbase (選用,用於雲端瀏覽器)
BROWSERBASE_API_KEY=your_browserbase_api_key
BROWSERBASE_PROJECT_ID=your_browserbase_project_id

說明: - 您需要至少一個 LLM 提供商的 API 金鑰 (OpenAI 或 Anthropic) - Browserbase 憑證是選用的,用於雲端瀏覽器執行，我們需要使用，改用local

快速開始與測試

基本範例

在 index.ts: 主要修改env:"LOCAL"，還有加上你env有輸入api key的模型名稱跟api key變數接著就可以執行npm start進行測試

npm start

import { Stagehand } from "@browserbasehq/stagehand";
import { z } from "zod";
async function main() {
  const stagehand = new Stagehand({
    //env: "BROWSERBASE",這行註解，改下面的內容
    env: "LOCAL",
    modelName: "google/gemini-2.5-flash",
    modelClientOptions: {
      apiKey: process.env.GOOGLE_API_KEY,
    },
  });
  try {
    // 啟動瀏覽器
    await stagehand.init();
    const page = stagehand.page;
    // 導航到網站
    console.log("正在訪問 GitHub...");
    await page.goto("https://github.com/browserbase");
    // 使用 act() 執行動作
    console.log("點擊 stagehand 儲存庫...");
    await page.act("click on the stagehand repo");
    // 使用 extract() 提取資料
    console.log("提取儲存庫資訊...");
    const repoInfo = await page.extract({
      instruction: "extract the repository name and star count",
      schema: z.object({
        name: z.string().describe("The repository name"),
        stars: z.string().describe("The number of stars")
      })
    });
    console.log("儲存庫資訊:", repoInfo);
  } finally {
    // 關閉瀏覽器
    await stagehand.close();
  }
}main().catch(console.error);

測試 observe() 功能

observe() 讓您探索頁面並規劃下一步動作:

import { Stagehand } from "@browserbasehq/stagehand";
async function testObserve() {
  const stagehand = new Stagehand({
    env: "LOCAL",
    verbose: 1
  });
  try {
    await stagehand.init();
    const page = stagehand.page;
    // 導航到登入頁面
    await page.goto("https://example.com/login");
    // 探索頁面上的可用動作
    console.log("探索登入頁面元素...");
    const actions = await page.observe("Find the login form fields and buttons");
    console.log("發現的動作:");
    actions.forEach((action, index) => {
      console.log(${index + 1}. ${action.description});
      console.log(   方法: ${action.method});
      console.log(   選擇器: ${action.selector});
      console.log(   參數: ${JSON.stringify(action.arguments)});
    });
    // 選擇性地執行某個動作 (不需要額外的 LLM 呼叫!)
    if (actions.length > 0) {
      console.log("\n執行第一個動作...");
      await page.act(actions[0]); // 直接傳入 ObserveResult,節省 token!
    }
  } finally {
    await stagehand.close();
  }
}testObserve().catch(console.error);

observe() 的最佳實踐

#### 1. 與 act() 結合使用 - 預先規劃

當填寫表單時,可以先用 observe() 找到所有欄位,然後批次執行:

// 一次 LLM 呼叫,發現所有欄位
const fields = await page.observe("Find all the fields in the form");// 多次 act(),但不需要額外的 LLM 呼叫!
for (const field of fields) {
  await page.act(field);
}

#### 2. 與 extract() 結合使用 - 精確定位

使用 observe() 聚焦於特定區域,可減少 10 倍的 token 使用量:

// 先找到資料表格
const [table] = await page.observe("Find the data table");// 在特定選擇器範圍內提取資料
const { data } = await page.extract({
  instruction: "Extract data from the table",
  schema: z.object({
    data: z.array(z.string())
  }),
  selector: table.selector // 減少上下文範圍!
});

#### 3. 動作驗證

在執行關鍵動作前進行驗證:

const prompt = "click the submit button";
const expectedMethod = "click";try {
  await page.act(prompt);
} catch (error) {
  if (error.message.includes("method not supported")) {
    // 使用 observe 驗證動作
    const [action] = await page.observe(prompt);
    
    if (action && action.method === expectedMethod) {
      await page.act(action);
    } else {
      throw new Error(不支援的方法: 預期 "${expectedMethod}", 得到 "${action?.method}");
    }
  } else {
    throw error;
  }
}

除錯技巧

啟用詳細日誌:

const stagehand = new Stagehand({
     verbose: 2, // 0-2,數字越大越詳細
     debugDom: true
   });

視覺化模式 (顯示瀏覽器視窗):

const stagehand = new Stagehand({
     env: "LOCAL",
     headless: false // 顯示瀏覽器視窗
   });

截圖除錯:

await page.screenshot({ path: "debug-screenshot.png" });

observe() 使用場景

何時使用 observe()?

| 場景 | 說明 | |------|------| | 探索 | 不確定頁面上有什麼元素,需要發現可用動作 | | 規劃 | 建立複雜工作流程時,預先規劃所需的所有動作 | | 快取 | 記住未來的動作並避免 LLM 呼叫 | | 驗證 | 在執行關鍵動作前確保元素存在 |

observe() 回傳結構

{
  "selector": "xpath=/html/body/header/div/button[1]",
  "description": "Log in button in the top right corner",
  "method": "click",
  "arguments": []
}

常見問題排解

找不到元素

- 確保頁面已完全載入:await page.waitForLoadState('networkidle') - 使用更具體的指令 - 對於 iframe,設定 iframes: true

元素描述不準確

- 提供更精確的指令,例如: "Find the primary CTA in the hero section" - 增加 verbose 層級查看更多資訊

識別錯誤的方法

- 使用 observe() 先驗證建議的動作 - 檢查回傳的 method 和 selector

效能優化建議

直接傳遞 ObserveResult: await page.act(results[0]) 節省 LLM token

批次操作: 使用 observe() 一次找到元素,然後多次執行動作

快取穩定結果: 對熟悉的頁面快取並重用 observe() 結果

使用選擇器限制範圍: 在 extract() 中使用 observe() 的選擇器

AI操作瀏覽器的另一選擇Stagehand