Architecture

File Structure

src/
└── index.js    # Single-file core module (~850 lines)

All logic lives in src/index.js. No external runtime dependencies besides the optional playwright-core.

Data Flow

run({ url, proxy, cookieFile })
│
├── checkResources()              → verify CPU/RAM (skip if lowResource)
├── checkWhisper()                → verify whisper-cli available
├── detectPlatform(url)           → 'youtube' | 'bilibili' | 'generic'
│
├── [Phase 1] mode !== 'browser'
│   ├── getVideoWithYtDlp()      → { title, duration, videoUrl, httpHeaders }
│   ├── extractFrames(url)       → try FFmpeg with direct URL
│   ├── downloadWithYtDlp()      → fallback: download to local file
│   ├── extractFrames(file)      → FFmpeg from local file
│   ├── extractAudio()           → 16kHz mono WAV
│   └── transcribeAudio()        → { text, language, segments }
│
├── [Phase 2] mode !== 'ytdlp'
│   ├── launchBrowser()          → local Chromium or cloud provider
│   ├── scrapeYouTube/Bilibili/Generic(page)
│   ├── extractFrames()          → if direct video URL found
│   ├── captureScreenshots()     → fallback: seek + screenshot
│   ├── extractAudio()           → if video source available
│   └── transcribeAudio()        → if audio extracted
│
├── analyzeFramesWithVision()    → send frames + transcript to vision AI
└── formatResult()               → structured text output

Key Functions

Function	Purpose
`run()`	Main entry point, orchestrates the full pipeline
`detectPlatform()`	URL pattern matching for platform detection
`getVideoWithYtDlp()`	Runs `yt-dlp -j` to get metadata + direct video URL
`downloadWithYtDlp()`	Runs `yt-dlp` to download video to a local file
`launchBrowser()`	Lazy-loads playwright-core, connects to local/cloud browser
`scrapeYouTube()`	YouTube-specific DOM scraping
`scrapeBilibili()`	Bilibili-specific DOM scraping
`scrapeGeneric()`	Generic `<video>` element scraping
`extractFrames()`	FFmpeg frame extraction at configured intervals
`captureScreenshots()`	Browser-based screenshot fallback (seek + capture)
`extractAudio()`	FFmpeg audio extraction to 16kHz mono WAV
`transcribeAudio()`	Run whisper-cli and parse JSON output
`formatTranscription()`	Format transcript segments for vision prompt
`checkResources()`	Verify CPU/RAM meet requirements (skip if lowResource)
`checkWhisper()`	Check if whisper-cli binary is available
`resolveWhisperModel()`	Find ggml model file in standard paths
`analyzeFramesWithVision()`	Sends base64 frames + transcript to vision API
`loadCookies()`	Parses JSON or Netscape cookie files

Phase 1 Detail: yt-dlp + FFmpeg

getVideoWithYtDlp(url)
    │
    │ yt-dlp -j --no-download -f 'best[height<=480]...'
    │
    ▼
{ videoUrl, httpHeaders, title, duration }
    │
    ├── [Step 1a] extractFrames(videoUrl, headers)
    │       FFmpeg direct URL → may fail with 403 (IP-bound URLs)
    │
    ├── [Step 1b] downloadWithYtDlp(url, localPath)
    │       yt-dlp downloads full video → extractFrames(localPath)
    │
    ▼
frames[] → analyzeFramesWithVision() → formatted result

Phase 2 Detail: Browser

launchBrowser()
    │
    │ require('playwright-core')  ← lazy loaded, throws if missing
    │
    ├── local:       chromium.launch()
    ├── browserless: chromium.connectOverCDP(wss://...)
    ├── browserbase: REST API → chromium.connectOverCDP(wss://...)
    └── steel:       chromium.connectOverCDP(wss://...)
         │
         ▼
    scrapeYouTube/Bilibili/Generic(page)
         │
         ├── videoUrl found → extractFrames()
         └── no videoUrl    → captureScreenshots() (click play, seek, screenshot)
         │
         ▼
    frames[] → analyzeFramesWithVision() → formatted result

Exports

javascript

module.exports = {
  run,                // Main entry point
  detectPlatform,     // URL → platform string
  loadCookies,        // Load cookie file
  getVideoWithYtDlp,  // yt-dlp metadata extraction
  downloadWithYtDlp,  // yt-dlp video download
  extractAudio,       // FFmpeg audio extraction
  transcribeAudio,    // whisper-cli transcription
  checkResources,     // System resource verification
};

Architecture ​

File Structure ​

Data Flow ​

Key Functions ​

Phase 1 Detail: yt-dlp + FFmpeg ​

Phase 2 Detail: Browser ​

Exports ​

Architecture

File Structure

Data Flow

Key Functions

Phase 1 Detail: yt-dlp + FFmpeg

Phase 2 Detail: Browser

Exports