from svc_infra.loaders import GitHubLoaderBaseLoaderLoad files from a GitHub repository. Fetches files matching a pattern from a GitHub repo using the GitHub API. Supports public repos and private repos (with token).
repo: Repository in "owner/repo" format (e.g., "nfraxlab/svc-infra") path: Path within repo to load from (e.g., "docs", "examples/src"). Empty string means repo root. branch: Branch name (default: "main") pattern: Glob pattern for files to include (default: "*.md"). Use "*" to match all files. token: GitHub token for private repos or higher rate limits. Falls back to GITHUB_TOKEN environment variable. recursive: Whether to search subdirectories (default: True) skip_patterns: List of patterns to skip. Default patterns are: __pycache__, *.pyc, *.pyo, .git, node_modules, *.lock, .env* extra_metadata: Additional metadata to attach to all loaded content. on_error: How to handle errors ("skip" or "raise"). Default: "skip"
>>> # Load all markdown from docs/ >>> loader = GitHubLoader("nfraxlab/svc-infra", path="docs") >>> contents = await loader.load() >>> for c in contents: ... print(f"Loaded: {c.source}") >>> >>> # Load Python files from examples >>> loader = GitHubLoader( ... "nfraxlab/svc-infra", ... path="examples/src", ... pattern="*.py", ... skip_patterns=["__pycache__", "test_*"], ... ) >>> contents = await loader.load() >>> >>> # Private repo with token >>> loader = GitHubLoader( ... "myorg/private-repo", ... token="ghp_xxxx", # or set GITHUB_TOKEN env var ... ) >>> contents = await loader.load()
- GitHub API rate limits: 60 requests/hour unauthenticated, 5000 requests/hour with token - Large repos may require multiple API calls (tree is fetched recursively) - Binary files are automatically skipped