Knowledge Base
The Knowledge Base is where you upload the documents Sam reads from when answering visitor questions. Sam indexes each file into a private vector store dedicated to your tenant — your knowledge is never shared with other Sam customers.
Supported file types
| Type | Extension | Notes |
|---|---|---|
.pdf | Searchable text only — scanned image-only PDFs won’t index well | |
| Word | .docx | Modern Word format |
| Plain text | .txt | UTF-8 encoded |
| Markdown | .md | Headings preserved |
| CSV | .csv | Each row indexed as a knowledge chunk |
Maximum file size: 20 MB per file.
Per-plan file limits
| Plan | Max files |
|---|---|
| Trial | 5 |
| Starter | 5 |
| Growth | 30 |
| Pro | 100 |
If you hit the limit, delete or replace existing files before uploading new ones.
Upload flow
- Click Upload file.
- Choose a file from your computer.
- (Optional) Add tags, comma-separated. For example:
pricing, onboarding, returns. - Click Upload.
The file goes through these states:
- Pending — uploaded to our storage, waiting to be processed
- Uploaded — indexed and ready for Sam to read from
- Failed — something went wrong (click to see the error and retry)
- Deleting — file is being removed
Indexing usually takes seconds for small files, up to a minute for large PDFs.
Tags
Tags let you organise knowledge per topic and — on Growth and Pro — let you scope which widget reads which files.
For example, if you run two widgets (one for sales, one for support), you can tag your pricing PDFs with pricing and configure your sales widget to only read files tagged pricing. See Widget personality for the widget side.
Tag limits per plan:
| Plan | Tags |
|---|---|
| Trial | 5 |
| Starter | 0 (no tag scoping — widget reads all files) |
| Growth | 6 |
| Pro | 12 |
You can apply up to 8 tags per file.
Retag and delete
Each file has actions in the row menu:
- Retag — change which tags are applied (re-syncs to the vector store)
- Retry — re-attempt a failed upload
- Delete — remove from storage and from Sam’s knowledge
Deleting a file removes it from Sam immediately — visitor questions won’t surface deleted content.
Website crawling
Instead of uploading files one by one, you can point Sam at a website URL and let it import the content for you. Open Knowledge Base → Crawl website.
Availability: Website crawling is included on Trial, Growth, and Pro. It is not available on Starter — upgrade to Growth to unlock it.
Discovery modes
- Single page — fetches just the URL you provide.
- Sitemap.xml — discovers pages from the site’s sitemap.
- Whole domain — breadth-first crawl from the URL, staying on the same domain.
Each fetched page is run through a content extractor (Readability + html-to-markdown) and a lightweight LLM cleanup pass that strips navigation, footers, image placeholders, and form noise while preserving exact text — prices, names, and numbers are not rewritten.
Per-plan page caps
The number of pages Sam will fetch per crawl source is capped by your plan:
| Plan | Max pages per source |
|---|---|
| Trial | 5 |
| Starter | Not available |
| Growth | 100 |
| Pro | 250 |
A hard ceiling of 2,000 pages applies regardless of plan.
Review before publishing
Crawled pages land as drafts on the review page (/knowledge/crawl/{source}/review). For each page you can:
- Approve — accept the cleaned content as-is
- Edit — fine-tune the title or content before publishing
- Exclude — drop the page from this crawl
Once you click Publish approved, the approved pages are uploaded to your tenant’s vector store via the same pipeline as a manual upload. They appear in the Knowledge Base file list as normal knowledge files.
Scheduled re-crawl
Scheduled re-crawl is a Pro-only feature — Growth users can run a Website crawl but must trigger re-runs manually.
On Pro, each crawl source can be set to re-crawl on a schedule:
- Off — no automatic re-crawl
- Daily / Weekly / Monthly
A re-crawl only flags pages as Changes — re-review when the content hash differs from what’s currently published, so unchanged pages don’t churn the vector store. You’ll see the changed pages back on the review page, ready to approve or edit before re-publishing.
Tips for better answers
- Prefer Markdown or plain text for short FAQs — they index cleanly with no formatting noise.
- Split very long documents into focused files of a few pages each. Sam retrieves relevant chunks; smaller files mean cleaner retrieval.
- Use a consistent tag vocabulary so widgets pull the right knowledge.
- Re-upload (don’t edit) if you want to update a file. Editing the underlying file in S3 won’t reindex it.
- For long-tail content like product catalogues or help-centres, website crawling is usually faster than uploading PDFs one by one.