Knowledge Base

The Knowledge Base is where you upload the documents Sam reads from when answering visitor questions. Sam indexes each file into a private vector store dedicated to your tenant — your knowledge is never shared with other Sam customers.

Supported file types

Type	Extension	Notes
PDF	`.pdf`	Searchable text only — scanned image-only PDFs won’t index well
Word	`.docx`	Modern Word format
Plain text	`.txt`	UTF-8 encoded
Markdown	`.md`	Headings preserved
CSV	`.csv`	Each row indexed as a knowledge chunk

Maximum file size: 20 MB per file.

Per-plan file limits

Plan	Max files
Trial	5
Starter	5
Growth	30
Pro	100

If you hit the limit, delete or replace existing files before uploading new ones.

Upload flow

Click Upload file.
Choose a file from your computer.
(Optional) Add tags, comma-separated. For example: pricing, onboarding, returns.
Click Upload.

The file goes through these states:

Pending — uploaded to our storage, waiting to be processed
Uploaded — indexed and ready for Sam to read from
Failed — something went wrong (click to see the error and retry)
Deleting — file is being removed

Indexing usually takes seconds for small files, up to a minute for large PDFs.

Retag and delete

Each file has actions in the row menu:

Retag — change which tags are applied (re-syncs to the vector store)
Retry — re-attempt a failed upload
Delete — remove from storage and from Sam’s knowledge

Deleting a file removes it from Sam immediately — visitor questions won’t surface deleted content.

Website crawling

Instead of uploading files one by one, you can point Sam at a website URL and let it import the content for you. Open Knowledge Base → Crawl website.

Availability: Website crawling is included on Trial, Growth, and Pro. It is not available on Starter — upgrade to Growth to unlock it.

Discovery modes

Single page — fetches just the URL you provide.
Sitemap.xml — discovers pages from the site’s sitemap.
Whole domain — breadth-first crawl from the URL, staying on the same domain.

Each fetched page is run through a content extractor (Readability + html-to-markdown) and a lightweight LLM cleanup pass that strips navigation, footers, image placeholders, and form noise while preserving exact text — prices, names, and numbers are not rewritten.

Per-plan page caps

The number of pages Sam will fetch per crawl source is capped by your plan:

Plan	Max pages per source
Trial	5
Starter	Not available
Growth	100
Pro	250

A hard ceiling of 2,000 pages applies regardless of plan.

Review before publishing

Crawled pages land as drafts on the review page (/knowledge/crawl/{source}/review). For each page you can:

Approve — accept the cleaned content as-is
Edit — fine-tune the title or content before publishing
Exclude — drop the page from this crawl

Once you click Publish approved, the approved pages are uploaded to your tenant’s vector store via the same pipeline as a manual upload. They appear in the Knowledge Base file list as normal knowledge files.

Scheduled re-crawl

Scheduled re-crawl is a Pro-only feature — Growth users can run a Website crawl but must trigger re-runs manually.

On Pro, each crawl source can be set to re-crawl on a schedule:

Off — no automatic re-crawl
Daily / Weekly / Monthly

A re-crawl only flags pages as Changes — re-review when the content hash differs from what’s currently published, so unchanged pages don’t churn the vector store. You’ll see the changed pages back on the review page, ready to approve or edit before re-publishing.

Tips for better answers

Prefer Markdown or plain text for short FAQs — they index cleanly with no formatting noise.
Split very long documents into focused files of a few pages each. Sam retrieves relevant chunks; smaller files mean cleaner retrieval.
Use a consistent tag vocabulary so widgets pull the right knowledge.
Re-upload (don’t edit) if you want to update a file. Editing the underlying file in S3 won’t reindex it.
For long-tail content like product catalogues or help-centres, website crawling is usually faster than uploading PDFs one by one.

Plan	Tags
Trial	5
Starter	0 (no tag scoping — widget reads all files)
Growth	6
Pro	12