Skip to content

Knowledge Base

The Knowledge Base is where you upload the documents Sam reads from when answering visitor questions. Sam indexes each file into a private vector store dedicated to your tenant — your knowledge is never shared with other Sam customers.

Supported file types

TypeExtensionNotes
PDF.pdfSearchable text only — scanned image-only PDFs won’t index well
Word.docxModern Word format
Plain text.txtUTF-8 encoded
Markdown.mdHeadings preserved
CSV.csvEach row indexed as a knowledge chunk

Maximum file size: 20 MB per file.

Per-plan file limits

PlanMax files
Trial5
Starter5
Growth30
Pro100

If you hit the limit, delete or replace existing files before uploading new ones.

Upload flow

  1. Click Upload file.
  2. Choose a file from your computer.
  3. (Optional) Add tags, comma-separated. For example: pricing, onboarding, returns.
  4. Click Upload.

The file goes through these states:

  • Pending — uploaded to our storage, waiting to be processed
  • Uploaded — indexed and ready for Sam to read from
  • Failed — something went wrong (click to see the error and retry)
  • Deleting — file is being removed

Indexing usually takes seconds for small files, up to a minute for large PDFs.

Tags

Tags let you organise knowledge per topic and — on Growth and Pro — let you scope which widget reads which files.

For example, if you run two widgets (one for sales, one for support), you can tag your pricing PDFs with pricing and configure your sales widget to only read files tagged pricing. See Widget personality for the widget side.

Tag limits per plan:

PlanTags
Trial5
Starter0 (no tag scoping — widget reads all files)
Growth6
Pro12

You can apply up to 8 tags per file.

Retag and delete

Each file has actions in the row menu:

  • Retag — change which tags are applied (re-syncs to the vector store)
  • Retry — re-attempt a failed upload
  • Delete — remove from storage and from Sam’s knowledge

Deleting a file removes it from Sam immediately — visitor questions won’t surface deleted content.

Website crawling

Instead of uploading files one by one, you can point Sam at a website URL and let it import the content for you. Open Knowledge Base → Crawl website.

Availability: Website crawling is included on Trial, Growth, and Pro. It is not available on Starter — upgrade to Growth to unlock it.

Discovery modes

  • Single page — fetches just the URL you provide.
  • Sitemap.xml — discovers pages from the site’s sitemap.
  • Whole domain — breadth-first crawl from the URL, staying on the same domain.

Each fetched page is run through a content extractor (Readability + html-to-markdown) and a lightweight LLM cleanup pass that strips navigation, footers, image placeholders, and form noise while preserving exact text — prices, names, and numbers are not rewritten.

Per-plan page caps

The number of pages Sam will fetch per crawl source is capped by your plan:

PlanMax pages per source
Trial5
StarterNot available
Growth100
Pro250

A hard ceiling of 2,000 pages applies regardless of plan.

Review before publishing

Crawled pages land as drafts on the review page (/knowledge/crawl/{source}/review). For each page you can:

  • Approve — accept the cleaned content as-is
  • Edit — fine-tune the title or content before publishing
  • Exclude — drop the page from this crawl

Once you click Publish approved, the approved pages are uploaded to your tenant’s vector store via the same pipeline as a manual upload. They appear in the Knowledge Base file list as normal knowledge files.

Scheduled re-crawl

Scheduled re-crawl is a Pro-only feature — Growth users can run a Website crawl but must trigger re-runs manually.

On Pro, each crawl source can be set to re-crawl on a schedule:

  • Off — no automatic re-crawl
  • Daily / Weekly / Monthly

A re-crawl only flags pages as Changes — re-review when the content hash differs from what’s currently published, so unchanged pages don’t churn the vector store. You’ll see the changed pages back on the review page, ready to approve or edit before re-publishing.

Tips for better answers

  • Prefer Markdown or plain text for short FAQs — they index cleanly with no formatting noise.
  • Split very long documents into focused files of a few pages each. Sam retrieves relevant chunks; smaller files mean cleaner retrieval.
  • Use a consistent tag vocabulary so widgets pull the right knowledge.
  • Re-upload (don’t edit) if you want to update a file. Editing the underlying file in S3 won’t reindex it.
  • For long-tail content like product catalogues or help-centres, website crawling is usually faster than uploading PDFs one by one.