Sources

Import existing content – help articles, PDFs, internal docs – so your agent can reference it without rewriting everything as individual topics. Connected Knowledge aggregates sources and re-syncs automatically.

The Connected tab is found under Knowledge > Sources in Agent Studio. Raven is the recommended model — it paraphrases unstructured content more naturally than other models.

Use Connected Knowledge when you want to expose large volumes of external content quickly without curating individual topics. Use FAQs instead when you need actions, flows, or precise control over what the agent says and does. Both use RAG (retrieval-augmented generation) to match user queries.

Supported sources

Websites
Documents (PDF, CSV, JSON)
Help desk systems (Zendesk, Gladly)

Sources sync automatically and can be reused across projects.

How Sources differs from FAQs

Both tabs expose information to your agent. Key differences:

Capability	Connected tab	FAQs tab
Trigger actions, functions, flows, SMS	No	Yes
Precise control over agent responses	No	Yes
Auto-sync from external sources	Yes	No
Best for frequently updated FAQ content	Yes	—
Best for stable, structured info	—	Yes
Fine-grained behavior control	No	Yes
Setup complexity	Low – no prompting skill required	Higher – requires more expertise and maintenance

Connected = fast import of external content. FAQs = precise control with actions and flows. If both tabs contain conflicting information, FAQs always takes priority.

Add a new source

Go to Knowledge > Sources tab
Select New source
Choose one of:
- Upload files
- Add URL
- Zendesk
- Gladly
- Additional integrations are in development – contact your PolyAI representative for the latest availability
Complete the required details and click Add

Your agent will begin Syncing the content. Once ready, the source appears in the list.

Supported source types

Source Type	Details
Upload files – Text & structured data	`.txt`, `.csv`, `.json`, `.xml`, `.md`, `.html`, `.rtf`
Upload files – PDF	`.pdf`
Upload files – Microsoft Office	`.docx`, `.doc`, `.docm`, `.xlsx`, `.xls`, `.xlsm`, `.pptx`, `.ppt`, `.pptm`, `.msg`
Upload files – OpenDocument	`.odt`, `.ods`, `.odp`
Upload files – Email files	`.eml`
Upload files – E-books	`.epub`
URL scraping	Public documentation pages and help center articles
Zendesk (beta)	Help Center content with API sync
Gladly (beta)	Knowledge source sync
Additional integrations	In development – contact your PolyAI representative for the latest availability

What exactly gets scraped when I upload a URL?

URL scraping traverses linked pages from the provided URL, with the following limits:

Depth → Only one level below the initial URL.
Breadth → A maximum of 10 embedded pages.

If your page contains more than 10 links, not all will be scraped. In that case, upload additional URLs individually or use integrations like Zendesk/Gladly for complete coverage. Where possible, connect applications such as Zendesk rather than relying on website scraping.

Keeping content fresh

After external content changes:

click Update to re-scrape files or URLs
or use the Sync icon per source

If a URL requires login or credentials change, syncing may fail. Update access and retry.

Group and manage sources

Group sources by product line, team, region, or document type. Sort by newest, oldest, type, or name. Each source offers:

Sync
Rename
Move to group
Remove

Why isn’t my agent using the sources I connected?

Several factors affect retrieval:

Data structure

Sources splits content into 2000-character chunks with 500-character overlap. Very large documents or widely separated related sections may struggle more with relevance. What to do:

Restructure documents into smaller, tighter pieces.
Repeat key headings or terms.
Or curate the material as a managed topic for guaranteed usage.

Update state

Two updates must be current:

Source Update → keeps the data in each source fresh
Agent Update → applies knowledge connection changes to the agent

Both can be triggered manually. Agent updates also run automatically every few minutes.

Environments, variants, saved changes

Each source must be enabled in the correct environment and variant. Any edits must be saved before leaving the page.

Conflicting information?

If the FAQs and Sources contain conflicting data, the FAQs tab wins. Content from the FAQs tab is always prioritized.

Viewing Connected Knowledge in Conversation Review

When your agent retrieves content from Connected Knowledge during a conversation, you can see exactly which sources were used in Conversation Review.

Open a conversation in Analytics > Conversations > Voice.
In the Diagnosis dropdown, toggle Sources on.
Each turn where Connected Knowledge was retrieved shows a Sources tag beneath the agent’s response, alongside any matched FAQs.
Click a source name to open an inline preview panel showing the exact text chunks the agent used.
Use Open in Knowledge in the panel to navigate directly to the source in the Knowledge area.

This is useful for:

Verifying the agent retrieved the correct content for a given question
Debugging cases where the agent’s response seems inaccurate or incomplete
Confirming that newly added or updated sources are being picked up

Combine the Sources and Topic citations diagnosis layers to see both Connected Knowledge and FAQs side by side for each turn.

Behavior and configuration notes

Use PolyAI’s Raven LLM for best results – it paraphrases structured and unstructured content more naturally.
Sources results are given ranking priority to ensure they surface alongside FAQs.
Sources and FAQs data are merged at runtime.
- Any system-prompt style guidance applies to both.

FAQs

Create curated topics alongside connected sources. FAQs always take priority.

RAG overview

Understand how retrieval-augmented generation works across your knowledge.

Conversation diagnosis

Verify which knowledge sources were retrieved on each turn.

Get started

Studio Assistant

Analytics

Conversations

Custom Dashboards

Behavior

Knowledge

Flows

Tools

Extend with code

Testing

Real-time config

Voice

Messaging

Integrations

Deployments

Widgets

Account

Supported sources

How Sources differs from FAQs

Add a new source

Supported source types

What exactly gets scraped when I upload a URL?

Keeping content fresh

Group and manage sources

Why isn’t my agent using the sources I connected?

Data structure

Update state

Environments, variants, saved changes

Conflicting information?

Viewing Connected Knowledge in Conversation Review

Behavior and configuration notes

FAQs

RAG overview

Conversation diagnosis

​Supported sources

​How Sources differs from FAQs

​Add a new source

​Supported source types

​What exactly gets scraped when I upload a URL?

​Keeping content fresh

​Group and manage sources

​Why isn’t my agent using the sources I connected?

​Data structure

​Update state

​Environments, variants, saved changes

​Conflicting information?

​Viewing Connected Knowledge in Conversation Review

​Behavior and configuration notes

​Related pages

FAQs

RAG overview

Conversation diagnosis

Supported sources

How Sources differs from FAQs

Add a new source

Supported source types

What exactly gets scraped when I upload a URL?

Keeping content fresh

Group and manage sources

Why isn’t my agent using the sources I connected?

Data structure

Update state

Environments, variants, saved changes

Conflicting information?

Viewing Connected Knowledge in Conversation Review

Behavior and configuration notes

Related pages