Document Loader properties
Important: Agents is currently in beta. For more information, see the Nintex Beta terms document.
Document loader properties define the settings used to import and process documents. Use these settings to specify supported file types, content extraction options, and document handling behavior before the document is prepared for use.
Jump to:
Edit Document Loader properties
-
Go to Agents > Document Stores.
-
Click the document store containing the document loader you want to edit.
-
Click Options(
). -
Select Preview & Process.
The document loader properties page is displayed.
-
Update the required settings, then click Process.
Document loader types and properties
| Field or selection | Description |
|---|---|
| Method |
HTTP method used to request data from the API endpoint.
|
| URL | Endpoint URL that returns the content to load. |
| Headers | HTTP headers included in the API request. |
| SSL certificate |
SSL certificate file used to authenticate the API connection. |
| Body | Request body sent with the API call. Available when POST is selected. |
| Field or selection | Description |
|---|---|
| Connect credential | Credentials used to authenticate the connection to Confluence. |
| Base URL | Base URL of the Confluence site to connect to. |
| Space key | Key of the Confluence space from which content is retrieved. |
| Limit | Maximum number of pages to retrieve from the specified space. |
| Field or selection | Description |
|---|---|
| Single column extraction | Column name used to extract content from a single column in the CSV file. |
| Field or selection | Description |
|---|---|
| Docx File Loader Name | Name assigned to the Docx File document loader. |
| Docx File | Docx file uploaded to load content into the document store. |
| Field or selection | Description |
|---|---|
| Usage |
Defines how the document is divided during processing.
|
| Field or selection | Description |
|---|---|
| Connect credential | Credentials used to authenticate the connection to GitHub. |
| Repo link | URL of the GitHub repository to load content from. |
| Branch | Repository branch from which content is retrieved. |
| Recursive | Retrieves files from subdirectories within the repository when enabled. |
| Max concurrency | Maximum number of concurrent requests used when retrieving repository content. |
| GitHub base URL | Base URL of the GitHubinstance. Used for self-hosted GitHub environments. |
| GitHub instance API | API endpoint URL for the GitHub instance. |
| Ignore paths | File or folder paths excluded from retrieval. |
| Field or selection | Description |
|---|---|
| Connect credential | Credentials used to authenticate the connection to Google Drive. |
| Select files | Files selected from Google Drive to load into the document store. |
| Folder ID | ID of the Google Drive folder from which files are retrieved. |
| File types | File formats included when retrieving files from Google Drive. |
| Include subfolders | Retrieves files from subfolders within the specified folder when enabled. |
| Include shared drives | Retrieves files from shared drives when enabled. |
| Max files | Maximum number of files to retrieve. |
| Field or selection | Description |
|---|---|
| Connect credential | Credentials used to authenticate the connection to Google Sheets. |
| Select spreadsheet | Spreadsheet selected to retrieve data from. |
| Sheet names | Names of the worksheet tabs to retrieve data from. |
| Range | Cell range to retrieve, for example A1:E10. |
| Include headers | Includes the first row as column headers when enabled. |
| Value render option | Format used to return cell values from the spreadsheet. |
| Field or selection | Description |
|---|---|
| Jira Loader Name | Name assigned to the Jira document loader. |
| Connect Credential | Credentials used to authenticate the connection to Jira. |
| Host | Base URL of the Jira instance to connect to. |
| Project Key | Key of the Jira project from which issues are retrieved. |
| Limit per request | Maximum number of issues retrieved per API request. |
| Created after | Retrieves issues created after the specified date. |
| Field or selection | Description |
|---|---|
| Notion Database Loader Name | Name assigned to the Notion database loader. |
| Connect Credential | Credentials used to authenticate the connection to Notion. |
| Notion Database Id | Identifier of the Notion database from which content is retrieved. |
| Additional Metadata | Additional metadata fields included with the retrieved Notion database content. |
| Field or selection | Description |
|---|---|
| Microsoft Excel Loader Name | Name assigned to the Microsoft Excel loader. |
| Excel File | Excel file uploaded to load content into the document store. |
| Field or selection | Description |
|---|---|
| Microsoft PowerPoint Loader Name | Name assigned to the Microsoft PowerPoint loader. |
| PowerPoint File | PowerPoint file uploaded to load content into the document store. |
| Field or selection | Description |
|---|---|
| Notion Folder Loader Name | Name assigned to the Notion folder loader. |
| Notion Folder | Path of the Notion folder from which content is retrieved |
| Additional Metadata | Additional metadata fields included with the retrieved Notion folder content. |
| Field or selection | Description |
|---|---|
| Notion Page Loader Name | Name assigned to the Notion page loader. |
| Connect Credential | Credentials used to authenticate the connection to Notion. |
| Notion Page Id | Identifier of the Notion page from which content is retrieved. |
| Field or selection | Description |
|---|---|
| PDF File Loader Name | Name assigned to the PDF File document loader. |
| PDF File | PDF file uploaded to load content into the document store. |
| Usage | Defines how the PDF content is split into documents.
|
| Use Legacy Build | Uses the legacy processing method for loading the PDF file. |
| Field or selection | Description |
|---|---|
| S3 Loader Name | Name assigned to the S3 loader. |
| AWS Credential | Credentials used to authenticate the connection to Amazon S3. |
| Bucket | Name of the S3 bucket from which files are retrieved. |
| Object Key | Key of the file or object in the S3 bucket to load. |
| Region | AWS region where the S3 bucket is located. |
| File Processing Method | Method used to process the retrieved file before it is prepared for use. |
| Field or selection | Description |
|---|---|
| S3 Directory Loader Name | Name assigned to the S3 directory loader. |
| Credential | Credentials used to authenticate the connection to Amazon S3. |
| Bucket | Name of the S3 bucket from which files are retrieved. |
| Region | AWS region where the S3 bucket is located. |
| Server URL | Endpoint URL of the S3 service. Used for custom or self-hosted S3-compatible storage. |
| Prefix | Folder path or prefix used to retrieve files from a specific location in the bucket. |
| PDF Usage | Defines how PDF files are processed when retrieved from the bucket. |
| Field or selection | Description |
|---|---|
| Text File Loader Name | Name assigned to the text file loader. |
| Txt File | Text file uploaded to load content into the document store. |
The following columns or selections are displayed when you view the document loader properties page.
| Section | Field or selection | Description |
|---|---|---|
| (unlabeled) |
File loader name |
Name assigned to the file loader. |
| Additional Metadata | Additional information added to the document during processing. | |
| Omit Metadata Keys | Metadata keys excluded when the document is processed. | |
| Select Text Splitter | Splitter |
Text splitting method used to divide the document into smaller sections before processing. The types of text splitters are listed below. Note: The Recursive Character Text Splitter is the recommended text splitter. It preserves meaningful structure and is suitable for large documents.
|
| (unlabeled) | Preview chunks | Displays a preview of the generated text sections based on the selected splitter settings. |
Best practices for handling line breaks in document stores
When processing documents in a document store, line breaks can affect how text is stored, split, and prepared for embeddings. Configure the line break carefully to maintain readable content while preparing text suitable for AI models.
Clean text during upload when documents contain inconsistent formatting
Cleaning text during upload creates a consistent stored version of the document. This step removes unnecessary spacing and formatting issues so that the stored content remains clean and easier to use across platform features.
Use upload cleanup when documents originate from multiple sources, such as:
-
PDFs with inconsistent line breaks.
-
OCR generated text with broken lines.
-
Emails with irregular spacing.
-
Content copied from websites.
A clean stored version improves readability and supports downstream processes such as previewing documents, building search indexes, and synchronizing or exporting content.
Prepare text for the AI model during embedding
Embedding preparation focuses on optimizing text for AI processing rather than storage. Removing unnecessary line breaks during this stage helps the model read the content consistently and improves chunking and retrieval.
Embedding cleanup may include:
-
Removing single line breaks while preserving paragraph structure.
-
Trimming unnecessary spacing.
-
Normalizing text formatting before generating embeddings.
Apply this step to ensure the text format aligns with model requirements.
Cleanup during upload and embedding when documents are inconsistent
In some scenarios, cleaning text during both upload and embedding produces better results. This approach creates a clean stored document while still allowing model-specific formatting adjustments.
Use cleanup at both stages when:
-
Documents originate from multiple or inconsistent sources.
-
The document store must maintain clean content for previews or indexing.
-
Embedding rules may change over time.
-
Upload and embedding processes occur at different times.
Separating these steps keeps the stored content consistent while allowing embedding behavior to evolve.
Preserve line breaks when formatting is meaningful
Documents rely on line breaks for structure. Removing them may change the meaning or reduce clarity.
Preserve line breaks for documents such as:
-
Source code or technical snippets.
-
Legal or policy documents.
-
Structured content where spacing conveys meaning.
Review document formatting before enabling line break cleanup to ensure important structure is preserved.