Document Loader properties

Important: Agents is currently in beta. For more information, see the Nintex Beta terms document.

Document loader properties define the settings used to import and process documents. Use these settings to specify supported file types, content extraction options, and document handling behavior before the document is prepared for use.

Jump to:

Edit Document Loader properties
Document loader types and properties
Common document loader properties fields, settings, and selections
Best practices for handling line breaks in document stores

Edit Document Loader properties

Go to Agents > Document Stores.
Click the document store containing the document loader you want to edit.
Click Options().
Select Preview & Process.

The document loader properties page is displayed.
Update the required settings, then click Process.

Document loader types and properties

API Loader properties

Field or selection	Description
Method	HTTP method used to request data from the API endpoint. GET: Retrieves data from the specified URL. POST: Sends data to the specified URL and retrieves the response.
URL	Endpoint URL that returns the content to load.
Headers	HTTP headers included in the API request.
SSL certificate	SSL certificate file used to authenticate the API connection.
Body	Request body sent with the API call. Available when POST is selected.

Confluence Loader properties

Field or selection	Description
Connect credential	Credentials used to authenticate the connection to Confluence.
Base URL	Base URL of the Confluence site to connect to.
Space key	Key of the Confluence space from which content is retrieved.
Limit	Maximum number of pages to retrieve from the specified space.

CSV File Loader properties

Field or selection	Description
Single column extraction	Column name used to extract content from a single column in the CSV file.

Docx File Loader properties

Field or selection	Description
Docx File Loader Name	Name assigned to the Docx File document loader.
Docx File	Docx file uploaded to load content into the document store.

EPUB File Loader properties

Field or selection	Description
Usage	Defines how the document is divided during processing. One document per chapter: Processes each chapter as a separate document. (EPUB only) One document per page: Processes each page as a separate document. (PDF only) One document per file: Processes the entire file as a single document.

Field or selection

Description

Usage

Defines how the document is divided during processing.

One document per chapter: Processes each chapter as a separate document. (EPUB only)

One document per page: Processes each page as a separate document. (PDF only)
One document per file: Processes the entire file as a single document.

GitHub Loader properties

Field or selection	Description
Connect credential	Credentials used to authenticate the connection to GitHub.
Repo link	URL of the GitHub repository to load content from.
Branch	Repository branch from which content is retrieved.
Recursive	Retrieves files from subdirectories within the repository when enabled.
Max concurrency	Maximum number of concurrent requests used when retrieving repository content.
GitHub base URL	Base URL of the GitHubinstance. Used for self-hosted GitHub environments.
GitHub instance API	API endpoint URL for the GitHub instance.
Ignore paths	File or folder paths excluded from retrieval.

Google Drive Loader properties

Field or selection	Description
Connect credential	Credentials used to authenticate the connection to Google Drive.
Select files	Files selected from Google Drive to load into the document store.
Folder ID	ID of the Google Drive folder from which files are retrieved.
File types	File formats included when retrieving files from Google Drive.
Include subfolders	Retrieves files from subfolders within the specified folder when enabled.
Include shared drives	Retrieves files from shared drives when enabled.
Max files	Maximum number of files to retrieve.

Google Sheets Loader properties

Field or selection	Description
Connect credential	Credentials used to authenticate the connection to Google Sheets.
Select spreadsheet	Spreadsheet selected to retrieve data from.
Sheet names	Names of the worksheet tabs to retrieve data from.
Range	Cell range to retrieve, for example A1:E10.
Include headers	Includes the first row as column headers when enabled.
Value render option	Format used to return cell values from the spreadsheet.

Jira Loader properties

Field or selection	Description
Jira Loader Name	Name assigned to the Jira document loader.
Connect Credential	Credentials used to authenticate the connection to Jira.
Host	Base URL of the Jira instance to connect to.
Project Key	Key of the Jira project from which issues are retrieved.
Limit per request	Maximum number of issues retrieved per API request.
Created after	Retrieves issues created after the specified date.

Notion Database Loader properties

Field or selection	Description
Notion Database Loader Name	Name assigned to the Notion database loader.
Connect Credential	Credentials used to authenticate the connection to Notion.
Notion Database Id	Identifier of the Notion database from which content is retrieved.
Additional Metadata	Additional metadata fields included with the retrieved Notion database content.

Microsoft Excel Loader properties

Field or selection	Description
Microsoft Excel Loader Name	Name assigned to the Microsoft Excel loader.
Excel File	Excel file uploaded to load content into the document store.

Microsoft PowerPoint Loader properties

Field or selection	Description
Microsoft PowerPoint Loader Name	Name assigned to the Microsoft PowerPoint loader.
PowerPoint File	PowerPoint file uploaded to load content into the document store.

Notion Folder Loader properties

Field or selection	Description
Notion Folder Loader Name	Name assigned to the Notion folder loader.
Notion Folder	Path of the Notion folder from which content is retrieved
Additional Metadata	Additional metadata fields included with the retrieved Notion folder content.

Notion Page Loader properties

Field or selection	Description
Notion Page Loader Name	Name assigned to the Notion page loader.
Connect Credential	Credentials used to authenticate the connection to Notion.
Notion Page Id	Identifier of the Notion page from which content is retrieved.

PDF Loader properties

Field or selection	Description
PDF File Loader Name	Name assigned to the PDF File document loader.
PDF File	PDF file uploaded to load content into the document store.
Usage	Defines how the PDF content is split into documents. One document per page: Processes each page as a separate document. One document per file: Processes the entire file as a single document.
Use Legacy Build	Uses the legacy processing method for loading the PDF file.

S3 Loader properties

Field or selection	Description
S3 Loader Name	Name assigned to the S3 loader.
AWS Credential	Credentials used to authenticate the connection to Amazon S3.
Bucket	Name of the S3 bucket from which files are retrieved.
Object Key	Key of the file or object in the S3 bucket to load.
Region	AWS region where the S3 bucket is located.
File Processing Method	Method used to process the retrieved file before it is prepared for use.

S3 Directory Loader properties

Field or selection	Description
S3 Directory Loader Name	Name assigned to the S3 directory loader.
Credential	Credentials used to authenticate the connection to Amazon S3.
Bucket	Name of the S3 bucket from which files are retrieved.
Region	AWS region where the S3 bucket is located.
Server URL	Endpoint URL of the S3 service. Used for custom or self-hosted S3-compatible storage.
Prefix	Folder path or prefix used to retrieve files from a specific location in the bucket.
PDF Usage	Defines how PDF files are processed when retrieved from the bucket.

Text Loader properties

Field or selection	Description
Text File Loader Name	Name assigned to the text file loader.
Txt File	Text file uploaded to load content into the document store.

Common document loader properties fields, settings, and selections

The following columns or selections are displayed when you view the document loader properties page.

Section	Field or selection	Description
(unlabeled)	File loader name	Name assigned to the file loader.
	Additional Metadata	Additional information added to the document during processing.
	Omit Metadata Keys	Metadata keys excluded when the document is processed.
Select Text Splitter	Splitter	Text splitting method used to divide the document into smaller sections before processing. The types of text splitters are listed below. Note: The Recursive Character Text Splitter is the recommended text splitter. It preserves meaningful structure and is suitable for large documents. Recursive character text splitter: Splits text by progressively reducing boundary levels, such as paragraph breaks, line breaks, and spaces, until the defined chunk size is met. Chunk Size: Maximum number of characters included in each chunk when the content is split. Chunk Overlap: Number of characters that overlap between consecutive chunks to help preserve context. Custom Seperator: Custom character or sequence used to define where the text should be split. The custom seperators includes the following: Blank line(\n\n: Indicates where a paragraph starts or ends. New line (\n): Splits content at line breaks. Full stop (.): Splits when a sentence ends Space ( ): Identifies the end of a word. None: Does not split the text. The document is processed as a single section. Character text splitter: Splits text by fixed character length and overlap. Chunk Size: Maximum number of characters included in each chunk when the content is split. Chunk Overlap: Number of characters that overlap between consecutive chunks to help preserve context. Custom Seperator: Custom character or sequence used to define where the text should be split. The custom seperators includes the following: Blank line(\n\n: Indicates where a paragraph starts or ends. New line (\n): Splits content at line breaks. Full stop (.): Splits when a sentence ends Space ( ): Identifies the end of a word. Code text splitter: Splits text using programming language syntax and code structure. Language: Programming language used to guide how the code is split into chunks. Examples include: cpp, go, java, js, php, proto, python, rst, ruby, rust, scala, swift, markdown, latex. Chunk Size: Maximum number of characters included in each chunk when the content is split. Chunk Overlap: Number of characters that overlap between consecutive chunks to help preserve context. Markdown text splitter: Splits based on Markdown structure such as headers and sections. Chunk Size: Maximum number of characters included in each chunk when the content is split. Chunk Overlap: Number of characters that overlap between consecutive chunks to help preserve context. HTML-Markdown text splitter: Converts HTML to Markdown, then splits based on Markdown structure. Chunk Size: Maximum number of characters included in each chunk when the content is split. Chunk Overlap: Number of characters that overlap between consecutive chunks to help preserve context. Token text splitter: Splits text by token count for models that use token-based limits. Encoding name: Token encoding used to calculate how the content is split into tokens. For example, gpt2, r50k_base, p50k_base, p50k_edit, cl100k_base. Chunk Size: Maximum number of characters included in each chunk when the content is split. Chunk Overlap: Number of characters that overlap between consecutive chunks to help preserve context.
(unlabeled)	Preview chunks	Displays a preview of the generated text sections based on the selected splitter settings.

Best practices for handling line breaks in document stores

When processing documents in a document store, line breaks can affect how text is stored, split, and prepared for embeddings. Configure the line break carefully to maintain readable content while preparing text suitable for AI models.

Clean text during upload when documents contain inconsistent formatting

Cleaning text during upload creates a consistent stored version of the document. This step removes unnecessary spacing and formatting issues so that the stored content remains clean and easier to use across platform features.

Use upload cleanup when documents originate from multiple sources, such as:

PDFs with inconsistent line breaks.
OCR generated text with broken lines.
Emails with irregular spacing.
Content copied from websites.

A clean stored version improves readability and supports downstream processes such as previewing documents, building search indexes, and synchronizing or exporting content.

Prepare text for the AI model during embedding

Embedding preparation focuses on optimizing text for AI processing rather than storage. Removing unnecessary line breaks during this stage helps the model read the content consistently and improves chunking and retrieval.

Embedding cleanup may include:

Removing single line breaks while preserving paragraph structure.
Trimming unnecessary spacing.
Normalizing text formatting before generating embeddings.

Apply this step to ensure the text format aligns with model requirements.

Cleanup during upload and embedding when documents are inconsistent

In some scenarios, cleaning text during both upload and embedding produces better results. This approach creates a clean stored document while still allowing model-specific formatting adjustments.

Use cleanup at both stages when:

Documents originate from multiple or inconsistent sources.
The document store must maintain clean content for previews or indexing.
Embedding rules may change over time.
Upload and embedding processes occur at different times.

Separating these steps keeps the stored content consistent while allowing embedding behavior to evolve.

Preserve line breaks when formatting is meaningful

Documents rely on line breaks for structure. Removing them may change the meaning or reduce clarity.

Preserve line breaks for documents such as:

Source code or technical snippets.
Legal or policy documents.
Structured content where spacing conveys meaning.

Review document formatting before enabling line break cleanup to ensure important structure is preserved.