NAGIX AI Technical Architecture v1.5
Table of Contents
NAGIX AI Technical Architecture v1.5
1. Overview
NAGIX AI is described as a SaaS system hosted in AWS for automatic AI-based conversion of documents into accessible documents, including PDF/UA outputs. The system is intended to process individual documents, including long and complex documents, and convert them automatically into accessible formats.
The platform supports common office/document formats such as PDF, Word, PowerPoint, and Excel. It is designed to identify document structure elements such as headings, tables, images, reading order, and other accessibility-related elements required for accessible output.
2. Architecture at a Glance
The architecture is built around an AWS-hosted Kubernetes environment. The diagram in the source PDF shows Cloudflare, AWS WAF, NGINX ingress, Amazon EKS, an internal Kubernetes cluster, internal services, customer-specific storage, and integrations with Azure and OpenAI services.
2.1 High-Level Layers
| Layer | Purpose | Main Elements |
|---|---|---|
| Edge and ingress | Protect and route incoming access to internal services. | Cloudflare, AWS WAF, NGINX ingress |
| Compute and orchestration | Run the platform services in a managed Kubernetes environment. | Amazon EKS / Kubernetes Cluster |
| Core application services | Receive requests, coordinate processing, expose secure APIs, and manage operational workflows. | NAGIX-AI service, Backoffice API, OCR service, Shield Proxy |
| Identity and administration | Authenticate users, manage roles, groups, companies, and authorized access. | Keycloak, Backoffice Front Application |
| Data and file storage | Store technical metadata and document files with tenant separation. | MongoDB, Amazon S3 customer buckets |
| External AI/OCR services | Perform OCR, document analysis, and image analysis when required. | Azure Content Understanding, Azure Document Intelligence, OpenAI |
3. Deployment Model
The source document separates deployment responsibilities into four numbered areas shown in the architecture diagram.
| Diagram Mark | Deployment Meaning | KB Interpretation |
|---|---|---|
| 1 | Managed in the supplier environment. Can also be deployed in the customer's organizational EKS environment. | The main platform stack is supplier-managed by default, with a possible customer-hosted EKS model. |
| 2-3 | Third-party services operated and managed only in the supplier's infrastructure and not intended for customer-side installation. | Azure OCR/document services and OpenAI image analysis remain external service dependencies. |
| 4 | Can be installed either in the customer environment or in the supplier environment, depending on target architecture and project requirements. | The portal/application access layer has flexible deployment placement. |
4. Component Map
4.1 NAGIX-AI Core Service
The NAGIX-AI service is the central service layer. It exposes secure APIs to system consumers and integrations, receives requests, manages processing coordination, and orchestrates the platform components involved in document conversion.
4.2 Internal OCR Service
The architecture includes an internal OCR service running inside the Kubernetes cluster. Its role is document processing and information extraction as part of the overall conversion workflow.
4.3 Shield Proxy
The diagram includes an internal Shield Proxy layer inside the Kubernetes cluster. In the KB context, this should be treated as an internal mediation/protection layer between platform services unless more detailed design documentation defines its exact routing and enforcement responsibilities.
4.4 Keycloak
Keycloak is the identity and access management component for the Backoffice. It handles authentication, authorization, roles, and user groups. The source document states support for OpenID Connect, OAuth 2.0, SSO, and integration with enterprise identity systems such as Microsoft Entra ID or Active Directory.
4.5 Backoffice Front Application
The Backoffice Front Application is an Angular-based web application used as the administration interface. It can be installed in the supplier environment or customer environment. It supports management of companies, authorized users, roles, permissions, and administrative/operational actions.
4.6 Backoffice API
The Backoffice API is the backend service for the administration layer. It exposes secure APIs for company, user, role, and permission management. It is also responsible for request handling, access enforcement, data validation, and saving/updating management information in the system databases.
4.7 NAGIX-AI Portal
The NAGIX-AI Portal is the user-facing access layer. It communicates with backend services through secure APIs. Users can create and manage business processes, view processed accessible documents, retrieve document lists, perform management/control actions, and track workflow status according to their authorization level.
5. Data Storage, Retention, and Tenant Isolation
5.1 MongoDB
The system uses MongoDB databases managed in AWS. According to the source document, each customer receives a dedicated and separate database instance to support full tenant isolation. MongoDB stores technical and operational metadata only, not the document content itself.
| Stored in MongoDB | Not Stored in MongoDB |
|---|---|
|
|
5.2 Amazon S3
Amazon S3 is used for secure document and file storage. Each customer receives a dedicated and separate bucket, enabling separation between customer data. The default retention period is up to 30 days, and the retention duration can be adjusted according to customer requirements.
The source document also states that, where required, the system can be configured to work directly with the customer's S3 environment so that files do not need to be stored in the supplier's storage environment.
6. External AI and OCR Services
The source architecture consumes external services for document OCR, document analysis, and image analysis.
| External Service | Purpose | Data Handling Notes from Source |
|---|---|---|
| Azure Content Understanding and Azure Document Intelligence | OCR, document analysis, and structured data extraction. | Communication is performed over TLS/HTTPS. The source document states use of Private Endpoint / Private Channel and temporary storage in Microsoft Azure West Europe during processing. |
| OpenAI | Image analysis only. | Only the images required for processing are sent, not the full document, following a data minimization approach. |
6.1 Azure Services
Azure Content Understanding and Azure Document Intelligence are used for OCR, document analysis, and extraction of structured information. Data transfer between AWS and Azure is described as encrypted using TLS/HTTPS. The document states that access to these services is protected through Private Endpoint and Private Channel so the services are not exposed to the public internet and are accessible only from authorized addresses and networks.
During processing, documents are stored temporarily in the Microsoft Azure service environment in the West Europe region. The source document states that documents are deleted according to the service policy after processing and are not used by Microsoft to train, improve, or adapt AI models.
6.2 OpenAI Image Analysis
OpenAI is used only for image analysis. The source document emphasizes data minimization: only required images are sent to the service, without sending the complete document or unrelated information.
7. Security Controls
7.1 Access Control
Access to the Backoffice and Portal is limited to authorized users. User visibility and available actions are controlled by authentication and authorization mechanisms, so each user can access only the information and operations allowed by their role.
7.2 Tenant Isolation
The architecture describes tenant isolation at both metadata and file-storage levels: customer-specific MongoDB instances and customer-specific Amazon S3 buckets. The design goal is to prevent sharing or access between customer data environments.
7.3 Least Privilege
The source document refers to the principle of Least Privilege. Access to databases, S3 storage, internal services, and service-specific data should be limited only to the components and services that require it for their role.
7.4 Encryption
| Area | Control |
|---|---|
| Files at rest in S3 | Automatic AWS encryption at rest is described in the source document. |
| Data in transit | TLS/HTTPS is used for communication, including AWS-to-Azure communication. |
| Azure processing | The source document states data is encrypted during storage and processing according to Azure security mechanisms. |
7.5 Secrets Management
The system uses secure secrets management for access keys, tokens, external service credentials, and cloud permissions. The current model described in the PDF manages required secrets in Kubernetes with access restricted to authorized services. The architecture also supports AWS Secrets Manager for secure storage, permission management, and centralized control.
The document defines a separation between customer-owned secrets and supplier-owned secrets. Customer-related secrets can remain under customer management and control, while supplier-service secrets are managed in the supplier's secured environment. Integration with customer enterprise secret management systems is also supported.
8. Monitoring and Operational Responsibility
The system includes monitoring, control, and alerting capabilities for service availability, performance, processing health, and abnormal system events. Operational responsibility depends on the deployment model.
| Deployment Model | Operational Responsibility |
|---|---|
| Supplier-managed SaaS | The supplier is responsible for monitoring, maintenance, incident handling, alert management, and availability control. |
| Customer-hosted environment | The customer is responsible for monitoring infrastructure, services, and resources using the customer's tools, procedures, and organizational controls. |
In a customer-hosted model, controlled information sharing, alerts, or support access can be defined for troubleshooting and support, subject to the customer's information security policy.
9. QA / Architecture Review Checklist
Use this checklist when reviewing a real deployment, implementation proposal, or customer-facing architecture response.
| Area | Question | Expected Evidence |
|---|---|---|
| Deployment model | Which components are supplier-hosted, customer-hosted, or third-party? | Approved target architecture diagram and responsibility matrix. |
| Tenant isolation | Does each customer have separate MongoDB and S3 resources? | Cloud resource mapping, naming convention, IAM policy evidence. |
| Retention | Is the default 30-day file retention accepted or changed? | Customer-approved retention setting and deletion verification process. |
| External services | Which data is sent to Azure and OpenAI? | Data flow diagram, payload examples, vendor policy references. |
| Private connectivity | Are Azure services reachable only through authorized private paths? | Private Endpoint / network configuration evidence. |
| Identity | Is Keycloak integrated with customer SSO, Entra ID, or Active Directory? | OIDC/OAuth configuration, MFA policy, role mapping. |
| Secrets | Which secrets are customer-owned and which are supplier-owned? | Secrets ownership matrix and access policy. |
| Monitoring | Who monitors service health and handles alerts? | Runbook, SLA/SLO agreement, escalation path. |
| Accessibility output | Which standards are used to validate output? | PDF/UA, WCAG, PAC/validator results, remediation reports. |
10. Source Mapping
| PDF Page | Used For |
|---|---|
| Page 1 | Title, product name, version 1.5, NAGIX AI branding. |
| Page 2 | Cloud architecture overview and architecture diagram. |
| Page 3 | Deployment model, NAGIX-AI core, OCR service, MongoDB, Amazon S3, Keycloak, Backoffice components. |
| Page 4 | S3 storage, retention, encryption, Keycloak, Backoffice Front Application, Backoffice API. |
| Page 5 | NAGIX-AI Portal, Azure OCR/document analysis services, OpenAI image analysis, data minimization. |
| Page 6 | Secrets management, monitoring, operational responsibility for supplier-managed SaaS and customer-hosted models. |
| Page 7 | Product summary, supported file types, accessibility targets, PDF/UA and WCAG context. |
Recommended KB Tags
NAGIX-AI, Technical Architecture, AWS, EKS, Kubernetes, MongoDB, S3, Keycloak, Azure Document Intelligence, OpenAI, PDF/UA, WCAG, Accessibility, SaaS, Security