Documentation / Neural Core V2

Neural Core V2

Fondasi runtime ConAI untuk inference, orchestration, dan decisioning real-time di lingkungan operasional bisnis.

Low-Latency Inference

Pipeline inferensi dirancang untuk waktu respons sub-detik dengan orkestrasi context window adaptif.

Sovereign Routing

Request diarahkan ke engine yang paling sesuai berdasarkan intent, policy, dan QoS tenant.

Operational Hardening

Retry policy, fallback berlapis, dan observability berbasis event menjaga stabilitas sistem.

Core Flow

  1. 1. Input request masuk ke layer intent parsing dan policy evaluation.
  2. 2. Router memilih model + toolchain terbaik berdasarkan konteks tenant.
  3. 3. Engine menjalankan inference real-time dan validasi guardrail.
  4. 4. Response dikirim dengan telemetry lengkap untuk observability.

Authentication & Authorization

Auth Flow

Semua request ke Neural Core harus menyertakan JWT token yang diterbitkan oleh ConAI Auth Service. Token berisi tenant ID, role, dan scope permission.

# Request Header

Authorization: Bearer <jwt_token>

X-Tenant-ID: mitra_abc123

X-Request-ID: uuid-v4

# JWT Payload Structure

{

"sub": "user_id",

"tid": "tenant_id",

"role": "mitra|superadmin|investor",

"scopes": ["inference:read", "inference:write"],

"plan": "growth|pro|enterprise",

"iat": 1716000000,

"exp": 1716086400

}

Permission Scopes

ScopeDeskripsiPlan Minimum
inference:readQuery inference history & statusStarter
inference:writeSubmit inference requestStarter
inference:streamReal-time streaming responseGrowth
policy:manageCRUD guardrail policiesPro
telemetry:readAccess observability metricsPro

Inference API

POST/api/neural/infer

Submit inference request ke Neural Core. Mendukung mode synchronous dan streaming.

Request Body

{

"messages": [

{ "role": "system", "content": "Kamu adalah asisten CS..." },

{ "role": "user", "content": "Kapan tagihan saya jatuh tempo?" }

],

"mode": "sync" | "stream",

"context": {

"customer_id": "cust_xyz",

"channel": "whatsapp" | "telegram" | "web",

"session_id": "sess_abc"

},

"options": {

"max_tokens": 1024,

"temperature": 0.7,

"tools": ["billing_lookup", "ticket_create"],

"guardrail_policy": "default" | "strict" | "custom_id"

}

}

Response (sync mode)

{

"id": "inf_abc123",

"status": "completed",

"output": {

"content": "Tagihan Anda jatuh tempo tanggal 25 Mei...",

"tool_calls": [

{ "tool": "billing_lookup", "result": { "due_date": "2026-05-25" } }

]

},

"usage": {

"prompt_tokens": 142,

"completion_tokens": 67,

"total_tokens": 209

},

"telemetry": {

"latency_ms": 340,

"model_used": "hermes-3-70b",

"node_id": "node-sg-01",

"routing_decision_ms": 12,

"guardrail_pass": true

}

}

Response (stream mode)

# SSE Stream Format

data: {"type":"token","content":"Tagihan"}

data: {"type":"token","content":" Anda"}

data: {"type":"token","content":" jatuh"}

data: {"type":"tool_call","tool":"billing_lookup","status":"executing"}

data: {"type":"tool_result","tool":"billing_lookup","result":{"due_date":"2026-05-25"}}

data: {"type":"done","usage":{"total_tokens":209},"telemetry":{"latency_ms":340}}

GET/api/neural/status/:inference_id

Cek status inference request yang sedang berjalan atau sudah selesai.

{

"id": "inf_abc123",

"status": "completed" | "processing" | "queued" | "failed",

"created_at": "2026-05-18T10:00:00Z",

"completed_at": "2026-05-18T10:00:00.340Z",

"error": null | { "code": "NC-001", "message": "..." }

}

Rate Limiting

Rate limit diterapkan per tenant berdasarkan plan. Header response menyertakan informasi kuota real-time.

PlanRequests/menitTokens/hariConcurrent
Starter2050,0002
Growth60500,0005
Pro2002,000,00015
EnterpriseCustomUnlimitedCustom

# Rate Limit Response Headers

X-RateLimit-Limit: 60

X-RateLimit-Remaining: 42

X-RateLimit-Reset: 1716012060

X-TokenLimit-Daily-Remaining: 487231

Error Codes

CodeMeaningResolution
NC-001Context window overflow — input melebihi batas token model aktifKurangi panjang input atau aktifkan auto-truncation policy
NC-002Model routing timeout — tidak ada engine yang merespons dalam SLACek health node atau naikkan timeout threshold
NC-003Guardrail rejection — output melanggar policy tenantReview policy rules di tenant settings
NC-004Rate limit exceeded — tenant melampaui kuota inference/menitUpgrade plan atau tunggu cooldown window
NC-005Fallback exhausted — semua engine dalam chain gagalEskalasi ke NOC, cek status semua node

Observability Metrics

Semua metric diekspor via Prometheus-compatible endpoint dan tersedia di dashboard monitoring tenant.

MetricUnitDeskripsi
nc.inference.latency_p50msMedian latency inference end-to-end
nc.inference.latency_p99ms99th percentile latency
nc.routing.decision_timemsWaktu routing engine selection
nc.guardrail.rejection_rate%Persentase output yang ditolak guardrail
nc.fallback.trigger_countcount/minJumlah fallback triggered per menit
nc.token.usage_totaltokensTotal token consumed per tenant
nc.queue.depthcountJumlah request dalam antrian inference
nc.model.availability%Uptime model engine per node

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                        CLIENT REQUEST                            │
│  (WhatsApp / Telegram / Web / API)                              │
└──────────────────────────────┬──────────────────────────────────┘
                               │
                               ▼
┌──────────────────────────────────────────────────────────────────┐
│                      GATEWAY LAYER                                │
│  ┌──────────┐  ┌──────────────┐  ┌────────────────┐             │
│  │ Auth/JWT │  │ Rate Limiter │  │ Request Router │             │
│  └──────────┘  └──────────────┘  └────────────────┘             │
└──────────────────────────────┬───────────────────────────────────┘
                               │
                               ▼
┌──────────────────────────────────────────────────────────────────┐
│                    NEURAL CORE V2                                 │
│  ┌────────────────┐  ┌───────────────┐  ┌──────────────────┐    │
│  │ Intent Parser  │  │ Model Router  │  │ Context Manager  │    │
│  └────────────────┘  └───────────────┘  └──────────────────┘    │
│  ┌────────────────┐  ┌───────────────┐  ┌──────────────────┐    │
│  │ Tool Executor  │  │  Guardrails   │  │ Telemetry Emit   │    │
│  └────────────────┘  └───────────────┘  └──────────────────┘    │
└──────────────────────────────┬───────────────────────────────────┘
                               │
                               ▼
┌──────────────────────────────────────────────────────────────────┐
│                    ENGINE POOL                                    │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐        │
│  │ Hermes-3 │  │ GPT-4o   │  │ Claude   │  │ Fallback │        │
│  │  70B     │  │  mini    │  │  Sonnet  │  │  Local   │        │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘        │
└──────────────────────────────────────────────────────────────────┘
Lanjut ke Provisioning API Kembali ke Landing