Now with Gemma 4 31B & MiniMax-M2.5

Stop renting
intelligence.

The open-source AI ecosystem, made enterprise-ready. Frontier open models, custom agents, and an open app suite, deployed anywhere from our cloud to your data center. You own everything. We handle the complexity.

POST /v1/chat/completions

curl https://api.arewa.ai/inference/v1/chat/completions \ -H "Authorization: Bearer $AREWA_API_KEY" \ -d '{ "model": "GLM-5", "messages": [{ "role": "user", "content": "Summarize our Q4 report" }] }' // → 200 OK, response in 48ms "Based on the Q4 financials…"

100%

Open-source model stack

OpenAI API

Drop-in compatible

0×

Data used for training

Agents

Custom agentic workflows, research & search

Included models

GLM-5.1Qwen3.5-9BDeepSeek V3.2Kimi K2.5 Gemma 4 31BMiniMax-M2.5Qwen3 Max ThinkingLlama Nemotron Ultra Mixtral 8×22BLlama 3.3 70BGLM-4.7 ThinkingGLM-4.6VDeepSeek-Coder-V2MiMo-V2-Flash GLM-5.1Qwen3.5-9BDeepSeek V3.2Kimi K2.5 Gemma 4 31BMiniMax-M2.5Qwen3 Max ThinkingLlama Nemotron Ultra Mixtral 8×22BLlama 3.3 70BGLM-4.7 ThinkingGLM-4.6VDeepSeek-Coder-V2MiMo-V2-Flash

Own everything

Your models, your fine-tuned weights, your data, your exit. Everything on Arewa is yours to inspect, move, and keep. Leverage stays on your side of the table.

The open ecosystem, frictionless

Every frontier open model, custom agents, and enterprise apps behind one OpenAI-compatible API, deployed, updated, and scaled by us. Open-source power without the ops burden.

Deploy on your terms

The same models and API from serverless cloud to fully air-gapped on-prem. Data isolation at every tier, so privacy is a consequence of the architecture, not a checkbox.

Philosophy

Open source-first.
No vendor lock-in.

We chose open source not as a trend, but as an architectural and philosophical principle. We provide real escape hatches and ensure no vendor lock-in.

Every model is open source

Every model on our platform has published weights and an open-source license. No black boxes, no licensing surprises, no silent regressions.

You own your exit

If you leave Arewa, you take your fine-tuned weights, your embeddings, your configuration, and all your data.

No vendor lock-in, ever

Our API is OpenAI-compatible. Switching providers takes two lines of code. Intentionally, a vendor you can't leave isn't a partner, it's a dependency.

Our position

Closed AI is a subscription to someone else's decisions.

The past three years have seen pricing doubled, models quietly degraded, terms rewritten, and entire product lines axed, all by AI providers that companies had bet their infrastructure on.

When your AI stack is a black box owned by someone else, you have no leverage, no visibility, and no exit. Arewa was built on the opposite premise: that the best AI infrastructure is one you can inspect, verify, move, and own.

We handle the complexity. You stay in control.

Where we come from

Forged in the
trenches of HPC.

BeforeGPU orchestration & low-level optimization

NowSovereign AI infrastructure for the enterprise

Our story didn't start with the chatbot boom. It started in the trenches of high-performance computing, optimizing computer vision algorithms on isolated physical servers, where the cloud wasn't an option and latency had to be minimal.

That phase forced us to master the deep engineering behind the hardware. We learned to orchestrate GPUs and squeeze every last bit of processing to achieve efficiency that seemed impossible. That obsession with technical efficiency is what we bring to the era of generative AI.

Companies competing in the global market need more than an API subscription. They need technological accompaniment and control. Arewa exists to remove the barrier between technical complexity and business innovation, delivering secure, private infrastructure that lets organizations adopt superintelligence on their own terms, data, and rules.

Get started

Ready to own
your AI stack?

Book a 30-minute demo; we'll map your current setup, show you where you're locked in, and what owning your stack looks like.

[email protected]

Products

Everything you need.
Nothing you don't.

A focused platform for serious teams, not a dashboard full of features you'll never open. From serverless inference to fully air-gapped on-prem.

What you get

The Arewa platform.

Five core capabilities, built for production from day one.

01 / Inference

High-performance inference

State-of-the-art open-weights models engineered for the perfect balance of low latency, high intelligence, and cost-efficiency. Production-ready from day one.

GLM-5llama-3.3-70bmistral-large-2deepseek-r1

02 / Customization

Custom agents & models

Agentic workflows, assistants, and models tuned on your domain data. Deep access to your systems demands agents you own: your agents, your weights, your competitive advantage.

Agentic workflowsLoRA / full fine-tuneYour weights

03 / Retrieval

Chat with your data

Connect your private knowledge base securely. Retrieve accurate context for your applications without exposing data to public training sets or shared embeddings.

04 / Enablement

AI Workflow Integration

Embed AI directly into your existing tools and processes. Our engineers work alongside yours, from architecture review to production rollout, ensuring every deployment delivers immediate business value, not just a proof of concept.

05 / Ecosystem

The Private AI Stack

A growing suite of enterprise applications (research assistants, search engines, agentic workflows), all open source, all yours, deeply integrated with the inference platform. More than an API: an ecosystem built for business.

Integration

OpenAI-compatible.
Migration in minutes.

Why it matters

Two-line migration. OpenAI-compatible API; swap your base_url and API key, keep every prompt, tool call, and integration exactly as-is.
Model freedom. Swap from Llama to Mistral to DeepSeek with one config change. No rewriting, no migration tax.
Predictable pricing. Pay per token, not per seat. Scale to a thousand users without renegotiating contracts.

Deployment tiers

Inference that fits
your threat model.

From instant serverless to fully air-gapped on-prem; every tier runs the same models and the same API. Only the SLA varies by deployment tier.

⚡

Cloud/ Serverless

Instant access via public API. Auto-scaling infrastructure designed for developers building agile AI applications. Pay per token, no commitment.

From $0.37 / 1M tokens

🔒

Dedicated/ Single tenant

Open source. Hosted right.
Best of both worlds.

Most teams choose between fragile self-hosting and opaque proprietary APIs. We sit in the corner that doesn't usually exist.

Feature

Arewa AI

Self-Hosted

Proprietary API

Open-source models

● Yes

✕ No

Data control

● Total

◐ Limited

Initial investment

↓ Low

↑ High

↓ Low

Time to production

● Minutes

↑ Months

● Minutes

Infrastructure managed

● Yes

✕ No

● Yes

Air-gapped deployment

● Yes

✕ No

Fine-tuning your weights

● Yes

◐ Limited

Vendor lock-in risk

● None

↑ High

Getting started

From zero to production-ready
in one afternoon.

Step 1

Create your workspace

Sign up, name your workspace, invite your team. Five minutes to a fully configured environment with your first model running.

Step 2

Pick your models

Choose from our hosted catalogue or bring your own. Set data policies, usage limits, and access controls per team.

Step 3

Connect and ship

Point your existing OpenAI-compatible code at Arewa. No rewriting required. Deploy to your users the same day.

Solutions by industry

Solve real problems.
Not just demos.

Vertical solutions designed for industries where ownership and control aren't optional. From protecting IP in software to sovereign infrastructure for government.

Software & Technology

IP protection

Challenge:Balancing speed of innovation with IP protection. Public AI tools often expose proprietary code and sensitive user data.

Solution:Private coding assistants and embedded LLMs that accelerate development cycles without your IP ever leaving the perimeter.

Lines of your code logged externally

Manufacturing 4.0

Edge deployment

Challenge:Factories often lack stable cloud connectivity, and operational blueprints are too sensitive to upload.

Solution:Edge AI deployment directly on plant servers. Process visual QA and predictive maintenance locally with zero lag.

On-site

No cloud connection required

Public Sector

Sovereign infra

Challenge:Legal mandates prohibit citizen data from crossing national borders, stalling modernization.

Solution:Sovereign infrastructure. We bring the model to your national data center, ensuring full compliance with data residency laws.

100%

In-country data residency

Financial Services

Compliance

Challenge:Strict regulatory frameworks (SOX, PCI-DSS, local banking laws) prevent adoption of cloud-based AI tools.

Solution:Dedicated instances with full audit trails, role-based access, and data residency guarantees that satisfy compliance officers.

Single-tenant

Audit trails on your infrastructure, under your control

Your industry

Don't see your sector?

Our infrastructure is industry-agnostic. If you need private, high-performance AI, we can help. Let's talk.

Model catalogue

Frontier models,
ready to deploy.

Every model is open-source, production-optimized, and available across all deployment tiers. Choose what fits your needs, we’ll handle the rest.

All models

Catalogue

Choose your model. We handle the rest.

All models ship with identical API schema, logging, and auth; swap between them with a single parameter change.

The catalogue below is a curated selection. The full, always-updated model directory, including live pricing, is on your Dashboard → Models page.

Gemma 4 31B

Google31B params

Google's compact Gemma 4 model. Excellent instruction following with a strong performance-per-parameter ratio for cost-efficient production deployments.

128K contextTextGemma License

MiniMax-M2.5

MiniMaxMoE architecture

MiniMax's latest flagship text model. Exceptional at long-context tasks, multilingual generation, and complex enterprise workloads.

4M contextTextApache 2.0

GLM-5.1

Zhipu AIReasoning

Updated release of Zhipu AI's flagship. Improved reasoning depth, faster generation, and enhanced multilingual instruction accuracy over GLM-5.

128K contextTextApache 2.0

Qwen3.5-9B

Alibaba9B params

Alibaba's efficient 9B parameter model. Punches well above its weight class in coding, math, and instruction following, ideal for resource-constrained deployments.

128K contextText + CodeApache 2.0

DeepSeek V3.2

DeepSeekMoE architecture

Latest generation of DeepSeek's flagship model. Outstanding coding, mathematics, and reasoning with open weights.

128K contextText + CodeMIT

Kimi K2.5

Moonshot AIReasoning

Advanced reasoning model from Moonshot AI. Excels at multi-step problem solving, agentic tasks, and long-context understanding.

128K contextReasoningApache 2.0

Qwen3 Max Thinking

AlibabaThinking variant

Alibaba's latest flagship with extended thinking capabilities. Top-tier at math, code, and Chinese/English instruction following.

128K contextText + CodeApache 2.0

Llama Nemotron Ultra

NVIDIA253B params

NVIDIA's dense open model. Best-in-class reasoning, instruction following, and STEM performance on a Llama-based architecture.

128K contextTextLlama 3.1

DeepSeek-Coder-V2

DeepSeekMoE architecture

Specialized coding model surpassing GPT-4o on competitive benchmarks. Supports 338 programming languages and strong code review.

128K contextText + CodeMIT

Custom models

Need a model
we don't list?

We can deploy any open-weights model on your infrastructure. Bring your own fine-tuned weights or request a new addition to the catalogue.

Documentation

Build with Arewa.

OpenAI-compatible API. Drop-in replacement. Migrate in minutes, or start from scratch with our quickstart below.

Getting Started

Arewa's inference API is fully compatible with the OpenAI Chat Completions specification. If you're already using OpenAI, switching takes two lines: update your base_url and api_key.

1. Get your API key

2. Make your first request

3. Choose a model

Replace "qwen3.5-9b-thinking" with any model ID from our model catalogue. All models use the same request schema; switching is a one-line change.

Key concepts

Workspaces isolate teams, billing, and access controls. Each workspace gets its own API key namespace and audit log.

Deployment tiers determine where your data is processed. Choose Cloud, Dedicated, BYO, or Sovereign at workspace creation; upgrade anytime without code changes.

Authentication

All API requests are authenticated with a Bearer token in the Authorization header. Keys are scoped to a workspace and start with sk-prod.

Get your key

Sign in at dashboard.arewa.ai
Navigate to Settings → API Keys
Click New API Key and name it for your environment

Send requests

Include your key in every API call:

Authorization: Bearer sk-prod-your-key-here

Store your key as an environment variable (AREWA_API_KEY) and never embed it in client-side code. Create one key per environment (development, staging, production) to track usage and rotate independently.

Key rotation

API keys can be revoked and reissued at any time from the dashboard without downtime. Revocation takes effect immediately across all endpoints.

Chat Completions

The primary endpoint for text and multimodal generation. Fully compatible with the OpenAI Chat Completions spec.

POST /v1/chat/completions

Required parameters

model: model ID from the catalogue (e.g. qwen3.5-9b-thinking)
messages: array of {"role", "content"} objects

Common optional parameters

temperature: float 0–2, controls randomness. Default: 1.0
max_tokens: integer, maximum tokens to generate
stream: boolean, stream tokens via SSE. Default: false
top_p: float 0–1, nucleus sampling threshold

Response

The response object contains an id for debugging, the model used, a choices array with the generated message, and a usage object with prompt, completion, and total token counts.

Streaming

Set stream: true to receive tokens as they are generated. Responses follow the server-sent events (SSE) format; each event is a JSON delta prefixed with data:. The stream terminates with data: [DONE].

Account Management API

The Account Management API lets you manage API keys, query usage metrics, and check your credit balance programmatically. All endpoints live under a separate base URL from the inference API.

Base URL:  https://api.arewa.ai/manage/v1

Authentication

Use your Arewa API key (sk-prod-...) as a Bearer token, the same key you use for inference calls. All endpoints also accept a JWT session token for dashboard-initiated requests.

Authorization: Bearer sk-prod-your-key-here

API Keys

Create and rotate keys programmatically. The full key value is returned only once at creation; store it securely immediately.

List keys

GET /api-keys
GET /api-keys?include_revoked=true   # include soft-deleted keys

Create a key

POST /api-keys
Content-Type: application/json

{ "name": "Production" }

→ 201 Created
{
  "id":         "key_abc123",
  "name":       "Production",
  "value":      "sk-prod-...",   ← shown once, store immediately
  "created_at": 1757892600
}

Revoke a key

DELETE /api-keys/{key_id}

→ 200 OK
{ "deleted": true }

Usage Metrics

Query token consumption, request counts, and cost over any time window. All timestamps are Unix seconds.

Single metric

GET /usage?metric=tokens&from=1756684800&to=1757548800&granularity=daily

# metric      → tokens | requests | cost
# granularity → daily | hourly | minute
# model       → model alias or "all"  (optional, default: all)
# api_key_id  → key ID or "all"       (optional, default: all)

All metrics in one call

GET /usage/summary?from=1756684800&to=1757548800&granularity=daily

→ 200 OK
{
  "tokens":   { "total": 4820000, "data": [...] },
  "requests": { "total": 1204,    "data": [...] },
  "cost":     { "total": 0.96,    "data": [...] }
}

Available Models

Returns all active models with per-million-token pricing. Use the id field as the model parameter in inference calls.

GET /models

→ 200 OK
[{
  "id":                 "qwen3.5-9b",
  "name":               "Qwen3.5-9B",
  "cost_mtoken_input":  0.15,
  "cost_mtoken_output": 0.60
}]

Credit Balance

Check your available spend in USD. Credits are consumed as inference requests complete and are deducted in real time.

GET /credits/balance

→ 200 OK
{ "balance": 12.50 }

Dashboard

The dashboard at dashboard.arewa.ai is your visual control panel; sign in with the same credentials you use for the API. It covers six areas:

Page	Path	What you can do
Models	/dashboard	Browse the full live model catalogue with real-time pricing. Click any model to see specs and copy its alias for use in API calls.
API Keys	/dashboard/api-keys	Create, name, and revoke inference keys (sk-prod-...). The full key value is shown only at creation; copy it immediately.
Usage	/dashboard/usage	Charts for token consumption, request count, and estimated cost. Filter by model or date range. Your current credit balance is shown at the top.
Billing	/dashboard/billing	Add or update a payment method and top up your credit balance. Minimum top-up is $5.
Playground	/dashboard/playground	Test any model interactively in the browser before wiring up an API integration. Requires an active API key and credit balance.
Profile	/dashboard/profile	View and update your account information and preferences.

Getting your first API key

Sign in at dashboard.arewa.ai
Complete onboarding if prompted, enter your role, company, and country
Open API Keys in the navigation
Click Create Key and give it a name (e.g. production, dev)
Copy the full key; it starts with sk-prod- and is shown only once
Store it as an environment variable and use it for inference calls

export AREWA_API_KEY="sk-prod-..."

curl https://api.arewa.ai/inference/v1/chat/completions \
  -H "Authorization: Bearer $AREWA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "model": "qwen3.5-9b", "messages": [{ "role": "user", "content": "Hello" }] }'

Blog

Engineering notes.

Deep dives from the team behind Arewa, GPU optimization, model benchmarks, and the infrastructure decisions we make in production.

Contact

Let's build your
AI stack.

Drop us an email; we read every message and reply within one business day. Tell us a bit about your setup and we'll take it from there.

Write to us directly

For demo requests, technical questions, pricing, or partnership opportunities. We read and respond to every email.

[email protected]

Response time

Within 24 hours on business days (CST, Monterrey).

Headquarters

Arewa de México S.A de C.V
Monterrey, Nuevo León, Mexico
Serving global and regional markets

Suggested email structure

To:      [email protected]
Subject: [Company] Arewa inquiry

Hi Arewa,

I'm [Name] from [Company], a [team size] team
working in [industry / use case].

What we want to own:
Open models / Custom agents / App suite / The full stack
(pick any, or ask us to recommend)

Where we'd deploy:
Arewa Cloud / Dedicated / BYO Cloud / Sovereign
(pick one or ask us to recommend)

Current setup:
- Running on:        [provider / in-house / greenfield]
- Locked in on:      [hard to move today, if anything]
- Data sensitivity:  [internal / regulated / classified]
- Expected volume:   [requests/day or approx. tokens]
- Target timeline:   [e.g. Q4 2026, ASAP, exploring]

Optional, feasibility review:
[ ] Not sure where AI fits yet? We'd like Arewa to
    assess our use case before we commit to a stack.

Questions we have:
- [Any specific technical or pricing question]

Happy to jump on a 30-min call any time this week.

Best,
[Name] · [Title] · [Company]

Not every field is required; even a short intro helps us prepare a relevant reply.

Stop rentingintelligence.

Open source-first.No vendor lock-in.

Forged in thetrenches of HPC.

Ready to ownyour AI stack?

Everything you need.Nothing you don't.

The Arewa platform.

OpenAI-compatible.Migration in minutes.

Inference that fitsyour threat model.

Open source. Hosted right.Best of both worlds.

From zero to production-readyin one afternoon.

Solve real problems.Not just demos.

Don't see your sector?

Frontier models,ready to deploy.

Choose your model. We handle the rest.

Need a modelwe don't list?

Build with Arewa.

Getting Started

1. Get your API key

2. Make your first request

3. Choose a model

Key concepts

Authentication

Get your key

Send requests

Key rotation

Chat Completions

Required parameters

Common optional parameters

Response

Streaming

Account Management API

Authentication

API Keys

Usage Metrics

Available Models

Credit Balance

Dashboard

Getting your first API key

Engineering notes.

Let's build yourAI stack.

Stop renting
intelligence.

Open source-first.
No vendor lock-in.

Forged in the
trenches of HPC.

Ready to own
your AI stack?

Everything you need.
Nothing you don't.

OpenAI-compatible.
Migration in minutes.

Inference that fits
your threat model.

Open source. Hosted right.
Best of both worlds.

From zero to production-ready
in one afternoon.

Solve real problems.
Not just demos.

Frontier models,
ready to deploy.

Need a model
we don't list?

Let's build your
AI stack.