Stop renting
intelligence.
We are your technology partner for deploying frontier AI models without risking your data. Move from renting intelligence to a secure, private, scalable infrastructure, on your terms, your data, your rules.
Your prompts never touch a shared pipeline. Every tier — from serverless to on-prem — is built for data isolation. Privacy is the architecture, not a checkbox.
Llama, Mistral, DeepSeek, Qwen and more, deployed, updated, and scaled by us. You get the performance without the ops burden.
Every model we run is open source. The apps are open source. If you leave, you take everything. There is no proprietary trap.
Open source-first.
No vendor lock-in.
We chose open source not as a trend, but as an architectural and philosophical principle. We provide real escape hatches and ensure no vendor lock-in.
Every model on our platform has published weights and an open-source license. No black boxes, no licensing surprises, no silent regressions.
If you leave Arewa, you take your fine-tuned weights, your embeddings, your configuration, and all your data.
Our API is OpenAI-compatible. Switching providers takes two lines of code. Intentionally, a vendor you can't leave isn't a partner, it's a dependency.
Closed AI is a subscription to someone else's decisions.
The past three years have seen pricing doubled, models quietly degraded, terms rewritten, and entire product lines axed, all by AI providers that companies had bet their infrastructure on.
When your AI stack is a black box owned by someone else, you have no leverage, no visibility, and no exit. Arewa was built on the opposite premise: that the best AI infrastructure is one you can inspect, verify, move, and own.
We run the complexity. You stay in control.
Forged in the
trenches of HPC.
Our story didn't start with the chatbot boom. It started in the trenches of high-performance computing, optimizing computer vision algorithms on isolated physical servers, where the cloud wasn't an option and latency had to be minimal.
That phase forced us to master the deep engineering behind the hardware. We learned to orchestrate GPUs and squeeze every last bit of processing to achieve efficiency that seemed impossible. That obsession with technical efficiency is what we bring to the era of generative AI.
Companies competing in the global market need more than an API subscription. They need technological accompaniment and control. Arewa exists to remove the barrier between technical complexity and business innovation, delivering secure, private infrastructure that lets organizations adopt superintelligence on their own terms, data, and rules.
Ready to own
your AI stack?
Book a 30-minute demo; we'll map your current setup, identify the privacy gaps, and show you exactly what a migration looks like.
Everything you need.
Nothing you don't.
A focused platform for serious teams, not a dashboard full of features you'll never open. From serverless inference to fully air-gapped on-prem.
The Arewa platform.
Five core capabilities, built for production from day one.
State-of-the-art open-weights models engineered for the perfect balance of low latency, high intelligence, and cost-efficiency. Production-ready from day one.
Adapt models to your specific domain data. Create proprietary models that belong to you, your IP, your weights, your competitive advantage.
Connect your private knowledge base securely. Retrieve accurate context for your applications without exposing data to public training sets or shared embeddings.
Embed AI directly into your existing tools and processes. Our engineers work alongside yours — from architecture review to production rollout — ensuring every deployment delivers immediate business value, not just a proof of concept.
A growing suite of enterprise applications — research assistants, search engines, agentic workflows — all open source, deeply integrated with the inference platform. More than an API: an ecosystem built for business.
OpenAI-compatible.
Migration in minutes.
- Two-line migration. OpenAI-compatible API; swap your
base_urland API key, keep every prompt, tool call, and integration exactly as-is. - Model freedom. Swap from Llama to Mistral to DeepSeek with one config change. No rewriting, no migration tax.
- Predictable pricing. Pay per token, not per seat. Scale to a thousand users without renegotiating contracts.
Inference that fits
your threat model.
From instant serverless to fully air-gapped on-prem; every tier runs the same models and the same API. Only the SLA varies by deployment tier.
Instant access via public API. Auto-scaling infrastructure designed for developers building agile AI applications. Pay per token, no commitment.
Your own private instance with guaranteed throughput. Isolated compute for enterprises with strict security boundaries. Full fine-tuning support.
Deploy Arewa's inference stack into your AWS, GCP, or Azure account. Your VPC, your keys, your compliance perimeter, managed by us.
Air-gapped deployment on your physical hardware. Absolute data residency for Government, Defense, and regulated industries. Zero external calls.
Open source. Hosted right.
Best of both worlds.
Most teams choose between fragile self-hosting and opaque proprietary APIs. We sit in the corner that doesn't usually exist.
From zero to production-ready
in one afternoon.
Sign up, name your workspace, invite your team. Five minutes to a fully configured environment with your first model running.
Choose from our hosted catalogue or bring your own. Set privacy policies, usage limits, and access controls per team.
Point your existing OpenAI-compatible code at Arewa. No rewriting required. Deploy to your users the same day.
Solve real problems.
Not just demos.
Vertical solutions designed for industries where privacy and control aren't optional. From protecting IP in software to sovereign infrastructure for government.
Don't see your sector?
Our infrastructure is industry-agnostic. If you need private, high-performance AI — we can help. Let's talk.
Frontier models,
ready to deploy.
Every model is open-source, production-optimized, and available across all deployment tiers. Choose what fits your needs, we’ll handle the rest.
Choose your model. We handle the rest.
All models ship with identical API schema, logging, and auth; swap between them with a single parameter change.
The catalogue below is a curated selection. The full, always-updated model directory — including live pricing — is on your Dashboard → Models page.
Google's compact Gemma 4 model. Excellent instruction following with a strong performance-per-parameter ratio for cost-efficient production deployments.
MiniMax's latest flagship text model. Exceptional at long-context tasks, multilingual generation, and complex enterprise workloads.
Updated release of Zhipu AI's flagship. Improved reasoning depth, faster generation, and enhanced multilingual instruction accuracy over GLM-5.
Alibaba's efficient 9B parameter model. Punches well above its weight class in coding, math, and instruction following, ideal for resource-constrained deployments.
Latest generation of DeepSeek's flagship model. Outstanding coding, mathematics, and reasoning with open weights.
Advanced reasoning model from Moonshot AI. Excels at multi-step problem solving, agentic tasks, and long-context understanding.
Alibaba's latest flagship with extended thinking capabilities. Top-tier at math, code, and Chinese/English instruction following.
NVIDIA's dense open model. Best-in-class reasoning, instruction following, and STEM performance on a Llama-based architecture.
Specialized coding model surpassing GPT-4o on competitive benchmarks. Supports 338 programming languages and strong code review.
Need a model
we don't list?
We can deploy any open-weights model on your infrastructure. Bring your own fine-tuned weights or request a new addition to the catalogue.
Build with Arewa.
OpenAI-compatible API. Drop-in replacement. Migrate in minutes, or start from scratch with our quickstart below.
Getting Started
Arewa's inference API is fully compatible with the OpenAI Chat Completions specification. If you're already using OpenAI, switching takes two lines: update your base_url and api_key.
1. Get your API key
Sign up at dashboard.arewa.ai and generate an API key from Settings → API Keys. Your key starts with sk-prod.
2. Make your first request
3. Choose a model
Replace "qwen3.5-9b-thinking" with any model ID from our model catalogue. All models use the same request schema; switching is a one-line change.
Key concepts
Workspaces isolate teams, billing, and access controls. Each workspace gets its own API key namespace and audit log.
Privacy tiers determine where your data is processed. Choose Cloud, Dedicated, BYO, or Sovereign at workspace creation; upgrade anytime without code changes.
Authentication
All API requests are authenticated with a Bearer token in the Authorization header. Keys are scoped to a workspace and start with sk-prod.
Get your key
- Sign in at dashboard.arewa.ai
- Navigate to Settings → API Keys
- Click New API Key and name it for your environment
Send requests
Include your key in every API call:
Authorization: Bearer sk-prod-your-key-here
Store your key as an environment variable (AREWA_API_KEY) and never embed it in client-side code. Create one key per environment — development, staging, production — to track usage and rotate independently.
Key rotation
API keys can be revoked and reissued at any time from the dashboard without downtime. Revocation takes effect immediately across all endpoints.
Chat Completions
The primary endpoint for text and multimodal generation. Fully compatible with the OpenAI Chat Completions spec.
POST /v1/chat/completions
Required parameters
- model — model ID from the catalogue (e.g. qwen3.5-9b-thinking)
- messages — array of {"role", "content"} objects
Common optional parameters
- temperature — float 0–2, controls randomness. Default: 1.0
- max_tokens — integer, maximum tokens to generate
- stream — boolean, stream tokens via SSE. Default: false
- top_p — float 0–1, nucleus sampling threshold
Response
The response object contains an id for debugging, the model used, a choices array with the generated message, and a usage object with prompt, completion, and total token counts.
Streaming
Set stream: true to receive tokens as they are generated. Responses follow the server-sent events (SSE) format; each event is a JSON delta prefixed with data:. The stream terminates with data: [DONE].
Account Management API
The Account Management API lets you manage API keys, query usage metrics, and check your credit balance programmatically. All endpoints live under a separate base URL from the inference API.
Base URL: https://api.arewa.ai/manage/v1
Authentication
Use your Arewa API key (sk-prod-...) as a Bearer token, the same key you use for inference calls. All endpoints also accept a JWT session token for dashboard-initiated requests.
Authorization: Bearer sk-prod-your-key-here
API Keys
Create and rotate keys programmatically. The full key value is returned only once at creation; store it securely immediately.
List keys
GET /api-keys GET /api-keys?include_revoked=true # include soft-deleted keys
Create a key
POST /api-keys
Content-Type: application/json
{ "name": "Production" }
→ 201 Created
{
"id": "key_abc123",
"name": "Production",
"value": "sk-prod-...", ← shown once, store immediately
"created_at": 1757892600
}
Revoke a key
DELETE /api-keys/{key_id}
→ 200 OK
{ "deleted": true }
Usage Metrics
Query token consumption, request counts, and cost over any time window. All timestamps are Unix seconds.
Single metric
GET /usage?metric=tokens&from=1756684800&to=1757548800&granularity=daily # metric → tokens | requests | cost # granularity → daily | hourly | minute # model → model alias or "all" (optional, default: all) # api_key_id → key ID or "all" (optional, default: all)
All metrics in one call
GET /usage/summary?from=1756684800&to=1757548800&granularity=daily
→ 200 OK
{
"tokens": { "total": 4820000, "data": [...] },
"requests": { "total": 1204, "data": [...] },
"cost": { "total": 0.96, "data": [...] }
}
Available Models
Returns all active models with per-million-token pricing. Use the id field as the model parameter in inference calls.
GET /models
→ 200 OK
[{
"id": "qwen3.5-9b",
"name": "Qwen3.5-9B",
"cost_mtoken_input": 0.15,
"cost_mtoken_output": 0.60
}]
Credit Balance
Check your available spend in USD. Credits are consumed as inference requests complete and are deducted in real time.
GET /credits/balance
→ 200 OK
{ "balance": 12.50 }
Dashboard
The dashboard at dashboard.arewa.ai is your visual control panel; sign in with the same credentials you use for the API. It covers six areas:
| Page | Path | What you can do |
|---|---|---|
| Models | /dashboard | Browse the full live model catalogue with real-time pricing. Click any model to see specs and copy its alias for use in API calls. |
| API Keys | /dashboard/api-keys | Create, name, and revoke inference keys (sk-prod-...). The full key value is shown only at creation; copy it immediately. |
| Usage | /dashboard/usage | Charts for token consumption, request count, and estimated cost. Filter by model or date range. Your current credit balance is shown at the top. |
| Billing | /dashboard/billing | Add or update a payment method and top up your credit balance. Minimum top-up is $5. |
| Playground | /dashboard/playground | Test any model interactively in the browser before wiring up an API integration. Requires an active API key and credit balance. |
| Profile | /dashboard/profile | View and update your account information and preferences. |
Getting your first API key
- Sign in at dashboard.arewa.ai
- Complete onboarding if prompted, enter your role, company, and country
- Open API Keys in the navigation
- Click Create Key and give it a name (e.g. production, dev)
- Copy the full key; it starts with sk-prod- and is shown only once
- Store it as an environment variable and use it for inference calls
export AREWA_API_KEY="sk-prod-..."
curl https://api.arewa.ai/inference/v1/chat/completions \
-H "Authorization: Bearer $AREWA_API_KEY" \
-H "Content-Type: application/json" \
-d '{ "model": "qwen3.5-9b", "messages": [{ "role": "user", "content": "Hello" }] }'
Engineering notes.
Deep dives from the team behind Arewa, GPU optimization, model benchmarks, and the infrastructure decisions we make in production.
Let's build your
AI stack.
Drop us an email; we read every message and reply within one business day. Tell us a bit about your setup and we'll take it from there.
For demo requests, technical questions, pricing, or partnership opportunities. We read and respond to every email.
[email protected]Within 24 hours on business days (CST, Monterrey).
Arewa de México S.A de C.V
Monterrey, Nuevo León, Mexico
Serving global and regional markets
To: [email protected] Subject: [Company] — Arewa inquiry Hi Arewa, I'm [Name] from [Company], a [team size] team working in [industry / use case]. What we're looking for: Arewa Cloud / Dedicated / BYO Cloud / Sovereignty (pick one or ask us to recommend) Key requirements: - Data sensitivity: [internal / regulated / classified] - Expected volume: [requests/day or approx. tokens] - Target timeline: [e.g. Q3 2026, ASAP, exploring] Questions we have: - [Any specific technical or pricing question] Happy to jump on a 30-min call any time this week. Best, [Name] · [Title] · [Company]
Not every field is required; even a short intro helps us prepare a relevant reply.