Back to Products
BF-Q
AI Platform

BF-Q Inference API

High-throughput REST API for deploying and scaling custom AI models with sub-millisecond latency. Supports PyTorch, TensorFlow, and ONNX formats.

Overview

BF-Q Inference API is a fully managed, cloud-native model-serving platform designed for enterprises that need deterministic performance at any scale. Built on a distributed inference engine, it handles thousands of concurrent requests while maintaining p99 latency under 10 ms. The API natively supports PyTorch, TensorFlow, ONNX, and JAX models, with auto-scaling, canary deployments, and A/B testing baked in.

Use Cases

  • Real-time recommendation engines
  • NLP API services (classification, summarisation, generation)
  • Computer-vision pipelines in production
  • Fraud detection & anomaly scoring

Key Features

  • Sub-10 ms P99 inference latency
  • Auto-scaling from 0 to 10 000 RPS
  • Multi-framework: PyTorch, TensorFlow, ONNX, JAX
  • Canary & A/B deployment strategies
  • Built-in model versioning & rollback
  • OpenTelemetry-native observability
  • gRPC & REST endpoints
  • Edge deployment support (WASM / TensorRT)

Details

Category
AI Platform
Released
January 15, 2024

Get Started

Talk to our product team