OnSpecta Technology

DLS, OnSpecta's inference solution, makes it easy to deploy optimized inference applications. Its key features are:

  • Pre-built integration with major frameworks like TensorFlow and Caffe. Simple substitution of binary files is all that's needed to use DLS. Full compatibility with framework APIs is guaranteed
  • Accelerates inference for CPUs, GPUs and clusters
  • No model retraining needed. No parameter/weight changes needed
  • Does not trade accuracy for improved speed
  • In addition to speed, optimizations for power consumption and memory footprint are supported

For devices on the edge, OnSpecta DLS delivers the best inference performance (latency or throughput), best performance per watt of power consumed and a small memory footprint.

On processors in the cloud (X86 or ARM CPUs, GPUs), DLS delivers the best inference performance across all batch sizes.

Connectors

Our Connectors integrate with standard frameworks, intercept inference calls, and facilitate efficient execution of inference. The Connectors are lightweight and seamlessly integrate with the frameworks. We currently have Connectors for TensorFlow, Caffe and Darknet. If needed, OnSpecta can custom build connectors for proprietary frameworks.

Virtualization Layer

Virtualization Layer is an abstraction layer that allows OnSpecta to effectively handle heterogeneity. It contains the following sub-modules:

  • APIs allow OnSpecta and its customers to create Connectors for new frameworks and to integrate with custom environments. There are also APIs for integrating with external Inference Engines like TensorRT, and external optimization kernels.
  • Framework-independent Intermediate Graph Representation allows DLS to ingest neural networks from the different frameworks using a common intermediate format. This sub-module also contains network-level performance optimizations like direct convolutions and layer merging.
  • Work Balancer optimizes inference across a cluster of processors.

Optimization Compiler

This is a just-in-time optimization compiler that creates an optimal execution path for target processors the first time a neural network, for example, ResNet50, is loaded. We currently deliver the best possible inference performance on single and multi-core CPUs (x86, ARM), on Nvidia and AMD GPUs, and on CPU-GPU clusters. OnSpecta is planning to support DSPs, FPGAs and special-purpose deep learning processors.

Inference Engine

Our Inference Engine is a small, memory efficient executable that is optimized for neural network inference without the overhead of the entire framework. DLS provides the option to use native or external inference engines for any layer types not supported by OnSpecta.