DLS, OnSpecta's inference solution, makes it easy to deploy optimized inference applications. Its key features are:
For devices on the edge, OnSpecta DLS delivers the best inference performance (latency or throughput), best performance per watt of power consumed and a small memory footprint.
On processors in the cloud (X86 or ARM CPUs, GPUs), DLS delivers the best inference performance across all batch sizes.
Our Connectors integrate with standard frameworks, intercept inference calls, and facilitate efficient execution of inference. The Connectors are lightweight and seamlessly integrate with the frameworks. We currently have Connectors for TensorFlow, Caffe and Darknet. If needed, OnSpecta can custom build connectors for proprietary frameworks.
Virtualization Layer is an abstraction layer that allows OnSpecta to effectively handle heterogeneity. It contains the following sub-modules:
This is a just-in-time optimization compiler that creates an optimal execution path for target processors the first time a neural network, for example, ResNet50, is loaded. We currently deliver the best possible inference performance on single and multi-core CPUs (x86, ARM), on Nvidia and AMD GPUs, and on CPU-GPU clusters. OnSpecta is planning to support DSPs, FPGAs and special-purpose deep learning processors.
Our Inference Engine is a small, memory efficient executable that is optimized for neural network inference without the overhead of the entire framework. DLS provides the option to use native or external inference engines for any layer types not supported by OnSpecta.