Falcon Acceleration for Machine Learning Applications
Improved training on the edge and customizable inference engine
2.5 quintillion bytes of data are created every day. Data coming in multiple formats (i.e. image, video, speech, etc) continues to expand as we experience growth in social media platforms and new connected devices fueling the era of the internet-of-things. No human can process and derive actionable insights from such a large volume data and hence the value derived from automating this process with Machine Learning (ML).
Deep Convolutional Neural Networks (DNNs) are among the ML algorithms that have seen wide adoption as they offer increased accuracy for complex applications such as image classification and speech recognition. Parameter optimization is a common approach to increase algorithm performance but Data Scientists and Application Developers could benefit from FPGA acceleration of core algorithm functions such as matrix multiply operations (GEMM). At the same time, DNNs are quickly evolving and this evolution is introducing irregular parallelism on custom data types that are a great fit for FPGA’s extreme customizability.
Parallelized algorithms on FPGA acceleration platforms such as DNNs can result orders of magnitude faster run-time.
The flexible logic of FPGAs and the automated algorithm implementation with Merlin eliminates the barrier presented by available ASICs to frequently update business-critical ML algorithms.
Up to 25X better performance per watt and 50-75x latency improvement compared to CPU/GPU implementations.
The rapid development of heterogeneous platforms, featuring both CPUs and FPGAs, across public and private data centers expands the opportunity for ML algorithms to evolve without hardware limitations. Falcon Computing’s Merlin Compiler simplifies the path to Machine Learning acceleration on a heterogeneous platform that benefits from using CPUs and FPGAs.
Today with Merlin Compiler
in the Cloud.