文档结构  
翻译进度:33%     翻译赏金:0 元 (?)    ¥ 我要打赏

在SVDS,我们的研发团队一直在研究不同深度学习技术,从列车图像识别到语音识别。我们需要建立一种途径获取数据,创建模型,并评估模型性能。然而,当我们研究什么技术可应用时,我们找不到一个简明的总结材料以供参考,从而能够开始一个新的深度学习研究项目。

回馈为我们提供工具的开源社区的一个方法,是利用我们的经验帮助他人评估和选择那些工具。 因此我们提供了下面的图表,以及根据我们决定的各种标准的解释。

第 1 段(可获 1.43 积分)

这些排名是我们对这些技术的主观经验的结合,包含图像和语音识别应用,以及公开的基准研究。我们会在下面对这些参数进行解释:

语言: 在开始深入学习时, 最好使用一个你熟悉的语言辅助框架。 例如,Caffe(C++)和Torth(Lua)有绑定的代码库Python(于2017一月发布),但是如果你想更好地使用这些技术,我们建议你分别熟悉C++或Lua。 相比之下,TensorFlow和MXNet的优势在于支持多语言,因此即使你不精通C++也能够应用技术。

第 2 段(可获 1.49 积分)

教程和培训材料: 深度学习技术的教程和入门材料在质量和数量上有很大的不同。 Theano,TensorFlow,Torch,和MXNet已经有专业的教程,很容易理解和应用。 而微软的CNTK和英特尔的Nervana Neon是强大的工具, 我们需要努力寻找初级水平材料。 此外,我们发现,GitHub社区参与量是一个强大的指标,不仅关乎工具的未来发展, 同时还是如何通过搜索StackOverflow或repo's Git快速解决问题或错误的方法。 需要注意的是,TensorFlow相当于在房间里的一只800磅的大猩猩的数量的教程、培训材料和社区的开发者和用户。

第 3 段(可获 1.61 积分)

CNN Modeling Capability: Convolutional neural networks (CNNs) are used for image recognition, recommendation engines, and natural language processing. A CNN is composed of a set of distinct layers that transform the initial data volume into output scores of predefined class scores (For more information, check out Eugenio Culurciello’s overview of Neural Network architectures). CNN’s can also be used for regression analysis, such as models that output of steering angles in autonomous vehicles. We consider a technology’s CNN modeling capability to include several features. These features include the opportunity space to define models, the availability of prebuilt layers, and the tools and functions available to connect these layers. We’ve seen that Theano, Caffe, and MXNet all have great CNN modeling capabilities. That said, TensorFlow’s easy ability to build upon it’s InceptionV3 model and Torch’s great CNN resources including easy-to-use temporal convolution set these two technologies apart for CNN modeling capability.

第 4 段(可获 1.95 积分)

RNN Modeling Capability: Recurrent neural networks (RNNs) are used for speech recognition, time series prediction, image captioning, and other tasks that require processing sequential information. As prebuilt RNN models are not as numerous as CNNs, it is therefore important if you have a RNN deep learning project that you consider what RNN models have been previously implemented and open sourced for a specific technology. For instance, Caffe has minimal RNN resources, while Microsoft’s CNTK and Torch have ample RNN tutorials and prebuilt models. While vanilla TensorFlow has some RNN materials, TFLearn and Keras include many more RNN examples that utilize TensorFlow.

第 5 段(可获 1.28 积分)

Architecture: In order to create and train new models in a particular framework, it is critical to have an easy to use and modular front end. TensorFlow, Torch, and MXNet have a straightforward, modular architecture that makes development straightforward. In comparison, frameworks such as Caffe require significant amount of work to create a new layer. We’ve found that TensorFlow in particular is easy to debug and monitor during and after training, as the TensorBoard web GUI application is included.

Speed: Torch and Nervana have the best documented performance for open source convolutional neural network benchmarking testsTensorFlow performance was comparable for most tests, while Caffe and Theano lagged behind. Microsoft’s CNTK claims to have some of the fastest RNN training time. Another study comparing Theano, Torch, and TensorFlow directly for RNN showed that Theano performs the best of the three.

第 6 段(可获 1.78 积分)

Multiple GPU Support: Most deep learning applications require an outstanding number of floating point operations (FLOPs). For example, Baidu’s DeepSpeech recognition models take 10s of ExaFLOPs to train. That is >10e18 calculations! As leading Graphics Processing Units (GPUs) such as NVIDIA’s Pascal TitanX can execute 11e9 FLOPs a second, it would take over a week to train a new model on a sufficiently large dataset. In order to decrease the time it takes to build a model, multiple GPUs over multiple machines are needed. Luckily, most of the technologies outlined above offer this support. In particular, MXNet is reported to have one the most optimized multi-GPU engine.

第 7 段(可获 1.36 积分)

Keras Compatible: Keras is a high level library for doing fast deep learning prototyping. We’ve found that it is a great tool for getting data scientists comfortable with deep learning. Keras currently supports two back ends, TensorFlow and Theano, and will be gaining official support in TensorFlow in the future. Keras is also a good choice for a high-level library when considering that its author recently expressed that Keras will continue to exist as a front end that can be used with multiple back ends.

第 8 段(可获 1.08 积分)

If you are interested in getting started with deep learning, I would recommend evaluating your own team’s skills and your project needs first. For instance, for an image recognition application with a Python-centric team we would recommend TensorFlow given its ample documentation, decent performance, and great prototyping tools. For scaling up an RNN to production with a Lua competent client team, we would recommend Torch for its superior speed and RNN modeling capabilities.

In the future we will discuss some of our challenges in scaling up our models. These challenges include optimizing GPU usage over multiple machines and adapting open source libraries like CMU Sphinx and Kaldi for our deep learning pipeline.

第 9 段(可获 1.41 积分)

文章评论