文档结构  
翻译进度:56%     翻译赏金:0 元 (?)    ¥ 我要打赏

对大数据集进行检查以发现隐藏模式,客户偏好,未知关联,市场趋势以及其他有用的商务信息的过程就是所谓的大数据分析。 大数据分析可以是非常有用的,它可以帮助公司降低成本,促进更快,更好的决策,并提供新的产品和服务。 现在我们来谈谈2016年带来的三个大大数据趋势。

Apache Spark

Apache Spark最初于2009年在加利福尼亚大学伯克利分校开发,是一个伟大的开源处理引擎,为复杂的分析,速度和易用性而打造。 它为程序员提供了一个应用程序编程接口,集中在一个被称为弹性分布式数据集的数据结构上,这个数据结构分布在一组机器上,并以容错方式进行维护。

第 1 段(可获 1.85 积分)

弹性分布式数据集(RDD)有助于实现交互式算法,该算法访问它们的数据集数倍,并进行交互式或解释性的数据分析.些应用程序的延迟可能会被各种大小的数量级所减少。在交互式算法类中发现的机器学习系统的训练算法,形成了Apache Spark开发的初始动力。下面是使用Apache Spark进行的简单数据分析.

让我们看看的一些特性使Apache火花引起的波动大数据的世界。

第 2 段(可获 1.19 积分)

闪电般的快速处理

大数据处理中速度一直是一个重要的方面。Apache Spark促进Hadoop集群中的应用能比内存中操作速度快100倍,比在硬盘上的操作速度快10倍。Spark通过减少磁盘读写数量使得这种高速成为可能。中间处理数据存储在内存中。

支持多种语言,易于使用

Sparks允许一个开发人员很快运用 Java, Scala甚至Python语言编写应用。使用Sparks,开发人员不仅可以用自己熟悉的编程语言创建应用还能用其运行运用。 Sparks 内置了80多个高级处理器。

第 3 段(可获 1.34 积分)

支持复杂分析

Apache Sparks和SQL查询一样支持复杂分析,数据流。此外,用户还可以将所有这些功能放在一个工作流中。

实时流处理

Apache Sparks处理实时流不成问题,能在使用Spark Streaming的同时实时操控数据。

与Hadoop以及已经存在的Hadoop Data集成的能力

Sparks 能独立执行,也能结合Hadoop 2’s YARN 集群管理器并可以读取任何Hadoop数据。 这个强大的功能使得Sparks适合迁移到任何已经存在的纯Hadoop应用上。

第 4 段(可获 1.18 积分)

基于Hadoop的多核服务器

企业机构正从昂贵的主机及企业数据仓库平台上缓慢转向基于Hadoop的多核心服务器。 Hadoop是一个开源的基于java的编程框架,支持在分布式环境中处理存储特大数据集。公司使用Hadoop作为大数据平台,主要有几个用途。

低成本数据存储和数据归档

Hadoop用来存储以及结合诸如点击流、事务、科学、机器、社交媒体、传感器等的数据。但是你可能稍后进行分析。由于商品硬件成本适中,这种低成本存储允许人们储存非关键但需要后期分析的信息。

第 5 段(可获 1.38 积分)

Sandbox for Discovery and Analysis

Hadoop can run analytical algorithms as it was designed to work with volumes of data in a number of shapes and forms. Big data analytics on Hadoop can enable companies to operate more efficiently, discover new opportunities and come up with next level competitive advantage. The sandbox approach provides an opportunity to come up with minimal investment.

Data Lake

With data lakes, storage of data can be done in its original or exact format. The aim is to provide a raw or unrefined view of data to data scientists and analysts for discovery and analytics. This enables them to ask new or difficult questions without many constraints.

第 6 段(可获 1.4 积分)

Complement Data Warehouse

Hadoop sits beside data warehouse environment and some data sets being offloaded from the data warehouse into Hadoop or new kinds of data going directly to Hadoop. The main goal of each organization is to have a good platform for storing as well as processing data of various schema, formats, etc. to support different use cases which can be integrated at different levels.

IoT and Hadoop

At the center of IoT is a streaming and on torrent of data. Hadoop is normally used as the data storage for several transactions. Huge storage and processing capabilities allow one to use Hadoop as a sandbox for discovery and definition of patterns to be monitored for prescriptive instruction.

第 7 段(可获 1.48 积分)

Predictive Analytics and Internet of Things (IOT)

The use of data, statistical algorithms and machine learning techniques to point out the likelihood of future outcomes based on historical data is known as predictive analytics. The aim is to go past knowing what has occurred to providing a better assessment of what will happen in the future. Predictive analytics is used for detecting fraud, optimizing marketing campaigns, improving operations and reducing risk.

Internet of Things (IOT) is the concept of connecting devices with an on/off switch to the internet or to each through the internet. The market for IOT is rapidly growing at an incredible rate. It is predicted that over the next 20 years the Internet of Things will add about $10 to $15 trillion to global GDP.

第 8 段(可获 1.61 积分)

The examination of huge data sets is extremely vital for the purposes of uncovering hidden patterns, understanding market trends as well as other useful information. The above mentioned big data trends have been proved in 2016 to help reduce risk, improve operations and detecting fraud. With for single software environment and real-time analytics, Hadoop is the way to go for becoming a leader in the market of best websites. For the combination of real-time sources of data and together with huge data to create more insights, predictive analytics is the way to go. The three big data trends have huge benefits as shared here today.

第 9 段(可获 1.31 积分)

文章评论