Big Data Analytics
is proving to be the biggest game-changing opportunity for organizations across
the globe. With the availability of latest tools and technologies to process
and get insight into overwhelming amounts of varied data, organization of all sizes
(now not only limited to large or government organizations) are able to take
advantages to manage and handle organizational complexity, rapidly changing
customer behaviors, and increased competitive pressures by leveraging big data
analytics.
But the challenge
with organizations is to find right resources whereas the challenge with
individuals is to learn these tools and technologies to easily and quickly.
There are so many tools and technologies are available, it becomes very
difficult to find right one and learn it in reasonable amount of time to get
started quickly. This is exactly what we have tried to cover in our book.
What This Book Covers
- Introduction of Big Data, NoSQL systems, its Business Value Proposition and use cases examples
- Introduction to Hadoop, Architecture, Ecosystem and Microsoft HDInsight
- Getting to know Hadoop 2.0 and the innovations it provides like HDFS2 and YARN
- Quickly installing, configuring, and monitoring Hadoop (HDInsight) clusters in the cloud and automating cluster provisioning
- Customize the HDInsight cluster and install additional Hadoop ecosystem projects using Script Actions
- Administering HDInsight from the Hadoop command prompt or Microsoft PowerShell
- Using the Microsoft Azure HDInsight Emulator for learning or development
- Understanding HDFS, HDFS vs. Azure Blob Storage, MapReduce Job Framework and Job Execution Pipeline
- Doing big data analytics with MapReduce, writing your MapReduce programs in your choice of .NET programming language such as C#
- Using Hive for big data analytics, demonstrate end to end scenario and how Apache Tez improves the performance several folds
- Consuming HDInsight data from Microsoft BI Tools over Hive ODBC Driver - Using HDInsight with Microsoft BI and Power BI to simplify data integration, analysis, and reporting
- Using PIG for big data transformation workflows step by step
- Apache HBase on HDInsight, its architecture, data model, HBase vs. Hive, programmatically managing HBase data with C# and Apache Phoenix
- Using Sqoop or SSIS (SQL Server Integration Services) to move data to/from HDInsight and build data integration workflows for transferring data
- Using Oozie for scheduling, co-ordination and managing data processing workflows in HDInsight cluster
- Using R programming language with HDInsight for performing statistical computing on Big Data sets
- Using Apache Spark's in-memory computation model to run big data analytics up to 100 times faster than Hadoop MapReduce
- Perform real-time Stream Analytics on high-velocity big data streams with Storm
- Integration of Enterprise Data Warehouse with Hadoop and Microsoft Analytics Platform System (APS), formally known as SQL Server Parallel Data Warehouse (PDW)
Who Should Read This Book
What do you hope to
get out of this book? As we wrote this book, we had the following audiences in
mind:
- Developers—Developers (especially business intelligence and data warehouse developers) worldwide are seeing a growing need for practical, step-by-step instruction in processing Big Data and performing advanced analytics to extract actionable insights. This book was designed to meet that need. It starts at the ground level and builds from there, to make you an expert. Here you’ll learn how to build the next generation of apps that include such capabilities.
- Data scientists—As a data scientist, you are already familiar with the processes of acquiring, transforming, and integrating data into your work and performing advanced analytics. This book introduces you to modern tools and technologies (ones that are prominent, inexpensive, flexible, and open source friendly) that you can apply while acquiring, transforming, and integrating Big Data and performing advanced analytics. By the time you complete this book, you’ll be quite comfortable with the latest tools and technologies.
- Business decision makers—Business decision makers around the world, from many different organizations, are looking to unlock the value of data to gain actionable insights that enable their businesses to stay ahead of competitors. This book delves into advanced analytics applications and case studies based on Big Data tools and technologies, to accelerate your business goals.
- Students aspiring to be Big Data analysts—As you are getting ready to transition from the academic to the corporate world, this books helps you build a foundational skill set to ace your interviews and successfully deliver Big Data projects in a timely manner. Chapters were designed to start at the ground level and gradually take you to an expert level.
Don’t worry if you
don’t fit into any of these classifications. Set your sights on learning as
much as you can and having fun in the process, and you’ll do fine!