Sunday, November 29, 2015

Big Data Analytics - Learn it easily and quickly

Big Data Analytics is proving to be the biggest game-changing opportunity for organizations across the globe. With the availability of latest tools and technologies to process and get insight into overwhelming amounts of varied data, organization of all sizes (now not only limited to large or government organizations) are able to take advantages to manage and handle organizational complexity, rapidly changing customer behaviors, and increased competitive pressures by leveraging big data analytics.

But the challenge with organizations is to find right resources whereas the challenge with individuals is to learn these tools and technologies to easily and quickly. There are so many tools and technologies are available, it becomes very difficult to find right one and learn it in reasonable amount of time to get started quickly. This is exactly what we have tried to cover in our book.

What This Book Covers
  • Introduction of Big Data, NoSQL systems, its Business Value Proposition and use cases examples
  • Introduction to Hadoop, Architecture, Ecosystem and Microsoft HDInsight
  • Getting to know Hadoop 2.0 and the innovations it provides like HDFS2 and YARN
  • Quickly installing, configuring, and monitoring Hadoop (HDInsight) clusters in the cloud and automating cluster provisioning
  • Customize the HDInsight cluster and install additional Hadoop ecosystem projects using Script Actions
  • Administering HDInsight from the Hadoop command prompt or Microsoft PowerShell
  • Using the Microsoft Azure HDInsight Emulator for learning or development
  • Understanding HDFS, HDFS vs. Azure Blob Storage, MapReduce Job Framework and Job Execution Pipeline
  • Doing big data analytics with MapReduce, writing your MapReduce programs in your choice of .NET programming language such as C#
  • Using Hive for big data analytics, demonstrate end to end scenario and how Apache Tez improves the performance several folds
  • Consuming HDInsight data from Microsoft BI Tools over Hive ODBC Driver - Using HDInsight with Microsoft BI and Power BI to simplify data integration, analysis, and reporting
  • Using PIG for big data transformation workflows step by step
  • Apache HBase on HDInsight, its architecture, data model, HBase vs. Hive, programmatically managing HBase data with C# and Apache Phoenix
  • Using Sqoop or SSIS (SQL Server Integration Services) to move data to/from HDInsight and build data integration workflows for transferring data
  • Using Oozie for scheduling, co-ordination and managing data processing workflows in HDInsight cluster
  • Using R programming language with HDInsight for performing statistical computing on Big Data sets
  • Using Apache Spark's in-memory computation model to run big data analytics up to 100 times faster than Hadoop MapReduce
  • Perform real-time Stream Analytics on high-velocity big data streams with Storm
  • Integration of Enterprise Data Warehouse with Hadoop and Microsoft Analytics Platform System (APS), formally known as SQL Server Parallel Data Warehouse (PDW)

Who Should Read This Book
What do you hope to get out of this book? As we wrote this book, we had the following audiences in mind:
  • Developers—Developers (especially business intelligence and data warehouse developers) worldwide are seeing a growing need for practical, step-by-step instruction in processing Big Data and performing advanced analytics to extract actionable insights. This book was designed to meet that need. It starts at the ground level and builds from there, to make you an expert. Here you’ll learn how to build the next generation of apps that include such capabilities.
  • Data scientists—As a data scientist, you are already familiar with the processes of acquiring, transforming, and integrating data into your work and performing advanced analytics. This book introduces you to modern tools and technologies (ones that are prominent, inexpensive, flexible, and open source friendly) that you can apply while acquiring, transforming, and integrating Big Data and performing advanced analytics. By the time you complete this book, you’ll be quite comfortable with the latest tools and technologies.
  • Business decision makers—Business decision makers around the world, from many different organizations, are looking to unlock the value of data to gain actionable insights that enable their businesses to stay ahead of competitors. This book delves into advanced analytics applications and case studies based on Big Data tools and technologies, to accelerate your business goals.
  • Students aspiring to be Big Data analysts—As you are getting ready to transition from the academic to the corporate world, this books helps you build a foundational skill set to ace your interviews and successfully deliver Big Data projects in a timely manner. Chapters were designed to start at the ground level and gradually take you to an expert level.
Don’t worry if you don’t fit into any of these classifications. Set your sights on learning as much as you can and having fun in the process, and you’ll do fine!

You can get this book from Safari Online, Amazon USA, Amazon India, Flipkart.

You can refer sample chapters here, here.

Friday, October 16, 2015

Converting Rows to Columns (PIVOT) and Columns to Rows (UNPIVOT) in SQL Server

In my last article “Converting Comma Separated Value to Rows and Vice Versa in SQL Server”, I talked about how you can convert comma separated (or separated with some other character) values in a single column into rows and vice versa. In this article, I demonstrate how you can convert rows values into columns values (PIVOT) and columns values into rows values (UNPIVOT) in SQL Server. For more information click here.

Converting Comma Separated Value to Rows and Vice Versa in SQL Server

Often while reporting you will encounter a situation where you will have comma separated (or separated with some other character) values in a single column but you want to report them in rows whereas in some other cases you might have values in multiple rows and want them to be a single value separated by comma or some other character. In this article, I am going to demonstrate how you can write queries in SQL Server to handle these scenarios quickly. For more information, click here. 

CONCAT and STUFF Functions in SQL Server 2012

We often need to combine two or more string values to use this combined string value in reporting. Although there was a way to do that in earlier versions of SQL Server, starting with SQL Server 2012 we have CONCAT function for this specific situation. This T-SQL function also takes care of data type conversion and handling NULLs appropriately. Apart from that, there has been a STUFF T-SQL function in SQL Server, which you can use to insert\replace a string value into another string value. The difference lies in the fact that CONCAT allows you to append a string value at the end of another string value whereas STUFF allows you insert or replace a string value into or in between another string value. I am going to demonstrate these functions and their real life usages in this article. For more information, click here.

Getting Starting with Database Engine Tuning Advisor in SQL Server

There are different techniques to optimize the performance of SQL Server queries, like keeping required and updated statistics, creating required indexes, partitioning tables, etc., but wouldn’t it be great if we had some recommendations before we started planning or optimizing queries so that we didn’t have to start from the scratch every time and in every scenario? This is where you can use the Database Engine Tuning Advisor utility to get recommendations based on your workload. I will be talking about Database Engine Tuning Advisor, how it works and its different interfaces, in this article series. For more information click here for Part 1 and here for Part 2.

Wednesday, October 14, 2015

Importance of Statistics and How It Works in SQL Server

Statistics refers to the statistical information about the distribution of values in one or more columns of a table or an index. The SQL Server Query Optimizer uses this statistical information to estimate the cardinality, or number of rows, in the query result to be returned, which enables the SQL Server Query Optimizer to create a high-quality query execution plan. For example, based on these statistical information SQL Server Query Optimizer might decide whether to use the index seek operator or a more resource-intensive index scan operator in order to provide optimal query performance. In this article series, I am going to talk about statistics in detail. For information click here for Part 1 and here for Part 2.

Lead and Lag functions in SQL Server 2012

Have you ever been in a situation where you needed to write a query that needed to do comparisons or access data from the subsequent rows along with the data from the current row? This article discusses different ways to write these types of queries and more specifically examines LEAD and LAG analytics functions, which were introduced with SQL Server 2012, and helps you understand how leveraging these functions can aid you in such situations. For more information click here.

Backup and Restore strategies in SQL Server

There are several high availability solutions that can be used with SQL Server, like AlwaysOn, Fail-over clustering, or Database mirroring. While these high availability solutions ensure maximum uptime for your databases, you need to setup backup and restore strategies to recover the data or minimize the risk of data loss in case a failure happens. In this article series, I am going to discuss backup and restore strategies in SQL Server in detail. For more information click here for Part 1 and here for Part 2.

Importance of Recovery Model in SQL Server and why its important

Have you ever wondered, especially in the case of a data warehousing scenario, why the transaction log file grows bigger and bigger and sometimes even much bigger than your actual database's data files? What caused it to happen? How do you control it? How does the recovery model of a database control the growing size of the transaction log? These are some of the questions I am going to explain to you in this article. For information click here.

Getting Started with Hashing in SQL Server

In my most recent articles, I’ve talked about encryption in detail and demonstrated its usage at the entire database level with Transparent Data Encryption and at the column level with granular\cell level encryption. In this article, I am going to discuss hashing in SQL Server and how it is different from encryption. For more information click here.