Big Data Analytics used to be an activity that require powerful computers and very high specialized skills. But with the cloud computing, you can have huge computing power installed, and configured in few button clicks.
AWS as an example provide a wide veriety of different kind of data analysis services, each one will do a specific job. We can use these services as building blocks to build any application that deal with big data analysis.
AWS services shine on big data, and they can handle Peta-byte size databases, with high performance.
In this post and following posts, I am going to describe these services, and how to use them. At the end, I am going to show how to use different kind of services to build a monitoring and diagnostic tool for an IIS web site application.
List of posts
We are going to cover these points:
- Introduction to big data (this post).
- Streaming real time data.
- Hadoop & MapReduce.
- ElasticSearch (coming).
- Spark and data analysis with Python (coming).
- Hive and data processing (coming).
- Parctical example
Components of a big data application
We can devide the functionality of any big data application into the following 4 types:
AWS provides applications for all these stages:
Setup the AWS environment
We are going to see how AWS services are created and configured, and we are going to do some hands-on exercies. To complete these exercises you need an Amazon account, and we need to use AWS services.
To use AWS, there are two ways, either through the web interface from this site or from command line tools, like Python or others. For these exercies we are going to use AWS Command line interface.
To install and configure AWS commadn line interface and create an Amazon account, follow this tutrorial: this document to install AWS CLI and set it up.