Ivy May 01, 2019 No Comments
1. Algorithm: A set of rules defined by statistical process used to program the computer to perform an analysis of data.
2. Big Data Scientist: A data wizard who possesses the knowledge & skills of handling both structured and unstructured data with elan and generate profitable insights for the business.
3. Cloud Computing:A new paradigm computing service delivered over the Internet “the cloud”. It has brought a marked shift from the traditional way businesses think about IT resources. All computing services like —servers, storage, databases, networking, software, analytics, intelligence and more are all just on the Internet (“the cloud”) to offer faster innovation, flexible resources and economic benefits.
4. Data Lake:A storage repository of a vast pool of raw data (much larger than data warehouses which is un processed) and the purpose of which is still unknown. The end users are usually Data Scientists unlike Data Warehouses which are accessed by Business professionals.
5. Data Warehouse: The system of data storage from multiple sources for the purpose of analysis and reporting. The reports generated are used to drive business decisions.
6. Flume : Flume is defined as a robust, reliable, distributed, and available service for aggregating, collecting, and transferring huge amount of data in HDFS.
7. Hadoop: open source software framework, manages processing of big data applications running in clustered systems.Hadoop can handle various forms of structured and unstructured data, giving users more flexibility for collecting, processing and analyzing data than relational databases and data warehouses provide.
8. MapReduce : “MapReduce” = Map phase + Reduce phaseA software framework for easily writing applications which process vast amounts of data, enables massive scalability across hundreds or thousands of servers in a Hadoop cluster.
9. Online Analytical Processing (OLAP) : Allows users to conduct multidimensional analysis of business data from multiple database systems at the same time. Through OLAP, users can conduct complex calculations, trend analysis, and sophisticated data modeling. Data warehouses are used for online analytical processing (OLAP), which uses complex queries to analyze rather than process transactions.
10. (Apache)Spark: is a unified analytics engine for large-scale distributed data processing and machine learning. Apache Spark is to rapidly query, analyze, and transform data at a large scale.Most frequently Spark is used for ETL and SQL batch jobs across large data sets, processing of streaming data from sensors, IoT, or financial systems, and machine learning tasks.
Looking for a Big Data Course? Enroll yourself in Data Science and Big Data Analytics Certification Course with IvyPro School and learn the latest Big Data tools to take your career to the next level.
Shromona Kahali – Content Strategist- IvyPro School