Mock Quiz Hub

Time: 00:00

Quiz

Navigate through questions using the controls below

Question 1 of 60 Quiz ID: q1

What are the three main characteristics that differentiate Big Data from data handled by earlier generation databases?

Volume, Velocity, and Variety

Volume, Value, and Verification

Velocity, Validation, and Visualization

Variety, Verification, and Virtualization

Question 2 of 60 Quiz ID: q2

Which of the following was mentioned as an early source of Big Data?

Social media posts

Web logs

Internet-of-things sensors

Mobile app usage data

Question 3 of 60 Quiz ID: q3

What are transaction processing systems for Big Data often willing to sacrifice in exchange for very high scalability?

Data security and encryption

ACID properties and other database features

Query processing speed

Data storage capacity

Question 4 of 60 Quiz ID: q4

Which of the following is NOT mentioned as a Big Data storage system?

Distributed file systems

Sharding across multiple databases

Key-value storage systems

Traditional relational databases

Question 5 of 60 Quiz ID: q5

What is a key characteristic of distributed file systems according to the lecture?

They only work with structured data

They store data across a large collection of machines but provide a single file-system view

They require expensive and reliable computers

They cannot handle hardware failures

Question 6 of 60 Quiz ID: q6

What scale example is given for highly scalable distributed file systems?

1K nodes, 10 million files, 1 PB

10K nodes, 100 million files, 10 PB

100K nodes, 1 billion files, 100 PB

5K nodes, 50 million files, 5 PB

Question 7 of 60 Quiz ID: q7

How do distributed file systems handle hardware failure?

By using only reliable hardware

By immediately shutting down the system

By replicating files and detecting/recovering from failures

By backing up data to tape storage

Question 8 of 60 Quiz ID: q8

Which of the following are examples of distributed file systems mentioned in the lecture?

MySQL File System and Oracle File System

Windows File System and Linux File System

Google File System (GFS) and Hadoop File System (HDFS)

Amazon S3 and Microsoft Azure Storage

Question 9 of 60 Quiz ID: q9

In Hadoop File System Architecture, what is the typical block size?

32 MB

64 MB

128 MB

256 MB

Question 10 of 60 Quiz ID: q10

What role does the NameNode play in HDFS?

It stores the actual data blocks

It maps filenames to Block IDs and Block IDs to DataNodes

It provides backup storage for failed nodes

It handles user authentication

Question 11 of 60 Quiz ID: q11

What does a DataNode do in HDFS?

Maps filenames to Block IDs

Coordinates data replication

Maps a Block ID to a physical location on disk

Manages user permissions

Question 12 of 60 Quiz ID: q12

What is the data coherency model used by HDFS?

Read-write-modify access model

Write-once-read-many access model

Multiple-write-single-read access model

Random access model

Question 13 of 60 Quiz ID: q13

What is a limitation of distributed file systems mentioned in the lecture?

They cannot handle large files

They have very high overheads and poor performance with billions of smaller tuples

They don't support data replication

They only work with structured data

Question 14 of 60 Quiz ID: q14

What is sharding in the context of Big Data?

Encrypting data across multiple databases

Partitioning data across multiple databases

Backing up data to multiple locations

Compressing data for storage efficiency

Question 15 of 60 Quiz ID: q15

In the sharding example given, how might records be distributed?

Records with key values 1-100,000 on database 1, 100,001-200,000 on database 2, etc.

All records randomly distributed across databases

Records grouped by creation date

Records distributed based on file size

Question 16 of 60 Quiz ID: q16

What is a positive aspect of sharding mentioned in the lecture?

It's completely transparent to applications

It scales well and is easy to implement

It eliminates the chance of failure

It automatically handles load balancing

Question 17 of 60 Quiz ID: q17

What is a major drawback of sharding?

It reduces data security

It's not transparent - applications must deal with routing queries and handling queries that span multiple databases

It only works with small datasets

It requires expensive hardware

Question 18 of 60 Quiz ID: q18

When were parallel databases originally developed?

1970s

1980s

1990s

2000s

Question 19 of 60 Quiz ID: q19

What scale were parallel databases originally designed for?

5 to 50 machines

10s to 100s of machines

1000s of machines

10,000s of machines

Question 20 of 60 Quiz ID: q20

How do parallel databases typically handle query failure due to machine failure?

They continue execution on remaining machines

They automatically restart the failed machine

They typically restart the entire query

They ignore the failure and continue

Question 21 of 60 Quiz ID: q21

How do Map-reduce systems handle failures compared to parallel databases?

They restart queries more frequently

They ignore failures completely

They can continue query execution, working around failures

They shut down the entire system

Question 22 of 60 Quiz ID: q22

Why is availability essential for parallel/distributed databases?

To improve query performance

So the system can run even if parts have failed

To reduce storage costs

To simplify database administration

Question 23 of 60 Quiz ID: q23

What does consistency mean in the context of replicated data?

All data is stored in the same format

All live replicas have the same value, and each read sees the latest version

Data is backed up regularly

All databases use the same schema

Question 24 of 60 Quiz ID: q24

In the majority protocol example given, if there are 3 replicas, how many replicas must reads/writes access?

1 replica

2 replicas

3 replicas

Any number of replicas

Question 25 of 60 Quiz ID: q25

What does Brewer's CAP 'Theorem' state about network partitions?

Availability and consistency can always be guaranteed

Network partitions never occur in practice

In the presence of partitions, you cannot guarantee both availability and consistency

Partitions only affect data storage, not retrieval

Question 26 of 60 Quiz ID: q26

What does the MapReduce paradigm provide?

A database management system

A platform for reliable, scalable parallel computing

A web development framework

A data visualization tool

Question 27 of 60 Quiz ID: q27

What does MapReduce abstract from the programmer?

Data storage requirements

Issues of distributed and parallel environment

Algorithm design

User interface development

Question 28 of 60 Quiz ID: q28

What functions must the programmer provide in MapReduce?

input() and output() functions

map() and reduce() functions

create() and delete() functions

read() and write() functions

Question 29 of 60 Quiz ID: q29

What scale of machines do very large MapReduce implementations run on?

10^1 to 10^2 machines

10^2 to 10^3 machines

10^3 to 10^4 machines

10^4 to 10^5 machines

Question 30 of 60 Quiz ID: q30

In the word count example, what does each worker do in the map phase?

Counts total words across all documents

Parses documents to find all words and outputs (word, count) pairs

Sorts words alphabetically

Removes duplicate words

Question 31 of 60 Quiz ID: q31

Given the input 'One a penny, two a penny, hot cross buns.', what would be one of the (word, count) pairs output by the map function?

('penny', 2)

('a', 1)

('total', 9)

('sentence', 1)

Question 32 of 60 Quiz ID: q32

In the word count example, what is the final output for the word 'penny'?

('penny', 1)

('penny', 2)

('penny', 3)

('penny', 4)

Question 33 of 60 Quiz ID: q33

In the MapReduce word count pseudo-code, what does the emit function do in the map phase?

Counts the total words

Outputs a (word, 1) pair for each word

Sorts the words

Removes punctuation

Question 34 of 60 Quiz ID: q34

What is the first attribute of the emit function called in MapReduce?

Map key

Reduce key

Primary key

Sort key

Question 35 of 60 Quiz ID: q35

What operation is effectively performed on the reduce key in MapReduce?

Sort by

Group by

Order by

Filter by

Question 36 of 60 Quiz ID: q36

Which companies are mentioned as widely using MapReduce for parallel processing?

Microsoft and Oracle

Google, Yahoo, and hundreds of other companies

IBM and Intel

Amazon and Facebook only

Question 37 of 60 Quiz ID: q37

Which of the following is mentioned as an example use of MapReduce?

Real-time transaction processing

Compute PageRank and build keyword indices

User interface design

Database schema design

Question 38 of 60 Quiz ID: q38

What is an advantage of MapReduce over traditional SQL databases?

It's always faster than SQL

It allows procedural code in map and reduce functions and data of any type

It uses less storage space

It requires less programming knowledge

Question 39 of 60 Quiz ID: q39

What is a disadvantage of MapReduce compared to SQL?

It cannot handle large datasets

It is cumbersome for writing simple queries

It doesn't support parallel processing

It only works with structured data

Question 40 of 60 Quiz ID: q40

What do current generation execution engines natively support?

Only map and reduce operations

Algebraic operations such as joins and aggregation

Only SQL queries

Only key-value operations

Question 41 of 60 Quiz ID: q41

Which execution engines are mentioned as examples in the lecture?

MySQL and PostgreSQL

Apache Tez and Spark

Oracle and SQL Server

MongoDB and Cassandra

Question 42 of 60 Quiz ID: q42

What does Apache Tez provide according to the lecture?

High-level SQL interface

Low level API

User interface components

Database storage engine

Question 43 of 60 Quiz ID: q43

What does RDD stand for in Spark?

Relational Data Distribution

Resilient Distributed Dataset

Rapid Data Deployment

Remote Database Driver

Question 44 of 60 Quiz ID: q44

How are RDDs computed in Spark?

They are computed immediately when created

They are lazily computed when needed

They are pre-computed and cached

They are computed in parallel always

Question 45 of 60 Quiz ID: q45

In which programming languages can Spark programs be written?

Only Java

Java, Scala, and R

Only Python and Java

Any programming language

Question 46 of 60 Quiz ID: q46

What does streaming data refer to?

Data stored in multiple locations

Data that arrives in a continuous fashion

Data that is compressed

Data that is encrypted

Question 47 of 60 Quiz ID: q47

Which of the following is NOT mentioned as an example of streaming data applications?

Stock market trades

E-commerce purchases and searches

Sensor readings from IoT devices

Database backup operations

Question 48 of 60 Quiz ID: q48

What is windowing in the context of streaming data?

Displaying data in multiple windows

Breaking up streams into windows and running queries on windows

Opening multiple database connections

Partitioning data by geographic regions

Question 49 of 60 Quiz ID: q49

What are the two bases on which windows may be created in streaming systems?

Size and location

Time or tuples

Source and destination

Priority and frequency

Question 50 of 60 Quiz ID: q50

What are punctuations used for in stream processing?

To format output data

To specify that all future tuples have timestamp greater than some value

To mark the end of a query

To separate different data types

Question 51 of 60 Quiz ID: q51

What is a characteristic of continuous queries?

They run once and produce final results

They output partial results and update continuously

They only work with static data

They require manual refresh

Question 52 of 60 Quiz ID: q52

What is a potential problem with continuous queries?

They consume too much storage

They can lead to a flood of updates

They don't work with real-time data

They require too much programming effort

Question 53 of 60 Quiz ID: q53

What does CEP stand for in the context of stream processing?

Central Event Processing

Complex Event Processing

Continuous Event Processing

Concurrent Event Processing

Question 54 of 60 Quiz ID: q54

What characterizes many stream processing systems mentioned in the lecture?

They always persist data to disk

They are purely in-memory and do not persist data

They only work with small datasets

They require expensive hardware

Question 55 of 60 Quiz ID: q55

What is the lambda architecture in stream processing?

A single stream processing approach

Split stream into two: one to stream processing system, another to database for storage

A method for data compression

A security protocol for streams

Question 56 of 60 Quiz ID: q56

What is a disadvantage of lambda architecture?

It's too complex to implement

It often leads to duplication of querying effort

It doesn't scale well

It only works with small data

Question 57 of 60 Quiz ID: q57

What type of window doesn't overlap in streaming systems?

Sliding window

Tumbling window

Hopping window

Session window

Question 58 of 60 Quiz ID: q58

What do publish-subscribe systems provide?

Database storage capabilities

Convenient abstraction for processing streams

User interface components

Data encryption services

Question 59 of 60 Quiz ID: q59

Which parallel pub-sub system is mentioned as popular for managing streaming data?

Apache Storm

Apache Kafka

Apache Flume

Apache Flink

Question 60 of 60 Quiz ID: q60

How can graphs be modelled as relations according to the lecture?

graph(nodes, edges, properties)

node(ID, label, node_data) and edge(fromID, toID, label, edge_data)

vertex(ID, data) and connection(start, end)

entity(ID, type) and relationship(source, target, type)

Quiz Summary

Review your answers before submitting

Total Questions

Answered

Remaining

00:00

Time Spent

Quiz

Quiz Summary

Confirm Submission