Mock Quiz Hub
Dark
Mock Quiz Hub
1
Recent Updates
Added: OS Mid 1 Quiz
Added: OS Mid 2 Quiz
Added: OS Lab 1 Quiz
Check back for more updates!
Time: 00:00
Quiz
Navigate through questions using the controls below
0%
Question 1 of 60
Quiz ID: q1
What are the three main characteristics that differentiate Big Data from data handled by earlier generation databases?
Volume, Velocity, and Variety
Volume, Value, and Verification
Velocity, Validation, and Visualization
Variety, Verification, and Virtualization
Question 2 of 60
Quiz ID: q2
Which of the following was mentioned as an early source of Big Data?
Social media posts
Web logs
Internet-of-things sensors
Mobile app usage data
Question 3 of 60
Quiz ID: q3
What are transaction processing systems for Big Data often willing to sacrifice in exchange for very high scalability?
Data security and encryption
ACID properties and other database features
Query processing speed
Data storage capacity
Question 4 of 60
Quiz ID: q4
Which of the following is NOT mentioned as a Big Data storage system?
Distributed file systems
Sharding across multiple databases
Key-value storage systems
Traditional relational databases
Question 5 of 60
Quiz ID: q5
What is a key characteristic of distributed file systems according to the lecture?
They only work with structured data
They store data across a large collection of machines but provide a single file-system view
They require expensive and reliable computers
They cannot handle hardware failures
Question 6 of 60
Quiz ID: q6
What scale example is given for highly scalable distributed file systems?
1K nodes, 10 million files, 1 PB
10K nodes, 100 million files, 10 PB
100K nodes, 1 billion files, 100 PB
5K nodes, 50 million files, 5 PB
Question 7 of 60
Quiz ID: q7
How do distributed file systems handle hardware failure?
By using only reliable hardware
By immediately shutting down the system
By replicating files and detecting/recovering from failures
By backing up data to tape storage
Question 8 of 60
Quiz ID: q8
Which of the following are examples of distributed file systems mentioned in the lecture?
MySQL File System and Oracle File System
Windows File System and Linux File System
Google File System (GFS) and Hadoop File System (HDFS)
Amazon S3 and Microsoft Azure Storage
Question 9 of 60
Quiz ID: q9
In Hadoop File System Architecture, what is the typical block size?
32 MB
64 MB
128 MB
256 MB
Question 10 of 60
Quiz ID: q10
What role does the NameNode play in HDFS?
It stores the actual data blocks
It maps filenames to Block IDs and Block IDs to DataNodes
It provides backup storage for failed nodes
It handles user authentication
Question 11 of 60
Quiz ID: q11
What does a DataNode do in HDFS?
Maps filenames to Block IDs
Coordinates data replication
Maps a Block ID to a physical location on disk
Manages user permissions
Question 12 of 60
Quiz ID: q12
What is the data coherency model used by HDFS?
Read-write-modify access model
Write-once-read-many access model
Multiple-write-single-read access model
Random access model
Question 13 of 60
Quiz ID: q13
What is a limitation of distributed file systems mentioned in the lecture?
They cannot handle large files
They have very high overheads and poor performance with billions of smaller tuples
They don't support data replication
They only work with structured data
Question 14 of 60
Quiz ID: q14
What is sharding in the context of Big Data?
Encrypting data across multiple databases
Partitioning data across multiple databases
Backing up data to multiple locations
Compressing data for storage efficiency
Question 15 of 60
Quiz ID: q15
In the sharding example given, how might records be distributed?
Records with key values 1-100,000 on database 1, 100,001-200,000 on database 2, etc.
All records randomly distributed across databases
Records grouped by creation date
Records distributed based on file size
Question 16 of 60
Quiz ID: q16
What is a positive aspect of sharding mentioned in the lecture?
It's completely transparent to applications
It scales well and is easy to implement
It eliminates the chance of failure
It automatically handles load balancing
Question 17 of 60
Quiz ID: q17
What is a major drawback of sharding?
It reduces data security
It's not transparent - applications must deal with routing queries and handling queries that span multiple databases
It only works with small datasets
It requires expensive hardware
Question 18 of 60
Quiz ID: q18
When were parallel databases originally developed?
1970s
1980s
1990s
2000s
Question 19 of 60
Quiz ID: q19
What scale were parallel databases originally designed for?
5 to 50 machines
10s to 100s of machines
1000s of machines
10,000s of machines
Question 20 of 60
Quiz ID: q20
How do parallel databases typically handle query failure due to machine failure?
They continue execution on remaining machines
They automatically restart the failed machine
They typically restart the entire query
They ignore the failure and continue
Question 21 of 60
Quiz ID: q21
How do Map-reduce systems handle failures compared to parallel databases?
They restart queries more frequently
They ignore failures completely
They can continue query execution, working around failures
They shut down the entire system
Question 22 of 60
Quiz ID: q22
Why is availability essential for parallel/distributed databases?
To improve query performance
So the system can run even if parts have failed
To reduce storage costs
To simplify database administration
Question 23 of 60
Quiz ID: q23
What does consistency mean in the context of replicated data?
All data is stored in the same format
All live replicas have the same value, and each read sees the latest version
Data is backed up regularly
All databases use the same schema
Question 24 of 60
Quiz ID: q24
In the majority protocol example given, if there are 3 replicas, how many replicas must reads/writes access?
1 replica
2 replicas
3 replicas
Any number of replicas
Question 25 of 60
Quiz ID: q25
What does Brewer's CAP 'Theorem' state about network partitions?
Availability and consistency can always be guaranteed
Network partitions never occur in practice
In the presence of partitions, you cannot guarantee both availability and consistency
Partitions only affect data storage, not retrieval
Question 26 of 60
Quiz ID: q26
What does the MapReduce paradigm provide?
A database management system
A platform for reliable, scalable parallel computing
A web development framework
A data visualization tool
Question 27 of 60
Quiz ID: q27
What does MapReduce abstract from the programmer?
Data storage requirements
Issues of distributed and parallel environment
Algorithm design
User interface development
Question 28 of 60
Quiz ID: q28
What functions must the programmer provide in MapReduce?
input() and output() functions
map() and reduce() functions
create() and delete() functions
read() and write() functions
Question 29 of 60
Quiz ID: q29
What scale of machines do very large MapReduce implementations run on?
10^1 to 10^2 machines
10^2 to 10^3 machines
10^3 to 10^4 machines
10^4 to 10^5 machines
Question 30 of 60
Quiz ID: q30
In the word count example, what does each worker do in the map phase?
Counts total words across all documents
Parses documents to find all words and outputs (word, count) pairs
Sorts words alphabetically
Removes duplicate words
Question 31 of 60
Quiz ID: q31
Given the input 'One a penny, two a penny, hot cross buns.', what would be one of the (word, count) pairs output by the map function?
('penny', 2)
('a', 1)
('total', 9)
('sentence', 1)
Question 32 of 60
Quiz ID: q32
In the word count example, what is the final output for the word 'penny'?
('penny', 1)
('penny', 2)
('penny', 3)
('penny', 4)
Question 33 of 60
Quiz ID: q33
In the MapReduce word count pseudo-code, what does the emit function do in the map phase?
Counts the total words
Outputs a (word, 1) pair for each word
Sorts the words
Removes punctuation
Question 34 of 60
Quiz ID: q34
What is the first attribute of the emit function called in MapReduce?
Map key
Reduce key
Primary key
Sort key
Question 35 of 60
Quiz ID: q35
What operation is effectively performed on the reduce key in MapReduce?
Sort by
Group by
Order by
Filter by
Question 36 of 60
Quiz ID: q36
Which companies are mentioned as widely using MapReduce for parallel processing?
Microsoft and Oracle
Google, Yahoo, and hundreds of other companies
IBM and Intel
Amazon and Facebook only
Question 37 of 60
Quiz ID: q37
Which of the following is mentioned as an example use of MapReduce?
Real-time transaction processing
Compute PageRank and build keyword indices
User interface design
Database schema design
Question 38 of 60
Quiz ID: q38
What is an advantage of MapReduce over traditional SQL databases?
It's always faster than SQL
It allows procedural code in map and reduce functions and data of any type
It uses less storage space
It requires less programming knowledge
Question 39 of 60
Quiz ID: q39
What is a disadvantage of MapReduce compared to SQL?
It cannot handle large datasets
It is cumbersome for writing simple queries
It doesn't support parallel processing
It only works with structured data
Question 40 of 60
Quiz ID: q40
What do current generation execution engines natively support?
Only map and reduce operations
Algebraic operations such as joins and aggregation
Only SQL queries
Only key-value operations
Question 41 of 60
Quiz ID: q41
Which execution engines are mentioned as examples in the lecture?
MySQL and PostgreSQL
Apache Tez and Spark
Oracle and SQL Server
MongoDB and Cassandra
Question 42 of 60
Quiz ID: q42
What does Apache Tez provide according to the lecture?
High-level SQL interface
Low level API
User interface components
Database storage engine
Question 43 of 60
Quiz ID: q43
What does RDD stand for in Spark?
Relational Data Distribution
Resilient Distributed Dataset
Rapid Data Deployment
Remote Database Driver
Question 44 of 60
Quiz ID: q44
How are RDDs computed in Spark?
They are computed immediately when created
They are lazily computed when needed
They are pre-computed and cached
They are computed in parallel always
Question 45 of 60
Quiz ID: q45
In which programming languages can Spark programs be written?
Only Java
Java, Scala, and R
Only Python and Java
Any programming language
Question 46 of 60
Quiz ID: q46
What does streaming data refer to?
Data stored in multiple locations
Data that arrives in a continuous fashion
Data that is compressed
Data that is encrypted
Question 47 of 60
Quiz ID: q47
Which of the following is NOT mentioned as an example of streaming data applications?
Stock market trades
E-commerce purchases and searches
Sensor readings from IoT devices
Database backup operations
Question 48 of 60
Quiz ID: q48
What is windowing in the context of streaming data?
Displaying data in multiple windows
Breaking up streams into windows and running queries on windows
Opening multiple database connections
Partitioning data by geographic regions
Question 49 of 60
Quiz ID: q49
What are the two bases on which windows may be created in streaming systems?
Size and location
Time or tuples
Source and destination
Priority and frequency
Question 50 of 60
Quiz ID: q50
What are punctuations used for in stream processing?
To format output data
To specify that all future tuples have timestamp greater than some value
To mark the end of a query
To separate different data types
Question 51 of 60
Quiz ID: q51
What is a characteristic of continuous queries?
They run once and produce final results
They output partial results and update continuously
They only work with static data
They require manual refresh
Question 52 of 60
Quiz ID: q52
What is a potential problem with continuous queries?
They consume too much storage
They can lead to a flood of updates
They don't work with real-time data
They require too much programming effort
Question 53 of 60
Quiz ID: q53
What does CEP stand for in the context of stream processing?
Central Event Processing
Complex Event Processing
Continuous Event Processing
Concurrent Event Processing
Question 54 of 60
Quiz ID: q54
What characterizes many stream processing systems mentioned in the lecture?
They always persist data to disk
They are purely in-memory and do not persist data
They only work with small datasets
They require expensive hardware
Question 55 of 60
Quiz ID: q55
What is the lambda architecture in stream processing?
A single stream processing approach
Split stream into two: one to stream processing system, another to database for storage
A method for data compression
A security protocol for streams
Question 56 of 60
Quiz ID: q56
What is a disadvantage of lambda architecture?
It's too complex to implement
It often leads to duplication of querying effort
It doesn't scale well
It only works with small data
Question 57 of 60
Quiz ID: q57
What type of window doesn't overlap in streaming systems?
Sliding window
Tumbling window
Hopping window
Session window
Question 58 of 60
Quiz ID: q58
What do publish-subscribe systems provide?
Database storage capabilities
Convenient abstraction for processing streams
User interface components
Data encryption services
Question 59 of 60
Quiz ID: q59
Which parallel pub-sub system is mentioned as popular for managing streaming data?
Apache Storm
Apache Kafka
Apache Flume
Apache Flink
Question 60 of 60
Quiz ID: q60
How can graphs be modelled as relations according to the lecture?
graph(nodes, edges, properties)
node(ID, label, node_data) and edge(fromID, toID, label, edge_data)
vertex(ID, data) and connection(start, end)
entity(ID, type) and relationship(source, target, type)
Quiz Summary
Review your answers before submitting
60
Total Questions
0
Answered
60
Remaining
00:00
Time Spent
Submit Quiz
Back to Questions
Previous
Question 1 of 60
Next
!
Confirm Submission
Cancel
Submit Quiz