Watch the video clip 3 times and fill in the blanks. So we 1___________ produce a lot of data, for example via social media, public transport and GPS, but it goes way beyond that. Daily, we upload 55 million pictures, 340 million tweets and 1 billion documents in total. We produce 2.5 quintillion bytes a day. That’s a lot of zeros. It’s 2 ____________. We call this big data, but what’s actually more important is what you can do with it. To process big data, you don’t need huge computers. People work with the cloud and endless network of normal servers and powerful 3 ____________. This way, they can analyse over a million pieces of data in minutes, and the result, well for example, video streaming website Netflix analyzed the big data of their viewers like popular shows and watching patterns. This way, they produced a successful series with the perfect combination of actors, directors and storyline. Right now, the big data of traffic is being analyzed to develop a car that can drive completely accident-free all by itself, and in the future, we can even use the big data of DNA to determine the perfect treatment. This way, curing 4 __________ diseases like cancer would become much easier, and that’s just the start. The quantity of computer data generated on planet Earth is growing 5___________ for many reasons. For start, retailers are building vast databases of recorded customer activity. Organizations working in logistics, financial services, health care and many other sectors, are also capturing more data, and public social media is creating vast quantities of digital material. As vision recognition improves, it is additionally starting to become possible for computers to 6 ___________ meaningful information from still images and video. As more smart objects go online, big data is also being generated by an 7 __________ Internet of Things. And finally, several areas of scientific advancement are starting to generate and rely on vast quantities of data that were until recently almost unimaginable. Big data is often characterized using the three Vs of volume, velocity, and variety. Here, volume 8 __________ both the greatest challenge and the greatest opportunity as big data could help many organizations to understand people better and to 9 __________ resources more effectively. However, traditional computing solutions like relational data bases are not scalable to handle data of this 10 ___________. Big data velocity also raises a number of issues with the rate at which data is flowing into many organizations now exceeding the capacity of their IT systems. In addition, users increasingly demand data which is streamed to them in real time, and delivering this can prove quite a challenge. Finally, the variety of data types to be processed is becoming increasingly diverse. Gone are the days when data centers only had to deal with documents, financial 11 ____________, stock records and personnel files. Today, photographs,audio, video, 3D models, complex simulations and location data are being piled into many a corporate data silo. Many such big data sources are also unstructured and hence not easy to 12 _____________, let alone process with traditional computing techniques. Due to the challenges of volume, velocity and variety, many organizations at present have little choice but to ignore or rapidly excrete large quantities of 13 ____________ valuable information. Indeed, if we think of organizations as creatures that process data, then most are rather primitive forms of life. Their sensors and IT systems are simply not up to the job of scanning and interpreting the vast oceans of data in which they swim. As a consequence, most of the data that surrounds organizations today is ignored. A large 14 ____________ of the data that they gather is then not processed with a significant quantity of useful information passing straight through them as data exhaust. For example, until recently, the majority of the data captured by retailer loyalty cards was not processed in any way, and almost all video data captured by hospitals during surgery is still deleted within weeks. Today, the leading big data technology is Hadoop. This is an open source software library for 15___________, scalable distributed computing and provides the first viable platform for big data analytics. Hadoop distributes the storage and processing of large data sets across groups or clusters of server computers whereas traditional large scale computing solutions rely on expensive server hardware with a high fault tolerance. Hadoop detects and 16 ___________ for hardware failures or other system problems at the application level. This allows a high level of service continuity to be delivered by clusters of individual computers each of which may be prone to failure. Technically, Hadoop consists of two key 17 _________. The first is the Hadoop distributed file system which permits high bandwidth cluster-based storage. The second is a data processing framework called MapReduce based on Google search technology. MapReduce distributes or maps large data sets across multiple servers. Each server then creates a summary of the data it has been allocated. All of this summary information is then aggregated in a so termed reduced stage. MapReduce 18 __________ allows extremely large raw data sets to be rapidly distilled before more traditional data analysis tools are applied. For organizations who cannot afford an 19 __________ big data infrastructure, cloud-based big data solutions are already available where public big data sets need to be utilized. Running everything in the cloud also makes a lot of sense as data does not have to be downloaded. Looking further ahead, quantum computing may greatly improve big data processing. Quantum computers store and process data using quantum mechanical states that will in theory excel at the massively 20 ___________ processing of unstructured data.