Big Data and Data Science Concept
In the 1800s, significant technological developments took place because of electricity. In the 1900s, petroleum was the driving force of the century. Similarly, data is considered the fuel of the 21st century. Almost every major corporation is dependent on data, which is increasing as per Moore’s Law.
In this TED Talk Tim Smith focuses on the changes that have happened over time in the world of Big Data. The concept of ‘Big Data’ emerged when physicists needed to analyze data in CERN. Initially, it was stored in a large mainframe computer, which changed drastically in the coming years.Big Data is an Elusive Concept -Techbytes by Tim Smith
With time, the amount of data increased, and it was stored in physically connected distributed systems around the CERN lab. As you might’ve guessed, this wasn’t an efficient way to store the data as it required heavy physical infrastructure. That gave birth to CERNET, and gradually the internet. It completely disrupted the conventional method of storing and sharing data.
What makes Big Data Elusive?
Our current gadgets are not sufficient to store the humongous and ever-increasing data. Moreover, big data is a concept which undergoes major structural changes in less time period. As a result, it is often called an elusive concept. This gradual evolution in Big data systems was a result of coping with needs and enhancing efficiency.
Initially, data centers were physically connected to the facilities. But this was a major hurdle in places having hot climate, since many electronic devices malfunction with increased load and heat. Moreover, after a certain limit and increasing data, physical infrastructure became a barrier. In such cases, accessing data remotely became popular amongst the community, since it eradicated the physical constraints associated with this aspect.
Furthermore, it allowed global access to data without knowing the actual source. Talking about CERN, this enabled researchers to pursue their work while sitting at their homes. It also connected the global scientific community, which consequently increased the research output.
Challenges with Big Data
However, the physical infrastructure was not the main challenge associated with big data systems. Ever-increasing data is the primary concern for the community, as we don’t have efficient data storage systems in the present day. After all, there is a saturation point in every data storage facility. To solve this issue, computer scientists and physicists are continuously scrutinizing existing algorithms and silicon-tech to make more efficient systems.
Just to get a rough idea of data captured at CERN, imagine a camera operating at 14 million per second is connected with 150 million sensors to analyze every part of second for the collision of a single particle.
Furthermore, such complex data needs to be stored, and this increases with every experiment performed. Other companies such as Google, Facebook, and Netflix also strive for billions of records collected from millions of users. Governments also depend a lot on data, be it your SSN, license records, criminal records, taxes, etc.
Apart from undergoing research to improve the existing systems, proper management of data also reduces the overall size of data. Various software are being developed which remove redundant columns. Moreover, database tools like Hive, Spark, and Hadoop have ardent data storage structures, which again reduce the load as compared to traditional DBMS systems.
The need for storing big data optimally is the need of the day. Despite advancements in AI, blockchain, and other technologies, this domain needs more extensive innovation. However, as of now, the systems are easily able to manage storage and sharing. But there will always be a race between ever-increasing big data and systems for storing it.