Overview
In 2020, it was estimated that on average 1.7 million bytes of data were created every second by every human. You can envisage 1.7 million bytes as a 100,000-word essay! Data is created by individuals (through social networks and mobile phones - in 2021 there were approximately 4.3 billion unique mobile phone users.); machines (through real-time, network connected, sensors – “the internet of things”); business and commerce (e.g. transaction records); science (e.g. bioinformatics, large scale simulation); medicine (MRI scans and EEGs). Much of this data is real time and georeferenced through GPS. Making sense of this vast ocean of data for the use and benefit of society is considered an imperative of the coming years, indeed most companies and organizations are already vigorously pursuing their “big data” agenda. Data scientists develop solutions for gathering, cleaning, archiving, analyzing and visualizing data for the purposes of making informed decisions, usually building upon their domain knowledge in their chosen field (e.g. biology, medicine, engineering, psychology). They develop models based on deep learning (a term now synonymous with artificial intelligence). Some examples of data science projects include the following. Business: Use historical discounting data from a department chain store at one thousand locations to predict how sales vary with department, season and location. Entertainment: Perform a sentiment analysis on the tweets about new Netflix shows and use this to predict future successful projects based on genre, actors etc.