Microsoft Breach, Dataset Search, Data Storytelling, Messy Data & ML, Academic Data and Coronavirus
This week we begin with an article on the recent Microsoft data breach which led to 250 million customer records being exposed. This is followed by a story about Google’s Dataset search engine which just came out of beta stage and which now offers access to 25 million datasets. After this, we cover the significance of showcasing data-centric insights via data storytelling. The next piece focuses on the impact that messy data has on Machine Learning projects. Then we have an article about the under-utilization of academic data in real world applications and decision-making tools. Finally, we cover the potential use of data visualization for containment and eradication of communicable diseases, like Coronavirus.
Database Access Misconfiguration Exposes 250M Customer Records at Microsoft
Comparitech security firm reported a major data breach at Microsoft that exposed 250 million customer records over a period of a couple of days. Microsoft said leaked data, which did not include personally identifiable information, was not used maliciously.
Google’s search engine for scientists upgraded for better data scouring
Google’s search engine for datasets, the cunningly named Dataset Search, is now out of beta, with new tools to better filter searches and access to almost 25 million datasets.
Why Data Storytellers Will Define The Next Decade Of Data
For today’s digital businesses, data can serve both as an input and an output. Data can be an invaluable asset if it is well-managed or a costly liability if it’s not. In no previous decade has data been as integral to business success as it is now.
Messy data is slowing down machine learning projects and driving up costs
The “garbage in, garbage out” warning about bad data is more relevant than ever as datasets grow ever more enormous and drive ever more business decisions.
Academic data is widely available, yet widely underused. Why?
Academic data is one of the most underutilized forms of data that exist today. Outside of educational institutions, the only significant use of this data is in enrollment verification, though even that use case is not particularly widespread.
How design can stop the spread of the Wuhan coronavirus
When airports are screening passengers for a communicable disease, you know it’s serious. No one wants another H1N1. As of this writing, the Wuhan China outbreak of coronavirus has infected more than 800, killed 41 people, and found its way to the United States in two confirmed cases.s say.