MOSTLY AI relies on AI-generated synthetic data
Whether data theft, data protection violations or the basis for AI discrimination: the use of customer data harbors a number of pitfalls. The Viennese start-up MOSTLY AI wants to counter this with AI-generated synthetic data.
(Bild: Screenshot / MOSTLY AI)
AI - generated synthetic data are artificially created data sets that are based on original data sets. You get the economic value of the original information, but no longer allow any conclusions to be drawn about private or sensitive information. They are therefore GDPR-compliant and can be used flexibly.
The Austrian start-up MOSTLY AI sees itself as a pioneer and key player in this area. With headquarters in Vienna and New York, the company is also represented internationally. "The use of real data for AI is risky," explains CEO Dr. Tobias Hann. For example, real information could fall into the wrong hands or data protection violations could occur. "Which means that you either have to do without the development of AI solutions or the real data is used anyway," says Hann. Although this is a violation of data protection, it still happens again and again.
Synthetic data as a rapidly evolving technology is not yet a clearly defined term. While it was first mentioned decades ago, older types of synthetic data do not bear any resemblance to the powerful AI-generated synthetic data we have today. Although there are various application areas where AI-generated synthetic data is of value, only one category is relevant for privacy protection: privacy-preserving, AI-generated synthetic data.
Privacy-preserving, AI-generated synthetic data is defined as an anonymization technology that preserves data utility. It is artificial data created by a machine learning model trained on real-world data that accurately and granularly retains the statistical properties of the real data it was trained upon. Yet, it is generated with a holistic set of privacy mechanisms that ensure absolute, irreversible anonymization.
Admittedly, we are now entering the weeds of selecting a truly privacy-preserving synthetic data solution. But for the sake of completeness: if you want bulletproof privacy protection, make sure that you are getting fully and not partly AI-generated synthetic data. To not be mistaken, there are some highly interesting use cases for partly synthetic data and it can be an invaluable tool to augment real-world data. But it’s not the right approach if privacy protection is what you are after.
An AI data generator employs artificial intelligence (AI) to generate synthetic data. It can be used for many different use cases, such as data privacy management, business process optimization, application testing, and machine learning model training. A number of different techniques are used to generate synthetic data via AI, including the: Crafting of synthetic data from existing datasets Creation of new data points using generative models Simulation of real-life scenarios AI data generators are designed to be flexible and customizable, enabling users to specify the format and quantity of the data they’d like to generate, as well as the characteristics they should have.
Since AI system training and evaluation often depend on a large amount of data for learning and decision-making, synthetic data can be used to replace, or augment, production data under certain circumstances. For instance, synthetic data generation could benefit: Autonomous driving Image classification Language processing
How to distinguish between partly and fully synthetic data?
The former pairs either...
a) real-world data, or
b) traditionally anonymized data (which, as you know, is simply not anonymous in the era of big data anymore .Thus it is basically real-world data 😉)
...with synthetic data.