Image

BLOG

Image

It is the means to extract knowledge from the raw data. One of the biggest challenges in big data and machine learning is the creation of value out of the raw data. When dealing with personal data, this must be coupled with privacy preserving approaches, so that only the necessary data is disclosed, and the data owner keeps the control on it.

It is the means to extract knowledge from the raw data. One of the biggest challenges in big data and machine learning is the creation of value out of the raw data. When dealing with personal data, this must be coupled with privacy preserving approaches, so that only the necessary data is disclosed, and the data owner keeps the control on it.

The DKE consists of machine learning approaches to aggregate data, abstract models to predict future data (e.g., predict user’s interest in recommendation systems), fuse data coming from different source to derive generic suggestions (e.g., to support decision by users, providing suggestions based on decisions taken by users with similar interest).

Image

The objective of the Data Provenance Tools from an End-User-Perspective is to provide stronger data ownership guarantees to data providers as sharing their datasets with the PIMCity platform will also discourage the illegal copying or reselling of datasets.

The objective of the Data Provenance Tools from an End-User-Perspective is to provide stronger data ownership guarantees to data providers as sharing their datasets with the PIMCity platform will also discourage the illegal copying or reselling of datasets. That is because our module provides a watermarking algorithm that allows a data buyer or data owner to verify data ownership offline or a third-party verifier online on behalf of the data owner by reading a data owner secret information online previous agreement with such owner.

The Data Provenance tools is an OpenAPI formatted framework for interoperable transactions with other components in the PIMCity platform. Responsibilities for the trading of the datasets fall outside the scope oof the Data Provenance tool but in future releases we plan to provide metadata information that is valuable to data traders in reassuring operational exchange of such datasets with data buyers and the like.

In particular, we provide the following capabilities to the Trading Engine of PIMCity,

1. Insert a watermark in a dataset to assure to data providers their data ownership conforms to their secret information even if not stored in the PIMCity platform (offline). Data buyers could be additionally provided with hint that a particular dataset is legally sourced from the PIMCity platform by belonging to a specific data provider without data buyer having to know the data provider.
2. Verify a watermark of a dataset by receiving a data provider’s secret information to reassure a data provider that a piece of data found in the wild outside PIMCity belongs to them with a given secret input information only the data provider (offline) and/or PIMCity (online) knows about.
In the first case, data providers or PIMCity platform provide or generate the secret information from which to derive a secure watermarked dataset.
In the second, the secret information to verify a dataset can be strictly held by the data provider that owns the data but it has to be at the very least read by the Data Provenance component in order to return True or False as result of the verification process for a watermarked dataset.

Benefits
Data Provenance tools in WP4 provide greater reassurance to data providers about sharing their private data while discouraging abusive reuse, reselling or simply copying their datasets in the wild without permission. We have implemented a first watermarking algorithm for Strings (browsing history urls) that follows a similar approach to state-of-the-art in VLDB ’02 [1].
Regarding permissions, together with the Data Trading Engine we plan book keeping of the transactions generated in the data trading platform by recording metadata such as source, destination and number of times each dataset has been shared. This will allow us to:

1. Identify if a dataset located in the ‘wild’ belongs to a given data owner.
2. The protocol is aimed at providing traceability or fingerprinting of the data buyer of a dataset in the ‘wild’ too, without disclosing to the public Internet the identity of such data buyer but just serve metadata to the PIMCity platform to process it in a secure manner.
Moreover, the Data Trading Engine provides through a data marketplace in PIMCity, information to buyers and sellers about every dataset and possibly personal transactions of the user. This is in order to offer transparency and trust to data transactions in PIMCity.

Image

The Data Portability Control (DPC) tool implements the right of data portability, a novelty of the EU’s General Data Protection Regulation (GDPR), that allows individuals to obtain and reuse personal data from one environment to another in a privacy-preserving fashion. More specifically, it incorporates the necessary tools to import data from multiple platforms (through the available Data Sources), process the data to remove sensitive information (through the Data Transformation Engine), and outport into other platforms (through the Data Export module).

The Data Portability Control (DPC) tool implements the right of data portability, a novelty of the EU’s General Data Protection Regulation (GDPR), that allows individuals to obtain and reuse personal data from one environment to another in a privacy-preserving fashion. More specifically, it incorporates the necessary tools to import data from multiple platforms (through the available Data Sources), process the data to remove sensitive information (through the Data Transformation Engine), and outport into other platforms (through the Data Export module). Since the tool does not have a dedicated UI for interacting with the users, it provides an interface in a form of a generic Control API for controlling all operations from other systems.

Benefits

By using the DPC tool, an organisation can satisfy the right of data portability to its users. At this stage the tool provides support for the following features:
-  Data aggregation from Banking data through an Open Banking API (TrueLayer).
-  A generic data anonymization that hides specific data categories (or columns) from selected DPC-related imported data that are considered sensitive.
-  Data export in a common data interchange format (e.g., JavaScript Object Notation (JSON)).

Image

The Data Aggregation module allows users to anonymize any kind of dataset, using K-anonymity as the main algorithm, deleting sensitive data and aggregating them, reducing their weight without important data loss. The anonymized data are stored into the component’s database and are accessible at any moment.

The Data Aggregation module allows users to anonymize any kind of dataset, using K-anonymity as the main algorithm, deleting sensitive data and aggregating them, reducing their weight without important data loss. The anonymized data are stored into the component’s database and are accessible at any moment. It is also possible to query for all the anonymized datasets that are available from a specific user.

Benefits
Data anonymization and aggregation.
Can be used for evaluation of applications, sale or share them with other partners in a privacy preserving manner.
Being able to use others’ anonymized data for comparison and correlation with my data without breaching privacy.

Example
A TELCO can use this component in order to anonymize location data or call records in order to share them with partners, collaborators or third parties under specific conditions/agreements.
Marketing purposes, explore possible new partnerships, provide aggregated data to end users in order to compare them with their own profile etc.

Image

It is the means to give personal data the right value. Online systems make money with users’ personal data. It is thus fundamental to know what the economic value of each piece of information is, to let the user take informed decision on what to share, at what price.

It is the means to give personal data the right value. Online systems make money with users’ personal data. It is thus fundamental to know what the economic value of each piece of information is, to let the user take informed decision on what to share, at what price. The D-VT consists in a set of methodologies that will make the value of the data transparent. It will offer standard mechanisms to publish prices, complemented with machine learning approaches to extend the knowledge to other data and systems. This information will be stored in open repositories, so that PIMS can easily give the right value to data.

Market Perspective
The Data Valuation Tools from the market perspective (DVTMP) module developed in PIMCity will leverage some of the most popular existing online advertising platforms to estimate the value of hundreds to thousands of audiences. The DVTMP module aims to provide the monetary value of audiences traded on the main online advertising platforms. This will serve any PIM deciding to implement the DVTMP module to have a realistic estimation of audiences’ value to be traded. The design of the DVTMP pursues the following objectives:

1. Crawling data value of audiences from Facebook, Instagram, and LinkedIn
2. Process, clean, and curate the collected data
3. Store processed data
4. Provide access to the data through an API

Benefits
Nowadays, the auction mechanism is the most prevailing type between sellers and buyers in the data economy. Therefore, a good approximation of the actual value of the data can assist both parties (users and companies) through the transaction. Users usually have less
experience than the companies in marketing activities. Therefore they may underestimate or overestimate the value of their data. This module tells the users the actual value that marketing platforms create using their data. This module benefits the users by:

- Providing real-time value estimation of the data,
- Help the users to sell data at their actual price and avoid losing money by selling it underpriced,
- Providing the estimations simply sending an HTTP POST request to the server.
It also benefits companies by:
- Helping them to target relevant audiences and estimate the cost of their campaigns,
- Buy users’ data at a fair price and do not lose money by purchasing overpriced data

User Perspective
The objective of the Data Valuation Tools from an End-User Perspective (DVTUP) module is to provide estimated valuations of end-users’ data they are selling through the marketplace according to the value this data provides in performing the specific AI/ML task that the buyer wants this data for. This value is not necessarily related to volume nor is equitable for the users, but requires more complex calculations that must be adapted to each specific use case.
DVTUP implements a framework that allows data marketplaces to provide value-based valuations of data products they trade. In particular, DVTUP will provide tools for the TE to:

1. Provide buyers with a hint of how valuable a piece of data is for a certain type of model or even for a specific task.
2. Calculate a fair breakdown of data transaction charges by seller, looking forward to rewarding each user proportionally to the value that each piece of data from different sellers brings to the buyer for a specific task.
In the first case, the output will be the expected accuracy the buyer will get from a dataset if purchased from the marketplace. In the second case, the output will estimate the percentage of a transaction value that corresponds to each seller, and a log of data and results obtained to justify rewards paid to different sellers.

Benefits
DVTUP overcomes some key challenges that are undermining data markets nowadays. In particular:

1. It allows data buyers to try data before they buy (TBYB) and know their value for their specific task beforehand. This feature dramatically enhances their experience and improves the value provided by the data marketplace [2].
2. It allows data marketplaces to reward users in accordance to the value they bring to the specific transactions. Since the value of data is inherently combinatorial, data marketplaces and PIMS usually sell combinations of data from different users or sources to feed a certain AI/ML model. DVTUP ensures the payback to user is fair. This incentivize the provision of high-quality data.

Image

The main function that this module implements is to execute all transactions (between Data Buyers and Data Sellers) within the platform to exchange data for value in a secure, transparent, and fair-for-all way.

The main function that this module implements is to execute all transactions (between Data Buyers and Data Sellers) within the platform to exchange data for value in a secure, transparent, and fair-for-all way. The TE serves as a communication interface between the PIM backend and the data buyers. There is a myriad of data types that can be sold. The TE focuses in bulk data and audience data.

Benefits
Data economy has been increasing in an exponential way but people (real owners of the data) have been taken aside.
With this system, companies and people will be able to interact directly and be part of the multi-billion market data economy. Community developers can also create tools to integrate with Data Trading Engine and participate in this economy.
Benefits for users

○ Allow users to get value from their data with their explicit consent using an API.
○ Users will participate in the data economy for the first time.
○ Users get ownership and decision over their data.

Benefits for companies
○ Get a simple and transparent system to generate data offers and get users’ data with their explicit consent.
○ Get data of higher quality
○ Be a company that protects and comply with data regulations
○ Easy to integrate it into your systems

Example
If a Data Buyer (any company) wants to place a Data Offer (ticket to acquire a bag of users’ data), the Trading Engine will be needed to execute this transaction. The Trading Engine will execute the following steps to generate the transaction:

1. Gets the price for the audience or data being bought.
2. Calculates how many people fit in the budget.
3. Gets the certified list of users with active consents.
4. Fetches their data. If the data size is too big to handle at once, streams are used.
5. Cleans the data.
6. Handles back the data and updates credits in the accounts.

Image

The Personal Privacy Preserving Analytics (P-PPA) module has the goal of allowing data analysts and stakeholders to extract useful information from the raw data while preserving the privacy of the users whose data is in the datasets. It leverages concepts like Differential Privacy and K-Anonymity so that data can be processed and shared while guaranteeing privacy for the users.

The Personal Privacy Preserving Analytics (P-PPA) module has the goal of allowing data analysts and stakeholders to extract useful information from the raw data while preserving the privacy of the users whose data is in the datasets. It leverages concepts like Differential Privacy and K-Anonymity so that data can be processed and shared while guaranteeing privacy for the users.

P-PPA includes a set of functionalities that allow perform data operations preserving the major privacy properties: k-anonymity, z-anonymity, differential privacy. P-PPA is capable to handle different sources of data inputs, that define which kind of privacy property is called into account: we have design solutions for tabular and batch stream, handled with PostgreSQL, MongoDB, and CSV modules, and live stream data.

Image

Privacy Metrics represent the means to increase the user’s awareness. This component collects, computes and shares easy-to-understand data to allow users know how a service (e.g., a data buyer) stores and manages the data, if it shares it with third parties, how secure and transparent it looks, etc. These are all fundamental pieces of information for a user to know to take informed decisions.

Privacy Metrics represent the means to increase the user’s awareness. This component collects, computes and shares easy-to-understand data to allow users know how a service (e.g., a data buyer) stores and manages the data, if it shares it with third parties, how secure and transparent it looks, etc. These are all fundamental pieces of information for a user to know to take informed decisions.

Howit Works?
The PM computes this information via a standard REST interface, offering an open knowledge information system which can be queried using an open and standard platform. PMs combine information from supervised machine learning analytics, services themselves and domain experts, volunteers, and contributors

Image

The primary objective of the Personal Consent Manager (P-CM) is to give the users the transparency and control over their data in a GDPR compliant way. That is, give them the possibility to decide which data can be uploaded and stored in the platform, as well as how (raw, extracted or aggregated) data can be shared with Data Buyers in exchange for value when the opportunity arises.

The primary objective of the Personal Consent Manager (P-CM) is to give the users the transparency and control over their data in a GDPR compliant way. That is, give them the possibility to decide which data can be uploaded and stored in the platform, as well as how (raw, extracted or aggregated) data can be shared with Data Buyers in exchange for value when the opportunity arises.

The P-CM is presented as a web application and a REST API, not only providing users the possibility to use the component in a user-friendly way, but also enabling developers to integrate PIMCity Consent Management capabilities in their products. The architecture of the PDK is depicted in the figure.

Image

It is the means to store personal data in a controlled form. It implements a secure repository for the user’s personal information. It is responsible for storing and aggregating user’s information such as navigation history, contacts, preferences, personal information, etc.

It is the means to store personal data in a controlled form. It implements a secure repository for the user’s personal information. It is responsible for storing and aggregating user’s information such as navigation history, contacts, preferences, personal information, etc.

This can be done in Push or Pull mode, i.e., the user can actively decide which information to store and retrieve; or the system can do it automatically by importing information as the y are collected while the user performs his usual activities like browse the web or move about a city. P-DS can store either the original copy of user data or point to other repositories, e.g., to external services that have already collected the data, limiting data replication if desired.

0 | 10 | 20