Data protection scores explained

Overview

Protective.ai is sharing continuous, automated, and non-intrusive data protection assessments to help companies keep their data safe while assisting developers in protecting clients' data better.

We use non-intrusive and permissionless methods to estimate many aspects of data safety, security, and privacy. The analysis is performed externally using the visible developers' assets, listings, endpoints, and public documents.

All our scores are calculated in an equal and transparent way for all apps and integrations. A specific score can't be modified or tampered with by the developer or us. Developers who choose to improve their data protection & security will be authentically scored higher.

Methodology

We use well-accepted, explainable, and reproducible techniques to process and evaluate data protection levels. e.g. non-invasive cloud & endpoint assessments, data collection, statistics, and AI-powered text analysis.

Data sources

  • Publicly-facing endpoints - accessing openly available servers, endpoints, website, and cloud assets related to the app.
  • Public documents - privacy policy, security audits, and compliance documents available on the company website or the public domain.
  • Engagement signals - collecting comments, reviews, number of followers, installs, and other openly available popularity and community sentiment signals.
  • External data sources - additional data sources such as open intelligence APIs, fraud assessment providers, and known breach databases.
  • Docs uploaded by the developer - assessment may include other documents & endpoints that the developer chooses to share for more accurate data protection analysis.

Statistics & AI

  • Basic statistics - we use the most straightforward calculations applicable to generate and aggregate scores for better replicability and explainability.
  • Document analysis - we process privacy policies with a semi-supervised AI model capable of evaluating a privacy document over multiple dimensions by detecting specific terms, their use in context, and overall meaning. The AI was trained using thousands of reviewed privacy policies and performs at around 75% accuracy.

Results summary & visualization

We provide a summarized and simplified version of our report to users but always keep the complete analysis for developers to review their security and privacy posture in full detail and improve their data protection.

Score calculations

πŸ”’  Cloud security

The cloud and endpoint security score is calculated by sending a small request to open endpoints and analyzing the server response. The response indicates the server frameworks and infrastructure, providing known vulnerabilities and security exploits associate with it. We don't perform penetration-testing or brute-forcing at any point.

The detected infrastructure and frameworks are grouped into the following categories for simplicity (full report available to the developer):

  1. Data in transit - networking frameworks like SSH, HTTP/S, proxies, and DNS servers.
  2. Encryption - frameworks used for data and network encryption.
  3. Storage - data storage like database servers and FTP servers.
  4. Email - email servers that use protocols like POP3 and SMTP.

We use a 3-level ranking system for each category:
  1. Low risk - no vulnerability found for any of the frameworks we've recognized.
  2. Mid. risk - one or more vulnerabilities found, no exploits detected.
  3. High risk - one or more vulnerabilities with known exploits detected.

A single vulnerability or exploit can harm developers' backend security; for that reason, we rank even one vulnerability / exploit as risk.

Score calculation - Each category contributes 25% of the total security score and can be 0.0 for high risk, 5.0 for medium risk, or 10.0 for low risk.

πŸ™Š  Privacy policy

Our privacy policy score is calculated by analyzing the full privacy document provided by the developer. To evaluate the commitment level to specific aspects of data protection and privacy, we use semi-supervised NLP (natural language processing) that recognizes different phrases, terms, and formulations.

Our AI evaluates each privacy policy across 15 parameters, which are based on regulations like GDPR, CCPA, HIPAA, and other privacy best practices. The parameters are grouped and averaged as listed below:

  1. Limited data sharing - the commitment level not to share clients' data, including:
    • Not selling data manually to partners, clients, third parties, and data brokers.
    • Not sharing data automatically with advertisers and other services.
    • PII (personally identifiable information) data retention vs. removal.
    • Storing data in an aggregated form.
  2. Security measures - commitment level to put security measures in place:
    • PII and email encryption.
    • Server protection measures (firewall, roles, protected assets).
    • Dedicated security team & periodic security assessments.
    • Security-by-design principles.
  3. Transparency & control - commitment level to provide data controls and transparency:
    • Ability to opt-out.
    • The right to be forgotten.
    • Require explicit user consent.
    • Notify users if a data breach has occurred.

Score calculation - each category contributes 33% to the total privacy policy score and consists of an average of all relevant parameters estimated by the AI model.

πŸ’¬  Community trust

The community and user-base sentiment & trust analysis collects data from openly available sources such as integration marketplaces and social networks and converts the engagement numbers into simple scores in a way that resembles how users would evaluate them.

Our calculation uses several signal types and assigns each with a different weight to represent the contribution to the overall trust score:

  • Number of installs - High importance. Active users are a strong engagement signal.
  • Number of reviews - Medium importance. Indicative, can be encouraged / overlooked.
  • Stars distribution - Medium importance. Indicative, can be encouraged / overlooked.
  • Number of followers - Low importance. Indicative but can be manipulated.

Score calculation - the total trust score is a weighted average of all the platform engagement numbers, multiplied by the corresponding importance weight assigned to each parameter type (as listed above).

Additional scores

Our platform can provide additional evaluations and scores which are not available in the basic version, including:

  • Permission analysis - permissions required by the app / integration, and the data sensitivity for each permission. Apps that require minimal permissions to perform their task are usually ranked higher.
  • Regulatory compliance - our platform scans the apps' website and publically available documents, evaluating the stated security standards like SOC/2, ISO-27001, and NIST, as well as data privacy compliance such as GDPR, CCPA, and HIPAA.
  • Knowen data breaches - the severeness and frequency of detected hacks and data breaches, as disclosed in public databases and articles.
  • Active data analysis - ongoing monitoring of the data sent by an app / integration to the developer or 3rd-party backend, validating that no sensitive or unrelated data is being collected without the users' knowledge.

Summary

We are doing our best to provide accurate and clear data protection estimations to help companies keep their data safe, while helping developers improve & communicate their security & privacy. However, we are a startup and our product is not perfect. We strongly believe in transparency, so if you have any questions about how we collect, process, evaluate and calculate the scores - please reach out at info@protective.ai.

Let's talk

Need help? Interested in early access? Want to get in touch? Send us a message below, or email us at info@protective.ai