Interview Questions on Data Classification and DLP Data Loss Prevention

Поделиться
HTML-код
  • Опубликовано: 16 сен 2024
  • Interview Questions on Data Classification and DLP Data Loss Prevention
    Data classification is the process of analyzing structured or unstructured data and organizing it into categories based on file type, contents, and other metadata
    Data classification is broadly defined as the process of organizing data by relevant categories so that it may be used and protected more efficiently.
    Data Sensitivity Levels
    REASONS FOR DATA CLASSIFICATION
    TYPES OF DATA CLASSIFICATION
    USING A DATA CLASSIFICATION MATRIX
    AN EXAMPLE OF DATA CLASSIFICATION
    THE DATA CLASSIFICATION PROCESS
    What is DLP?
    Data loss prevention (DLP), per Gartner, may be defined as technologies which perform both content inspection and contextual analysis of data sent via messaging applications such as email and instant messaging, in motion over the network, in use on a managed endpoint device, and at rest in on-premises file servers or in cloud applications and cloud storage
    How does DLP work?
    Once the envelope is opened and the content processed, there are multiple content analysis techniques which can be used to trigger policy violations, including:
    Rule-Based/Regular Expressions: The most common analysis technique used in DLP involves an engine analyzing content for specific rules such as 16-digit credit card numbers, 9-digit U.S. social security numbers, etc. This technique is an excellent first-pass filter since the rules can be configured and processed quickly, although they can be prone to high false positive rates without checksum validation to identify valid patterns.
    Database Fingerprinting: Also known as Exact Data Matching, this mechanism looks at exact matches from a database dump or live database. Although database dumps or live database connections affect performance, this is an option for structured data from databases.
    Exact File Matching: File contents are not analyzed; however, the hashes of files are matches against exact fingerprints. Provides low false positives although this approach does not work for files with multiple similar but not identical versions.
    Partial Document Matching: Looks for complete or partial match on specific files such as multiple versions of a form that have been filled out by different users.
    Conceptual/Lexicon: Using a combination of dictionaries, rules, etc., these policies can alert on completely unstructured ideas that defy simple categorization. It needs to be customized for the DLP solution provided.
    Statistical Analysis: Uses machine learning or other statistical methods such as Bayesian analysis to trigger policy violations in secure content. Requires a large volume of data to scan from, the bigger the better, else prone to false positives and negatives.
    Pre-built categories: Pre-built categories with rules and dictionaries for common types of sensitive data, such as credit card numbers/PCI protection, HIPAA, etc.
    They are, from highest to lowest:
    Restricted Data/Formerly Restricted Data
    Code Word classification
    Top Secret
    Secret
    Confidential
    Public Trust
    Controlled Unclassified Information (CUI) #CyberSecurity

Комментарии • 35