A few weeks ago a blog post by George Hulme on Health Information Trust Alliance (HITRUST) community site caught my attention. In his blog George talks about data breaches in the healthcare realm and how they are hard to prevent even if various data protection technologies are implemented. George wonders if data masking can reduce the frequency of data breaches where the primary attack vector is theft of data from non-production environments and I wanted to examine this premise in the context of implementing an identity administration solution with a product such as Oracle Identity Manager.
Data masking is an umbrella term for techniques that transform data without changing relationships within the dataset. Masking is particularly useful when applied to sensitive data in regulated environments such as healthcare since exposing protected health information (PHI) is an obvious risk. The chance of exposure is exponentially increased in non-production environments since many organizations typically copy production data at least one time to a staging environment although some articles cite 6-8 times as the common replication factor of a single production dataset across environments. Given that a greater number of insiders have access to non-production environments and controls are typically lax (or less stringent, ok?) outside of production, we've got a double whammy when it comes to risk.
A simple formula for assessing Risk is Probability of event multiplied by Severity of event. The probability of an insider having access to personal data housed in a non-production environment is high. Not only that but there will usually be a higher number of non-employees that make up the insider population with access to non-production data. If we assume that non-employees are a higher risk group, our Probability goes up from high to very high. The severity of data loss (breach, theft, leak, sprout, what have you) is also high in regulated environments, be it HIPAA, SOX or one of other 50+ regulatory frameworks if we go around the world (this is a real number, folks!).
Even with this simple formula, it follows that reducing the Probability of an event via data masking when copying data from production to non-production is a Good Thing™. The challenge with data masking when deploying an identity administration solution is that some fields in our dataset are used to establish relationships with other systems, rendering the masking exercise for these fields useless. For example, in a solution implemented on top of Oracle Identity Manager, reconciling records from an external system to OIM relies on matching rules that establish equality between external records and OIM records for a given identity. Frequently the matching rules are very simple, it's one rule that looks at some key field such as a login name on external system and a user ID in OIM. If the value of this key field is not sensitive (i.e. the field does not contain your passport serial number, for example), it doesn't need to be masked and our reconciliation will continue to work. If it is sensitive and we do mask it, we break our reconciliation process since matching rules will not establish equality even though the records do refer to the same physical person.
A possible solution to the above dilemma is a piece of software (let's call it a Data Broker) that sits between the applications and the identity store, the latter being a database in Oracle Identity Manager deployment. For applications connecting to the identity store and authenticating via an application account, the Data Broker will return data unmasked, for everyone else it will mask it. The masking would have to be done on-the-fly and in real time. If we want the Data Broker to reside at the lowest possible level in our application stack without going down to the network layer, we're talking about Data Broker that's attached to, well, data and resides inside the identity store itself. One possible implementation of our Data Broker for Oracle databases is Oracle's Virtual Private Database (VPD). Even though VPD is primarily a row-level security solution, it includes simple masking and column-level access control features and it resides at the database level. (Oracle has other products that specifically deal with data masking but our emphasis here is on access control).
However, VPD or any other implementation of Data Broker is far from perfect for our scenario. For one, there's a potential performance overhead, even if present only in non-production environments. Worse, once the application gets a hold of the data, it can log it, print it, send it to the console and so on. In other words, just because we've stopped developers from being able to hit the database and grab the sensitive data, we haven't addressed the root cause, i.e. source data being unmasked. Yes, we reduced the Probability of an event but not by a whole lot as any developer working with the application would easily circumvent the restriction by having the application grab and expose data.
A close-to-good solution would combine access control and DRM for data regardless of where data lives or how it's represented. Even if data ends up being transformed to, say, objects in application memory, it would still "know" where it came from and how to behave. As a first step, we restrict everyone but applications from getting unmasked data in non-production. As a second step, we allow the applications to retrieve unmasked data but we attach a usage policy to the data. This policy would prohibit the application from sending the data to the log or using it in any context other than manipulations in memory. This would restrict the Probability of a leak even further but not completely eliminate it. Memory can be examined too, an attack known as memory space snooping. There are solutions for that but I digress! (Note: the access control+DRM solution is imaginary, I don't know of any products that do this).
Good solution? Don't store sensitive data in your identity management database. If you do store sensitive data, don't use it for establishing relationships with external systems, then it can be masked when exported to non-production environments. The latter is a hard issue - either you need to use a field for integrating with external system or you can't integrate...but there's a light at the end of that tunnel. The light is called a risk acceptance form signed by the CIO to save you from going to jail in the event of a data breach.
Perfect solution? Meet Stanley Ipkiss:
Meet Stanley Ipkiss
by Deborah Volk on May 10th, 2009
Posted in Oracle Identity Manager, Access Management, Identity Management Tagged with vpd, data masking, risk, phi
Leave a Comment
Access Management (19)
Ask Identigral (6)
Change Management (10)
Data Quality (4)
Identity Management (27)
Passlogix v-GO (3)
Sun OpenSSO (3)
Sun Role Manager (3)
11g 3rd bday JavaOne SAML academia accuracy active directory adapters administrative agilent ask identigral attestation audit bpel bpmn bpm business case cdi cloud computing connectors contextual search data masking data quality deployment dip entitlements federation gartner groups gtc guests insider threats insider threat java jca jms lifecycle limericks linux mashup mdm messaging migration nabaztag oaam oam oas obiee oc4j oel off-boarding ohs oid oif oim oow09 opensso operations osso ovd owsm passwords patching performance phi privileged accounts provisioning queues reconciliation risk rocks rogue accounts rsa10 semantics siem sim sjsds sod solaris suncle thermodynamics twitter virtual reality vpd waveset webinar whitepapers