Drawing Distinctions Among Data Modeling Challenges: Tool vs. Technique

Imagine, for a moment, you’re given the task of digging a trench 3 feet deep and 100 feet long... Now imagine the only tool you’re provided is a teaspoon.


Data modeling is challenging. Data modeling with a lack of appropriate tooling turns a challenging endeavor into a daunting one. It’s no wonder so many practitioners become frustrated, so many engineers simply don’t see the point, and so many project managers give up long-term value for short-term results.


For many decades, software developers designed data models based on the required message sets, mistaking the messages for the interfaces. The result of this well-meaning but misguided approach is a single-use data model. When requirements change, necessitating message updates, systems expansions, or any number of other modifications, it is simpler to toss out the old model and start again. New methodologies support flexibility, scalability, reuse, and portability of data models, which provide much greater value far into the future.


Not only do today’s data models require revised thinking about the model’s purpose and value, but the inherent rigor and specificity of the models also requires a shift in ideation. Human language is vague and imprecise. It is mired in cultural, functional and linguistic multiplicities. Documenting a model to a degree that makes it machine understandable requires an examination of semantics that most practitioners have never had to consider. Yet it is this rigor and specificity that enables the automation of processes, updates, and even integration.


As a Team Lead once said, only half joking, “Data modeling is a Zen practice… I go by my engineer’s office, and he’s sitting there staring at the computer just like he was over an hour ago. ‘Making any progress?’ I ask. He answers without even turning his head, ‘Hrm, I’m not sure. Right now I’m considering the meaning of ‘green.’ What does it mean to be green?” As the engineer was likely contemplating, in an aviation system green may have many meanings, including “all is right,” “connection intact,” “active,” “cleared to proceed,” and more.


Data modeling is challenging. To discuss solutions to these challenges, there is first a distinction that must be made between challenges related to tooling and those related to technique. Beyond bad habits, complex linguistics and Zen teachings on semantics, there is the dilemma of the teaspoon.




Tooling Challenges (The Teaspoon)

When it comes to data architecture, the market is dominated by modeling tools that were built to help plan and execute business strategy.


Imagine that the spoon in the trench scenario is one of several implements on an 8-in-1 camping multi-tool. This multi-tool also includes a fork, corkscrew, scissors, bottle opener, a small saw edge, a file, and a blade. Indispensable for the larger purpose of surviving in the wilderness, the multi-tool no doubt holds an important place among the outdoorsman’s belongings. In fact, one has trouble imagining doing without one.


Would one be capable of digging a trench with a spoon? Prisoners have been known to escape the likes of Alcatraz with only such a tool. It might be doable, but it certainly is not ideal. A shovel, or even better yet, a trencher is designed and built with the specific tasks of digging and trenching in mind.


The dominant modeling tools on the market are designed for a larger purpose, with data modeling as an ancillary concern. Data modeling with them is doable, but it takes more work – more clicks, repetition of inputs across various levels, procedural workarounds to meet standards requirements, and more manual inputs begetting errors which require yet more labor to correct.


All these extra mechanics to build a model are difficult enough, but what happens when a model needs to be updated? Over time entity relationships may change as might our understanding or use of those entities and relationships.

Consider an incredibly simple model and a typical data refactoring pattern whereby one wishes to remove an attribute (location) out of an entity (Person) and move it into its own entity with a different measurement representation. The move at the conceptual level is simple, but what is the impact of that change on the overall model including the logical level, platform level and on the UoP?


Moving a single attribute at the conceptual level causes a ripple that affects many other points in even the simplest model. Not only is the identification of all affected components a challenge, a modeling tool that is not fit-for-purpose exacerbates that challenge as each change must be made through a manually intensive process. With each manual change required, the likelihood of human oversight and error increases.


One of the most valuable benefits of advanced modeling standards is that they require model rigor and specificity that enables an automated update process. So, not only can the mechanical procedures be minimized with a fit-for-purpose tool, but many of the maintenance and updated processes themselves have been automated in this new generation of data modeling tools. Yes, thankfully the teaspoon is no longer the only choice..


Techniques & Best Practices (Digging the Trench)

Whether digging a trench with a spoon or a shovel, the basic premise is the same. Scoop and transport soil from one place to another with the goal of achieving the trench to specifications. Beyond the mechanics of spoon or shovel then, what practices should be followed to produce a high-quality model?


Taken as a whole, model quality is a subjective measure. Evolving standards efforts, however, are enabling new objective measures for “goodness” that are just beginning to positively impact the industry. In fact, there are some fundamental practices modelers can embrace to build quality data models. These principles apply regardless of available tooling and enable more scalable, reusable, and easier to maintain data models that ultimately deliver greater value over time.


1. Begin with Education

It is particularly important for those sponsoring, leading, supporting, or executing the modeling to understand the Why. For many this approach is new, unfamiliar, requires a change in perspective and thinking, and usually takes more time and effort upfront. Why is data modeling to best practice standards the right move? Is the long-term value worth the short-term investment? Once the value proposition is understood, there is still a significant learning curve and practice required in terms of What should be modeled and How.


2. Capitalize on Existing Models

Building data models from scratch is daunting and expensive. Thankfully, the original building blocks need only be built once. Subsequent projects benefit by reusing the groundwork of a shared data model. The Future Airborne Capability Environment (FACE)’s Shared Data Model (SDM), for example, provides an ideal starting point for a data model that is intended to be aligned with the FACE Technical Standard. It obviates the need for teams to start from scratch and provides them an approved (and managed) set of observables and measurements.


Further efficiencies can be realized when one also starts with a Domain Specific Data Model (DSDM) which details entities and relationships applicable to a particular domain or System of Systems (SoS).


3. Question Relationships from the Start

When developing the entity model, one usually begins by identifying an entity (a real-world object) and quickly moves on to defining that entity’s attributes. There is, however, a human tendency to cast too wide a net when defining attributes. Just as “correlation does not imply causation” in data science, “correlation does not imply composition” in data modeling. It is important to thoughtfully consider whether that correlation is indeed best defined as an attribute or as some other kind of relationship.


To test if an attribute is suitable, consider the following: Does the attribute make sense without additional information in the label? For example, in the entity Person, an attribute address is defined. If address is intended to refer to the address of the person, then this construction passes the test. This style of modeling reduces ambiguity because it does not rely on information in the attribute label to provide essential meaning.


Semantics should not need to be included in the attribute label. If, for instance, the entity Person includes an attribute officeAddress, the address is obviously not intended to refer to the person. While this may be a convenient implementation, it is not very flexible. Here address is clearly intended to refer to the address of the person’s office. Poor construction forces a brittle and problematic relationship between the person and the address of the person’s office. Will a person always have an office? What if the person is unemployed? Can an office exist without a person? Does an office have an address even if it is not assigned to a person?


4. Document “Sometimes” Relationships as Associations

When defining an entity and its attributes, determine if any of those attributes are better represented as an association. An association represents a relationship between two entity types.


When we consider conceptual modeling, is a person characterized by the concept of an office address? In fact, the address is a characteristic of the office and not of the person. An implementation employing an association would allow for more flexibility, eliminate the need for duplication, and reduce the possibility of related errors. In such a model, two entities are created - Person and Office. Their relationship is modeled through an association – Office Assignment – which contains two properties or associated entities – Person and Address. The association joins the two entities through a specific relationship.


5. Document “Always” Relationships as Compositions

If two entities are correlated through an “always” relationship, that implies composition – a relationship between a whole and its parts. If it is a “sometimes” relationship, that implies association.


Consider that the Person entity includes a composed entity kidney. Very much like the address example, this model indicates that the person will always have a kidney and the kidney will always be a part of the person. For many, if not most domains, this model is entirely sufficient. However, what happens when we try to apply this model to the domain of a transplant surgeon?


In the domain of a surgeon, a kidney may not be a part of the same human forever. An association such as Belongs To must be created. This relationship increases the complexity of the data model, if only slightly. There is always a cost to creating a more sophisticated model, and this cost must be weighed against the value of the increased complexity, specific to the domain being modeled.


6. Model for Uniqueness

Characteristic Uniqueness A person may be assigned more than one type of address (office, home, vacation, etc.), but can a person have more than one position? A position may be represented multiple ways at the logical level, but conceptually, can a person be two places at once? Conceptually, the person only has a single position.


Does a person have more than one kidney? Some do, some don’t, but this is a matter of multiplicity. A person only has one concept of a kidney. If there is something fundamentally different from the left and right kidney, the kidney entity should be further decomposed and modeled with the difference clearly defined.


Entity Uniqueness An entity is characterized by its properties. How then are two entities with the same set of properties differentiated from one anothe