Data Modeling Is Hard vs. Data Modeling is Hard



This paper outlines and discusses several real and perceived aspects of data modeling when following a highly prescribed and connected data architecture. We explore where difficulties and effort is currently being expended and how that affects the desire and projected benefit of building these syntactic and semantic models.


The path to building, maintaining, and adopting these data models and the support data architecture is not to make it easier to build multiple models from scratch. How many times do we need to build a conceptual representation of an air vehicle entity? Each time we build an avionics related component?  No, we simply need one. The only real path to easing the burden on software providers is to get to the point where they can begin with the 90% solution allowing the rest can be integrated into the model with the smallest increment of effort.


This topic is complex and mired in technical details and nuances. We would rather this paper not be used as a sleep aid, so please, join us for a conversation on data modeling approaches, processes, tooling, and the resulting ramifications of having a ‘good’ model.


1. Introduction

The BITS (BALSA (Basic Avionics Lightweight Source Archetype) Integration & Test Session) out-brief at the December 2016 meeting of the Future Airborne Capability Environment (FACE™) Consortium was a fascinating affair. The presenters lined up, showcasing their excellent work with the BALSA framework and how they were able to quickly and easily integrate their solutions using the FACE architecture. They demonstrated how they developed a modular solution, insulating their system against obsolescence concerns and providing forward compatibility. It was even shown how they could replace pieces of the BALSA framework with their own implementations. They extolled the clear boundaries and how easy it was to connect their modules to these interfaces. They lavished praise on the FACE architecture and utility of BALSA. And they produced exciting demonstrations of capability that any team using the FACE architecture would be proud to include in their portfolio of accomplishments.


Then the other shoe dropped. These comments were common throughout the presentations.


“But we spent so much time on the data model.”“Data modeling is hard.”“Why are we even bothering with this data model?”“It took me too much time to add my content.”“The tools were not useful.”


These comments came as a bit of a surprise. It is not surprising that someone found data modeling to be difficult. Most anyone who has worked with large data models are painfully aware of how challenging it can be. Decomposing our view of the world is quite challenging, especially when there are several engineers working to develop a consistent product. This is further compounded because each member of the team is predisposed to thinking about the information in a particular way.


Furthermore, our thinking has been heavily influenced by decades of software development where the messages we use for communication are often considered to be the data model. We mistake the messages for the interfaces and when we make this assumption, we end up with single use data models that are useful only as conformance artifacts. Data models developed this way have little additional use.


While each one of these concerns is significant and deserves to be addressed at length, they fail to capture a far more fundamental challenge: one expressed using the exact same words, but with a completely different meaning. Data modeling is hard.


Consider, for a moment, the simplicity of the BALSA data model. It is not filled with hundreds of entities.  It is not even filled with dozens of entities nor are the entities in the data model cross connected with complex relationships that need to be navigated.


How then, if the BALSA data is such a simple data model, can data modeling be so hard?

Therein lies the disconnect. Creating large data models is difficult for a multitude of reasons. It is difficult for a team to build a large data model with any degree of consistency. It is hard to decompose entities to a comparable level of granularity. It is hard to develop a rigorous naming convention.


But this is not a large model being worked on by a large team. What is the challenge?

The actual mechanics of data modeling.


Forget, for a moment, about the challenges that stem from developing large models. These users are not struggling with modeling at scale. They are having a hard time with modeling basics. How is the data entered? Why is the data entered three different times at three different levels of abstraction? How many ways are these different levels of abstraction connected? Why is it nearly impossible to define a new measurement? Why do I have to click twenty times to add a single property to my entity?

Without drilling down to a deeper understanding, we will continue to talk past east other about why data modeling is truly difficult and fail to move past this aspect of the FACE Technical Standard forward.


2. Mechanical Difficulties

After hearing of the difficulties the BITS participants were having with data modeling, we set out to test this idea within our team2. As long time members of the Data Architecture Working Group (for this and other standards) with years of experience modeling with Enterprise Architect by Sparx Systems, we fully expected the data modeling process to be relatively trivial. Our goal was to build a small, valid data model with a limited set of measurements and entities for use by our test team.


To accomplish this, we needed to create five new entities each with an average of four properties. Some of these properties were observables and some were other entities. Some of the observables were represented by numeric values, some with text, and some with a list of values (enumerations).


Modeling at the conceptual level was relatively pain-free. It was trivial to create the entities and point them to the corresponding observables and entities. The amount of effort required was completely in line with expectations.


Surprisingly, it took around twenty mouse clicks to add the first property to an entity. This may not be the most useful metrics since it does not take long to run up a tab of twenty mouse clicks, but it is indicative of the amount of attention given to adding an attribute. This number of mouse clicks captures the effort required to add a new attribute (2 clicks or Control-N), set its visibility property (3 clicks), connect it to its corresponding observable or entity (4 clicks + navigating through model to find the observable or 6 clicks if you use the search options), and set its stereotype (6 clicks). It may not have been necessary to expend all of those mouse clicks getting the settings exactly right, but these properties were set to have comparable documentation to those already in the model.


The next level of modeling – the logical level – was far more difficult than was expected. While reusing an existing measurement was relatively easy, creating a new measurement was difficult. Creating the individual pieces of the measurement and measurement system was not terribly challenging since the meta-model (the rules for how the FACE Technical Standard says things should be glued together) describes how these pieces are intended to fit together. The primary difficulty arose in fitting the pieces together such that they produced a valid model when exported from the tool.

T

his same pain was experienced while modeling at the platform level, as well. There were many values that needed to be set to properly to wire the model together. A clear understanding of the meta-model does not directly translate into how things are done in the tooling.


The tools from Vanderbilt University were used to export the data model and the error messages generated by that tool were used to ensure a valid model was exported. This step turned out to be a challenge as well. While the error messages were very accurate, our team had a difficult time correlating the error messages with the offending entities in the model. We developed a couple of custom SQL queries to mine the model for additional information but that failed to find the offending errors.


Ultimately, we found that the best source of understanding came from inspecting the (partially) exported XMI file for missing connectors. This reinforces the difficulty in the mechanics of data modeling. The graphical tools are intended to make it easier to build models and to abstract the esoterica that is the XMI representation. In this case, our team found it more useful to reference the user-unfriendly text-based format for debugging.

It is worth mentioning that the XMI format is the only officially recognized data model representation. The comment about its lack of user-friendliness is not a criticism of the standard, but rather a statement of the difficulty required in parsing a verbose text file instead of viewing diagrams in a software tool. Our team frequently works directly from the XMI, but this came at the cost of many hours of studying data models in this format.


3. Other Difficulties

Data Modeling mechanics are not the only challenges encountered when building a data model. While the practice of data modeling has been in wide practice in the world of databases for decades, the data models proscribed by the FACE Technical Standard are a slight twist on traditional practices. Adjusting to different modeling concepts may not necessarily be a challenge for seasoned modelers, but it does compound the learning curve for first-time modelers. Not only are there are less experts to consult, there are few sources (beyond the standard itself) that explain the novelties of these modeling practices.


There is also a small perception that data modeling is merely busy work that we must do to achieve conformance. While a data model is necessary to achieve conformance, it does not represent a complete picture of its utility. The Conformance Test Suite (CTS) analyzes the data model along with the corresponding software’s object code in order to ensure that the software in constructed as advertised. It does not verify the logical behavior of the software, but it does verify that the software interfaces interact with only precisely what is documented in the data model. Although the data model could be shelved after conformance is achieved, it has much more utility.


First and foremost, the data model can replace an interface control document (ICD). If there are behavioral aspects captured in your ICD, you may still need some traditional documentation, but the data model is an unambiguous and consistent documentation of your interfaces. Not only is the data model machine readable (most XML is), the data model is also machine understandable. It is possible to write software that can interpret the documentation contained within the data model.


So what is that useful for? I’m glad you asked. Since the data model captures both the syntax (this is what most ICDs represent) and the semantics (this is what the data actually means and is typically represented in the ICD prose) of the data used in interfaces, this data can be leveraged to facilitate (or even automate) integration between systems.


4. The Coup de Grâce

Up to this point, the mechanical difficulties have really been focused on the creation aspect of the data model, but there are many reasons we might consider changing our models. In some cases, there are actual defects (errors) in the model that need to be corrected. In others, we may come to a better understanding of what our data means allowing us to increase our semantic specificity. We may also need to change the relationships between certain entities to account for a new use case.


Consider the following, relatively simple, three element data model that shows two entities connected by an association. All levels of abstraction plus a unit of portability are depicted.