Cadalyst CAD Management

Navigating AI in the Engineering World — Managing Data for AI Applications

Written by Cadalyst Staff | May 7, 2025 4:27:00 PM

An important, but often overlooked, aspect of artificial intelligence (AI) is the management of data that feeds AI systems. Without proper data management, efforts to implement AI can go awry and generate unreliable results or other undesirable outcomes. While data management in an AI setting can take on various forms, the process can be subdivided into five main steps: data collection, data preparation, data storage, data analysis, and data monitoring. Each step requires careful planning to ensure AI delivers meaningful insights to improve efficiency.  To expand the build potential for Large Language Model (LLM) data, especially with the requirements that it must be clean and stored in a common data environment for AI assistants, you need a structured and systematic approach to data management.

In our first article, we discussed how to evaluate your readiness for AI. Our second article then explored how to start implementing AI at your organization, while our third article examined how to reap benefits and avoid common pitfalls of AI. In this article, we will focus on data management for AI.


Image source:  solom/stock.adobe.com.

 

Data Collection

To harness the power of AI, you need reliable, robust data as a basis for your AI model. Sources of data might be internal (e.g., based on your organization’s previous projects, BIM data, or maintenance records), external (e.g., public datasets, regulations, or geospatial data), or some combination of such data.

Regardless of sources, you will want to gather and review the suitability of available data and decide which datasets to use for training AI tools. The underlying data should be clean, well-organized, from reliable sources, and standardized. The data sources should be clearly labeled and documented for future use.

Clean data is the backbone of any effective LLM. Your dataset must be free from errors, duplicates, inconsistencies, and irrelevant noise. Think of it like this: if you feed junk into a high-performance engine, it’s going to fail.

The same goes for AI — dirty data leads to unreliable, biased, or outright incorrect outputs. For AI assistants to perform at their best, the data they’re built on has to be pristine.

When considering data sources, you should follow established governance policies for data security. As noted in our previous article, your organization should have a security model with guidelines on data use, data ownership, and password complexity. If you are sharing data, consider whether the audience might include potentially competing organizations that might have opportunities to misuse your data. If you use a public cloud for data storage, the cloud provider should meet standard industry best practices for data security, including data encryption and other security measures. If privacy is a concern, you may want to purchase a subscription and work within that platform to keep your data private.

For example, consider a structural engineering team using AI for predictive maintenance. The team must gather historical sensor data from past bridge inspections and integrate real-time environ-mental data such as temperature fluctuations and humidity levels.

 

It’s important to remember, that clean data is of the utmost importance when working with AI. The adage “Garbage in, garbage out,” is true when working with AI. Image source: Friends Stock/stock.adobe.com.

 

Data Preparation

After collecting data, you will likely need to spend some time preparing the data for use with AI. This might include cleaning the data to remove duplicate, inaccurate, or irrelevant information, formatting the data for its intended use, or organizing it for efficient use in the AI model.

When formatting data, you’ll need to determine if the AI model understands various data formats in your collected data, such as text, spreadsheets, drawings, 3D models, photographic imagery, and LiDAR-scanned data. If not, you may need to make adjustments, such as converting data to other formats. In some cases, AI can actually provide assistance during the data preparation process by reviewing data from multiple sources and identifying data that needs conversion or adjustment for use on the project. For example, an architectural firm using generative design AI should ensure its CAD and Revit models are formatted correctly, with consistent layer naming and material specifications, so that the AI can generate optimized design options.

 

Bridge modeling software is revolutionizing construction with intelligent 3D models to simulate stress tests, detect conflicts, and help users collaborate. Image source NavisWorks rendering of M4 London motorway, Atkins.

 

Data Storage

Effective data storage is essential for managing large datasets and providing proper access. AI data is often stored in structured databases with querying capabilities, though it may also be stored in centralized repositories that keep raw data in its native format. As with the data collection process, data storage requires close attention to security and compliance with internal and external guidelines.

Cloud storage of data is the preferred approach to maximize data access for AI, though only applicable data should be used to harvest information. While the AEC industry may have 30 to 40 years’ worth of CAD modeling data available, organizations need to be careful in using older or inapplicable data, as design codes may be out of date, or other factors make certain data inapplicable. 

Along with data security policies, organizations should establish policies for data retention to control which projects are applicable. “The importance of data strategy is often underestimated,”  said Scott Wolslager, Senior Engagement Manager at IMAGINiT Technologies. “You need to have a window around datasets as far as what you’re actually going to use.”

 

Data Analysis and Validation

After data has been collected, prepared for use, and stored in accessible locations, additional data analysis will be needed before, during, and after generating results with AI. This might include analyzing characteristics beforehand to identify which characteristics are most pertinent for the project of interest and potentially future projects. For example, if AI is used to generate design alternatives, characteristics such as cost, material selection, sustainability, building codes, and other design criteria may need to be identified up front to train the model accordingly.

Data validation consists of assessing the quality and consistency of the prepared data to identify potential biases or issues. This should be done before deploying the AI model and also when reviewing AI-generated results. Throughout the process, AI can help analyze the data for accuracy, completeness, and consistency. For example, AI-powered cost estimation tools can validate price changes against up-to-date supplier costs.

 

CAD standards become even more important when trying to implement AI — make sure your team uses consistent layer naming, materials specs, and the like. Image source Autodesk.

 

Data Monitoring

After data has been collected, prepared for use, and stored in accessible locations, additional data analysis will be needed before, during, and after generating results with AI. This might include analyzing characteristics beforehand to identify which characteristics are most pertinent for the project of interest and potentially future projects. For example, if AI is used to generate design alternatives, characteristics such as cost, material selection, sustainability, building codes, and other design criteria may need to be identified up front to train the model accordingly.

Data validation consists of assessing the quality and consistency of the prepared data to identify potential biases or issues. This should be done before deploying the AI model and also when reviewing AI-generated results. Throughout the process, AI can help analyze the data for accuracy, completeness, and consistency. For example, AI-powered cost estimation tools can validate price changes against up-to-date supplier costs.

 

In the End

As with most technical processes, the concept of garbage-in garbage out applies when using AI. If you don’t input meaningful, reliable data into an AI model, you will likely not obtain meaningful, reliable output. AI can be a powerful tool, but is only as good as the underlying data.

Engineering and architecture organizations that invest in structured, high-quality data can gain a competitive edge using
AI to make better design decisions, optimize costs, and reduce project risks.

To successfully integrate AI into your workflows, partner with IMAGINiT Technologies to leverage their experience in technology, AI implementation, and software solutions. With experience across entire project lifecycles and a wide range of industries, IMAGINiT Technologies can help you with data management and other AI workflows, specifically tailored for your organization.  

 

***

ARTICLE SPONSORED BY IMAGINiT Technologies.

 

 
Read more about CAD Management on our  CAD Management Resource Page