Technical debt refers to the concept in software development where taking shortcuts, opting for easy or quick fixes, or employing suboptimal development practices in the short term can lead to additional work, complications, and costs in the long term. This metaphorical "debt" accumulates when a project prioritizes rapid delivery over perfect code.
According to a recent survey by CompTIA involving hundreds of IT professionals, technical debt poses a challenge for 74% of organizations. Notably, 42% of these organizations consider it a substantial hindrance.
Hyperscience CTO Tony Lee asserts that technical debt is not just a byproduct; it's a pivotal force that can make or break the future of AI innovations. Ignoring it is like building on a foundation of sand – precarious and unsustainable.
What other thoughts does Tony have about technical debt and its impact on artificial intelligence? In a recent interview, he shared advice on managing technical debt and ensuring AI solutions remain scalable and maintainable in the long run.
1. In the context of the rapid adoption of AI, what is "technical debt," and why is it a significant concern for organizations venturing into AI?
In software, technical debt generally refers to an accruing debt one takes on as software ages, essentially from the moment it is written. This results from not anticipating future needs or making deliberate short-term decisions to ship faster.
Although not inherently negative, since deployed code is often more valuable than perfect code that never sees the light of day, technical debt requires careful management. If neglected, it can decelerate future development efforts and result in unwieldy code, potentially causing outages or defects that adversely affect customers.
In the AI context, a key manifestation of technical debt is model drift. This phenomenon occurs when the current environment significantly diverges from the one in which the AI model was originally trained.
While there are various types of model drift, a recent, straightforward example is the impact of the COVID-19 pandemic on global traffic patterns. Models developed to forecast traffic flow pre-pandemic would yield highly inaccurate predictions during the lockdowns. Other forms of model drift are much more nuanced. They can lead to a slow degradation of results over time, gradually eroding the model's accuracy in ways that may not be immediately apparent to users.
An additional aspect of technical debt in machine learning (ML) systems arises from overlooking the integration of continuous quality monitoring and retraining capabilities. It is tempting to conduct initial accuracy tests upon deploying an ML system and then celebrate these results. However, this approach can be problematic if the system is left to operate without ongoing scrutiny, leading to a gradual, unnoticed decline in performance due to the lack of built-in observability and maintenance mechanisms.
Given the explosion of interest and investment in generative AI, there's a risk that some organizations may hastily invest in costly solutions without thorough evaluation, potentially leading to unforeseen model drift.
2. Conversely, how can AI help identify or manage existing technical debt?
Generative AI models can combat software technical debt by monitoring and flagging where code may need updates – even performing the updates autonomously. Although human supervision remains indispensable, the utilization of generative AI will reduce the time required to tackle technical debt by taking over simple yet crucial tasks, such as data management.
3. How do you assess and manage technical debt within your AI projects?
We have integrated Machine Learning Quality Assurance (ML QA) and retraining capabilities directly into our platform to manage AI technical debt effectively. It is important to provide transparency regarding the system's accuracy and automation efficiency over time.
This approach empowers my team to continuously conduct ML QA and retrain the model within the platform as necessary.
Implement continuous monitoring and quality assurance processes to detect and mitigate any emerging technical debt proactively.
For ML QA, we sample the model outputs and create QA tasks, enabling users to validate the results through a consensus mechanism. If errors are identified, these insights are fed back into the training set, enhancing the model's accuracy and reliability.
4. How can organizations balance the urgency of go-to-market strategies with the potential risks of accruing technical debt in AI solutions?
Organizations should confidently embrace new AI advancements in their solutions, striking a balance between a timely go-to-market approach and effective risk management. Incorporating data maintenance and regular monitoring throughout the entire product lifecycle can significantly diminish the risk of accumulating technical debt and encountering model drift. Investing in AI, while risky at times, should be embraced in tech stacks and can result in innovation and productivity gains. However, such investments should be made with thoughtful deliberation and a responsible approach to ensure optimal outcomes.
5. How do you ensure that your AI solutions remain scalable and maintainable in the long run, considering the potential for technical debt?
Ensuring a solution remains scalable and maintainable requires continual team maintenance. To avoid having to do a costly overhaul, organizations should create a process with a steady cadence for training data maintenance to supervise the model’s accuracy. Keeping human oversight as a crucial part of any AI and business strategy will be vital in avoiding model drift and, in turn, more technical debt.
6. How do you factor in the costs of addressing technical debt when budgeting for AI projects?
The key is to begin by thoroughly understanding the specific needs and constraints of your use case, along with the potential impact of ML technical debt.
- In applications like ML-based transcription of printed text, you might determine that the likelihood of model drift is relatively low. In such cases, relying on occasional manual quality checks without frequent ML quality assurance (QA) or retraining could be sufficient.
- Using ML models to make mission-critical decisions like reading X-rays for medical treatment requires an aggressive ML QA process with human-based quality checks and retraining as needed.
I’d recommend starting with your use case's needs, defining the requirements, and then picking the software solution that aligns with these requirements to ensure the right balance of efficiency, accuracy, and safety.
7. How does the technical debt in AI differ from that in traditional software development, and how are you adapting to these differences?
Building on the earlier discussion, technical debt in AI systems primarily manifests as model drift. This occurs when the training datasets that models learn from contain biased or inaccurate information, creating unwanted outputs.
Unlike traditional technical debt, which primarily affects system speed and reliability, model drift can have more severe consequences, especially if extreme biases in the training data go unaddressed. This requires organizations to take on a different approach than they would with traditional technical debt, as it requires stricter attention to the latest regulations and scrutiny of data quality.
Adopting strategies to counteract model drift involves a mindset similar to approaching traditional technical debt.
A practical measure is to allocate a portion of the team's weekly schedule — maybe 20% — to focus on data hygiene and substantially reduce the risk of model drift.
By prioritizing data maintenance and dedicating time to these efforts, the probability of AI models generating erroneous outputs is significantly minimized.
8. How do you see the relationship between AI deployment and technical debt evolving in the future?
The recent developments of generative AI solutions are only the beginning of our relationship with the technology. With rapid advancement predicted to continue, the tech industry can expect to work with better datasets to train their models.
As systems become more complex and more data becomes available for training, the risk of technical debt will decrease as the machines produce accurate, scalable outputs.
Additionally, as generative AI tools and platforms continue to hone their coding capabilities, a shift toward greater self-regulation by machines, under some degree of human oversight, is likely. Human intervention will remain necessary for verifying training data and code. However, reliance on machine autonomy for routine checks is expected to become more standard.
This evolution will lead to cost reductions as organizations increasingly depend on cutting-edge AI to mitigate technical debt. Achieving this hinges on the availability of a diverse array of data, much of which is yet to be broadly accessible.
Balancing Short-Term Gains With Long-Term Costs
Just like financial debt, technical debt incurs "interest" over time — if not addressed, it can lead to increased maintenance challenges, reduced code quality, and more complex upgrades or fixes in the future. The concept emphasizes the trade-off between short-term benefits and long-term costs in software development and maintenance.
Want More?
If you’re interested in learning more about technical debt and the future of artificial intelligence, subscribe to our newsletter for the latest tech insights.
We'll help you scale smarter and lead stronger with guides, AI resources, and strategies from top experts!