Part 3 of 5

AI System Architecture

⏱ 45-55 min read ☆ Technical Concepts

Introduction

Understanding how AI systems are structured - from initial development through deployment - enables professionals to ask the right questions, identify risks, and make informed decisions about AI initiatives. This part covers the fundamental concepts of training, inference, model types, and how organizations integrate AI through APIs.

Training vs. Inference: The Two Phases

Every AI system goes through two distinct phases, each with different resource requirements, timelines, and risks. Understanding this distinction is fundamental to AI project planning and governance.

Training Phase

  • Creates the AI model from data
  • Happens once (or periodically)
  • Extremely compute-intensive
  • Can take days to months
  • Requires labeled training data
  • High cost, high risk
  • Usually done by specialists

Inference Phase

  • Uses trained model for predictions
  • Happens continuously in production
  • Less compute per request
  • Milliseconds to seconds per query
  • Processes new, unseen data
  • Cost scales with usage
  • Must be reliable and fast

Data Collection

Gather training examples

Data Preparation

Clean, label, format

Training

Model learns patterns

Validation

Test performance

Deployment

Move to production

Inference

Serve predictions

Cost Implications

Training a large language model from scratch can cost tens of millions of dollars in compute alone. However, inference costs - while lower per query - can accumulate to significant ongoing expenses as usage grows. Many organizations underestimate inference costs when budgeting AI projects.

Model Types by Purpose

AI models are designed for specific types of tasks. Understanding these categories helps in selecting the right approach for a given problem and evaluating vendor solutions.

📄

Classification

Assigns inputs to categories. Examples: spam detection, sentiment analysis, fraud detection.

📈

Regression

Predicts numerical values. Examples: price forecasting, demand prediction, risk scoring.

🎨

Generation

Creates new content. Examples: text generation, image creation, code synthesis.

🔎

Detection

Identifies objects or patterns. Examples: object detection in images, anomaly detection.

🌏

Translation

Converts between formats. Examples: language translation, speech-to-text.

🔗

Recommendation

Suggests relevant items. Examples: product recommendations, content personalization.

Model Types by Architecture

Different neural network architectures excel at different tasks. While you don't need to understand the technical details, knowing the major types helps in evaluating solutions.

Transformers

The dominant architecture for language and increasingly for other modalities. Powers ChatGPT, Claude, and most modern language models. The "attention mechanism" allows them to consider context across long sequences.

Convolutional Neural Networks (CNNs)

Specialized for image and spatial data processing. Used in computer vision applications like image classification, object detection, and medical imaging analysis.

Recurrent Neural Networks (RNNs)

Designed for sequential data like time series and text. Largely superseded by transformers for language tasks but still used in some specialized applications.

Diffusion Models

The architecture behind modern image generation systems like DALL-E, Midjourney, and Stable Diffusion. Learns to generate images by gradually removing noise.

Practical Insight

You don't need to choose architectures yourself - that's for technical teams. But understanding that different architectures suit different tasks helps you recognize when a vendor's proposed solution matches (or doesn't match) your needs.

Foundation Models and Transfer Learning

A major shift in AI is the rise of "foundation models" - large pre-trained models that can be adapted for many downstream tasks. This changes the economics and strategy of AI adoption.

Traditional Approach

  • Train specific model for each task
  • Requires task-specific data
  • High cost per application
  • Long development time
  • Limited by available data

Foundation Model Approach

  • Start with pre-trained model
  • Adapt with less data
  • Lower marginal cost
  • Faster deployment
  • Leverage general knowledge

Key Foundation Models to Know

  • GPT-4 (OpenAI): Leading commercial large language model
  • Claude (Anthropic): Focus on safety and helpfulness
  • Gemini (Google): Multimodal capabilities
  • Llama (Meta): Open-weight model for customization
  • DALL-E, Midjourney, Stable Diffusion: Image generation

APIs: How Organizations Access AI

Most organizations consume AI capabilities through APIs (Application Programming Interfaces) rather than building or hosting models themselves. Understanding this is crucial for vendor evaluation and risk management.

What is an AI API?

An API is a standardized way for software systems to communicate. AI APIs allow applications to send data to an AI model and receive predictions or generated content in return, without managing the underlying infrastructure.

# Simplified example of an API call
Request: "Analyze the sentiment of this review: 'Great product!'"

Response: { "sentiment": "positive", "confidence": 0.95 }

Benefits of API-Based AI

  • No infrastructure management required
  • Access to state-of-the-art models
  • Pay-per-use pricing
  • Automatic updates and improvements
  • Fast implementation

Risks of API-Based AI

  • Data leaves your environment
  • Vendor dependency and lock-in
  • Pricing changes outside your control
  • Service availability risks
  • Limited customization

Governance Question

When evaluating AI APIs, key questions include: Where does the data go? Who can access it? Is it used to train other models? What happens if the service is discontinued? What are the latency and reliability guarantees?

Deployment Patterns

AI systems can be deployed in various configurations, each with different implications for performance, cost, and security.

Cloud API

Model runs on vendor's cloud. Data sent over internet. Lowest complexity.

Private Cloud

Model deployed in your cloud environment. Better data control.

💻

On-Premises

Model runs in your data center. Maximum control, highest complexity.

📱

Edge/Device

Model runs on end-user devices. Offline capability, limited model size.

Hybrid Approaches

Many organizations use hybrid approaches - for example, using cloud APIs for development and testing, then deploying to private infrastructure for production. Or using edge deployment for latency-sensitive inference with cloud backup for complex queries.

Key Takeaways

  • Training creates models (expensive, one-time); inference uses them (ongoing, per-query costs)
  • Different model types suit different tasks - classification, generation, detection, etc.
  • Foundation models enable faster, cheaper AI development through transfer learning
  • Most organizations access AI through APIs - convenient but creates dependencies
  • Deployment options range from cloud APIs to on-premises, each with trade-offs
  • Architecture understanding helps evaluate whether vendor solutions fit your needs