Part 3: AI System Architecture | Module 1

Introduction

Understanding how AI systems are structured - from initial development through deployment - enables professionals to ask the right questions, identify risks, and make informed decisions about AI initiatives. This part covers the fundamental concepts of training, inference, model types, and how organizations integrate AI through APIs.

Training vs. Inference: The Two Phases

Every AI system goes through two distinct phases, each with different resource requirements, timelines, and risks. Understanding this distinction is fundamental to AI project planning and governance.

Training Phase

Creates the AI model from data
Happens once (or periodically)
Extremely compute-intensive
Can take days to months
Requires labeled training data
High cost, high risk
Usually done by specialists

Inference Phase

Uses trained model for predictions
Happens continuously in production
Less compute per request
Milliseconds to seconds per query
Processes new, unseen data
Cost scales with usage
Must be reliable and fast

Data Collection

Gather training examples

→

Data Preparation

Clean, label, format

→

Training

Model learns patterns

Validation

Test performance

→

Deployment

Move to production

→

Inference

Serve predictions

Cost Implications

Training a large language model from scratch can cost tens of millions of dollars in compute alone. However, inference costs - while lower per query - can accumulate to significant ongoing expenses as usage grows. Many organizations underestimate inference costs when budgeting AI projects.

Model Types by Purpose

AI models are designed for specific types of tasks. Understanding these categories helps in selecting the right approach for a given problem and evaluating vendor solutions.

📄

Classification

Assigns inputs to categories. Examples: spam detection, sentiment analysis, fraud detection.

📈

Regression

Predicts numerical values. Examples: price forecasting, demand prediction, risk scoring.

🎨

Generation

Creates new content. Examples: text generation, image creation, code synthesis.

🔎

Detection

Identifies objects or patterns. Examples: object detection in images, anomaly detection.

🌏

Translation

Converts between formats. Examples: language translation, speech-to-text.

🔗

Recommendation

Suggests relevant items. Examples: product recommendations, content personalization.

Model Types by Architecture

Different neural network architectures excel at different tasks. While you don't need to understand the technical details, knowing the major types helps in evaluating solutions.

Transformers

The dominant architecture for language and increasingly for other modalities. Powers ChatGPT, Claude, and most modern language models. The "attention mechanism" allows them to consider context across long sequences.

Convolutional Neural Networks (CNNs)

Specialized for image and spatial data processing. Used in computer vision applications like image classification, object detection, and medical imaging analysis.

Recurrent Neural Networks (RNNs)

Designed for sequential data like time series and text. Largely superseded by transformers for language tasks but still used in some specialized applications.

Diffusion Models

The architecture behind modern image generation systems like DALL-E, Midjourney, and Stable Diffusion. Learns to generate images by gradually removing noise.

Practical Insight

You don't need to choose architectures yourself - that's for technical teams. But understanding that different architectures suit different tasks helps you recognize when a vendor's proposed solution matches (or doesn't match) your needs.

Foundation Models and Transfer Learning

A major shift in AI is the rise of "foundation models" - large pre-trained models that can be adapted for many downstream tasks. This changes the economics and strategy of AI adoption.

Traditional Approach

Train specific model for each task
Requires task-specific data
High cost per application
Long development time
Limited by available data

Foundation Model Approach

Start with pre-trained model
Adapt with less data
Lower marginal cost
Faster deployment
Leverage general knowledge

Key Foundation Models to Know

GPT-4 (OpenAI): Leading commercial large language model
Claude (Anthropic): Focus on safety and helpfulness
Gemini (Google): Multimodal capabilities
Llama (Meta): Open-weight model for customization
DALL-E, Midjourney, Stable Diffusion: Image generation

APIs: How Organizations Access AI

Most organizations consume AI capabilities through APIs (Application Programming Interfaces) rather than building or hosting models themselves. Understanding this is crucial for vendor evaluation and risk management.

What is an AI API?

An API is a standardized way for software systems to communicate. AI APIs allow applications to send data to an AI model and receive predictions or generated content in return, without managing the underlying infrastructure.

# Simplified example of an API call
Request: "Analyze the sentiment of this review: 'Great product!'"

Response: { "sentiment": "positive", "confidence": 0.95 }

Benefits of API-Based AI

No infrastructure management required
Access to state-of-the-art models
Pay-per-use pricing
Automatic updates and improvements
Fast implementation

Risks of API-Based AI

Data leaves your environment
Vendor dependency and lock-in
Pricing changes outside your control
Service availability risks
Limited customization

Governance Question

When evaluating AI APIs, key questions include: Where does the data go? Who can access it? Is it used to train other models? What happens if the service is discontinued? What are the latency and reliability guarantees?

Deployment Patterns

AI systems can be deployed in various configurations, each with different implications for performance, cost, and security.

☁

Cloud API

Model runs on vendor's cloud. Data sent over internet. Lowest complexity.

◈

Private Cloud

Model deployed in your cloud environment. Better data control.

💻

On-Premises

Model runs in your data center. Maximum control, highest complexity.

📱

Edge/Device

Model runs on end-user devices. Offline capability, limited model size.

Hybrid Approaches

Many organizations use hybrid approaches - for example, using cloud APIs for development and testing, then deploying to private infrastructure for production. Or using edge deployment for latency-sensitive inference with cloud backup for complex queries.

Key Takeaways

Training creates models (expensive, one-time); inference uses them (ongoing, per-query costs)
Different model types suit different tasks - classification, generation, detection, etc.
Foundation models enable faster, cheaper AI development through transfer learning
Most organizations access AI through APIs - convenient but creates dependencies
Deployment options range from cloud APIs to on-premises, each with trade-offs
Architecture understanding helps evaluate whether vendor solutions fit your needs