Apple's Foundation Models Framework, announced at WWDC 2025, provides Swift-native access to the ~3 billion parameter on-device language model powering Apple Intelligence. This framework enables cost-free, privacy-preserving, offline AI functionality across iOS 26+, iPadOS 26+, macOS 26+, and visionOS 26+ platforms—fundamentally changing mobile AI development.
Overview of the Apple Foundation Models Framework
The framework centers on a 3B parameter model with aggressive 2-bit quantization achieved through Quantization-Aware Training. Key architectural innovations include KV-cache sharing (37.5% memory reduction), grouped-query attention, and a 49K vocabulary. The framework comprises three core components:
Guided Generation provides type-safe structured outputs through the @Generable
and @Guide
macros, enabling compile-time schema generation and constrained decoding. Tool Calling extends capabilities via the Tool
protocol, handling complex parallel/serial execution graphs automatically. The Adapter System uses LoRA (rank 32) for parameter-efficient fine-tuning, with each adapter requiring ~160MB storage and deployment via Background Assets framework.
Benefits of using local, offline models
Privacy-first architecture ensures all processing occurs on-device, eliminating data transmission to external servers. This enables features like local Siri processing, Photos analysis, and Health data insights without privacy compromises.
Performance advantages include 0.6ms time-to-first-token latency on iPhone 15 Pro with 30 tokens/second generation. The M4's Neural Engine delivers 38 TOPS—26x faster than the original implementation—while using half the power of M2.
Zero-cost inference eliminates API fees, data transfer charges, and cloud dependencies. The framework scales automatically with device adoption, making it dramatically more cost-effective than cloud alternatives for production applications.
Implementation Best Practices
Managing sessions and transcript entries
Session management forms the backbone of conversational AI implementation. Thankfully, the foundation models framework automatically keeps track of conversations and makes them available as array of entries (messages). Instead of managing this state separately, use the built in Transcript of entries to render these messages:
Apple made significant changes in iOS 26 Beta 2, making Transcript.init(entries:)
private and transforming Transcript
into a collection of entries.
Read the full Transcript Data Structure Docs.
Persisting conversations with SwiftData
Conversation persistence enables users to maintain chat history across app launches, continue conversations from where they left off, and search through historical interactions. SwiftData provides the ideal solution for storing Foundation Models transcripts with automatic CloudKit syncing, efficient external storage for large conversations, and type-safe Swift integration.
Implement a SwiftData model that efficiently stores and retrieves transcripts:
Integrate persistence into your chat implementation with automatic saving after each interaction:
Key implementation considerations include using @Attribute(.externalStorage)
for transcript data to prevent bloating the SQLite database, as conversations can grow large. Implement periodic auto-save during long conversations rather than waiting for session completion. Consider adding search functionality by indexing conversation content separately for full-text search capabilities. For sensitive conversations, implement additional encryption before storing transcript data. Handle migration carefully when the Transcript format changes between OS versions by versioning your data model.
Using
isResponding
for state management
The isResponding
property serves as a crucial Boolean state indicator that becomes true
while the model actively generates responses. Instead of maintaining a separate state for keeping track of model responses, use this to show an indicator and prevent the user from sending additional messages while the model is responding.
Proper implementation requires observing this property before submitting new prompts to prevent concurrent requests:
Implementing Guided Generation for structured outputs with
@Generable
Guided Generation enables type-safe structured outputs through constrained decoding, representing one of the framework's most powerful capabilities - generating predictably formatted structured data. In our experience, the built-in is well optimized for structured data generation, so this improves quality and accuracy quite a bit.
Read more about the Generable Protocol.
Improve Guided Generation with the
@Guide
macro
The @Guide
macro allows us to provide hints and constraints to the model beyond typing, to improve generation. For instance, for a receipt data the amount
property might be an int, but the @Guide
can tell the model that we want this to be the total amount, and constraint it to a positive number. You can think of these constrains like the difference between Typescript types and Zod schemas in the web.
The @Guide
macro supports various constraints including .count(n)
for exact array lengths, .range(min...max)
for numeric values, and .anyOf([options])
for predefined choices. Property order matters significantly—declare properties in generation order with summaries last for optimal quality. Keep structures relatively flat to avoid deep nesting that can impact performance.
Choosing and using built-in adapters
The framework provides specialized adapters for different use cases. The default general-purpose model handles most tasks effectively:
For specialized tasks, use the Content Tagging Adapter:
The Content Tagging Adapter excels at topic detection, entity extraction, tag generation, and supports custom instructions for specialized detection like actions or emotions.
Customizing system prompts effectively
The framework makes a critical distinction between instructions (developer-defined, static, takes precedence) and prompts (user-provided, dynamic, secondary). Implement effective instructions that clearly define role, output formatting, and behavioral boundaries:
Never interpolate untrusted user input into instructions to prevent prompt injection attacks. Keep instructions mostly static across sessions and use them to enforce behavioral boundaries and safety constraints.
Creating custom adapters for specialized behavior
Custom adapters become valuable when specific style or format requirements exceed prompting capabilities, domain-specific tasks demand specialized knowledge, or consistency proves critical across many interactions. The training process requires the Foundation Models Framework Adapter Entitlement and involves preparing JSONL formatted training data:
LoRA adapters use rank 32 for parameter-efficient training, requiring approximately 160MB storage per adapter. Deploy through the Background Assets framework for efficient distribution:
Remember that each adapter remains version-specific, requiring retraining when Apple updates the base model.
Datawizz allows you to easily train and evaluate custom adapters for the AFM. Check out our docs for training and evaluating apple foundation model adapters.
Adding tool calls for data retrieval
Tools extend the model's capabilities by providing access to external data sources and APIs. This is a great way to let your model access realtime information and fetch additional data, without overfilling the context (given the models limited context window).
Implement the Tool protocol with type-safe arguments:
The model automatically analyzes queries to determine when tools are needed, selects appropriate tools, generates arguments, and executes them. The framework handles complex parallel and serial tool call graphs automatically, integrating results seamlessly into the final response.
Checking model availability and implementing fallbacks
Model availability varies across devices and regions, making robust availability checking essential for production applications. The Foundation Models Framework requires iOS 26+, iPadOS 26+, macOS 26+, or visionOS 26+ on devices with Apple Silicon (A17 Pro or later for iPhone, M1 or later for Mac). Additionally, users must have Apple Intelligence enabled in Settings, and the feature must be available in their region.
Implement comprehensive availability checking with graceful fallbacks:
Best practices for fallback implementation include progressive enhancement where core functionality works without AI, with intelligence features enhancing the experience when available. Cache availability status to avoid repeated checks, but implement periodic rechecks as users may enable Apple Intelligence later. Always inform users when using cloud fallbacks that may impact privacy, and consider offering settings to control fallback behavior. For critical features, maintain multiple fallback layers: Apple Intelligence → Cloud API → Basic algorithmic approach. Monitor and log which providers are being used to understand your user base's capabilities and optimize accordingly.
Conclusion
Apple's Foundation Models Framework represents a paradigm shift in mobile AI development, providing unprecedented access to sophisticated language models while maintaining Apple's core values of privacy and performance. The combination of type-safe Swift integration, powerful features like Guided Generation and Tool Calling, and optimization for Apple Silicon creates a comprehensive platform for building intelligent, privacy-preserving applications.
The framework's zero-cost inference model and offline capabilities democratize access to advanced AI, enabling developers of all sizes to integrate sophisticated language understanding into their applications. With its Swift-native API, comprehensive adapter system, and deep ecosystem integration, the Foundation Models Framework establishes a new standard for on-device AI that respects user privacy while delivering cutting-edge functionality. As the framework continues to evolve with enhanced multimodal capabilities and expanded language support, it positions developers to create the next generation of intelligent applications that seamlessly blend powerful AI with unwavering privacy protection.