Last Updated on January 8, 2026
What is Constrained Writing?
In professional drafting, Constrained Writing is the art of generating text that must adhere to strict rules, patterns, or limitations. Unlike free-form creative writing, constrained tasks require the AI to follow “Hard Constraints,” such as:
- Lexical Limits: Writing without using specific “forbidden” words or using only a specialized industry vocabulary.
- Structural Rules: Ensuring every paragraph is exactly four sentences or adhering to a specific syllable count for headlines.
In the past, AI often struggled with these “rules,” frequently losing the thread when asked to follow strict formatting or linguistic constraints. But with the arrival of modern reasoning-focused LLMs, constrained writing has become a highly feasible reality.
The purpose of this blog is to test just how capable recent local and open-source models have become at following instructions. We picked Qwen3-30B-A3B and Phi-4–mini-reasoning because they represent the popular choices of local LLMs. This direction is at the core of our Ultimate Guide to Local LLMs for Microsoft Word, where we focus on achieving 100% data ownership.
Watch: Constrained Writing Demo
This demonstration showcases how Qwen3 and Phi-4 handle the task with deep reasoning inside Microsoft Word.
For more creative use cases of private GPT models within Microsoft Word, check out additional demos available on our channel at @GPTLocalhost.
Technical Profile: Why Qwen3-30B-A3B? (Download Size: 17.72 GB)
The Qwen3-30B-A3B is an efficient, high-performance open-source large language model that uses a Mixture-of-Experts (MoE) architecture. A key highlight is its ability to deliver strong performance in complex tasks like reasoning while only activating a small subset of its total parameters during inference.
- Mixture-of-Experts (MoE) Architecture: The model has 30.5 billion total parameters, but only 3.3 billion are activated for any given request, making it highly efficient for training and inference while maintaining high performance.
- Flexible “Thinking” Mode: The original version of the model supports a unique, seamless switch between a “thinking mode” for complex problem-solving and a “non-thinking mode” for efficient, general dialogue. Later versions, like Qwen3-30B-A3B-Thinking-2507, are solely optimized for the reasoning mode.
Technical Profile: Phi-4-Mini-Reasoning? (Download Size: 2.49 GB)
Phi-4-Mini-Reasoning is a lightweight open model that balances efficiency with advanced reasoning ability. Specifically engineered for memory-constrained environments and latency-bound scenarios, this model excels at multi-step mathematical problem-solving and symbolic computation. According to this graph, despite its compact size, it demonstrates remarkable breakthroughs in deep analytical thinking, outperforming many models more than twice its size on benchmarks.
- Logical Reasoning: Designed for formal proofs and symbolic computation, the model delivers strong reasoning while remaining cost-efficient to deploy. Its streamlined architecture makes it well suited for educational use cases, embedded tutoring systems, and lightweight edge environments that require robust multi-step problem solving.
- Large Context: Supporting a 128K token context window, the model can analyze and reason over extensive mathematical proofs and long-form documents. Through advanced fine-tuning on high-quality synthetic mathematics datasets, it delivers robust performance for complex, demanding use cases.
Deployment Reminders: Running Qwen3 and Phi-4 Locally
Our tests were run on an M1 Max with 64 GB of memory—more than enough for the job. While Qwen3 may look large at first glance due to its download size, it actually uses only about one-tenth of its total parameters during inference. Even more impressively, a Reddit user reported running Qwen-30B-A3B at around 12.4 tokens per second on a budget Dell laptop equipped with an RTX 3050 (6 GB VRAM) and 16 GB of RAM—an outcome that came as a surprise to the user. As for Phi-4-mini-reasoning, the same spec. should be good enough too.
The Local Advantage
Running Qwen3 and Phi-4 locally via GPTLocalhost ensures:
- Data Ownership: No cloud data leaks.
- Zero Network Latency: Faster performance on GPU and Apple Silicon.
- Offline Access: Work anywhere, including on a plane ✈️, without an internet connection.