Private AI for Word: Using Reka Flash 3 for Creative Writing and Reasoning

Last Updated on January 8, 2026

What is Reasoning-Based Creative Writing?

In professional drafting, Creative Writing often requires more than just “generating text”—it requires the model to follow a logical thread, maintain a consistent voice, and understand subtext. This is where “Reasoning” models excel. Unlike standard models that predict the next word immediately, reasoning models like Reka Flash 3 use a “thinking” phase to:

  • Plan the Narrative: Outlining a scene’s structure before writing a single word.
  • Maintain Tone: Ensuring a technical report stays formal or a story stays atmospheric throughout.
  • Check Instructions: Verifying that all user constraints are met during the generation process.

In the past, conventional NLP techniques lacked the “IQ” to handle these complex creative tasks. However, with the arrival of recent LLMs such as Reka Flash 3—a 21B parameter model built from scratch—Private AI for Word has reached a new milestone. This capability within the coverage of our Ultimate Guide to Local LLMs for Microsoft Word, where we focus on achieving 100% data ownership.


Watch: Creative Writing with Reka Flash 3

See how Reka Flash 3 utilizes its internal reasoning process to draft and refine creative content directly inside Microsoft Word.

For more demonstrations of private AI models in Microsoft Word, please visit our channel at @GPTLocalhost.


Technical Profile: Why Reka Flash 3 for Word? (Download Size: 13.61 GB)

Reka Flash 3 is a compact powerhouse that bridges the gap between small on-device models and massive cloud-based assistants. It is now possible to access these capabilities in Microsoft Word because of GPTLocalhost running as a Word Add-in, which enables local LLM support directly in your document.

  • “Budget Forcing” Reasoning: Reka Flash 3 uses specialized <reasoning> tags to show its “thinking” process. In the Word Add-in, you can watch the model plan its creative approach before it outputs the final text.
  • 21B Parameter Intelligence: Despite its compact size, it performs competitively with proprietary models like OpenAI’s o1-mini. It is currently considered one of the best models in its size category for instruction following.
  • Efficient Local Deployment: At 4-bit quantization, Reka Flash 3 fits into just 11GB of VRAM, making it a perfect fit for consumer-grade GPUs or Apple Silicon Macs.

Deployment Reminders: Running Reka Flash 3 Locally

Our primary testing was conducted on an M1 Max (64 GB), which is more than sufficient. This model is suitable for low-latency or on-device deployments, fitting within the VRAM of many consumer-grade GPUs with 12GB or more, such as the NVIDIA RTX 3060 (12GB version) or RTX 4070, or Apple Silicon Macs with unified memory. 

  • Context Window Consideration: The model has a 32,000-token context length. Running with a large context will increase the memory required for the KV cache (Key-Value cache), potentially adding a few more gigabytes of memory usage.

The Local Advantage

Running Reka Flash 3 locally via GPTLocalhost ensures:

  • Data Ownership: No cloud data leaks.
  • Zero Network Latency: Faster performance on GPU and Apple Silicon.
  • Offline Access: Work anywhere, including on a plane ✈️, without an internet connection.

For Intranet and teamwork, please check LocPilot for Word.