KitKat Journal · Daily completo

Journal Diário — 04/04/2026

Esta edição reúne os posts completos aprovados na janela editorial. A ordem segue Gmail editorial, Substack e Skool, preservando estrutura, mídia útil e origem canônica sem resumo.

Use o índice para saltar entre os textos. Cada post mantém o corpo integral e termina com a origem e o link da fonte.

🔬 IA RESEARCH AI Search 2026-04-04T07:03:35-03:00

AI tutor agents, omnimodal video models, LTX-2 updates, long-term memory, video faceswap: AI NEWS

Por AI Search

[AI Search](/) AI Search AI tutor agents, omnimodal video models, LTX-2 updates, long-term memory, video faceswap: AI NEWS 0:00 Current time: 0:00 / Total time: -35:40 -35:40 ## AI tutor agents, omnimodal video models, LTX-2 updates, long-term memory, video faceswap: AI NEWS Welcome to the AI Search podcast. Here are the top highlights in AI this week AI Search Jan 13, 2026 HUGE AI NEWS: LTX-2, UniVideo, SimpleMem, HY-MT, NeoVerse & more #ai #ainews #aitools #aivideo Note that most of the examples are visual. See my Youtube video for the best experience. AI Search AI news, science, & research. Stay up to date with AI every week. AI news, science, & research. Stay up to date with AI every week. Substack App Apple Podcasts Spotify RSS Feed Appears in episode AI Search Recent Episodes [Realtime AI video, -level AI agents, realtime TTS, new open music generator, new video animator: AI NEWS](https://aisearch.substack.com/p/realtime-ai-video- -level-ai-agents) • AI Search DeepSeek breakthrough, realtime upscaler, realtime 3D worlds, new top 3D generator, Qwen Image 2512: AI NEWS • AI Search New open Nano Banana, AI plays any video game, new top open source models, long AI videos: AI NEWS • AI Search Insane 3D models, realtime AI video, new #1 open model, realtime AI worlds, Gemini 3 Flash: AI NEWS • AI Search [GPT 5.

🔬 IA RESEARCH AI Search 2026-04-04T07:03:33-03:00

Realtime AI video games, Moltbook, agent swarms, AI Earth, open video models: AI NEWS

Por AI Search

[AI Search](/) AI Search Realtime AI video games, Moltbook, agent swarms, AI Earth, open video models: AI NEWS 0:00 Current time: 0:00 / Total time: -40:52 -40:52 ## Realtime AI video games, Moltbook, agent swarms, AI Earth, open video models: AI NEWS Welcome to the AI Search podcast. Here are the top highlights in AI this week AI Search Feb 07, 2026 HUGE AI NEWS: Google Genie 3, Lingbot World, Moltbook, Clawdbot, Kimi K2.5, Qwen Max #ai #ainews #aitools #aivideo Note that most of the examples are visual. See my Youtube video for the best experience. AI Search AI news, science, & research. Stay up to date with AI every week. AI news, science, & research. Stay up to date with AI every week. Substack App Apple Podcasts Spotify RSS Feed Appears in episode AI Search Recent Episodes Realtime AI voices, AI livestreamers, Blender 3D agents, realtime worlds, new top OCR: AI NEWS • AI Search [Realtime AI video, -level AI agents, realtime TTS, new open music generator, new video animator: AI NEWS](https://aisearch.substack.com/p/realtime-ai-video- -level-ai-agents) • AI Search AI tutor agents, omnimodal video models, LTX-2 updates, long-term memory, video faceswap: AI NEWS • AI Search DeepSeek breakthrough, realtime upscaler, realtime 3D worlds, new top 3D generator, Qwen Image 2512: AI NEWS • AI Search [New

📰 IA NEWS AI by Hand ✍️ 2026-04-04T07:01:47-03:00

KL Divergence

Por Prof. Tom Yeh

[AI by Hand ✍️](/)### Essential AI Math Excel Blueprints Prof. Tom Yeh Feb 15, 2026 ∙ Paid Kullback–Leibler (KL) divergence measures how different one probability distribution is from another. It quantifies how much information is lost when we use a model (predicted) distribution (Q) to approximate a true (target) distribution (P). ## Calculation The calculation begins with the predicted distribution Q(x) and the target distribution P(x). First, we take the logarithm of both Q(x) and P(x). , for each outcome, we compute the difference log(P(x)) minus log(Q(x)), which represents the log ratio between the target and predicted probabilities. This difference is then weighted by P(x), producing the term P(x) multiplied by log(P(x) over Q(x)). Finally, we sum these weighted terms across all outcomes to obtain the KL divergence. ## Excel Blueprint This Excel Blueprint is available to AI by Hand Academy members. You can become a member [via a paid Substack subscription](https://www.byhand.ai/). ## This post is for paid subscribers [Already a paid subscriber?

📰 IA NEWS AI by Hand ✍️ 2026-04-04T07:01:46-03:00

Essential AI Math #11 to #15

Por Prof. Tom Yeh

Essential AI Math #11 to #15 - by Prof. Tom Yeh # AI by Hand ✍️#11 to #15 Prof. Tom Yeh Feb 18, 2026 ∙ Paid Share Dear AI by Hand Academy Members, Here is another mini-batch of new _Essential AI Math Blueprints_. I’ve taught these ideas many times over the years, scattered across different lectures, but I’m now com… ## This post is for paid subscribers [ ](https://www.byhand.ai/ ?simple=true& =https%3A%2F%2Fwww.byhand.ai%2Fp%2Fessential-ai-math-11-to-15&utm_source=paywall&utm_medium=web&utm_content=188341176) [Already a paid subscriber? **

📰 IA NEWS AI by Hand ✍️ 2026-04-04T07:01:46-03:00

ELU (Exponential Linear Unit)

Por Prof. Tom Yeh

[AI by Hand ✍️](/)(Exponential Linear Unit) ### Essential AI Math Excel Blueprints Prof. Tom Yeh Feb 18, 2026 ∙ Paid ELU (Exponential Linear Unit) introduces a smooth exponential curve in the negative region to create a gradual transition at zero. Instead of an abrupt change in slope, the function bends smoothly into negative values, producing continuous derivatives and more stable gradient flow. This smoother behavior can improve convergence and lead to more stable learning dynamics in deep neural networks. ELU is designed to address a limitation of LeakyReLU: there is still a sharp kink at x = 0, creating a discontinuity in the derivative. ELU solves this by replacing the linear negative slope with a smooth exponential curve. ELU behaves like ReLU in the positive region, passing positive inputs through unchanged. But in ReLU, a neuron can become “dead” in the negative region because the gradient is zero, meaning it receives no signal to update. Like LeakyReLU, ELU keeps the negative region alive by providing small, nonzero gradients. This allows a “dead” neuron to slowly recover, rather than remaining permanently silent. ## Excel Blueprint This Excel Blueprint is available to AI by Hand Academy members. You can become a member [via a paid Substack subscription](https://www.byhand.ai/). ## This post is for paid subscribers [Already a paid subscriber?

📰 IA NEWS AI by Hand ✍️ 2026-04-04T07:01:45-03:00

Swish (SiLU)

Por Prof. Tom Yeh

[AI by Hand ✍️](/)(SiLU) ### Essential AI Math Excel Blueprints Swish, also known as Sigmoid Linear Unit (SiLU), is designed to introduce a smooth, self-gated activation mechanism. Instead of abruptly cutting off negative inputs like ReLU, Swish multiplies the input x by a sigmoid gate σ(x) that softly scales the signal between 0 and 1. For large positive values, the gate approaches 1 and the function behaves like a linear pass-through. For large negative values, the gate approaches 0, gradually suppressing the signal. Below is the ReLU activation for comparison. You can think of ReLU as using a hard gate: the gate value is 0 when x 0. This creates a sharp transition at x = 0. Swish replaces this sharp transition with a smooth “swish” transition (pun intended). ## Excel Blueprint This Excel Blueprint is available to AI by Hand Academy members. You can become a member [via a paid Substack subscription](https://www.byhand.ai/). ## This post is for paid subscribers

📰 IA NEWS AI by Hand ✍️ 2026-04-04T07:01:45-03:00

GELU (Gaussian Error Linear Unit)

Por Prof. Tom Yeh

GELU (Gaussian Error Linear Unit) - by Prof. Tom Yeh # AI by Hand ✍️(Gaussian Error Linear Unit) ### Essential AI Math Excel Blueprints Prof. Tom Yeh Feb 20, 2026 ∙ Paid 1 Share The GELU (Gaussian Error Linear Unit) activation function is fundamentally similar to Swish (SiLU), in that both apply a smooth, input‑dependent gate to the linear signal x, to achieve the effect of suppressing negative values toward zero while allowing positive values to pass through, but in a soft, probabilistic manner. This gentle attenuation of negatives preserves useful gradient information and improves learning, unlike the hard cutoff of ReLU. The core difference between GELU and Swish lies in how their “gates” transition from closed to open. GELU uses the Gaussian error function Φ(x) as its gate, which operates in a narrower band roughly between -3 and 3. In contrast, the Swish gate uses the sigmoid function σ(x), which has a wider band roughly from -6 to 6. ## Excel Blueprint This Excel Blueprint is available to AI by Hand Academy members. You can become a member [via a paid Substack subscription](https://www.byhand.ai/). ## This post is for paid subscribers [ ](https://www.byhand.ai/ ?simple=true& =https%3A%2F%2Fwww.byhand.ai%2Fp%2Fgelu-gaussian-error-linear-unit&utm_source=paywall&utm_medium=web&utm_content=188610777) [A

📰 IA NEWS AI by Hand ✍️ 2026-04-04T07:01:45-03:00

Tanh

Por Prof. Tom Yeh

[AI by Hand ✍️](/)### Essential AI Math Excel Blueprints The Tanh activation function takes any number and smoothly squeezes it into a range between –1 and 1. It keeps the signal centered around zero, which helps the network learn more efficiently, especially in deeper layers. Like a gentle S‑shaped curve, it allows strong signals to pass through while taming extreme values, making it a popular choice when you want both positive and negative activity in the model. In comparison, the sigmoid activation function σ(x) squeezes a value into a range between 0 and 1. It also has an S-shaped curve, but it’s centered at y = 0.5, and its most active transition occurs roughly in the x-range of –6 to 6, wider than that of tanh. ## Excel Blueprint This Excel Blueprint is available to AI by Hand Academy members. You can become a member [via a paid Substack subscription](https://www.byhand.ai/). ## This post is for paid subscribers

📰 IA NEWS AI by Hand ✍️ 2026-04-04T07:01:44-03:00

GLU (Gated Linear Unit)

Por Prof. Tom Yeh

GLU (Gated Linear Unit) - by Prof. Tom Yeh - AI by Hand ✍️ # AI by Hand ✍️(Gated Linear Unit) ### Essential AI Math Excel Blueprints Prof. Tom Yeh Feb 21, 2026 ∙ Paid 2 Share Gated Linear Units (GLU) marked a breakthrough in activation design by introducing a truly dynamic gating mechanism — meaning the gate is predicted from the input itself rather than defined by a fixed, predefined function. GLU projects the input through two parallel linear transformations: one produces a feature value, and the other produces a gate logit. The gate logit passes through a sigmoid to produce a value between 0 and 1, which determines how “open” the gate is and how much (percentage) of the feature value is allowed to pass through. Below is the visuliation of the computation of SiLU for comparison. You can notice the key difference. In SiLU the sigmoid gate depends directly on the same projected feature value (z = Wx), meaning the feature and the gate come from the same linear transformation. In contrast, GLU-style gating predicts the gate using a separate linear transformation, so the gate is not tied to the feature itself. ## Excel Blueprint This Excel Blueprint is available to AI by Hand Academy members. You can become a member [via a paid Substack subscription](https://www.byhand.ai/). ## This post is for paid subscribers [ ](https://www.byhand.ai/ ?simple=true& =https%3A%2F%2Fwww.byhand.ai%2Fp%2Fglu-gated-linear-unit&utm_source=paywall&utm_medium=web&utm_content=188715421) [A

📰 IA NEWS AI by Hand ✍️ 2026-04-04T07:01:43-03:00

Essential AI Math #16 to #20

Por Prof. Tom Yeh

Essential AI Math #16 to #20 - by Prof. Tom Yeh # AI by Hand ✍️#16 to #20 Prof. Tom Yeh Feb 23, 2026 ∙ Paid 4 Share Dear Academy Members, I’m glad that I finally reached #20 for the new _Essential AI Math Blueprints_ series 🎉 After reaching this milestone, I’m confident the series is on its way to becoming one of the… ## This post is for paid subscribers [ ](https://www.byhand.ai/ ?simple=true& =https%3A%2F%2Fwww.byhand.ai%2Fp%2Fessential-ai-math-16-to-20&utm_source=paywa

📰 IA NEWS AI by Hand ✍️ 2026-04-04T07:01:42-03:00

Entropy

Por Prof. Tom Yeh

[AI by Hand ✍️](/)### Essential AI Math Excel Blueprints Prof. Tom Yeh Mar 01, 2026 ∙ Paid Entropy measures the inherent uncertainty (surprise) of a probability distribution. If one outcome is guaranteed—for example, B occurs with probability 1—there is no uncertainty at all, so entropy is zero. If B is almost certain but not guaranteed, there is still a small amount of uncertainty, so entropy is low. If there are two likely outcomes with comparable probabilities, uncertainty increases and entropy is high. Finally, when all possible outcomes are equally likely, uncertainty is maximized, and entropy reaches its highest value for that set of outcomes. Entropy increases as uncertainty spreads across more possible outcomes. For example, if there are 5 possible outcomes (A–E) and all are equally likely, the entropy is about 1.73. If we expand the space to 9 equally likely outcomes (A–I), there are now more possibilities to distinguish among, so entropy increases to around 2.20. In general, when all outcomes are equally likely—a uniform distribution—entropy reaches its maximum for that fixed number of outcomes. ## Excel Blueprint This Excel Blueprint is available to AI by Hand Academy members. You can become a member [via a paid Substack subscription](https://www.byhand.ai/). ## This post is for paid subscribers [Already a paid subscriber?

📰 IA NEWS AI by Hand ✍️ 2026-04-04T07:01:40-03:00

AI by Hand Library ~ Attention, MHA, MQA, GQA

Por Prof. Tom Yeh

AI by Hand Library ~ Attention, MHA, MQA, GQA # AI by Hand ✍️~ Attention, MHA, MQA, GQA Prof. Tom Yeh Mar 31, 2026 ∙ Paid 2 1 Share I’m building a library of interactive flow diagrams for the paid members of the AI by Hand Academy. The first collection covers attentionwith seven diagrams in one learning path: 1. QKV Projection: wher… ## This post is for paid subscribers [ ](https://www.byhand.ai/ ?simple=true& =https%3A%2F%2Fwww.byhand.ai%2Fp%2Fai-by-hand-library-attention-mha&utm_source=paywall&utm_medium=web&utm_content=192739256) [Already a paid subscriber? **

🔬 IA RESEARCH AI by Hand ✍️ 2026-04-04T07:01:40-03:00

My Deep Learning Math Workbook — Now Interactive

Por Prof. Tom Yeh

[AI by Hand ✍️](/)— Now Interactive Happy to share something new: an interactive edition of my popular Deep Learning Math Workbook. This is a proof-of-concept prototype with the first 15 exercises of Chapter 1: Dot Product. You fill in the blanks, pick from multiple choices, and check your answers — all in your browser. 👉 Give it a try Let me know what you think. I still believe pen and paper is the best way to learn math. There’s something about the focus and concentration that a blank page demands. But many of you have asked for an interactive option, and I think there’s real value in being able to test yourself and get immediate feedback. ~ Prof. Tom Yeh #### Interactive format is more engaging 👏🏻✅ Done it all correct at 1️⃣ shot without hallucinations 🥱 🤭 No posts ### ?

🔬 IA RESEARCH Agentic AI 2026-04-04T07:01:03-03:00

Beyond the “Gradient Highway”: How Attention Residuals Fix the Hidden Crisis of Deep LLMs

Por Ken Huang

Beyond the “Gradient Highway”: How Attention Residuals Fix the Hidden Crisis of Deep LLMs # Agentic AI“Gradient Highway”: How Attention Residuals Fix the Hidden Crisis of Deep LLMs ### Some key takeaways from recent paper from open source model Kimi’s research Ken Huang Mar 18, 2026 ∙ Paid 14 1 Share Sometimes, most advanced research does not need a PhD; the X post about Kimi’s new AI architecture, allegedly created by a 17‑year‑old and praised by Elon Musk, is a vivid example of how exceptional talent, open research, and plentiful compute can let independent young researchers push state‑of‑the‑art AI infrastructure forward without formal credentials, delivering “drop‑in” architectures with meaningful compute gains at minimal extra latency and proving that breakthrough ideas increasingly come from those who move fastest, not those with the longest academic résumés, and our AI researchers at DistributedApps.ai have analyzed the underlying work and summarized the key takeaways— to our paid tier to know more. 1. Introduction: The Amnesia of Depth For years, the “gradient highway” has been the structural backbone of deep learning. Standard residual connections serve as a fast lane, allowing information to bypass complex transformations via identity mappings. But this high-speed travel carries a hidden cost. Imagine a highway where every town passed adds new cargo to a truck. By the time the vehicle has traveled thousands of miles—or passed through hundreds of neural layers—the original items from the start of the trip are buried under a mountain of new weight. In modern Large Language Models (LLMs) using PreNorm architectures, this manifests as a form of architectural “amnesia” or dilution. Because standard residuals aggregate information using fixed unit weights, each individual layer’s contribution is progressively washed out as the model grows deeper. Attention Residuals (AttnRes), an innovation recently detailed by the Kimi team at Moonshot AI, introduces a “selective memory” upgrade. By moving away from blind accumulation, AttnRes allows each layer to perform content-aware retrieval across the model’s entire history. 1. The PreNorm Paradox: Why More Layers Don’t Always Mean Better Features While depth is intended to build increasingly sophisticated features, modern PreNorm architectures suffer from a technical limitation: hidden-state magnitudes grow as O(L) with depth. As these magnitudes expand, the relative influence of any single new layer—and its ability to impact the final output—shrinks. Early-layer information is effectively “buried,” and the model loses the capacity to retrieve specific representations from its own past. Standard residuals trap the model in a “fixed unit weight” strategy where every layer is treated with equal importance, regardless of its utility. As the research team observes: “Residuals also play a second role that has received less attention... residuals define how information aggregates across depth. Unlike sequence mixing and expert routing, which now employ learnable input-dependent weighting, this depth-wise aggregation remains governed by fixed unit weights.” 1. Takeaway #1: Replacing Addition with Selection (Softmax over Depth) The central intellectual pivot of AttnRes is the Duality of Time and Depth. Just as the Transformer revolution replaced the sequential recurrence of RNNs with attention across time (the sequence), AttnRes replaces the additive recurrence of residuals with attention across depth. ## Continue reading this post for free, courtesy of Ken Huang. Claim my free post [Or purchase a paid subscription.](https://kenhuangus.substack.com/ ?simple=true& =https%3A%2F%2Fkenhuangus.substack.com%2Fp%2Fbeyond-the-gradient-highway-how-attention&utm_source=paywall&utm_medium=web&utm_content=191286591&just_signed_

📰 IA NEWS Agentic AI 2026-04-04T07:00:57-03:00

Claude Skill vs. Plug-in: When to use What?

Por Ken Huang

Claude Skill vs. Plug-in: When to use What? - by Ken Huang # Agentic AI: When to use What? Ken Huang Apr 01, 2026 ∙ Paid 25 1 Share In this article, I explain in very details about the extension system Claude Code uses. I still got some questions from my readers and ask me to focus on just Skills vs. Plug-in used in Claude Code. So, here it is the focused article on this topic. In Claude Code, a skill is the lightweight unit of reusable instructions or workflow, while a plugin is the packaging layer that can bundle multiple skills plus hooks, subagents, and MCP servers into one installable toolkit. If you mean “should I use a plugin or a skill?”, use a skill for one reusable task or domain workflow, and a plugin when you want to distribute or reuse a broader setup across projects or teams. c ## What each one is Skills are markdown-based instructions files (SKILL.md) that Claude can load automatically when relevant or invoke directly with /skill-name. They’re best for reference material, repeatable workflows, and narrowly scoped actions. c Plugins are the container that packages and distributes features, including multiple skills, hooks, subagents, and MCP servers. They’re namespaced so several plugins can coexist without naming collisions. c ## Practical rule of thumb ## How they differ in practice ## Continue reading this post for free, courtesy of Ken Huang. Claim my free post [Or purchase a paid subscription.](https://kenhuangus.substack.com/ ?sim