AI Struggles with Indian Culture: New Method Boosts Understanding by 20%

📄 VIRAASAT: Traversing Novel Paths for Indian Cultural Reasoning

Researchers discovered that even advanced AI models perform poorly when asked complex questions about Indian culture, history, and traditions. They created VIRAASAT, a dataset of over 3,200 challenging cultural questions spanning all Indian states, and developed a new training method called Symbolic Chain-of-Manipulation (SCoM) that teaches AI to think step-by-step through cultural knowledge like navigating a map. This approach improved AI performance by up to 20% compared to standard methods, helping models better understand and reason about diverse cultural contexts.

📄 View on arXiv 📥 PDF

New AI Creates Real-Time Virtual Humans That Actually Look at You

📄 SARAH: Spatially Aware Real-time Agentic Humans

Current virtual characters in VR and video calls act like they're talking to a wall - they don't turn toward you, make eye contact, or respond to your movements. Researchers solved this by creating SARAH, an AI system that generates full-body motion for virtual humans in real-time, making them turn toward users, maintain natural gaze, and align their gestures with speech based on where you are in the room. The system runs at over 300 frames per second on VR headsets and even lets users adjust how much eye contact they want. Tests show it creates much more natural-feeling conversations than previous methods.

📄 View on arXiv 📥 PDF

New 3-in-1 Technique Shrinks AI Models by 75% While Boosting Performance

📄 SPQ: An Ensemble Technique for Large Language Model Compression

Large AI language models are incredibly powerful but require massive amounts of memory, making them expensive and difficult to deploy. Researchers developed SPQ, a clever compression technique that combines three different approaches - removing redundant parts, simplifying complex calculations, and reducing numerical precision - all working together like a well-coordinated team. When applied to Meta's LLaMA-2-7B model, SPQ achieved a remarkable 75% reduction in memory usage while actually improving the model's performance on language tasks. The compressed models also run nearly twice as fast as competing compression methods, making AI more accessible for real-world applications.

📄 View on arXiv 📥 PDF

AI System Predicts Indian Court Appeals with 81% Accuracy

📄 Vichara: Appellate Judgment Prediction and Explanation for the Indian Judicial System

India's courts are drowning in a massive backlog of legal cases, creating years-long delays for justice. Researchers developed Vichara, an AI system that can predict how appellate courts will rule on cases and explain its reasoning in a format lawyers can easily understand. The system breaks down complex legal documents into 'decision points' - key legal determinations with their context and reasoning. Testing on real Indian legal cases, Vichara achieved over 80% accuracy using GPT-4o mini, significantly outperforming existing legal AI tools.

📄 View on arXiv 📥 PDF

AI Learns to Spot Missing MRI Data Without Human Help

📄 Exploiting Completeness Perception with Diffusion Transformer for Unified 3D MRI Synthesis

Medical imaging often suffers from missing data - like incomplete MRI scans or missing scan types - forcing doctors to work with partial information. Researchers developed CoPeDiT, an AI system that can automatically detect what's missing from MRI scans and generate the missing pieces without needing humans to manually point out the gaps. The system uses a clever 'completeness perception' approach that learns to recognize incomplete data patterns and fills them in with realistic 3D brain and heart imagery. Testing on large medical datasets showed this self-aware AI significantly outperformed existing methods that require manual guidance.

📄 View on arXiv 📥 PDF

AI Agent Learns to Engineer Better Features Than Humans for ML Models

📄 FAMOSE: A ReAct Approach to Automated Feature Discovery

Machine learning models often fail because humans struggle to identify the best features from massive datasets - a process requiring deep expertise and intuition. Researchers created FAMOSE, an AI agent that uses a 'think-then-act' approach to automatically discover, create, and test new features for data analysis. The system achieved state-of-the-art results on regression tasks and strong performance on classification, reducing prediction errors by 2% on average while being more reliable than existing automated methods. What makes it special is that the AI learns from its own trial-and-error process, getting better at inventing useful features as it goes.

📄 View on arXiv 📥 PDF

New Attack Method Exposes Serious Vulnerabilities in Top AI Vision Systems

📄 Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting

Researchers developed M-Attack-V2, a more effective method for fooling large AI vision-language models like GPT-5 and Claude into giving wrong answers. The technique works by creating specially crafted images that look normal to humans but confuse AI systems by targeting fine-grained visual details. The method dramatically improved attack success rates - jumping from 8% to 30% on Claude-4.0 and reaching 100% success on GPT-5. This research highlights critical security gaps in AI systems that process both images and text.

📄 View on arXiv 📥 PDF

AI Can Now Generate Realistic Hand Movements from Simple Text Descriptions

📄 CLUTCH: Contextualized Language model for Unlocking Text-Conditioned Hand motion modelling in the wild

Researchers have created CLUTCH, an AI system that can generate natural 3D hand movements from text descriptions like 'opening a jar' or 'typing on keyboard.' Previous systems were limited to studio recordings with basic actions, but this breakthrough uses 32,000 real-world hand motion sequences captured from everyday activities. The system combines a large language model with a novel technique called SHIFT that breaks down hand movements into parts, achieving much more realistic and diverse hand animations than ever before.

📄 View on arXiv 📥 PDF

Meta Trains Massive AI Vision Model on 25B Social Media Posts

📄 Xray-Visual Models: Scaling Vision models on Industry Scale Data

Researchers at Meta created Xray-Visual, a powerful AI system that can understand both images and videos by training on an enormous dataset of 15 billion image-text pairs and 10 billion video-hashtag pairs from Facebook and Instagram. The key innovation is a three-stage training approach that combines different learning techniques to help the AI understand visual content without needing perfectly labeled data. The system achieves record-breaking performance on standard vision tasks while being more efficient and robust than previous models, proving that massive social media datasets can create superior AI vision systems.

📄 View on arXiv 📥 PDF

AI Researchers Expose Critical Security Flaw in ChatGPT-Style Agents

📄 Automating Agent Hijacking via Structural Template Injection

AI agents that retrieve and process information can be tricked into following malicious commands through a new attack method called 'Phantom.' Researchers discovered they can inject specially crafted code templates that confuse agents about who is giving instructions - making them think harmful commands are legitimate user requests. The team found over 70 vulnerabilities in real commercial AI products and showed their automated attack works much better than previous manual hacking attempts. This research reveals a fundamental security weakness in how AI agents process different types of instructions.

📄 View on arXiv 📥 PDF

Greek AI Gets Its Own Voice: Testing Language Models for Cultural Accuracy

📄 Evaluating Monolingual and Multilingual Large Language Models for Greek Question Answering: The DemosQA Benchmark

Most AI language models are heavily biased toward English and popular languages, potentially misrepresenting the culture and social context of smaller languages like Greek. Researchers created DemosQA, a new Greek question-answering dataset built from real social media conversations, and tested 11 different AI models to see whether specialized Greek-only models or general multilingual ones perform better. They developed a memory-efficient testing framework and evaluated the models across 6 different Greek datasets using various prompting strategies. The study reveals important insights about how well current AI systems understand and represent Greek language, culture, and social nuances.

📄 View on arXiv 📥 PDF

New AI Safety Method Uses 17,000x Fewer Parameters Than Full Retraining

📄 NeST: Neuron Selective Tuning for LLM Safety

Training AI models to refuse harmful requests typically requires expensive, full-scale retraining of billions of parameters. Researchers developed NeST, a clever approach that identifies and updates only the specific neurons responsible for safety behavior - like finding the exact brain cells that control moral reasoning. By targeting just these 'safety neurons' and grouping similar ones together, NeST achieved a 90% reduction in unsafe AI responses while using 17,000 times fewer parameters than traditional methods. This makes it far cheaper and faster to keep AI models safe as new threats emerge.

📄 View on arXiv 📥 PDF

New AI System Sees Underwater Depths 17% Better Than Previous Methods

📄 StereoAdapter-2: Globally Structure-Consistent Underwater Stereo Depth Estimation

Underwater robots struggle to judge distances because water distorts light and vision in complex ways. Researchers developed StereoAdapter-2, a new AI system that uses a smarter scanning pattern inspired by how eyes naturally track objects, plus a massive dataset of 80,000 synthetic underwater scenes for training. The system achieved 17% better depth perception on underwater tests compared to existing methods, and successfully worked on a real underwater robot called BlueROV2.

📄 View on arXiv 📥 PDF

New Study Reveals AI Agents Are Vulnerable to Sophisticated Multi-Turn Attacks

📄 AgentLAB: Benchmarking LLM Agents against Long-Horizon Attacks

As AI agents become more powerful and are deployed for complex, long-term tasks, they face a new class of security threats that exploit extended conversations to manipulate their behavior. Researchers created AgentLAB, the first comprehensive testing platform that simulates five types of sophisticated attacks - including hijacking the agent's goals, chaining tools maliciously, and poisoning memory - across 28 realistic scenarios with 644 test cases. Testing popular AI agents revealed they're highly susceptible to these multi-turn manipulation tactics, and traditional single-conversation defenses don't work against these extended attacks. This research highlights critical security gaps as AI agents take on more autonomous roles in real-world applications.

📄 View on arXiv 📥 PDF

AI System Writes Better Software Tests by Planning Before Coding

📄 SPARC: Scenario Planning and Reasoning for Automated C Unit Test Generation

Writing automated tests for C programming code has been notoriously difficult because AI models often jump straight to generating code without understanding the program's structure, leading to broken or useless tests. Researchers created SPARC, a new system that makes AI 'think before it codes' by first analyzing the program's flow, then planning test scenarios, and finally using compiler feedback to fix any issues. The system dramatically outperforms existing methods, achieving over 30% better code coverage and successfully repairing 94% of its generated tests, while producing more readable and maintainable code.

📄 View on arXiv 📥 PDF

GLM-5 Model Transforms Coding from 'Vibes' to Professional Engineering

📄 GLM-5: from Vibe Coding to Agentic Engineering

Researchers have developed GLM-5, a new AI model that aims to move beyond basic 'vibe coding' (where AI helps with simple code snippets) toward full 'agentic engineering' - where AI can handle complete, complex software projects autonomously. The key innovation is a new asynchronous reinforcement learning system that allows the AI to learn from long, complex coding tasks more efficiently by separating the generation process from the training process. GLM-5 achieved state-of-the-art results on major coding benchmarks and showed unprecedented ability to handle end-to-end software engineering challenges in real-world scenarios.

📄 View on arXiv 📥 PDF

AI Models Get Worse at Privacy & Personalization with Longer Context

📄 Long Context, Less Focus: A Scaling Gap in LLMs Revealed through Privacy and Personalization

Researchers discovered a surprising problem: as AI models process longer conversations and documents, they actually become worse at both protecting privacy and personalizing responses. They created PAPerBench, a massive benchmark with 377,000 test questions to study this issue across context lengths from 1,000 to 256,000 words. The study found that all major AI models suffer from 'attention dilution' - like trying to focus on everything at once, they end up focusing on nothing well. This reveals a fundamental limitation in how current transformer-based AI systems handle long contexts.

📄 View on arXiv 📥 PDF

AI Agent Discovers Hidden Drug Investments Worth Billions Globally

📄 Hunt Globally: Wide Search AI Agents for Drug Asset Scouting in Investing, Business Development, and Competitive Intelligence

Pharmaceutical companies are missing out on potentially billion-dollar drug discoveries because most new medications are now developed outside the U.S. and published in non-English sources that current AI tools can't effectively search. Researchers created a specialized AI agent called 'Bioptic Agent' that can hunt for promising drug candidates across multiple languages and regions without making up false information. In tests, their system found nearly 80% of hidden drug assets compared to just 50-56% for leading AI tools like Claude and ChatGPT, potentially saving investors from missing the next breakthrough medication.

📄 View on arXiv 📥 PDF

AI Agents Struggle with Real Research Tasks, Succeeding Just 6.7% of Time

📄 ResearchGym: Evaluating Language Model Agents on Real-World AI Research

Researchers created ResearchGym, a benchmark that tests AI agents on actual research tasks by giving them real academic papers' datasets and asking them to discover the solutions independently. They tested advanced AI agents (including GPT-5) on five research problems from top AI conferences, requiring the agents to form hypotheses, run experiments, and beat human baselines. Despite occasional breakthroughs - one agent even surpassed a published solution - the agents succeeded in only 1 out of 15 attempts and completed just 26% of subtasks, revealing major reliability issues in AI's research capabilities.

📄 View on arXiv 📥 PDF

New Method Spots Malicious AI Adapters Without Running Them

📄 Weight space Detection of Backdoors in LoRA Adapters

AI researchers have developed a way to detect backdoor attacks in LoRA adapters - popular tools for customizing large language models - by analyzing their code structure instead of testing them with data. The method examines statistical patterns in the adapter's weight matrices, like how concentrated certain values are, to spot suspicious modifications. Testing on 500 adapters, the technique achieved 97% accuracy in identifying poisoned adapters with less than 2% false alarms. This breakthrough makes it practical to screen thousands of shared AI adapters on platforms like Hugging Face without knowing what triggers might activate malicious behavior.

📄 View on arXiv 📥 PDF