Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    AT&T’s new locking option blocks thieves trying to access your account

    July 1, 2025

    Steam can now show you that the framework generation has changed your game

    July 1, 2025

    Hewlett Packard Enterprise $14B acquisition of Juniper, the judiciary clears after settlement

    June 30, 2025
    Facebook X (Twitter) Instagram
    NPP HUB
    • Home
    • Technology
    • Artificial Intelligence
    • Gadgets
    • Tech News
    Facebook X (Twitter) Instagram
    NPP HUB
    Home»Artificial Intelligence»Research shows that visual models cannot process queries with negative words
    Artificial Intelligence

    Research shows that visual models cannot process queries with negative words

    Daniel68By Daniel68May 19, 2025No Comments5 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email

    Imagine that a radiologist examines a new patient’s chest X-ray. She noticed the patient swelling in the tissues but had insufficient heart. Hopefully speeding up diagnosis, she might use visual language machine learning models to search for reports similar to patients.

    However, if the model mistakenly identifies reports in both cases, the most likely diagnosis may be very different: If the patient has a swelling of tissue and enlarged heart, the condition is likely to be related to the heart, but without enlarged hearts, there may be several root causes.

    In a new study, MIT researchers found that visual language models are highly likely to make such mistakes in the real world because they don’t understand negation—words like “no” and “not” indicate wrong or non-existent words.

    “These negatives can have very significant impacts, and if we just use these models blindly, we can have catastrophic consequences,” said Kumail Alhamud, a graduate student at MIT and lead author of the study.

    The researchers tested the ability of visual models to recognize negation in image titles. These models will usually be like random guesses. Building on these findings, the team created an image dataset with negative subtitles describing the missing objects.

    They show that using this dataset to detect visual models results in performance improvements when the model is asked to retrieve images that do not contain certain objects. It also improves the accuracy of multiple choice questions answered with negative subtitles.

    But researchers warn that more work is needed to address the root cause of the problem. They hope their research reminds potential users of previously unknown shortcomings that could have severe implications in the current high-risk environment using these models, from determining which patients receive certain treatments to identifying product defects in manufacturing plants.

    Senior writer Marzyeh Ghassemi said: “This is a technical paper, but there are more issues to consider.

    MIT graduate student Shaden Alshammari joined Ghassemi and Alhamoud. Yonglong Tian of Openai; Guohao Li, a former postdoctoral fellow at Oxford University; Philip HS Torr, an Oxford professor; Yoon Kim, an assistant professor at EECS and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT. The study will be presented at the Computer Vision and Pattern Recognition Conference.

    Ignore the negation

    Visual Language Models (VLM) are trainings using a large number of images and corresponding subtitles, and they learn to encode them into sets of numbers, called vector representations. The model uses these vectors to distinguish different images.

    VLM uses two separate encoders, one for text and one for images, and the encoder learns to output similar vectors for the image and its corresponding text subtitles.

    “Subtitles express what’s in the image – they’re a positive label. It’s actually the whole problem. No one looks at the image of a dog jumping over a fence and says “a dog jumps over a fence, no helicopter,” Ghassemi says.

    Since the image capture dataset does not contain negative examples, VLMS will never learn to recognize it.

    To delve into this issue, the researchers designed two benchmark tasks to test VLMS’s ability to understand negation.

    First, they use a large language model (LLM) to recapture images in an existing dataset, requiring the LLM to consider related objects that are not in the image and write them into the title. They then test the model by prompting them to use a negative word to retrieve images containing some objects but no others.

    For the second task, they designed multiple selection questions, requiring VLM to select the most appropriate title from the list of closely related options. These subtitles differ only by adding a reference to an object that does not appear in the image or negate an object that does appear in the image.

    Models for both tasks usually fail, with image retrieval performance dropping by nearly 25% on negative subtitles. When answering multiple choice questions, the best model achieved only about 39% accuracy, with several models even lower than the random chance.

    One of the reasons for this failure is what researchers call a shortcut to affirmative bias – VLMS ignores the negative words and focuses on objects in the image.

    “It’s not only happening in words like ‘no’ and ‘not’. No matter how you express negative or exclusion, these models simply ignore it,” Alhamoud said.

    This is consistent in every VLM they tested.

    “Solved Problems”

    Since VLM usually does not train negatively on image titles, the researchers developed datasets with negative words as the first step in solving the problem.

    Using a dataset with 10 million image text subtitle pairs, they prompted LLM to propose relevant subtitles to specify what is excluded from the image, resulting in new subtitles with negative words.

    They must be particularly careful to ensure that these synthetic subtitles are still read naturally, or may cause VLM to fail in the real world when faced with more complex subtitles written by humans.

    They found that FINETUNTUNTINNE of VLM and its datasets brought a comprehensive performance improvement. It improves the model’s image retrieval capability by about 10%, while also improving the performance of multiple selective answer tasks by about 30%.

    “But our solution is not perfect. We’re just capturing the dataset, which is a form of data augmentation. We don’t even touch on how these models work, but we hope that this shows that it’s a solution that others can take our solution and improve the solution,” Alhamoud said.

    At the same time, he hopes that their work encourages more users to consider the problem they want to use VLM to solve and design some examples to test them before deployment.

    In the future, researchers can extend this work by teaching VLM to process text and images separately, which may improve their ability to understand negation. Additionally, they can develop other data sets that include image capture pairs for specific applications, such as healthcare.

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Daniel68
    • Website

    Related Posts

    Unlock performance: Accelerate Pandas operation using Polars

    June 30, 2025

    CTGT’s AI platform is built to eliminate bias, hallucination in AI models

    June 29, 2025

    See blood clots before the strike

    June 27, 2025

    AI-controlled robot shows unstable driving, NHTSA problem Tesla

    June 26, 2025

    Estonia’s AI Leap brings chatbots to school

    June 25, 2025

    The competition between agents and controls enterprise AI

    June 24, 2025
    Leave A Reply Cancel Reply

    Top Reviews
    8.9
    Blog

    Smart Home Décor : Technology Offers a Slew of Options

    By Daniel68
    8.9
    Blog

    Edifier W240TN Earbud Review: Fancy Specs Aren’t Everything

    By Daniel68
    8.9
    Blog

    Review: Xiaomi’s New Mobile with Hi-fi and Home Cinema System

    By Daniel68
    mmm
    Editors Picks

    AT&T’s new locking option blocks thieves trying to access your account

    July 1, 2025

    Steam can now show you that the framework generation has changed your game

    July 1, 2025

    Hewlett Packard Enterprise $14B acquisition of Juniper, the judiciary clears after settlement

    June 30, 2025

    Unlock performance: Accelerate Pandas operation using Polars

    June 30, 2025
    Legal Pages
    • About Us
    • Disclaimer
    • DMCA Notice
    • Privacy Policy
    Our Picks

    AT&T’s new locking option blocks thieves trying to access your account

    July 1, 2025

    Steam can now show you that the framework generation has changed your game

    July 1, 2025

    Hewlett Packard Enterprise $14B acquisition of Juniper, the judiciary clears after settlement

    June 30, 2025
    Top Reviews
    8.9

    Smart Home Décor : Technology Offers a Slew of Options

    January 15, 2021
    8.9

    Edifier W240TN Earbud Review: Fancy Specs Aren’t Everything

    January 15, 2021
    8.9

    Review: Xiaomi’s New Mobile with Hi-fi and Home Cinema System

    January 15, 2021

    Type above and press Enter to search. Press Esc to cancel.