Uncategorized

DAI#57 – Tricky AI, exam challenge, and conspiracy cures

Welcome to this week’s roundup of AI news made by humans, for humans.

This week, OpenAI told us that it’s pretty sure o1 is kinda safe.

Microsoft gave Copilot a big boost.

And a chatbot can cure your belief in conspiracy theories.

Let’s dig in.

It’s pretty safe

We were caught up in the excitement of OpenAI’s release of its o1 models last week until we read the fine print. The model’s system card offers interesting insight into the safety testing OpenAI did and the results may raise some eyebrows.

It turns out that o1 is smarter but also more deceptive with a “medium” danger level according to OpenAI’s rating system.

Despite o1 being very sneaky during testing, OpenAI and its red teamers say they’re fairly sure it’s safe enough to release. Not so safe if you’re a programmer looking for a job.

If OpenAI‘s o1 can pass OpenAI‘s research engineer hiring interview for coding — 90% to 100% rate…

……then why would they continue to hire actual human engineers for this position?

Every company is about to ask this question. pic.twitter.com/NIIn80AW6f

— Benjamin De Kraker (@BenjaminDEKR) September 12, 2024

Copilot upgrades

Microsoft unleashed Copilot “Wave 2” which will give your productivity and content production an additional AI boost. If you were on the fence over Copilot’s usefulness these new features may be the clincher.

The Pages feature and the new Excel integrations are really cool. The way Copilot accesses your data does raise some privacy questions though.

More strawberries

If all the recent talk about OpenAI’s Strawberry project gave you a craving for the berry then you’re in luck.

Researchers have developed an AI system that promises to transform how we grow strawberries and other agricultural products.

This open-source application could have a huge impact on food waste, harvest yields, and even the price you pay for fresh fruit and veg at the store.

Too easy

AI models are getting so smart now that our benchmarks to measure them are just about obsolete. Scale AI and CAIS launched a project called Humanity’s Last Exam to fix this.

They want you to submit tough questions that you think could stump leading AI models. If an AI can answer PhD-level questions then we’ll get a sense of how close we are to achieving expert-level AI systems.

If you think you have a good one you could win a share of $500,000. It’ll have to be really tough though.

Source: X

Curing conspiracies

I love a good conspiracy theory, but some of the things people believe are just crazy. Have you tried convincing a flat-earther with simple facts and reasoning? It doesn’t work. But what if we let an AI chatbot have a go?

Researchers built a chatbot using GPT-4 Turbo and they had impressive results in changing people’s minds about the conspiracy theories they believed in.

It does raise some awkward questions about how persuasive AI models are and who decides what ‘truth’ is.

Just because you’re paranoid, doesn’t mean they’re not after you.

Stay cool

Is having your body cryogenically frozen part of your backup plan? If so, you’ll be happy to hear AI is making this crazy idea slightly more plausible.

A company called Select AI used AI to accelerate the discovery of cryoprotectant compounds. These compounds stop organic matter from turning into crystals during the freezing process.

For now, the application is for better transport and storage of blood or temperature-sensitive medicines. But if AI helps them find a really good cryoprotectant, cryogenic preservation of humans could go from a moneymaking racket to a plausible option.

AI is contributing to the medical field in other ways that might make you a little nervous. New research shows that a surprising amount of doctors are turning to ChatGPT for help to diagnose patients. Is that a good thing?

If you’re excited about what’s happening in medicine and considering a career as a doctor you may want to rethink that according to this professor.

This is the final warning for those considering careers as physicians: AI is becoming so advanced that the demand for human doctors will significantly decrease, especially in roles involving standard diagnostics and routine treatments, which will be increasingly replaced by AI.… pic.twitter.com/VJqE6rvkG0

— Derya Unutmaz, MD (@DeryaTR_) September 13, 2024

In other news…

Here are some other clickworthy AI stories we enjoyed this week:

Googles Notebook LM turns your written content into a podcast. This is crazy good.
When Japan switches the world’s first zeta-class supercomputer on in 2030 it will be 1,000 times faster than the world’s current fastest supercomputer.
SambaNova challenges OpenAI’s o1 model with an open-source Llama 3.1-powered demo.
More than 200 tech industry players sign an open letter asking Gavin Newsom to veto the SB 1047 AI safety bill.
Gavin Newsom signed two bills into law to protect living and deceased performers from AI cloning.
Sam Altman departs OpenAI’s safety committee to make it more “independent”.
OpenAI says the signs of life shown by ChatGPT in initiating conversations are just a glitch.
RunwayML launches Gen-3 Alpha Video to Video feature to paid users of its app.

Gen-3 Alpha Video to Video is now available on web for all paid plans. Video to Video represents a new control mechanism for precise movement, expressiveness and intent within generations. To use Video to Video, simply upload your input video, prompt in any aesthetic direction… pic.twitter.com/ZjRwVPyqem

— Runway (@runwayml) September 13, 2024

And that’s a wrap.

It’s not surprising that AI models like o1 present more risk as they get smarter, but the sneakiness during testing was weird. Do you think OpenAI will stick to its self-imposed safety level restrictions?

The Humanity’s Last Exam project was an eye-opener. Humans are struggling to find questions tough enough for AI to solve. What happens after that?

If you believe in conspiracy theories, do you think an AI chatbot could change your mind? Amazon Echo is always listening, the government uses big tech to spy on us, and Mark Zuckerberg is a robot. Prove me wrong.

Let us know what you think, follow us on X, and send us links to cool AI stuff we may have missed.

The post DAI#57 – Tricky AI, exam challenge, and conspiracy cures appeared first on DailyAI.

AI in the doctor’s office: GPs turn to ChatGPT and other tools for diagnoses

A new survey has found that one in five general practitioners (GPs) in the UK are using AI tools like ChatGPT to assist with daily tasks such as suggesting diagnoses and writing patient letters. 

The research, published in the journal BMJ Health and Care Informatics, surveyed 1,006 GPs across the about their use of AI chatbots in clinical practice. 

Some 20% reported using generative AI tools, with ChatGPT being the most popular. Of those using AI, 29% said they employed it to generate documentation after patient appointments, while 28% used it to suggest potential diagnoses.

“These findings signal that GPs may derive value from these tools, particularly with administrative tasks and to support clinical reasoning,” the study authors noted. 

We have no idea how many papers OpenAI used to train their models, but it’s certainly more than any doctor could have read. It gives quick, convincing answers and is very easy to use, unlike searching research papers manually. 

Does that mean ChatGPT is generally accurate for clinical advice? Absolutely not. Large language models (LLMs) like ChatGPT are pre-trained on massive amounts of general data, making them more flexible but dubiously accurate for specific medical tasks.

It’s easy to lead them on, with the AI model tending to side with your assumptions in problematically sycophantic behavior.

Moreover, some researchers state that ChatGPT can be conservative or prude when handling delicate topics like sexual health.

As Stephen Hughes from Anglia Ruskin University wrote in The Conservation, “I asked ChatGPT to diagnose pain when passing urine and a discharge from the male genitalia after unprotected sexual intercourse. I was intrigued to see that I received no response. It was as if ChatGPT blushed in some coy computerised way. Removing mentions of sexual intercourse resulted in ChatGPT giving a differential diagnosis that included gonorrhoea, which was the condition I had in mind.” 

As Dr. Charlotte Blease, lead author of the study, commented: “Despite a lack of guidance about these tools and unclear work policies, GPs report using them to assist with their job. The medical community will need to find ways to both educate physicians and trainees about the potential benefits of these tools in summarizing information but also the risks in terms of hallucinations, algorithmic biases and the potential to compromise patient privacy.”

That last point is key. Passing patient information into AI systems likely constitutes a breach of privacy and patient trust.

Dr. Ellie Mein, medico-legal adviser at the Medical Defence Union, agreed on the key issues: “Along with the uses identified in the BMJ paper, we’ve found that some doctors are turning to AI programs to help draft complaint responses for them. We have cautioned MDU members about the issues this raises, including inaccuracy and patient confidentiality. There are also data protection considerations.”

She added: “When dealing with patient complaints, AI drafted responses may sound plausible but can contain inaccuracies and reference incorrect guidelines which can be hard to spot when woven into very eloquent passages of text. It’s vital that doctors use AI in an ethical way and comply with relevant guidance and regulations.”

Probably the most critical questions amid all this are: How accurate is ChatGPT in a medical context? And how great might the risks of misdiagnosis or other issues be if this continues?

Generative AI in medical practice

As GPs increasingly experiment with AI tools, researchers are working to evaluate how they compare to traditional diagnostic methods. 

A study published in Expert Systems with Applications conducted a comparative analysis between ChatGPT, conventional machine learning models, and other AI systems for medical diagnoses.

The researchers found that while ChatGPT showed promise, it was often outperformed by traditional machine learning models specifically trained on medical datasets. For example, multi-layer perceptron neural networks achieved the highest accuracy in diagnosing diseases based on symptoms, with rates of 81% and 94% on two different datasets.

Researchers concluded that while ChatGPT and similar AI tools show potential, “their answers can be often ambiguous and out of context, so providing incorrect diagnoses, even if it is asked to provide an answer only considering a specific set of classes.”

This aligns with other recent studies examining AI’s potential in medical practice.

For example, research published in JAMA Network Open tested GPT-4’s ability to analyze complex patient cases. While it showed promising results in some areas, GPT-4 still made errors, some of which could be dangerous in real clinical scenarios.

There are some exceptions, though. One study conducted by the New York Eye and Ear Infirmary of Mount Sinai (NYEE) demonstrated how GPT-4 can meet or exceed human ophthalmologists in diagnosing and treating eye diseases.

For glaucoma, GPT-4 provided highly accurate and detailed responses that exceeded those of real eye specialists. 

AI developers such as OpenAI and NVIDIA are training purpose-built medical AI assistants to support clinicians, hopefully making up for shortfalls in base frontier models like GP-4.

OpenAI has already partnered with health tech company Color Health to create an AI “copilot” for cancer care, demonstrating how these tools are set to become more specific to clinical practice.  

Weighing up benefits and risks

There are countless studies comparing specially trained AI models to humans in identifying diseases from diagnostics images such as MRI and X-ray. 

AI techniques have outperformed doctors in everything from cancer and eye disease diagnosis to Alzheimer’s and Parkinson’s early detection. One, named “Mia,” proved effective in analyzing over 10,000 mammogram scans, flagging known cancer cases, and uncovering cancer in 11 women that doctors had missed. 

However, these purpose-built AI tools are certainly not the same as parsing notes and findings into a language model like ChatGPT and asking it to infer a diagnosis from that alone. 

Nevertheless, that’s a difficult temptation to resist. It’s no secret that healthcare services are overwhelmed. NHS waiting times continue to soar at all-time highs, and even obtaining GP appointments in some areas is a grim task. 

AI tools target time-consuming admin, such is their allure for overwhelmed doctors. We’ve seen this mirrored across numerous public sector fields, such as education, where teachers are widely using AI to create materials, mark work, and more. 

So, will your doctor parse your notes into ChatGPT and write you a prescription based on the results for your next doctor’s visit? Quite possibly. It’s just another frontier where the technology’s promise to save time is just so hard to deny. 

The best path forward may be to develop a code of use. The British Medical Association has called for clear policies on integrating AI into clinical practice.

“The medical community will need to find ways to both educate physicians and trainees and guide patients about the safe adoption of these tools,” the BMJ study authors concluded.

Aside from advice and education, ongoing research, clear guidelines, and a commitment to patient safety will be essential to realizing AI’s benefits while offsetting risks.

The post AI in the doctor’s office: GPs turn to ChatGPT and other tools for diagnoses appeared first on DailyAI.

Researchers use AI chatbot to change conspiracy theory beliefs

Around 50% of Americans believe in conspiracy theories of one type or another, but MIT and Cornell University researchers think AI can fix that.

In their paper, the psychology researchers explained how they used a chatbot powered by GPT-4 Turbo to interact with participants to see if they could be persuaded to abandon their belief in a conspiracy theory.

The experiment involved 1,000 participants who were asked to describe a conspiracy theory they believed in and the evidence they felt underpinned their belief.

The paper noted that “Prominent psychological theories propose that many people want to adopt conspiracy theories (to satisfy underlying psychic “needs” or motivations), and thus, believers cannot be convinced to abandon these unfounded and implausible beliefs using facts and counterevidence.”

Could an AI chatbot be more persuasive where others failed? The researchers offered two reasons why they suspected LLMs could do a better job than you of convincing your colleague that the moon landing really happened.

LLMs have been trained on vast amounts of data and they’re really good at tailoring counterarguments to the specifics of a person’s beliefs.

After describing the conspiracy theory and evidence, the participants engaged in back-and-forth interactions with the chatbot. The chatbot was prompted to “very effectively persuade” the participants to change their belief in their chosen conspiracy.

The result was that on average the participants experienced a 21.43% decrease in their belief in the conspiracy, which they formerly considered to be true. The persistence of the effect was also interesting. Up to two months later, participants retained their new beliefs about the conspiracy they previously believed.

The researchers concluded that “many conspiracists—including those strongly committed to their beliefs—updated their views when confronted with an AI that argued compellingly against their positions.”

Our new paper, out on (the cover of!) Science is now live! https://t.co/VBfC5eoMQ2

— Tom Costello (@tomstello_) September 12, 2024

They suggest that AI could be used to counter conspiracy theories and fake news spread on social media by countering these with facts and well-reasoned arguments.

While the study focused on conspiracy theories, it noted that “Absent appropriate guardrails, however, it is entirely possible that such models could also convince people to adopt epistemically suspect beliefs—or be used as tools of large-scale persuasion more generally.”

In other words, AI is really good at convincing you to believe the things it is prompted to make you believe. An AI model also doesn’t inherently know what is ‘true’ and what isn’t. It depends on the content in its training data.

The researchers achieved their results using GPT-4 Turbo, but GPT-4o and the new o1 models are even more persuasive and deceptive.

The study was funded by the John Templeton Foundation. The irony of this is that the Templeton Freedom Awards are administered by the Atlas Economic Research Foundation. This group opposes taking action on climate change and defends the tobacco industry, which also gives it funding.

AI models are becoming very persuasive and the people who decide what constitutes truth hold the power.

The same AI models that could convince you to stop believing the earth is flat, could be used by lobbyists to convince you that anti-smoking laws are bad and climate change isn’t happening.

The post Researchers use AI chatbot to change conspiracy theory beliefs appeared first on DailyAI.

AI accelerates the discovery of cryoprotectant compounds for medicine transport and storage

Scientists have developed a new machine learning system that could help preserve vaccines, blood, and other medical treatments. 

The research, published in Nature Communications, was led by the University of Warwick and the University of Manchester.

The AI system helps identify molecules called cryoprotectants – compounds that prevent damage when freezing biological materials. 

Cryoprotectants are special substances that help protect living cells and tissues from damage when they’re frozen. They work by preventing the formation of harmful ice crystals, which essentially break tissue apart when you freeze it. They also help cells maintain their structure in extreme cold.

These compounds are fundamentally important for preserving things like vaccines, blood samples, and reproductive cells for long-term storage or transport.

Cryopresevants could one day be used to preserve organs, complex tissues, or even entire humans.

Currently, finding new cryoprotectants is a slow, trial-and-error process. This new ML-driven approach allows researchers to rapidly screen hundreds of potential molecules virtually.

Here are some key points of the study:

The team created a machine learning model trained on data from existing cryoprotectants.
This model can predict how well new molecules might work as cryoprotectants.
Researchers used the model to screen a library of about 500 amino acids.
The system identified several promising compounds, including an aminooxazole ester that outperformed many known cryoprotectants.
Lab tests confirmed the AI’s predictions, with the new compound showing strong ice crystal prevention.
The discovered molecule improved red blood cell preservation when combined with standard techniques.

The amino oxazole ester identified by the study demonstrated particularly remarkable ice recrystallization inhibition (IRI) qualities. It almost completely stopped ice crystals from growing larger during the freezing process.

The compound was effective even when researchers lowered its concentration. Plus, it also maintained its ice-inhibiting properties in phosphate-buffered saline (PBS), a solution that mimics the salt concentration in human bodies.

Dr. Matt Warren, the PhD student who spearheaded the project, described how the model accelerates efficiency: “After years of labour-intensive data collection in the lab, it’s incredibly exciting to now have a machine learning model that enables a data-driven approach to predicting cryoprotective activity.”

Professor Matthew Gibson from Manchester addeds, “The results of the computer model were astonishing, identifying active molecules I never would have chosen, even with my years of expertise.”

Professor Gabriele Sosso, who led the Warwick team, explained in a blog post that, while impressive, machine learning isn’t a cure-all for these types of research problems: “It’s important to understand that machine learning isn’t a magic solution for every scientific problem. In this work, we used it as one tool among many.”

The researchers combined the AI predictions with molecular simulations and lab experiments – a multi-pronged approach that helped validate results and refine the model.

This contributes to a range of AI-driven studies into drug discovery and material design. Researchers have built AI models to generate interesting medicinal compounds, one of which has been brought to clinical trial.

DeepMind also created a model named GNoME capable of automatically generating and synthesizing materials.

The new cryoprotectant compounds discovered could have broad real-world impacts.

For instance, the researchers describe how improving cryopreservation might extend the shelf life of vaccines and make it easier to transport sensitive medical treatments to remote areas. 

The technique could also speed up blood transfusions by reducing the time needed to process frozen blood.

While the results are promising, the team cautions that more work is needed to fully understand how these new compounds function and to ensure medical safety and stability. 

The post AI accelerates the discovery of cryoprotectant compounds for medicine transport and storage appeared first on DailyAI.

Humanity’s Last Exam wants your tough questions to stump AI

Benchmarks are struggling to keep up with advancing AI model capabilities and the Humanity’s Last Exam project wants your help to fix this.

The project is a collaboration between the Center for AI Safety (CAIS) and AI data company Scale AI. The project aims to measure how close we are to achieving expert-level AI systems, something existing benchmarks aren’t capable of.

OpenAI and CAIS developed the popular MMLU (Massive Multitask Language Understanding) benchmark in 2021. Back then, CAIS says, “AI systems performed no better than random.”

The impressive performance of OpenAI’s o1 model has “destroyed the most popular reasoning benchmarks,” according to Dan Hendrycks, executive director of CAIS.

OpenAI’s o1 MMLU performance compared with earlier models. Source: OpenAI

Once AI models hit 100% on the MMLU, how will we measure them? CAIS says “Existing tests now have become too easy and we can no longer track AI developments well, or how far they are from becoming expert-level.”

When you see the jump in benchmark scores that o1 added to the already impressive GPT-4o figures, it won’t be long before an AI model aces the MMLU.

This is objectively true. pic.twitter.com/gorahh86ee

— Ethan Mollick (@emollick) September 17, 2024

Humanity’s Last Exam is asking people to submit questions that would genuinely surprise you if an AI model delivered the correct answer. They want PhD level exam questions, not the ‘how many Rs in Strawberry’ type that trip up some models.

Scale explained that “As existing tests become too easy, we lose the ability to distinguish between AI systems which can ace undergrad exams, and those which can genuinely contribute to frontier research and problem solving.”

If you have an original question that could stump an advanced AI model then you could have your name added as a co-author of the project’s paper and share in a pool of $500,000 that will be awarded to the best questions.

To give you an idea of the level the project is aiming at Scale explained that “if a randomly selected undergraduate can understand what is being asked, it is likely too easy for the frontier LLMs of today and tomorrow.”

There are a few interesting restrictions on the kinds of questions that can be submitted. They don’t want anything related to chemical, biological, radiological, nuclear weapons, or cyberweapons used for attacking critical infrastructure.

If you think you’ve got a question that meets the requirements then you can submit it here.

The post Humanity’s Last Exam wants your tough questions to stump AI appeared first on DailyAI.

Microsoft unveils Copilot “Wave 2” to accelerate productivity and content production

Microsoft introduced “Wave 2” of its Copilot AI assistant, bringing a host of new features designed to transform how individuals and businesses interact with AI in their daily workflows.

At the core of Wave 2 is Copilot Pages, a novel digital workspace that Microsoft describes as “the first new digital artifact for the AI age.” 

In essence, the new Copilot announcements offer a fully AI-embedded workspace for media and content production, blending OpenAI’s GPT-4o model into a coherent digital workspace.

It allows teams to collaborate in real-time with AI assistance, turning AI-generated content into editable, shareable documents. 

The goal? To save business time and enhance productivity while driving paid adoption of AI tools. Wave 2 indicates Microsoft’s attempts to turn its multi-billion dollar OpenAI investment into revenue.

By promoting Copilot’s business appeal, the company aims to convert more companies into paying customers for its premium AI tools.

Jared Spataro, Microsoft’s Corporate VP for AI at Work, explains the new Copilot concept: “Pages takes ephemeral AI-generated content and makes it durable, so you can edit it, add to it, and share it with others. You and your team can work collaboratively in a page with Copilot, seeing everyone’s work in real time and iterating with Copilot like a partner.”

The rollout of these features varies, with some immediately available and others predicted to roll out in the coming weeks or months. 

Here’s a round-up of everything Microsoft announced at their live event:

Copilot Pages

Copilot Pages is the centerpiece of Microsoft’s Wave 2 update, offering a thoroughly AI-embedded approach to collaborative work. It works by transforming Copilot’s AI-generated responses from fleeting chat messages into lasting, editable documents.

To use it, you select an “Edit in Pages” button next to a Copilot response, opening a new window alongside the original chat thread. Users can work within that window to refine and build upon the AI’s output together.

Pages integrates with BizChat, Microsoft’s hub for combining web, corporate, and business data.

Like most of these features, Pages is currently exclusive to paid Copilot for Microsoft 365 subscribers.

PowerPoint powered by AI

The new PowerPoint Narrative Builder aims to streamline the presentation creation process. Users can input a topic, and the AI will generate an outline in minutes. 

Microsoft is also introducing a Brand Manager feature to ensure presentations align with company guidelines and maintain consistency across different channels.

That’s another business-centric feature designed to drive greater uptake among enterprise users.

In Microsoft’s words, “Accelerating every business process with Copilot—to grow revenue and reduce costs—is the best way to gain competitive advantage in the age of AI.”

Supporting businesses is a central theme here. Microsoft is pushing businesses to opt into its growing AI ecosystem, converting them into paying customers – customers who can pay much more than the average ChatGPT Plus subscriber at $20 a month. 

Smarter meetings with Teams

Copilot in Teams now offers more comprehensive meeting summaries by analyzing both spoken conversations and chat messages. 

As Microsoft puts it, “Now with Copilot in Teams, no question, idea, or contribution is left behind.”

Ending email overload in Outlook

Addressing the perennial challenge of email management, Microsoft introduces “Prioritize my inbox” for Outlook. 

This feature uses AI to identify important emails, provide concise summaries, and explain why certain messages were flagged as priorities. In the future, users will be able to teach Copilot their personal priorities to refine this feature. 

Copilot Agents

Perhaps the most intriguing addition is Copilot agents – customizable AI assistants that can perform specific tasks with varying degrees of autonomy. 

Microsoft is also launching an agent builder, which will make it easier for users to create and deploy AI helpers for different tasks, somewhat similar to Custom GPTs within ChatGPT.

Again, Agents are aimed at business users. In Microsoft’s words, “We’re introducing Copilot agents, making it easier and faster than ever to automate and execute business processes on your behalf—enabling you to scale your team like never before.

Excel evolves with Python integration

Copilot in Excel now includes Python integration, designed to accelerate data analysis. 

Users can perform advanced tasks like forecasting and risk analysis using natural language prompts without needing to write code. 

Microsoft’s AI productivity revolution kicks up a notch

Copilot Wave 2 represents a big jump forward for AI integration into the Microsoft ecosystem. 

However, amidst the deluge of announcements, some critical areas seem to have flown under the radar.

For one, security and privacy concerns are rife since many of these tools will interact with sensitive personal and business data. While Microsoft asserts that Pages has Enterprise Data Protection, details on how this works in practice are scarce.

Other features, such as the ability to analyze files without opening them, may be convenient, but they also mean Copilot has relatively permanent access to sensitive information. This level of AI involvement in the enterprise data ecosystem will breed some level of discomfort.

The same goes for AI integration into Outlook. Not everyone’s ready for an AI system to sift through personal and professional exchanges.

Microsoft must demonstrate that Copilot’s benefits outweigh the privacy concerns while being transparent about how user data is handled.

As Wave 2 rolls out, it’s clear that Microsoft is betting big on AI integration. Google is expected to respond with its own new wave of AI-driven features for Workspace.

Time will only tell how useful these tools are and the extent of business adoption, or whether we need a little longer for people to accept AI’s massive front-, center-, and behind-the-scenes role in our personal and business lives.

The post Microsoft unveils Copilot “Wave 2” to accelerate productivity and content production appeared first on DailyAI.