An investigation by cybersecurity startup Lasso Security reveals that more than 1,500 HuggingFace API tokens are exposed, including those from Meta.
A recent investigation into HuggingFace, a major platform for developers, has revealed that more than 1,500 API tokens are exposed. According to Lasso Security, a start-up specialising in cybersecurity for language models and other generative AI models, this leaves millions of Meta Llama, Bloom and Pythia users vulnerable to potential attacks.
HuggingFace is an important resource for developers working on AI projects such as language models. The platform offers an extensive library of AI models and datasets, including Meta’s widely used Llama models.
The HuggingFace API allows developers and organisations to integrate models and read, create, modify, and delete repositories or files within them using API tokens.
Lasso Security gains full access to Meta repositories
The team searched GitHub and HuggingFace repositories for exposed API tokens using their search functions. According to best practices such as OpenAI, API tokens should not be stored directly in code for this very reason.
The Lasso Security team found 1,681 tokens in their search and were able to uncover accounts from major organizations including Meta, Microsoft, Google, and VMware. The data also gave the team full access to the widely used Meta Llama, Bloom, Pythia, and HuggingFace repositories. Exposing such a large number of API tokens poses significant risks to organizations and their users, the team said.
Lasso lists some key dangers associated with exposed API tokens:
1. Supply chain vulnerabilities: If potential attackers gained full access to accounts such as Meta Llama2, BigScience Workshop and EleutherAI, they could manipulate existing models and potentially turn them into malicious entities, the team says. This could affect millions of users who rely on these basic models for their applications.
2. Training data poisoning: With write access to 14 datasets with tens and hundreds of thousands of downloads per month, attackers could manipulate trusted datasets, compromising the integrity of AI models based on them, with far-reaching consequences.
3. Model theft: The team claims to have used the method to gain access to more than ten thousand private AI models and more than 2,500 datasets, which could lead to potential economic losses, impaired competitive advantage, and potential access to sensitive information.
Team provides security tips to users and HuggingFace
To address these vulnerabilities, developers are advised not to use hard-coded tokens and to follow best practices. HuggingFace should also continuously scan for publicly exposed API tokens and either revoke them or notify users and organizations of the exposed tokens.
Organizations should also consider token classification and implement security solutions that inspect IDEs and code reviews specifically designed to protect their investment in LLM. By addressing these issues now, organizations could strengthen their defenses and avert the threats posed by these vulnerabilities, Lasso Security said.