Home
Technology
Researchers Trained an AI Called ‘DarkBERT’ With the Dark Web to Fight Cybercrime

Researchers Trained an AI Called ‘DarkBERT’ With the Dark Web to Fight Cybercrime

Social Media Expert

Last updated October 13, 2024

DarkBERT
Researchers from South Korea have made the extremely rare decision to create and train artificial intelligence (AI) using the dark web for data with the aim of using it to shed light on how to prevent cybercrime.

DarkBERT – The New Al Model

The internet has a portion known as the “Dark Web” which is hidden and inaccessible through regular web browsers because the links to these pages are yet to be indexed by the search engine.

Since this area of the internet is untracked, it is well-known for its anonymous websites and is mostly used to host markets that enable illegal operations like the trade in drugs and weapons, the sale of stolen data, and serving as a shelter for hackers to facilitate cybercrime.

Researchers from the Korea Advanced Institute of Science and Technology (KAIST) in conjunction with the data intelligence group, S2W, have released DarkBERT, a generative AI language model that has been trained only on datasets derived from the dark web.

DarkBERT was then set loose to scour and index anything it could uncover on the dark web in order to inform how to better deal with cybercrime in this part of the internet.

While it is yet to be peer-reviewed, the researchers published a paper titled “DarkBERT: A language model for the dark side of the Internet,” which described in detail the development and experiment process for this Large Language Model(LLM).

DarkBERT: A Language Model for the Dark Side of the Internethttps://t.co/OqEm1QTPsV
Recent research has suggested that there are clear differences in the language used in the Dark Web compared to that of the Surface…
pic.twitter.com/1X6HdiyRzR
— Daily AI Papers (@papers_daily) May 18, 2023

To create a dataset for the model, the research team compiled a sizable database by crawling the Tor network, which is specialized software used to access the dark web, in order to optimize how DarkBERT adjusts to the language used on the dark web.

The database then underwent deduplication, data filtering, and pre-processing in an effort to ease ethical concerns about the dark web’s sensitive information-filled content. This removed organizations’ names, information about data leaks, threat comments, and illicit photos.

While DarkBERT is a new artificial intelligence model, it was built on the RoBERTa architecture, an approach for AI that Facebook researchers came up with in 2019.

RoBERTa is an improvement over Google’s BERT (Bidirectional Encoder Representations from Transformers), and Facebook’s researchers enhanced its performance after it was released as open source. A research paper that explains how RoBERTa works describes it as a “robustly optimized method for pretraining natural language processing (NLP) systems.”

Popular AI YouTuber Matthew Berman dove into the paper in more depth here:

AI To Fight Against Cybercrime

In the research paper for DarkBERT, the team discovered that their Large Language Model was much better at understanding the dark web compared to other models trained for similar tasks, like RoBERTa, which was designed to “predict intentionally hidden parts of text within otherwise unmarked language samples.”

The researchers said:

“Our evaluation results show that DarkBERT-based classification model outperforms that of known pre-trained language models.”

They also said that DarkBERT could potentially be used to aid in cybersecurity tasks such as identifying websites that sell or publish private, confidential data of organizations leaked by ransomware groups.

It could additionally be used to scour through the many forums on the dark web which are updated daily and watch out for any exchange of illegal information.

DarkBert won’t be accessible to the general public for a while due to the possibly dangerous nature of dark web content. Requests for the usage of the AI model for scholarly endeavors are now permissible, nonetheless.

That doesn’t mean DarkBERT is complete since like with other LLMs, additional training and fine-tuning might still enhance its performance. What can be learned from it and how it will be applied are still unknown.

What's the Best Crypto to Buy Now?

B2C Listed the Top Rated Cryptocurrencies for 2023
Get Early Access to Presales & Private Sales
KYC Verified & Audited, Public Teams
Most Voted for Tokens on CoinSniper
Upcoming Listings on Exchanges, NFT Drops

See the 15+ Coins

Social Media Expert

Nancy is a seasoned expert writer specializing in social media marketing and strategic online engagement within the realms of business, finance, and emerging tech sectors such as cryptocurrency and blockchain Nancy has four years experience writing about these fields and was previously a content writer for Kraken and a co-founder of Nairobi, Kenya based site KryptoTrends. Nancy’s latest work has been published on various websites besides Business2Community including Vauld Insights, Coingape, Forexcrunch, InsideBitcoins and Economywatch. Her personal interests lie in crypto asset research and technical analysis, DeFi, NFTs and on-chain data analysis. Nancy also posts many market insights and crypto price predictions on her Twitter profile @NancyOmanga. Nancy graduated with an MSc in Environmental Engineering and Sustainability from the University of Surrey, and was a member of the Surrey Green Society. She also holds a professional certification in Data Processing awarded by 365 Data Science, and several Blockchain Council certificates.

Show more

View all posts by Nancy Lubale

Latest News

More

3 Men Die Following Google Maps’ Instructions: Is Google Liable?

Three people died in an accident in the Indian state…

November 29, 2024

Business News

Appeals Court Overturns OFAC’s Prohibitions on Tornado Cash

An appeals court in the United States has overturned the…

Alejandro Arrieche

November 28, 2024

Business News

FTC Finally Targets Tech Support Scams With New Rule: Will it Help?

The Federal Trade Commission (FTC) approved an important amendment to…

Alejandro Arrieche

November 28, 2024

Business News

Artists Testing OpenAI’s Sora AI Tool Leaked It Online: OpenAI Can’t Catch a Break

November 27, 2024

Business News

Here Are the Top Contenders For Trump’s SEC Chair: Will Crypto Win Out?

Alejandro Arrieche

November 27, 2024

Crypto News

Google Faces New £7 Billion Class Action Lawsuit in the UK: Is $GOOG Doomed?

Alejandro Arrieche

November 27, 2024

Business News

McDonald’s Adds New $5 Meal Deal as Inflation Persists: Will It Be Enough?

November 27, 2024

Business News