AI Security & Threats
Research and reports on the misuse of AI and emerging security vulnerabilities.
YouTube
A collection of essential videos covering AI safety, ethical considerations, and the societal impact of artificial intelligence.
Geoffrey Hinton on AI Dangers
A seminal discussion on the existential risks posed by advanced artificial intelligence. (Source: The Diary of a CEO)
Mo Gawdat on AI Dystopia and Utopia
An outline of potential dystopian and utopian futures shaped by AI. (Source: The Diary of a CEO)
Dr. Roman Yampolskiy on AI Safety
An in-depth analysis of superintelligence risks and projected AI safety timelines. (Source: The Diary of a CEO)
What OpenAI Doesn’t Want You to Know
An investigative report on the ethical controversies associated with OpenAI. (Source: More Perfect Union)
The Chinese Room Is a Dishonest Argument
A philosophical examination of the Chinese Room Argument and artificial consciousness. (Source: Curt Jaimungal)
The Dark Side of AI Data Centers
An exposé on the environmental and societal impact of AI data infrastructure. (Source: Business Insider)
Courses
Free educational courses to build expertise in AI alignment and safety, based on 2025 recommendations.
AI Alignment (BlueDot Impact)
A foundational course on core AI safety concepts, evaluation, and structured debate.
AI For Everyone (Coursera)
A non-technical overview of AI capabilities, limitations, and ethical considerations.
Elements of AI (Univ. of Helsinki)
A broad introduction to AI, including key principles of ethics and alignment for a general audience.
Intro to ML Safety
A technical introduction to modern machine learning safety and alignment techniques.
AI Safety Syllabus
A comprehensive curriculum covering the AI alignment problem in depth.
Datasets & Reports
Open resources for AI safety research, including benchmarks and risk assessments from 2025.
2025 AI Index Report (Stanford HAI)
A comprehensive annual report on trends, risks, and progress in artificial intelligence.
2025 AI Safety Index (Future of Life)
An assessment of the safety and transparency efforts of leading AI development companies.
AI Security Institute (UK) Research
Official government research on frontier AI risks and model safety evaluations.
Anthropic's HH-RLHF Datasets
Datasets for training models to be helpful and harmless using reinforcement learning from human feedback.
OpenAI Safety Reports
Official documentation on safety evaluations and red teaming for models like o1.
SafeBench Competition
AI safety benchmarking competition for evaluating risks in advanced AI systems.
Organization
Our affiliations and collaborations within the AI safety community.
AI Safety Berlin
Collaborative community working on AI safety research and advocacy in Berlin.
Labonsky AI Research
Independent AI safety research initiative focusing on defensive security and AI alignment.
Mentors from AI Safety Berlin
Research
Our ongoing research projects exploring AI safety, security implications, and defensive strategies.
AI Weaponization Research: Project Green
1. Introduction
AI models capabilities are advancing rapidly and defensive strategies need to keep pace with AI-augmented offensive capabilities.
2. Hypothesis
Current open-source state of the art LLMs like gpt-oss:20b and gpt-oss:120b can autonomously conduct effective vulnerability assessments using security tools.
3. Background/Literature Review
LLMs have rapidly gained tool-calling capabilities and can now function as autonomous agents coordinating multi-step cyberattacks without detailed human instruction, with Claude Sonnet 4.5's Cybench success rate doubling to nearly 70% in just six months according to Carnegie Mellon University research.
4. Research Questions
- How autonomous can models be with security tools?
- How much financial resources is needed to use models this way?
5. Methodology
Local gpt-oss-120b model deployed on multi-GPU hardware within an isolated LAB server environment. The model interacts with OpenVAS vulnerability scanner through the GVM API, orchestrated via opencode. The setup enables autonomous scan selection, vulnerability assessment execution, result analysis, and operation chaining. Testing occurs in a controlled network with no external connectivity. Performance evaluated by comparing LLM-guided scans against baseline traditional OpenVAS scans, measuring vulnerability detection accuracy, tool usage patterns, and decision quality.
6. Experiment Design
Specific test scenarios in isolated networks, baseline comparisons, and achievements documentation.
Infrastructure
Custom-built AI research infrastructure powering our safety research.
LAB Custom AI Server
High-performance workstation for AI model training and research.
Server Specifications
Motherboard: X870E AORUS PRO
CPU: AMD Ryzen 9 9950X3D (32 cores) @ 5.76 GHz
GPU 1: NVIDIA GeForce RTX 5090 [Discrete]
GPU 2: NVIDIA GeForce RTX 4090 [Discrete]
GPU 3: NVIDIA GeForce RTX 5060 Ti [Discrete]
Memory: 251.30 GiB
Disk: 3.64 TiB - btrfs
Unraid NAS Server
High-capacity storage and data management server for research datasets.
Server Specifications
Motherboard: B650 GAMING X AX V2
CPU: AMD Ryzen 7 9700X (8 cores)
GPU: AMD Radeon Graphics (Integrated)
Network: 2x Intel 82599ES 10-Gigabit Ethernet (SFP+)
Memory: 64 GiB
Storage: 2x Samsung 980 PRO 1TB NVMe, 3x Seagate 16TB HDD (48TB), Samsung 860 512GB SSD
Total Capacity: ~50TB
Software & Services
Essential software stack and cloud services powering our infrastructure.
CachyOS
High-performance Linux distribution optimized for speed and efficiency.
Ollama
Local LLM platform for running and managing large language models.
Gitea
Self-hosted Git service for version control and code collaboration.
Unraid
Flexible NAS operating system for storage, virtualization, and Docker.
OPNsense
Open-source firewall and routing platform for network security.
Tailscale
Zero-config VPN for secure remote access and mesh networking.
Claude
AI assistant by Anthropic for research, analysis, and development support.
Meet the Team
The dedicated team driving AI safety innovation at Labonsky AI Research.
Marcin Labonski
Founder
Directing the organization's research strategy in AI alignment and pioneering novel approaches to risk mitigation.
Meet the Research Assistants
Our highly-valued team members, providing moral support and expert-level napping.
Lia
Chief Morale Officer
Levi
Lead Sleep Analyst
Lutka
Head of Box Fort Architecture
Rengar
Junior Pounce Engineer
Lilith
Feline Language Model