Warpnet Moderation System
This document describes the current implementation of content moderation in Warpnet.
Overview
Warpnet implements a decentralized content moderation system using dedicated moderator nodes that employ AI models to evaluate content based on a defined moderation policy. The system is designed to help maintain content quality and safety across the network without relying on centralized control.
Architecture
Components
The moderation system consists of several key components:
1. Moderator Nodes
Moderator nodes are specialized nodes in the Warpnet network that run moderation engines. They:
Continuously monitor network peers for content that requires moderation
Retrieve unmoderated content from member nodes
Process content through AI models
Publish moderation results back to the network
2. Moderation Engine
The moderation engine is built using llama.cpp bindings and provides:
LLM-based content analysis
Binary moderation decisions (OK or FAIL)
Reason generation for rejected content
Support for Llama 2 model (referenced as LLAMA2 in code)
Engine configuration:
Context size: 512 tokens
Output tokens: 64 tokens
Temperature: 0.0 (deterministic)
Top P: 0.9
Memory mapping enabled
Low VRAM mode supported
3. Moderation Protocol
The isolation protocol handles communication between moderator nodes and member nodes:
Sends moderation results to content owners
Publishes results to followers via pubsub
Updates tweet metadata with moderation information
Tweet Moderation
Moderator nodes periodically scan connected peers every 10 seconds
For each non-moderator peer:
Retrieve node information
Fetch up to 20 tweets per request
Skip already moderated content
Process unmoderated tweets sequentially
For each unmoderated tweet:
Generate a prompt with the tweet text
Run inference through the LLM engine
Parse the model response (Yes/No with optional reason)
Create moderation result
Send moderation result back to:
The original content owner via stream
All followers via pubsub
User Description Moderation
User profile moderation is partially implemented:
Designed to moderate username and bio text
Currently in development (lurking function placeholder)
Will follow similar workflow to tweet moderation
Moderation Policy
The current moderation policy prohibits the following content:
CSAM or sexual content involving minors including deepfakes or AI-generated
Non-consensual sex, porn with coercion or abuse
Gore, violence, snuff, dead bodies
Weapon or drug creation or sales
Self-harm, suicide, eating disorders
Sexism against women only, racism, casteism, xenophobia, hate speech
Religious extremism, terrorism incitement
Spam, mass unsolicited promos
Note: The policy text is defined in the prompt template (core/moderation/prompt.go) and reflects the current implementation. The phrasing "Sexism against women only" is taken verbatim from the source code.
The policy is embedded in the prompt template and enforced by the LLM model.
Prompt Engineering
The moderation system uses a structured prompt template:
Instruction: You act as a social network moderator. Does the following content violate moderation policy? [Policy list...] Respond in English only. If yes, answer: 'Yes' and add short reason, max 14 words. If no, answer: 'No' No other answer types accepted. Content: """[content]""" Possible Violations: [context] Answer:
The system expects responses in one of two formats:
"No" - Content is acceptable
"Yes [reason]" - Content violates policy with a brief explanation
The 14-word limit for reasons helps ensure:
Consistent and concise explanations
Reliable model output parsing
Efficient token usage
Limitations and Future Work
Current limitations:
User description moderation not yet active
Image content moderation not implemented
Single model support (Llama 2)
No appeals or review process
Moderation decisions are final
Potential improvements:
Multi-model support for better accuracy
Configurable moderation policies
Reputation system for moderators
User-controlled moderation preferences
Image and video content analysis
Appeals and review mechanism
Security Considerations
The moderation system:
Runs on dedicated nodes separate from user content
Uses deterministic model settings for consistency
Publishes all decisions transparently
Allows users to see moderation metadata
Cannot directly delete content from member nodes
Relies on member nodes to honor moderation results
Performance
Typical moderation performance:
Processes one tweet at a time per peer
10-second intervals between peer scans
Model inference time varies by hardware
Logged for monitoring and optimization
Configuration
Moderator nodes require:
Model path configuration
Thread count for inference
Network configuration (testnet or mainnet)
Sufficient hardware for LLM inference
Network Protocol
Moderation uses standard Warpnet protocols. All communication happens over libp2p streams with protocol multiplexing.
Media Metadata Embedding Implementation
WarpNet implements a media metadata embedding system for images that helps establish accountability and traceability for uploaded content. This system is designed to work in conjunction with content moderation to prevent the spread of harmful content on the decentralized social network.
Important Privacy Note: All uploaded images contain embedded encrypted metadata including user information, node details, and MAC addresses. While encrypted with intentionally weak encryption, this metadata can be decrypted by entities with sufficient computational resources. MAC addresses in particular are persistent hardware identifiers that can track users across different accounts and platforms.
Current Implementation
Metadata Embedding Process
When a user uploads an image to WarpNet, the system automatically embeds encrypted metadata into the image's EXIF (Exchangeable Image File Format) segment. This process occurs transparently during the upload operation.
What Metadata is Embedded
The following information is embedded into each uploaded image:
Node Information
Node ID and network details
Information about the node handling the upload
User Information
User ID of the content creator
User profile data associated with the upload
MAC Address
Hardware network interface identifier
Additional device fingerprinting data
Privacy Note: MAC addresses are persistent hardware identifiers that can track users across different accounts and platforms
Encryption Mechanism
The metadata embedding uses a deliberate "security through computational difficulty" approach with intentionally weak encryption:
Encryption Algorithm
Algorithm: AES-256-GCM (Galois/Counter Mode)
Key Derivation: Argon2id (when used with password) or time-based weak key generation
Password: Randomly generated weak password that is immediately discarded after encryption
Salt: Public, hardcoded salt ("cec27db4") embedded with the media file
Nonce: Zero-filled nonce (intentionally weak)
Security Model Philosophy
The system is designed with the following principles:
Not Designed for User Decryption: Ordinary users cannot recover the embedded metadata
Designed for Powerful Entity Decryption: Only entities with massive computational resources (e.g., government data centers, law enforcement with supercomputing access) can brute-force decrypt the metadata
Proof of Ownership: The encrypted EXIF metadata acts as proof of ownership and responsibility without revealing sensitive data during normal operation
Computational Difficulty: Security relies entirely on computational difficulty, not on secrecy of the password
The system embeds encrypted metadata (node and user information) into the EXIF segment of media files during upload. A weak password is randomly generated for each file, used for encryption via AES-256-GCM, and immediately discarded. The password is never stored or logged. Decryption is only possible through brute-force attacks, requiring massive computational resources. Ordinary users cannot recover the metadata; only powerful entities (e.g., government data centers) can. EXIF metadata acts as proof of ownership and responsibility without revealing sensitive data. Salt and nonce are public and embedded with the media file. Security relies entirely on computational difficulty, not on secrecy of the password.
How Metadata Embedding Prevents Harmful Content
The metadata embedding system contributes to content safety through several mechanisms:
1. Attribution and Accountability
Every image uploaded to WarpNet carries encrypted evidence of its origin
Node and user information creates a chain of responsibility
MAC address provides additional device-level tracking
2. Deterrence Effect
Users aware of metadata embedding may be deterred from uploading harmful content
Knowledge that content can be traced back to its source acts as a preventive measure
3. Investigation Support
When harmful content is reported, metadata provides investigation leads
Law enforcement or authorized entities can request brute-force decryption
Decrypted metadata reveals the original uploader and their node
4. Distributed Accountability
In a decentralized network, metadata helps identify responsible parties
Prevents anonymity abuse while maintaining privacy for legitimate users
5. Forensic Evidence
Embedded metadata can serve as forensic evidence in legal proceedings
Provides proof of upload time, source node, and user identity
MAC address adds physical device linkage
6. Content Origin Verification
Helps distinguish original uploads from redistributed content
Enables tracking of content spread across the network
Supports identification of primary sources for harmful content
Limitations and Considerations
Privacy Implications
All uploaded images contain embedded encrypted user information
While encrypted, metadata can be decrypted with sufficient resources
Users should be aware that images contain traceable information
Security Limitations
Weak Encryption by Design: The encryption is intentionally weak
Predictable Key Generation: The timestamp-based key generation pattern makes it easier to decrypt multiple images once the pattern is understood
Metadata Removal: Technically sophisticated users could strip EXIF data
Not Foolproof: Determined malicious actors may find ways to circumvent the system
Future Enhancements
Potential improvements to the system could include:
Image Content Analysis: Extend moderation to analyze actual image content (not just text)
Watermarking: Visible or invisible watermarking in addition to EXIF metadata
Enhanced Forensics: Additional metadata like geolocation, camera info, etc.
WarpNet's media metadata embedding system balances privacy with accountability. By embedding encrypted user and node information in uploaded images, the system creates a deterrent against harmful content while maintaining reasonable privacy for legitimate users. Combined with LLM-based content moderation, this approach helps prevent the spread of prohibited content including CSAM, violence, hate speech, and other harmful materials on the decentralized social network.
The intentionally weak encryption ensures that while casual users cannot access the metadata, authorized entities with sufficient resources can decrypt it when investigating serious crimes or policy violations.
Contacts
© 2025. All rights reserved. Legal information.
Donation
BTC: bc1quwwnec87tukn9j93spr4de7mctvexpftpwu09d
USDT (Tron): THXiCmfr6D4mqAfd4La9EQ5THCx7WsR143
SOL: A3vhW7tnUwa3u3xzfrgyVLphHCrbPqC6XmSmcVjhY191
Wrapped TON: 0xDdFc51Fa8a6c10Bb48c9960DC5A0092D7ECBF355
