Warpnet Moderation System

This document describes the current implementation of content moderation in Warpnet.

Overview

Warpnet implements a decentralized content moderation system using dedicated moderator nodes that employ AI models to evaluate content based on a defined moderation policy. The system is designed to help maintain content quality and safety across the network without relying on centralized control.

Architecture

Components

The moderation system consists of several key components:

1. Moderator Nodes

Moderator nodes are specialized nodes in the Warpnet network that run moderation engines. They:

Continuously monitor network peers for content that requires moderation
Retrieve unmoderated content from member nodes
Process content through AI models
Publish moderation results back to the network

2. Moderation Engine

The moderation engine is built using llama.cpp bindings and provides:

LLM-based content analysis
Binary moderation decisions (OK or FAIL)
Reason generation for rejected content
Support for Llama 2 model (referenced as LLAMA2 in code)

Engine configuration:

Context size: 512 tokens
Output tokens: 64 tokens
Temperature: 0.0 (deterministic)
Top P: 0.9
Memory mapping enabled
Low VRAM mode supported

3. Moderation Protocol

The isolation protocol handles communication between moderator nodes and member nodes:

Sends moderation results to content owners
Publishes results to followers via pubsub
Updates tweet metadata with moderation information

Tweet Moderation

Moderator nodes periodically scan connected peers every 10 seconds
For each non-moderator peer:
- Retrieve node information
- Fetch up to 20 tweets per request
- Skip already moderated content
- Process unmoderated tweets sequentially
For each unmoderated tweet:
- Generate a prompt with the tweet text
- Run inference through the LLM engine
- Parse the model response (Yes/No with optional reason)
- Create moderation result
Send moderation result back to:
- The original content owner via stream
- All followers via pubsub

User Description Moderation

User profile moderation is partially implemented:

Designed to moderate username and bio text
Currently in development (lurking function placeholder)
Will follow similar workflow to tweet moderation

Moderation Policy

The current moderation policy prohibits the following content:

CSAM or sexual content involving minors including deepfakes or AI-generated
Non-consensual sex, porn with coercion or abuse
Gore, violence, snuff, dead bodies
Weapon or drug creation or sales
Self-harm, suicide, eating disorders
Sexism against women only, racism, casteism, xenophobia, hate speech
Religious extremism, terrorism incitement
Spam, mass unsolicited promos

Note: The policy text is defined in the prompt template (core/moderation/prompt.go) and reflects the current implementation. The phrasing "Sexism against women only" is taken verbatim from the source code.

The policy is embedded in the prompt template and enforced by the LLM model.

Prompt Engineering

The moderation system uses a structured prompt template:

Instruction: You act as a social network moderator. Does the following content violate moderation policy? [Policy list...] Respond in English only. If yes, answer: 'Yes' and add short reason, max 14 words. If no, answer: 'No' No other answer types accepted. Content: """[content]""" Possible Violations: [context] Answer:

The system expects responses in one of two formats:

"No" - Content is acceptable
"Yes [reason]" - Content violates policy with a brief explanation

The 14-word limit for reasons helps ensure:

Consistent and concise explanations
Reliable model output parsing
Efficient token usage

Limitations and Future Work

Current limitations:

User description moderation not yet active
Image content moderation not implemented
Single model support (Llama 2)
No appeals or review process
Moderation decisions are final

Potential improvements:

Multi-model support for better accuracy
Configurable moderation policies
Reputation system for moderators
User-controlled moderation preferences
Image and video content analysis
Appeals and review mechanism

Security Considerations

The moderation system:

Runs on dedicated nodes separate from user content
Uses deterministic model settings for consistency
Publishes all decisions transparently
Allows users to see moderation metadata
Cannot directly delete content from member nodes
Relies on member nodes to honor moderation results

Performance

Typical moderation performance:

Processes one tweet at a time per peer
10-second intervals between peer scans
Model inference time varies by hardware
Logged for monitoring and optimization

Configuration

Moderator nodes require:

Model path configuration
Thread count for inference
Network configuration (testnet or mainnet)
Sufficient hardware for LLM inference

Network Protocol

Moderation uses standard Warpnet protocols. All communication happens over libp2p streams with protocol multiplexing.

Media Metadata Embedding Implementation

WarpNet implements a media metadata embedding system for images that helps establish accountability and traceability for uploaded content. This system is designed to work in conjunction with content moderation to prevent the spread of harmful content on the decentralized social network.

Important Privacy Note: All uploaded images contain embedded encrypted metadata including user information, node details, and MAC addresses. While encrypted with intentionally weak encryption, this metadata can be decrypted by entities with sufficient computational resources. MAC addresses in particular are persistent hardware identifiers that can track users across different accounts and platforms.

Current Implementation

Metadata Embedding Process

When a user uploads an image to WarpNet, the system automatically embeds encrypted metadata into the image's EXIF (Exchangeable Image File Format) segment. This process occurs transparently during the upload operation.

What Metadata is Embedded

The following information is embedded into each uploaded image:

Node Information
- Node ID and network details
- Information about the node handling the upload
User Information
- User ID of the content creator
- User profile data associated with the upload
MAC Address
- Hardware network interface identifier
- Additional device fingerprinting data
- Privacy Note: MAC addresses are persistent hardware identifiers that can track users across different accounts and platforms

Encryption Mechanism

The metadata embedding uses a deliberate "security through computational difficulty" approach with intentionally weak encryption:

Encryption Algorithm

Algorithm: AES-256-GCM (Galois/Counter Mode)
Key Derivation: Argon2id (when used with password) or time-based weak key generation
Password: Randomly generated weak password that is immediately discarded after encryption
Salt: Public, hardcoded salt ("cec27db4") embedded with the media file
Nonce: Zero-filled nonce (intentionally weak)

Security Model Philosophy

The system is designed with the following principles:

Not Designed for User Decryption: Ordinary users cannot recover the embedded metadata
Designed for Powerful Entity Decryption: Only entities with massive computational resources (e.g., government data centers, law enforcement with supercomputing access) can brute-force decrypt the metadata
Proof of Ownership: The encrypted EXIF metadata acts as proof of ownership and responsibility without revealing sensitive data during normal operation
Computational Difficulty: Security relies entirely on computational difficulty, not on secrecy of the password

The system embeds encrypted metadata (node and user information) into the EXIF segment of media files during upload. A weak password is randomly generated for each file, used for encryption via AES-256-GCM, and immediately discarded. The password is never stored or logged. Decryption is only possible through brute-force attacks, requiring massive computational resources. Ordinary users cannot recover the metadata; only powerful entities (e.g., government data centers) can. EXIF metadata acts as proof of ownership and responsibility without revealing sensitive data. Salt and nonce are public and embedded with the media file. Security relies entirely on computational difficulty, not on secrecy of the password.

How Metadata Embedding Prevents Harmful Content

The metadata embedding system contributes to content safety through several mechanisms:

1. Attribution and Accountability

Every image uploaded to WarpNet carries encrypted evidence of its origin
Node and user information creates a chain of responsibility
MAC address provides additional device-level tracking

2. Deterrence Effect

Users aware of metadata embedding may be deterred from uploading harmful content
Knowledge that content can be traced back to its source acts as a preventive measure

3. Investigation Support

When harmful content is reported, metadata provides investigation leads
Law enforcement or authorized entities can request brute-force decryption
Decrypted metadata reveals the original uploader and their node

4. Distributed Accountability

In a decentralized network, metadata helps identify responsible parties
Prevents anonymity abuse while maintaining privacy for legitimate users

5. Forensic Evidence

Embedded metadata can serve as forensic evidence in legal proceedings
Provides proof of upload time, source node, and user identity
MAC address adds physical device linkage

6. Content Origin Verification

Helps distinguish original uploads from redistributed content
Enables tracking of content spread across the network
Supports identification of primary sources for harmful content

Limitations and Considerations

Privacy Implications

All uploaded images contain embedded encrypted user information
While encrypted, metadata can be decrypted with sufficient resources
Users should be aware that images contain traceable information

Security Limitations

Weak Encryption by Design: The encryption is intentionally weak
Predictable Key Generation: The timestamp-based key generation pattern makes it easier to decrypt multiple images once the pattern is understood
Metadata Removal: Technically sophisticated users could strip EXIF data
Not Foolproof: Determined malicious actors may find ways to circumvent the system

Future Enhancements

Potential improvements to the system could include:

Image Content Analysis: Extend moderation to analyze actual image content (not just text)
Watermarking: Visible or invisible watermarking in addition to EXIF metadata
Enhanced Forensics: Additional metadata like geolocation, camera info, etc.

WarpNet's media metadata embedding system balances privacy with accountability. By embedding encrypted user and node information in uploaded images, the system creates a deterrent against harmful content while maintaining reasonable privacy for legitimate users. Combined with LLM-based content moderation, this approach helps prevent the spread of prohibited content including CSAM, violence, hate speech, and other harmful materials on the decentralized social network.

The intentionally weak encryption ensures that while casual users cannot access the metadata, authorized entities with sufficient resources can decrypt it when investigating serious crimes or policy violations.

Contacts

contact@warpnet.site

Donation

BTC: bc1quwwnec87tukn9j93spr4de7mctvexpftpwu09d

USDT (Tron): THXiCmfr6D4mqAfd4La9EQ5THCx7WsR143

SOL: A3vhW7tnUwa3u3xzfrgyVLphHCrbPqC6XmSmcVjhY191

Wrapped TON: 0xDdFc51Fa8a6c10Bb48c9960DC5A0092D7ECBF355