Warpnet Moderation System

This document describes the current implementation of content moderation in Warpnet.

Overview

Warpnet implements a decentralized content moderation system using dedicated moderator nodes that employ AI models to evaluate content based on a defined moderation policy. The system is designed to help maintain content quality and safety across the network without relying on centralized control.

Architecture

Components

The moderation system consists of several key components:

1. Moderator Nodes

Moderator nodes are specialized nodes in the Warpnet network that run moderation engines. They:

  • Continuously monitor network peers for content that requires moderation

  • Retrieve unmoderated content from member nodes

  • Process content through AI models

  • Publish moderation results back to the network

2. Moderation Engine

The moderation engine is built using llama.cpp bindings and provides:

  • LLM-based content analysis

  • Binary moderation decisions (OK or FAIL)

  • Reason generation for rejected content

  • Support for Llama 2 model (referenced as LLAMA2 in code)

Engine configuration:

  • Context size: 512 tokens

  • Output tokens: 64 tokens

  • Temperature: 0.0 (deterministic)

  • Top P: 0.9

  • Memory mapping enabled

  • Low VRAM mode supported

3. Moderation Protocol

The isolation protocol handles communication between moderator nodes and member nodes:

  • Sends moderation results to content owners

  • Publishes results to followers via pubsub

  • Updates tweet metadata with moderation information

Tweet Moderation

  1. Moderator nodes periodically scan connected peers every 10 seconds

  2. For each non-moderator peer:

    • Retrieve node information

    • Fetch up to 20 tweets per request

    • Skip already moderated content

    • Process unmoderated tweets sequentially

  3. For each unmoderated tweet:

    • Generate a prompt with the tweet text

    • Run inference through the LLM engine

    • Parse the model response (Yes/No with optional reason)

    • Create moderation result

  4. Send moderation result back to:

    • The original content owner via stream

    • All followers via pubsub

User Description Moderation

User profile moderation is partially implemented:

  • Designed to moderate username and bio text

  • Currently in development (lurking function placeholder)

  • Will follow similar workflow to tweet moderation

Moderation Policy

The current moderation policy prohibits the following content:

  • CSAM or sexual content involving minors including deepfakes or AI-generated

  • Non-consensual sex, porn with coercion or abuse

  • Gore, violence, snuff, dead bodies

  • Weapon or drug creation or sales

  • Self-harm, suicide, eating disorders

  • Sexism against women only, racism, casteism, xenophobia, hate speech

  • Religious extremism, terrorism incitement

  • Spam, mass unsolicited promos

Note: The policy text is defined in the prompt template (core/moderation/prompt.go) and reflects the current implementation. The phrasing "Sexism against women only" is taken verbatim from the source code.

The policy is embedded in the prompt template and enforced by the LLM model.

Prompt Engineering

The moderation system uses a structured prompt template:

Instruction: You act as a social network moderator. Does the following content violate moderation policy? [Policy list...] Respond in English only. If yes, answer: 'Yes' and add short reason, max 14 words. If no, answer: 'No' No other answer types accepted. Content: """[content]""" Possible Violations: [context] Answer:

The system expects responses in one of two formats:

  • "No" - Content is acceptable

  • "Yes [reason]" - Content violates policy with a brief explanation

The 14-word limit for reasons helps ensure:

  • Consistent and concise explanations

  • Reliable model output parsing

  • Efficient token usage

Limitations and Future Work

Current limitations:

  • User description moderation not yet active

  • Image content moderation not implemented

  • Single model support (Llama 2)

  • No appeals or review process

  • Moderation decisions are final

Potential improvements:

  • Multi-model support for better accuracy

  • Configurable moderation policies

  • Reputation system for moderators

  • User-controlled moderation preferences

  • Image and video content analysis

  • Appeals and review mechanism

Security Considerations

The moderation system:

  • Runs on dedicated nodes separate from user content

  • Uses deterministic model settings for consistency

  • Publishes all decisions transparently

  • Allows users to see moderation metadata

  • Cannot directly delete content from member nodes

  • Relies on member nodes to honor moderation results

Performance

Typical moderation performance:

  • Processes one tweet at a time per peer

  • 10-second intervals between peer scans

  • Model inference time varies by hardware

  • Logged for monitoring and optimization

Configuration

Moderator nodes require:

  • Model path configuration

  • Thread count for inference

  • Network configuration (testnet or mainnet)

  • Sufficient hardware for LLM inference

Network Protocol

Moderation uses standard Warpnet protocols. All communication happens over libp2p streams with protocol multiplexing.

Media Metadata Embedding Implementation

WarpNet implements a media metadata embedding system for images that helps establish accountability and traceability for uploaded content. This system is designed to work in conjunction with content moderation to prevent the spread of harmful content on the decentralized social network.

Important Privacy Note: All uploaded images contain embedded encrypted metadata including user information, node details, and MAC addresses. While encrypted with intentionally weak encryption, this metadata can be decrypted by entities with sufficient computational resources. MAC addresses in particular are persistent hardware identifiers that can track users across different accounts and platforms.

Current Implementation

Metadata Embedding Process

When a user uploads an image to WarpNet, the system automatically embeds encrypted metadata into the image's EXIF (Exchangeable Image File Format) segment. This process occurs transparently during the upload operation.

What Metadata is Embedded

The following information is embedded into each uploaded image:

  1. Node Information

    • Node ID and network details

    • Information about the node handling the upload

  2. User Information

    • User ID of the content creator

    • User profile data associated with the upload

  3. MAC Address

    • Hardware network interface identifier

    • Additional device fingerprinting data

    • Privacy Note: MAC addresses are persistent hardware identifiers that can track users across different accounts and platforms

Encryption Mechanism

The metadata embedding uses a deliberate "security through computational difficulty" approach with intentionally weak encryption:

Encryption Algorithm

  • Algorithm: AES-256-GCM (Galois/Counter Mode)

  • Key Derivation: Argon2id (when used with password) or time-based weak key generation

  • Password: Randomly generated weak password that is immediately discarded after encryption

  • Salt: Public, hardcoded salt ("cec27db4") embedded with the media file

  • Nonce: Zero-filled nonce (intentionally weak)

Security Model Philosophy

The system is designed with the following principles:

  1. Not Designed for User Decryption: Ordinary users cannot recover the embedded metadata

  2. Designed for Powerful Entity Decryption: Only entities with massive computational resources (e.g., government data centers, law enforcement with supercomputing access) can brute-force decrypt the metadata

  3. Proof of Ownership: The encrypted EXIF metadata acts as proof of ownership and responsibility without revealing sensitive data during normal operation

  4. Computational Difficulty: Security relies entirely on computational difficulty, not on secrecy of the password

The system embeds encrypted metadata (node and user information) into the EXIF segment of media files during upload. A weak password is randomly generated for each file, used for encryption via AES-256-GCM, and immediately discarded. The password is never stored or logged. Decryption is only possible through brute-force attacks, requiring massive computational resources. Ordinary users cannot recover the metadata; only powerful entities (e.g., government data centers) can. EXIF metadata acts as proof of ownership and responsibility without revealing sensitive data. Salt and nonce are public and embedded with the media file. Security relies entirely on computational difficulty, not on secrecy of the password.

How Metadata Embedding Prevents Harmful Content

The metadata embedding system contributes to content safety through several mechanisms:

1. Attribution and Accountability

  • Every image uploaded to WarpNet carries encrypted evidence of its origin

  • Node and user information creates a chain of responsibility

  • MAC address provides additional device-level tracking

2. Deterrence Effect

  • Users aware of metadata embedding may be deterred from uploading harmful content

  • Knowledge that content can be traced back to its source acts as a preventive measure

3. Investigation Support

  • When harmful content is reported, metadata provides investigation leads

  • Law enforcement or authorized entities can request brute-force decryption

  • Decrypted metadata reveals the original uploader and their node

4. Distributed Accountability

  • In a decentralized network, metadata helps identify responsible parties

  • Prevents anonymity abuse while maintaining privacy for legitimate users

5. Forensic Evidence

  • Embedded metadata can serve as forensic evidence in legal proceedings

  • Provides proof of upload time, source node, and user identity

  • MAC address adds physical device linkage

6. Content Origin Verification

  • Helps distinguish original uploads from redistributed content

  • Enables tracking of content spread across the network

  • Supports identification of primary sources for harmful content

Limitations and Considerations

Privacy Implications

  • All uploaded images contain embedded encrypted user information

  • While encrypted, metadata can be decrypted with sufficient resources

  • Users should be aware that images contain traceable information

Security Limitations

  • Weak Encryption by Design: The encryption is intentionally weak

  • Predictable Key Generation: The timestamp-based key generation pattern makes it easier to decrypt multiple images once the pattern is understood

  • Metadata Removal: Technically sophisticated users could strip EXIF data

  • Not Foolproof: Determined malicious actors may find ways to circumvent the system

Future Enhancements

Potential improvements to the system could include:

  1. Image Content Analysis: Extend moderation to analyze actual image content (not just text)

  2. Watermarking: Visible or invisible watermarking in addition to EXIF metadata

  3. Enhanced Forensics: Additional metadata like geolocation, camera info, etc.

WarpNet's media metadata embedding system balances privacy with accountability. By embedding encrypted user and node information in uploaded images, the system creates a deterrent against harmful content while maintaining reasonable privacy for legitimate users. Combined with LLM-based content moderation, this approach helps prevent the spread of prohibited content including CSAM, violence, hate speech, and other harmful materials on the decentralized social network.

The intentionally weak encryption ensures that while casual users cannot access the metadata, authorized entities with sufficient resources can decrypt it when investigating serious crimes or policy violations.