Rapid Abstract Syntax Based Predictor (RASP)

A hybrid language model that combines the efficiency of ROSA+ with syntactic understanding through Conditional Random Fields (CRF) and spaCy dependency parsing. RASP generates text with enhanced grammatical coherence and structural awareness.

🧠 System Architecture

RASP integrates three key components:

ROSA+ Base Model: A lightweight, efficient n-gram based language model
Syntactic Feature Extractor: spaCy-based dependency parsing for grammatical structure
CRF Layer: Conditional Random Fields for sequence labeling with syntactic constraints

Input Text → spaCy Syntactic Parser → Feature Extraction → CRF Layer → ROSA+ Base Model → Output Text

🔧 How It Works

1. Syntactic Feature Extraction

The system first processes text using spaCy's dependency parser to extract:

Token-level features: Part-of-speech tags, dependency relations, morphological features
Structural features: Dependency tree structure, ancestor relationships, children dependencies
Contextual features: Subtree spans, noun chunks, sentence root information

class SpacySyntacticFeatureExtractor:
    def extract_token_features(self, text: str) -> List[Dict[str, any]]:
        # Extracts per-token syntactic features including dependency tree structure
        # Returns features like pos, dep, head_pos, is_root, n_lefts, n_rights, etc.

2. CRF-Enhanced Sequence Labeling

The extracted features are fed into a CRF layer that:

Models sequential dependencies between tokens
Incorporates syntactic constraints into predictions
Learns transition patterns between grammatical structures

class ROSAPlusCRF:
    def _token_to_crf_features(self, tokens: List[Dict], idx: int) -> Dict[str, any]:
        # Converts token features to CRF feature format with context window
        # Includes features from previous/next tokens and dependency relationships

3. Neural CRF with BiGRU

For deeper syntactic understanding, RASP includes a neural CRF implementation:

Bidirectional GRU: Captures contextual information from both directions
Viterbi Decoding: Finds the most likely sequence of tags
Transition Parameters: Learns syntactic transition patterns

class BiGRUCRF(nn.Module):
    def viterbi_decode(self, emissions: torch.Tensor) -> List[int]:
        # Viterbi decoding for best path through syntactic states

4. Hybrid Generation Process

During text generation, RASP:

Gets base distribution from ROSA+ model
Extracts syntactic features from current context
Applies CRF constraints to refine predictions
Samples next character with syntactic awareness
Updates context and repeats

def generate_with_syntax_constraints(self, prompt: str, max_tokens: int = 200):
    # Generates text with syntactic coherence constraints
    # Balances statistical predictions with grammatical validity

🚀 Running the Notebook

To use RASP, simply run all cells of the ast_crf_rosa.ipynb notebook in order:

The first cell installs all required dependencies including spaCy, sklearn-crfsuite, torch, and numpy
The second cell imports necessary libraries and modules
Subsequent cells define the core components of the RASP system
The final cells demonstrate training and usage of the complete hybrid system

The notebook contains a complete implementation with examples that will:

Initialize the RASP model
Train it on sample text data
Generate text with syntactic constraints
Analyze the syntactic features of generated text

🎯 Key Features

Syntactic Coherence: Generates text with proper grammatical structure
Efficient Inference: Combines lightweight ROSA+ with targeted syntactic processing
Deep Understanding: Neural CRF captures complex syntactic patterns
Flexible Integration: Can be combined with other language models
Interpretable Output: Provides detailed syntactic analysis of generated text

🤝 Acknowledgments

Special thanks to:

ROSA+ for the efficient base language model
RWKV-LM for architectural inspiration

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
ast_crf_rosa.ipynb		ast_crf_rosa.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Rapid Abstract Syntax Based Predictor (RASP)

🧠 System Architecture

🔧 How It Works

1. Syntactic Feature Extraction

2. CRF-Enhanced Sequence Labeling

3. Neural CRF with BiGRU

4. Hybrid Generation Process

🚀 Running the Notebook

🎯 Key Features

🤝 Acknowledgments

🌟 Star History

About

Uh oh!

Releases

Packages

Languages

x-0D/RASP

Folders and files

Latest commit

History

Repository files navigation

Rapid Abstract Syntax Based Predictor (RASP)

🧠 System Architecture

🔧 How It Works

1. Syntactic Feature Extraction

2. CRF-Enhanced Sequence Labeling

3. Neural CRF with BiGRU

4. Hybrid Generation Process

🚀 Running the Notebook

🎯 Key Features

🤝 Acknowledgments

🌟 Star History

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages