A sophisticated C implementation of a syntax highlighting engine that converts C source code files into beautifully formatted HTML with VS Code Dark+ theme styling.
Source-to-HTML is a lexical analyzer and HTML generator that parses C source code and produces syntax-highlighted HTML output. Using a state-machine based parser, it accurately identifies keywords, comments, preprocessor directives, strings, and other language elements, applying appropriate styling to each component.
- Intelligent Parsing: State-machine based lexical analyzer for accurate tokenization
- VS Code Dark+ Theme: Professional color scheme matching VS Code's popular theme
- Line Numbering: Optional VS Code-style gutter with line numbers (NEW!)
- Comprehensive Token Recognition:
- Keywords (data types and control flow)
- Preprocessor directives
- Single and multi-line comments
- String literals with format specifiers
- Numeric constants
- Character literals
- Operators and symbols
- Header file includes (system and user-defined)
- Format Specifier Detection: Highlights printf/scanf format codes within strings
- External CSS Support: Clean separation of styling and structure
- Preserves Formatting: Maintains original code indentation and line breaks
- Algorithm: Finite State Machine (FSM) with state transition logic
- Language: ANSI C
- Output Format: HTML5 with CSS styling
- Platform: Cross-platform (Linux, macOS, Windows)
The project uses a three-layer architecture:
- Parser Engine (
s2html_event.c) - Tokenizes source code using state machines - HTML Converter (
s2html_conv.c) - Transforms tokens into HTML spans - Main Driver (
s2html_main.c) - Orchestrates the conversion process
The parser implements multiple states:
IDLE- Default state, identifies token beginningsPREPROCESSOR_DIRECTIVE- Handles #include, #define, etc.RESERVE_KEYWORD- Processes language keywordsNUMERIC_CONSTANT- Captures numbersSTRING- Handles string literalsSINGLE_LINE_COMMENT- Processes // commentsMULTI_LINE_COMMENT- Handles /* */ commentsASCII_CHAR- Processes character literalsOPERATORS- Identifies operators (+, -, *, etc.)SYMBOLS- Handles delimiters ({, }, (, ), etc.)
- GCC compiler (or any C99-compatible compiler)
- Basic command line knowledge
- C source files for conversion
- Clone the repository:
git clone https://github.com/SLADE0261/Source-to-HTML.git
cd Source-to-HTML- Compile the project:
gcc s2html_main.c s2html_event.c s2html_conv.c -o syntaxglowOr use a Makefile:
make./syntaxglow <input_file.c> [output_file] [-n]-n: Enable line numbering (VS Code-style gutter)
# Convert test.c to test.c.html (default output)
./syntaxglow test.c
# Convert with custom output name
./syntaxglow test.c highlighted_code
# Convert with line numbers
./syntaxglow test.c -n
# Convert with custom output name and line numbers
./syntaxglow test.c highlighted_code -n
# Line numbers can be specified anywhere after filename
./syntaxglow test.c -n outputThe program generates:
- An HTML file with embedded syntax highlighting
- Automatic linking to
styles.css(must be in same directory) - Properly formatted code within
<pre>tags - Optional line numbers in VS Code-style gutter (with
-nflag)
Source-to-HTML/
├── s2html_main.c # Main program entry point
├── s2html_event.c # Lexical analyzer/parser engine
├── s2html_event.h # Parser function declarations
├── s2html_conv.c # HTML conversion functions
├── s2html_conv.h # Converter function declarations
├── styles.css # VS Code Dark+ theme styling
├── test # Sample C source file
├── test.html # Sample output
└── README.md # This file
The default VS Code Dark+ theme includes:
| Element | Color | CSS Class |
|---|---|---|
| Background | #1e1e1e |
body |
| Default Text | #d4d4d4 |
body |
| Line Numbers | #858585 (Gray) |
.line-number |
| Keywords (int, char) | #569cd6 (Blue) |
.reserved_key1 |
| Control Flow (if, else) | #4ec9b0 (Teal) |
.reserved_key2 |
| Comments | #6a9955 (Green) |
.comment |
| Preprocessor | #c586c0 (Purple) |
.preprocess_dir |
| Strings | #ce9178 (Orange) |
.string |
| Numbers | #b5cea8 (Light Green) |
.numeric_constant |
| Characters | #d7ba7d (Tan) |
.ascii_char |
| Format Specifiers | #dcdcaa (Yellow) |
.format_specifier |
| Operators/Symbols | #d4d4d4 (Gray) |
.operators, .symbols |
# 1. Create or have a C source file
cat > hello.c << EOF
#include <stdio.h>
int main() {
printf("Hello, World!\n");
return 0;
}
EOF
# 2. Convert to HTML with line numbers
./syntaxglow hello.c -n
# 3. Ensure styles.css is in the same directory as output
# 4. Open in browser
firefox hello.c.html
# or
open hello.c.htmlWhen using the -n option, the output includes:
- VS Code-style line numbers in the left gutter
- Right-aligned numbers with subtle border separator
- Non-selectable line numbers (won't be copied when selecting code)
- Proper alignment with multi-line tokens (comments, strings, etc.)
Example output with line numbers:
1 #include <stdio.h>
2
3 int main() {
4 printf("Hello, World!\n");
5 return 0;
6 }
Edit styles.css to customize the color scheme:
/* Change keyword color to red */
.reserved_key1 {
color: #ff0000;
}
/* Change background to lighter theme */
body {
background-color: #ffffff;
color: #000000;
}
/* Customize line number appearance */
.line-number {
color: #858585;
border-right: 1px solid #3e3e3e;
}- Define token type in
s2html_event.h:
typedef enum {
// ... existing tokens
TOKEN_NEW_TYPE
} token_type_t;- Add state handler in
s2html_event.c - Add conversion case in
s2html_conv.c - Define CSS class in
styles.css
// Input C code:
#include <stdio.h>
#define MAX 100
int main() {
char name = 'A';
printf("Value: %d\n", 42);
/* Multi-line
comment */
return 0;
}
// Generated HTML structure (without line numbers):
<span class="preprocess_dir">#include </span>
<span class="header_file"><stdio.h></span>
<span class="reserved_key1">int</span> main
<span class="symbols">()</span>
<span class="ascii_char">'A'</span>
<span class="format_specifier">%d</span>
<span class="numeric_constant">42</span>
<span class="comment">/* Multi-line comment */</span>
// With line numbers (-n flag):
<span class="line-number"> 1</span> <span class="preprocess_dir">#include </span>
<span class="line-number"> 2</span> <span class="header_file"><stdio.h></span>- Language Support: Currently only C language
- C Source Only: Designed for C source files
- Comment Handling: Nested comments not supported (per C standard)
- Preprocessor: Basic support, complex macros may not highlight perfectly
- CSS Dependency: Requires
styles.cssin output directory
- Escape Sequences: May not perfectly handle all escape sequences in strings
- Trigraphs: Not supported (rarely used in modern C)
- Wide Characters: Limited support for wide string/char literals
| Error | Cause | Solution |
|---|---|---|
| File could not be opened | File not found or no permissions | Check file path and permissions |
| Could not create output file | Write permission denied | Check output directory permissions |
| Unknown parser state | Internal parser error | Report bug with input file |
| Unknown token type | Unrecognized token | Report bug with input file |
- Speed: Processes ~10,000 lines per second on modern hardware
- Memory: Linear memory usage, ~1KB per 100 tokens
- File Size: No practical limit, tested up to 50MB source files
- Line Numbering: Minimal performance impact (<5% overhead)
Contributions are welcome! Areas for improvement:
- Line numbering option (COMPLETED!)
- Support for C++ syntax
- Command-line theme selection
- Dark/light theme toggle in HTML
- Support for other languages (Python, JavaScript, etc.)
- JSON/XML output formats
- Syntax error detection
- Code folding markers
- Copy code button
- Export to PDF
# Clone with tags
git clone https://github.com/SLADE0261/Source-to-HTML.git
cd Source-to-HTML
# Enable debug mode
gcc -DDEBUG s2html_main.c s2html_event.c s2html_conv.c -o syntaxglow_debug
# Run tests
./run_tests.shThis project is open source and available for educational purposes.
Krishnakant C. Pore
- Inspired by the need for offline code presentation
- VS Code Dark+ theme color scheme
- Finite State Machine design pattern
- C programming community
GitHub: https://github.com/SLADE0261/Source-to-HTML
Q: Can I use this for languages other than C? A: Currently no, but the architecture is extensible. You can modify the keyword tables and state handlers.
Q: Does it support syntax checking? A: No, it's purely a lexical analyzer for highlighting, not a compiler.
Q: Can I embed the output in my website?
A: Yes! Just copy the <pre> section and ensure styles.css is linked.
Q: How do I change the theme?
A: Edit styles.css with your preferred colors.
Q: Does it work with C++ code? A: Partially. Basic C++ will work, but C++-specific features may not highlight correctly.
Q: Can I disable line numbers in the HTML after generation?
A: Yes, simply remove or hide the .line-number elements via CSS: .line-number { display: none; }
Q: Do line numbers affect copy-paste of code?
A: No, line numbers use user-select: none CSS property, so they won't be copied when you select and copy code.
Q: Can I customize line number formatting?
A: Yes! Edit the .line-number class in styles.css to change width, color, padding, or border style.
⭐ If you find Source-to-HTML useful, please consider giving it a star! ⭐