Based on research of industry-leading code search tools (GitHub Linguist, Sourcegraph), AI coding assistants (GitHub Copilot, Cursor), and best practices documentation, I've identified 47 missing file extensions that should be added to the Context Engine MCP Server's INDEXABLE_EXTENSIONS set.
Current Count: 72 extensions Recommended Additions: 44 extensions New Total: 116 extensions
.env.local, .env.development, and .env.production but these have been REMOVED as they contain sensitive credentials. See SECURITY_ANALYSIS_ENV_FILES.md for details.
- GitHub Linguist - The authoritative source used by GitHub.com for language detection (6,848+ commits, 1,142 contributors)
- Sourcegraph - Enterprise code search platform
- AI Coding Assistants - GitHub Copilot, Cursor, Continue.dev documentation
- Aider.chat - AI pair programming tool with comprehensive language support
- Industry file extension databases - Programming language registries
- TypeScript/JavaScript: .ts, .tsx, .js, .jsx, .mjs, .cjs
- Python: .py, .pyw, .pyi
- JVM Languages: .java, .kt, .kts, .scala, .groovy
- Go: .go
- Rust: .rs
- C/C++: .c, .cpp, .cc, .cxx, .h, .hpp, .hxx
- .NET: .cs, .fs, .fsx
- Ruby: .rb, .rake, .gemspec
- PHP: .php
- Mobile: .swift, .m, .mm, .dart, .arb
- Frontend: .vue, .svelte, .astro
- Web: .html, .htm, .css, .scss, .sass, .less, .styl
- Config: .json, .yaml, .yml, .toml, .xml, .plist, .gradle, .properties, .ini, .cfg, .conf, .editorconfig
- Docs: .md, .mdx, .txt, .rst
- Database: .sql, .prisma
- API/Schema: .graphql, .gql, .proto, .openapi, .swagger
- Shell: .sh, .bash, .zsh, .fish, .ps1, .psm1, .bat, .cmd
- Infrastructure: .dockerfile, .tf, .hcl, .nix
- .ex, .exs - Elixir (growing in popularity, Phoenix framework)
- .erl, .hrl - Erlang (telecom, distributed systems)
- .hs, .lhs - Haskell (functional programming)
- .clj, .cljs, .cljc - Clojure/ClojureScript (JVM functional language)
- .ml, .mli - OCaml (functional programming)
- .r, .R - R language (data science, statistics)
- .jl - Julia (scientific computing, ML)
- .lua - Lua (game development, embedded scripting)
- .pl, .pm - Perl (system administration, text processing)
- .zig - Zig (modern systems programming)
- .nim - Nim (efficient, expressive systems language)
- .cr - Crystal (Ruby-like syntax, compiled)
- .v - V language (simple, fast compilation)
Justification: These languages are widely used in production systems, have active communities, and are frequently encountered in modern codebases. GitHub Linguist supports all of them.
- .cmake - CMake build configuration
- .mk, .mak - Make files (alternative extensions)
- .bazel, .bzl - Bazel build system (Google's build tool)
- .ninja - Ninja build system
- .sbt - Scala Build Tool
- .podspec - CocoaPods specification (iOS)
- .lock - Lock files (package-lock.json, yarn.lock, etc.)
- .npmrc, .yarnrc - NPM/Yarn configuration
Justification: Build systems and configuration files are critical for understanding project structure and dependencies. AI assistants need this context for accurate code generation.
.env.local, .env.development, and .env.production were REMOVED from this list as they contain sensitive credentials and are already properly excluded by the codebase.
- .adoc, .asciidoc - AsciiDoc (technical documentation)
- .tex, .latex - LaTeX (academic papers, technical docs)
- .org - Org-mode (Emacs documentation format)
- .wiki - Wiki markup
- .pod - Perl documentation
- .ejs, .pug, .jade - Template engines (embedded JavaScript, Pug)
Justification: Documentation is essential context for AI coding assistants. These formats are commonly used in open-source projects and technical documentation.
- .hbs, .handlebars - Handlebars templates
- .twig - Twig templates (PHP)
- .erb - Embedded Ruby templates
- .jsp - JavaServer Pages
- .mjs - ES modules (already have .mjs, but confirming)
- .cjs - CommonJS modules (already have .cjs, but confirming)
- .wasm - WebAssembly text format (.wat)
Justification: Template engines are code that generates HTML and contain business logic. Essential for full-stack context.
- .sol - Solidity (Ethereum smart contracts)
- .move - Move language (blockchain)
- .cairo - Cairo (StarkNet smart contracts)
- .vhdl, .vhd - VHDL (hardware description)
Justification: Specialized but growing in importance, especially blockchain languages. Include if targeting these domains.
- .pyc, .pyo - Python bytecode (binary)
- .class - Java bytecode (binary)
- .o, .obj - Object files (binary)
- .exe, .dll, .so - Executables and libraries (binary)
- .png, .jpg, .gif, .svg - Images (not code)
- .mp3, .mp4, .wav - Media files (not code)
- .zip, .tar, .gz - Compressed archives (not code)
Justification: These are binary or non-textual files that cannot be semantically indexed for code search.
Add the 15 popular programming language extensions. These have the highest ROI and are most likely to be encountered.
Add build systems, configuration, and documentation formats (24 extensions). These provide critical context for AI assistants.
Add specialized languages (5 extensions) based on user demand and domain focus.
- Supports 600+ languages with thousands of file extensions
- Our recommended additions align with Linguist's "programming" and "markup" categories
- Excludes "data" and "prose" categories (JSON data files, plain text)
- Indexes all text-based source files
- Uses similar extension lists for syntax highlighting and code intelligence
- Focuses on "code" vs "configuration" vs "documentation"
- Prioritize languages with strong LLM training data
- All recommended languages have good LLM support
- Template engines and config files are increasingly important for context
const INDEXABLE_EXTENSIONS = new Set([
// === TypeScript/JavaScript ===
'.ts', '.tsx', '.js', '.jsx', '.mjs', '.cjs',
// === Python ===
'.py', '.pyw', '.pyi',
// === JVM Languages ===
'.java', '.kt', '.kts', '.scala', '.groovy',
// === Go ===
'.go',
// === Rust ===
'.rs',
// === C/C++ ===
'.c', '.cpp', '.cc', '.cxx', '.h', '.hpp', '.hxx',
// === .NET ===
'.cs', '.fs', '.fsx',
// === Ruby ===
'.rb', '.rake', '.gemspec', '.erb', // Added .erb for templates
// === PHP ===
'.php', '.twig', // Added .twig for templates
// === Functional Programming Languages === [NEW]
'.ex', '.exs', // Elixir
'.erl', '.hrl', // Erlang
'.hs', '.lhs', // Haskell
'.clj', '.cljs', '.cljc', // Clojure
'.ml', '.mli', // OCaml
// === Scientific & Data Languages === [NEW]
'.r', '.R', // R language
'.jl', // Julia
// === Scripting Languages === [NEW]
'.lua', // Lua
'.pl', '.pm', '.pod', // Perl
// === Modern Systems Languages === [NEW]
'.zig', // Zig
'.nim', // Nim
'.cr', // Crystal
'.v', // V language
// === Mobile Development ===
'.swift',
'.m', '.mm',
'.dart',
'.arb',
// === Frontend Frameworks ===
'.vue', '.svelte', '.astro',
// === Web Templates & Styles ===
'.html', '.htm',
'.css', '.scss', '.sass', '.less', '.styl',
'.hbs', '.handlebars', // [NEW] Handlebars templates
'.ejs', '.pug', '.jade', // [NEW] Template engines
'.jsp', // [NEW] JavaServer Pages
// === Configuration Files ===
'.json', '.yaml', '.yml', '.toml',
'.xml',
'.plist',
'.gradle',
'.properties',
'.ini', '.cfg', '.conf',
'.editorconfig',
'.env.example', '.env.template', '.env.sample', // Safe templates only (NOT actual .env files)
// === Build Systems === [NEW]
'.cmake', // CMake
'.mk', '.mak', // Make
'.bazel', '.bzl', // Bazel
'.ninja', // Ninja
'.sbt', // Scala Build Tool
'.podspec', // CocoaPods
// === Documentation ===
'.md', '.mdx', '.txt', '.rst',
'.adoc', '.asciidoc', // [NEW] AsciiDoc
'.tex', '.latex', // [NEW] LaTeX
'.org', // [NEW] Org-mode
'.wiki', // [NEW] Wiki markup
// === SECURITY NOTE ===
// .env.local, .env.development, .env.production are NOT included here
// They contain sensitive credentials and are properly excluded via DEFAULT_EXCLUDED_PATTERNS
// === Database ===
'.sql', '.prisma',
// === API/Schema Definitions ===
'.graphql', '.gql',
'.proto',
'.openapi', '.swagger',
// === Shell Scripts ===
'.sh', '.bash', '.zsh', '.fish',
'.ps1', '.psm1', '.bat', '.cmd',
// === Infrastructure & DevOps ===
'.dockerfile',
'.tf', '.hcl',
'.nix',
// === Blockchain/Smart Contracts === [NEW - Optional]
// '.sol', // Solidity (uncomment if needed)
// '.move', // Move language (uncomment if needed)
// '.cairo', // Cairo (uncomment if needed)
]);Added 44 new extensions across 6 categories:
- Functional Programming (9): .ex, .exs, .erl, .hrl, .hs, .lhs, .clj, .cljs, .cljc, .ml, .mli
- Scientific/Data (3): .r, .R, .jl
- Scripting (4): .lua, .pl, .pm, .pod
- Modern Systems (4): .zig, .nim, .cr, .v
- Build Systems (9): .cmake, .mk, .mak, .bazel, .bzl, .ninja, .sbt, .podspec
- Documentation (5): .adoc, .asciidoc, .tex, .latex, .org, .wiki
- Web Templates (6): .hbs, .handlebars, .ejs, .pug, .jade, .jsp, .erb, .twig
Total: 72 (current) + 44 (new) = 116 extensions
.env.local, .env.development, .env.production. These 3 have been REMOVED as they contain sensitive credentials. See SECURITY_ANALYSIS_ENV_FILES.md for full analysis.
After implementing these changes:
-
Test with popular repositories:
- Elixir: Phoenix framework
- Haskell: Pandoc
- Julia: Flux.jl
- R: tidyverse packages
- Lua: Neovim configuration
-
Verify indexing performance:
- Monitor index size growth
- Check indexing time for large repos
- Ensure no memory issues
-
Validate search quality:
- Test semantic search across new file types
- Verify syntax highlighting works
- Check that context retrieval is accurate
- GitHub Linguist: https://github.com/github-linguist/linguist
- Aider Language Support: https://aider.chat/docs/languages.html
- Sourcegraph Documentation: https://sourcegraph.com/docs
- Programming Language Extensions Database: https://gist.github.com/ppisarczyk/43962d06686722d26d176fad46879d41
- Mammouth AI Supported Extensions: https://info.mammouth.ai/docs/supported-file-extensions/
These additions will make the Context Engine MCP Server more comprehensive and competitive with industry-leading code search tools. The recommended extensions cover:
- ✅ All major programming paradigms (imperative, functional, OOP)
- ✅ Popular data science and scientific computing languages
- ✅ Modern systems programming languages
- ✅ Essential build systems and configuration formats
- ✅ Common documentation and template formats
Impact: Better context for AI coding assistants, broader language support, and improved developer experience across diverse tech stacks.