Skip to content

Commit f0df3ef

Browse files
authored
Split AStar into multiple files (#377)
* Move astar into helper functions and improved string pruning * helper * Clippy and fmt * fixed some helpers * english detection improvements * added more tests * 0.11.0 * Clippy and fmt --------- Co-authored-by: bee <autumn@skerritt.blog>
1 parent cc108b0 commit f0df3ef

15 files changed

Lines changed: 845 additions & 482 deletions

Cargo.lock

Lines changed: 105 additions & 106 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
[package]
22
name = "project_ares"
33
repository = "https://github.com/bee-san/Ares"
4-
version = "0.10.0"
4+
version = "0.11.0"
55
edition = "2021"
66
description = "Automated decoding tool, Ciphey but in Rust"
77
license = "MIT"
@@ -48,7 +48,7 @@ bs58 = "0.5.0"
4848
data-encoding = "2.4.0"
4949
urlencoding = "2.1.3"
5050
z85 = "3.0.5"
51-
gibberish-or-not = "4.0.3"
51+
gibberish-or-not = "4.1.0"
5252
cipher_identifier = "0.2.0"
5353
rand = "0.9.0" # For generating random values
5454
colored = "3.0.0"
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# Change: AStar Refactoring and String Quality Enhancement
2+
3+
## Purpose
4+
Refactor the AStar search implementation to improve code organization and enhance string quality assessment by filtering out strings with high percentages of invisible characters.
5+
6+
## Trade-offs
7+
### Advantages
8+
- Improved code organization with helper functions in a separate module
9+
- Better memory efficiency by quickly rejecting strings with >50% invisible characters
10+
- Enhanced maintainability through clearer separation of concerns
11+
- Easier testing of individual helper functions
12+
13+
### Disadvantages
14+
- Slight increase in module complexity with an additional file
15+
- Potential for minor performance overhead from cross-module function calls
16+
17+
## Technical Implementation
18+
- Split AStar implementation into two files:
19+
- `astar.rs`: Core A* search algorithm implementation
20+
- `helper_functions.rs`: Supporting functions for heuristics, quality assessment, and statistics
21+
- Enhanced `calculate_string_quality` function to immediately reject strings with >50% invisible characters
22+
- Added a new test case to verify the invisible character filtering functionality
23+
- Updated module imports and exports in `mod.rs`
24+
25+
## Future Improvements
26+
- Persist decoder success statistics to disk for learning across sessions
27+
- Further optimize string quality assessment with more sophisticated language detection
28+
- Consider moving more common utility functions to the helper module for reuse by other search algorithms
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Change: Improve String Pruning for Low-Quality Inputs
2+
3+
## Purpose
4+
Enhance the pruning mechanism to skip decoding of low-quality strings, which improves efficiency by avoiding wasted computation on strings that are unlikely to produce meaningful results.
5+
6+
## Trade-offs
7+
### Advantages
8+
- Reduces computational resources spent on strings unlikely to yield useful results
9+
- Speeds up the overall decoding process by focusing on higher-quality candidates
10+
- Prevents the search algorithm from exploring unproductive paths
11+
- Improves memory usage by pruning low-quality strings early
12+
13+
### Disadvantages
14+
- May occasionally reject valid encodings that have unusual characteristics
15+
- Requires careful tuning of thresholds to balance efficiency and thoroughness
16+
- Adds additional computation for quality checks (though this is minimal compared to the savings)
17+
18+
## Technical Implementation
19+
- Enhanced the `check_if_string_cant_be_decoded` function to consider multiple quality factors:
20+
- String length (rejects strings with 2 or fewer characters)
21+
- Non-printable character ratio (rejects strings with >30% non-printable characters)
22+
- Overall string quality (rejects strings with quality score <0.2)
23+
- Added comprehensive tests to verify the pruning behavior
24+
- Updated documentation to explain the rationale behind each pruning criterion
25+
26+
## Future Improvements
27+
- Fine-tune the thresholds based on real-world usage data
28+
- Consider adding more sophisticated quality metrics (e.g., entropy, character distribution)
29+
- Implement adaptive thresholds that adjust based on the search context
30+
- Add logging to track how many strings are being pruned and why
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# Change: Remove CIPHER_MAPPING from helper_functions
2+
3+
## Purpose
4+
Remove the incorrect mapping between Cipher Identifier's cipher names and Ares decoder names. The mapping was inaccurate, particularly with "fractionatedMorse" being incorrectly mapped to "morseCode" when they are different encoding schemes.
5+
6+
## Trade-offs
7+
### Advantages
8+
- Removes incorrect mappings that could lead to misidentification of ciphers
9+
- Simplifies the code by directly using the first result from Cipher Identifier
10+
- Eliminates potential confusion between different cipher types
11+
12+
### Disadvantages
13+
- No longer filters cipher types based on available decoders
14+
- May return cipher types that don't have corresponding decoders in Ares
15+
16+
## Technical Implementation
17+
- Removed the `CIPHER_MAPPING` static variable and its documentation
18+
- Modified the `get_cipher_identifier_score` function to return the first result from Cipher Identifier instead of checking against the mapping
19+
- Verified that all tests still pass after the changes
20+
21+
## Future Improvements
22+
- Consider implementing a more accurate mapping if needed in the future
23+
- Potentially add a check to verify if Ares has a decoder for the identified cipher type
24+
- Could add a more sophisticated scoring mechanism for cipher identification
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# Change: Remove get_decoder_popularity Function
2+
3+
## Purpose
4+
Remove the redundant `get_decoder_popularity` function from `helper_functions.rs` since decoders already have a `popularity` attribute in their implementation. This eliminates duplication and ensures that popularity values are maintained in a single location.
5+
6+
## Trade-offs
7+
### Advantages
8+
- Eliminates redundant code that duplicated popularity values
9+
- Simplifies maintenance by having popularity values defined only in the decoder implementations
10+
- Reduces the risk of inconsistencies between the function and the actual decoder attributes
11+
12+
### Disadvantages
13+
- The `generate_heuristic` function no longer has direct access to the popularity values
14+
- Using success rate as a proxy for popularity may not perfectly match the original behavior
15+
16+
## Technical Implementation
17+
- Removed the `get_decoder_popularity` function from `helper_functions.rs`
18+
- Modified the `generate_heuristic` function to use the decoder's success rate as a proxy for popularity
19+
- Updated tests to verify that success rate affects the heuristic calculation
20+
- Removed the now-obsolete `test_popularity_affects_heuristic` test
21+
22+
## Future Improvements
23+
- Consider modifying the `CrackResult` struct to include the decoder's popularity attribute
24+
- Explore ways to directly access the decoder's popularity attribute in the `generate_heuristic` function
25+
- Evaluate whether success rate is an appropriate proxy for popularity or if another approach would be better

docs/rules/documentation.mdc

Lines changed: 147 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,147 @@
1-
1+
# Documentation Standards
2+
3+
Rule to ensure consistent documentation across the codebase.
4+
5+
<rule>
6+
name: documentation_standards
7+
description: Standards for code documentation and comments
8+
filters:
9+
# Match any Rust files
10+
- type: file_extension
11+
pattern: "\\.rs$"
12+
# Match documentation comments
13+
- type: content
14+
pattern: "///|//!|#\\[doc"
15+
16+
actions:
17+
- type: suggest
18+
message: |
19+
When writing documentation:
20+
21+
1. Module-Level Documentation:
22+
```rust
23+
//! Module description that explains its purpose
24+
//!
25+
//! Detailed explanation of what the module handles, its key
26+
//! components, and any important concepts.
27+
```
28+
29+
2. Struct Documentation:
30+
```rust
31+
/// Represents a complex type with a clear purpose.
32+
/// Each field is documented to explain its role and format.
33+
#[derive(Debug)] // Add debug when struct should be printable
34+
pub struct MyStruct {
35+
/// Field description explaining its purpose
36+
/// Format specification if applicable (e.g. "r,g,b" format)
37+
pub field: Type,
38+
}
39+
```
40+
41+
3. Function Documentation:
42+
```rust
43+
/// Clear description of what the function does.
44+
///
45+
/// # Arguments
46+
/// * `arg_name` - Detailed description of the argument
47+
///
48+
/// # Returns
49+
/// * `ReturnType` - Description of the return value
50+
///
51+
/// # Panics
52+
/// Document any conditions that cause panics
53+
pub fn my_function(arg: Type) -> ReturnType {
54+
```
55+
56+
4. Enum Documentation:
57+
```rust
58+
/// Description of what the enum represents
59+
/// Include any important usage information
60+
#[derive(Debug)]
61+
pub enum MyEnum {
62+
/// Description of this variant
63+
/// Include any specific behavior
64+
Variant1,
65+
}
66+
```
67+
68+
5. Error Handling Documentation:
69+
```rust
70+
/// Function that can fail
71+
///
72+
/// # Errors
73+
/// * Describes conditions that cause errors
74+
/// * Lists the types of errors that can occur
75+
pub fn fallible_function() -> Result<T, Error> {
76+
```
77+
78+
6. Constants and Configuration:
79+
```rust
80+
/// Description of what the constant represents
81+
/// Include units or format if applicable
82+
pub const MY_CONSTANT: f64 = 0.3;
83+
```
84+
85+
7. Public API Documentation:
86+
```rust
87+
/// Public function with complete documentation
88+
///
89+
/// Detailed description of the function's purpose,
90+
/// behavior, and any important notes about usage.
91+
///
92+
/// # Examples
93+
/// ```rust
94+
/// let result = my_function("input");
95+
/// assert_eq!(result, "expected");
96+
/// ```
97+
pub fn my_function() {
98+
```
99+
100+
8. Internal Function Documentation:
101+
```rust
102+
/// Brief description of internal function
103+
/// Focus on technical details relevant to maintainers
104+
fn internal_function() {
105+
```
106+
107+
examples:
108+
- input: |
109+
//! Color handling module for CLI output
110+
//!
111+
//! This module manages color schemes and formatting for
112+
//! CLI output, ensuring consistent appearance across the app.
113+
114+
/// Represents a color scheme with RGB values for different roles.
115+
/// Each color is stored as a comma-separated RGB string.
116+
#[derive(Debug)]
117+
pub struct ColorScheme {
118+
/// RGB color for informational messages (format: "r,g,b")
119+
pub informational: String,
120+
}
121+
122+
/// Formats text with specified color scheme
123+
///
124+
/// # Arguments
125+
/// * `text` - The text to color
126+
/// * `role` - The role determining color choice
127+
///
128+
/// # Returns
129+
/// * `String` - Colored text string
130+
pub fn format_text(text: &str, role: &str) -> String {
131+
output: "Valid documentation format"
132+
133+
- input: |
134+
// Bad documentation
135+
struct Colors {
136+
// RGB color
137+
info: String,
138+
}
139+
140+
// Colors the text
141+
fn color_text(t: &str) -> String {
142+
output: "Invalid documentation format"
143+
144+
metadata:
145+
priority: high
146+
version: 1.0
147+
</rule>

src/checkers/athena.rs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,9 @@ impl Check for Checker<Athena> {
5555
// TODO: wrap all checkers in oncecell so we only create them once!
5656
let lemmeknow = Checker::<LemmeKnow>::new().with_sensitivity(self.sensitivity);
5757
let lemmeknow_result = lemmeknow.check(text);
58+
//println!("Text is {}", text);
5859
if lemmeknow_result.is_identified {
60+
println!("lemmeknow_result: {:?}", lemmeknow_result.is_identified);
5961
let mut check_res = CheckResult::new(&lemmeknow);
6062
let human_result = human_checker::human_checker(&lemmeknow_result);
6163
check_res.is_identified = human_result;

src/checkers/english.rs

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -139,16 +139,6 @@ mod tests {
139139
assert!(checker.check("Prei?nterview He!llo Dog?").is_identified);
140140
}
141141

142-
#[test]
143-
fn test_checker_fails_doesnt_hit_40_percent() {
144-
let checker = Checker::<EnglishChecker>::new();
145-
assert!(
146-
checker
147-
.check("Hello Dog nnnnnnnnnnn llllllll ppppppppp gggggggg")
148-
.is_identified
149-
);
150-
}
151-
152142
#[test]
153143
fn test_check_fail_single_puncuation_char() {
154144
let checker = Checker::<EnglishChecker>::new();

src/checkers/lemmeknow_checker.rs

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,3 +58,56 @@ impl Check for Checker<LemmeKnow> {
5858
fn format_data_result(input: &Data) -> String {
5959
input.name.to_string()
6060
}
61+
62+
#[cfg(test)]
63+
mod tests {
64+
use super::*;
65+
use crate::checkers::checker_type::{Check, Checker};
66+
use gibberish_or_not::Sensitivity;
67+
68+
#[test]
69+
fn test_url_exact_match() {
70+
let checker = Checker::<LemmeKnow>::new().with_sensitivity(Sensitivity::Low);
71+
assert!(checker.check("https://google.com").is_identified);
72+
}
73+
74+
#[test]
75+
fn test_url_with_extra_text_fails() {
76+
let checker = Checker::<LemmeKnow>::new().with_sensitivity(Sensitivity::Low);
77+
assert!(
78+
!checker
79+
.check("https://google.com and some text")
80+
.is_identified
81+
);
82+
}
83+
84+
#[test]
85+
fn test_ip_exact_match() {
86+
let checker = Checker::<LemmeKnow>::new().with_sensitivity(Sensitivity::Low);
87+
assert!(checker.check("192.168.1.1").is_identified);
88+
}
89+
90+
#[test]
91+
fn test_ip_with_extra_text_fails() {
92+
let checker = Checker::<LemmeKnow>::new().with_sensitivity(Sensitivity::Low);
93+
assert!(!checker.check("IP is 192.168.1.1").is_identified);
94+
}
95+
96+
#[test]
97+
fn test_s3_path() {
98+
let checker = Checker::<LemmeKnow>::new().with_sensitivity(Sensitivity::Low);
99+
assert!(checker.check("s3://bucket/path/key").is_identified);
100+
}
101+
102+
// Lemmeknow can only match if its an EXACT match
103+
// So this should fail
104+
#[test]
105+
fn test_bitcoin_with_extra_text_fails() {
106+
let checker = Checker::<LemmeKnow>::new().with_sensitivity(Sensitivity::Low);
107+
assert!(
108+
!checker
109+
.check("BTC address: 1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVN2")
110+
.is_identified
111+
);
112+
}
113+
}

0 commit comments

Comments
 (0)