-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PyTorch-based name generation model with dataset and training script #60
Conversation
Reviewer's Guide by SourceryThis PR introduces a character-level name generation system using PyTorch. The implementation consists of a custom dataset class for processing name data and an LSTM-based neural network model for generating new names. The system includes comprehensive training and generation utilities with configurable parameters. Sequence diagram for name generation processsequenceDiagram
participant User
participant NameDataset
participant NameGenerator
participant Model
User->>NameDataset: Initialize with names
User->>NameGenerator: Initialize with vocab size
User->>Model: Train model
loop Generate names
User->>Model: Generate name
Model->>NameDataset: Get context
Model->>NameGenerator: Predict next character
NameGenerator-->>Model: Return prediction
alt End token or max length
Model-->>User: Return generated name
end
end
Class diagram for NameDataset and NameGeneratorclassDiagram
class NameDataset {
-List~str~ names
-int context_size
-List~str~ chars
-List~str~ special_tokens
-Dict~str, int~ stoi
-Dict~int, str~ itos
-torch.Tensor X
-torch.Tensor y
+__init__(List~str~ names, int context_size=3)
+_build_dataset() Tuple~torch.Tensor, torch.Tensor~
+__len__() int
+__getitem__(int idx) Tuple~torch.Tensor, torch.Tensor~
}
class NameGenerator {
-nn.Embedding embedding
-nn.LSTM lstm
-nn.Linear fc
+__init__(int vocab_size, int embedding_dim=24, int hidden_dim=128)
+forward(torch.Tensor x) torch.Tensor
}
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @leonvanbokhorst - I've reviewed your changes - here's some feedback:
Overall Comments:
- The NameDataset class is missing a call to _build_dataset() in init to initialize self.X and self.y. This will cause runtime errors when len or getitem are called.
Here's what I looked at during the review
- 🟡 General issues: 2 issues found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟡 Complexity: 1 issue found
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Summary by Sourcery
New Features: