This is a Cybersecurity Platform Featuring an AI-Driven Assessment Bot which will go through a company's infrastructure, probing for vulnerabitlites and gaps. Once identified, it would ask dynamic questions to gain further clarifications. The topics include Network Security, Compliance, Data Protection and more. Based on the answers by the company, it would rate every answer and give recommendations to improve if required. At the end of the assessment, an overall assessment report would be generated highlighting the weak points as well as suggest actionable insights.
We needed to heavily rely on understanding the context of what was present in the documentats as these documents would rarely have the exact terminologies for the protocols followed.
- 🧑🏻💻 Dynamic and Context-Aware Questioning using a LLM and Chain of Thought Prompt Engineering.
- 🤔 Contextual Learning with Knowledge Graphs to improve the accuracy of dynamic questioning.
- 📝 Assessment of Infrastructure Documentation to find vulnerabilties to give recommendations.
- 💯 Real-Time Risk Scoring categorizing risks into different categories based on the knowledge base.
- ✅ Detailed Report and Recommendations highlighting major risks and conclusions derived from the question-answering.
- 📈 Topic Modelling for Incident Reports to aid in analysis and visualisation.
- We had to use open-source LLMs or fine-tune small language models (SLMs), due to security and privacy reasons.
- Less computational power was crucial.
- We parsed the input document (containing images, text, and tables) using PyPDF to extract text, then preprocessed it.
- A cybersecurity corpus was created covering 12 domains (network, data protection, cloud security, etc.).
- Words from the corpus found in the document were replaced with
<mask>
, generating a masked input. - Using the SecureBERT model, we probabilistically predicted values for
<mask>
, ranking the top k terms in order of relevance. - Least relevant predictions indicated topics insufficiently covered, prompting clarification.
- Clustering embeddings of these terms highlighted coverage gaps, aiding context-aware questioning using Chain of Thought Prompt Engineering.
- Iterating through dynamic questions based on chat history and context, we collected enough data to produce a detailed report on vulnerabilities, risk scores, and recommendations.
- PyPDF
- SecureBERT
- Phi-3.5-mini-instruct LLM
- FastAPI
- Next.js