Skip to content

alvaropmontenegro/html-extractor-app

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 

Repository files navigation

HTMLExtractor

This is a web application for extracting images and text of any URL.

Contents

Overview

The main objective of this challenge is to develop a web application that would accept a web URL and can display all of the images used on the page and select the top 10 used words in all text of the page. There are some solutions for this problem using libraries made by 3rd-party developers/companies like HTML Agility Pack, TidyNet.Tidy, mshtml, HTMLDocument, etc. Despite these solutions, I've decided to create my solution to avoid any bugs or bad performance from 3rd-party libraries.

Minimal Requirements

  • Visual Studio Community installed with ASP.NET Core MVC features.

Code, Debug, and Run

To inspect and run this application, you need to follow these steps:

  • Download a zip file from GitHub using the "Download ZIP" button and unzip it in any folder you desire.
  • OR clone this git repository into your desired folder.
  • After downloaded or cloned:
    • Start the Visual Studio
    • In the menu, click in File > Open > Project/Solution
    • Go to your downloaded/cloned folder
    • Go to altudo-app/AltudoApplication
    • Select the AltudoApplication.sln file
    • Feel free to inspect the code.
    • If you want to debug or play the app, click on the button ISS Express (green arrow button) or press F5 to start the application.
    • Now, the application will start in your web browser.
    • If an alert message appears, you must click in Advanced > Accept the risk and continue
    • Enjoy!

Features

  • It can extract all images from any website.
  • It can extract all texts from any website and count the top 10 words most used.
  • The application was built in ASP.Net Core 3.1 MVC.
  • The front-end was built using Razor Pages.

Contact me

For any question, issue or bug, please, feel free to contact me.

About

HTML Extractor

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published