graph LR
Configuration_Driver_Setup["Configuration & Driver Setup"]
Web_Navigation_HTML_Fetching["Web Navigation & HTML Fetching"]
Job_Link_Extraction["Job Link Extraction"]
Job_Details_Application_Extraction["Job Details & Application Extraction"]
Job_Filtering_Analysis["Job Filtering & Analysis"]
Results_Presentation["Results Presentation"]
Unclassified["Unclassified"]
Configuration_Driver_Setup -- "Provides WebDriver instance and initial URL." --> Web_Navigation_HTML_Fetching
Configuration_Driver_Setup -- "Supplies required and preferred keywords." --> Job_Filtering_Analysis
Web_Navigation_HTML_Fetching -- "Outputs raw HTML content." --> Job_Link_Extraction
Job_Link_Extraction -- "Provides job portal application links." --> Job_Details_Application_Extraction
Job_Details_Application_Extraction -- "Delivers job descriptions and application links." --> Job_Filtering_Analysis
Job_Filtering_Analysis -- "Forwards filtered and sorted job data." --> Results_Presentation
click Job_Filtering_Analysis href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/webscraping/Job_Filtering_Analysis.md" "Details"
The web scraping system is structured around a sequential data processing pipeline. It begins with a Configuration & Driver Setup component, which initializes the web browser driver and loads user-defined keywords for job filtering. This setup then feeds into the Web Navigation & HTML Fetching component, responsible for accessing job portals and retrieving their raw HTML content. Subsequently, the Job Link Extraction component parses this HTML to identify and collect individual job posting links. These links are then processed by the Job Details & Application Extraction component, which navigates to each job page to extract comprehensive descriptions and direct application URLs. The extracted job data is then passed to the Job Filtering & Analysis component, where it is evaluated against the predefined keywords, filtered, and ranked based on relevance. Finally, the Results Presentation component takes the refined list of job opportunities and presents them to the user, either through console output or by automatically opening the application links in a web browser.
Initializes the Selenium WebDriver and loads scraping parameters, including required and preferred keywords.
Related Classes/Methods:
main.my_urlmain.currentpagemain.required_key_wordsmain.preferred_key_wordspopupJobs.startJobsSites:8-13
Manages browser interaction, navigates to job portals, and retrieves the raw HTML content of web pages.
Related Classes/Methods:
Parses the fetched HTML to identify and extract links to individual job postings on the portal.
Related Classes/Methods:
For each individual job posting link, this component fetches the specific job page, extracts the detailed job description, and identifies the direct application link.
Related Classes/Methods:
Job Filtering & Analysis [Expand]
Analyzes job descriptions against a predefined set of required and preferred keywords, filters jobs based on suitability, and sorts them by preference score.
Related Classes/Methods:
main.filtrateJobs_KeyWords:49-58main.sortDictbyPreferedKeys:61-62main.containsRequiredKeys:65-73main.preferedKeysCounter:76-81main.keywordAlgo:85-88
Presents the filtered and sorted job opportunities to the user, either by printing to the console or by opening application links in a web browser.
Related Classes/Methods:
Component for all unclassified files and utility functions (Utility functions/External Libraries/Dependencies)
Related Classes/Methods: None