AI-based Automatic Resume Analysis
How we use AI to detect and extract information from CVs (Resumes) to help the HR industry in its recruitment process.
AI technology in the HR industry
The current trend in the HR industry is the application of Artificial Intelligence (AI) technology to solve the challenges faced by employers and help accelerate the search for the most suitable candidates that companies are looking for. AI can be used in the process of approaching, evaluating and recommending candidates automatically to ensure time-saving, and especially to eliminate errors and personal prejudices. In some ways, AI technology in HR works like the analytical mind of a human, but of course at a higher scale and speed that humans can hardly achieve.
Technology has completely changed the recruiting landscape. Nowadays, you can easily access candidate information through job websites. There, the volume of job applicants often reaches thousands of people with completely different levels of ability, experience, and desired jobs.
Why should HR managers need AI?
In fact, HR managers cannot filter and evaluate such a large amount of information. Not to mention, when the candidate evaluation is done by humans, absolute objectivity cannot be guaranteed. Moreover, this also costs businesses a lot of time, as well as resources. This is why the HR industry needs the support of AI technology.
As part of our projects, we have developed and deployed AI-powered recruiters for the employment service. One of them is a Machine Learning-based automatic resume analysis solution for job seekers.
This AI solution is an application that takes a resume as an input that can be in any media format: document (PDF, Word) or image (.PNG or .JPEG) or template, then convert it into a structured data format — such as XML or JSON.
The information that is extracted by a resume parser usually includes the following:
- Personal Information: including name, address, phone number and email.
- Education: including a list of curriculums, each containing the start date, end date, location, degree, educational institution or university.
- Skills: including professional skills and languages
- Experience: including an experience list containing start date, end date, location, job title, company, and job description.
AI Strategies for Resume Parsing
As we all know, a resume such as a file in a word or image format is an unstructured data form. Moreover, with a large number of resumes, there are also many resume templates for job seekers. Therefore, to perform information extraction such as Skills, Education, Experience or Personal Information, we will combine many AI models to perform sequentially end-to-end. Our strategy is a combination of Computer Vision and Natural Language Processing algorithms.
To visualize our strategy, Figure 3 shows an example of a resume format. The pipeline is as follows:
- First, we convert the document to an image format (png or jpeg).
- Next, we use a computer vision-based algorithm of the type Object Detection to detect and segment the zones corresponding to personal information, education, skills and experience. The output of this step is the coordinates of a rectangular box surrounding those zones. For example, in Figure 4, the personal information, education, skills and experience corresponding zones are detected and segmented in the red, blue, purple and green boxes, respectively.
- After segmenting the boxes containing the information of each category, we crop the images in those boxes and then use OCR to extract the text from this cropped image.
- Finally, we will use RegEX and NER to recognize the structural information required and then represent them under a Python dictionary which can be stored in a JSON file. We validate the majority of the information extracted (Age should be an integer between 18 and 65, Telephone number and Email should be in a valid format, the current address should be in France or the EU etc.)
Results
In one of our projects, we built an Object detection model and achieved a 92% AP50-score on the testing set. Thus, with this AI solution, we hope to help companies save up to 70% or more on time-consuming tasks but still ensure high accuracy extraction of information, which is useful in the recruitment process. Our solution helps recruiters in extracting information and easing reading resumes. The recruiters can then focus on validating the pertinence of suggested resumes and making decisions.
Conclusion
In this article, we introduced a strategy of resume parsing that was implemented for our customers. Looking at the pipeline as shown in Figure 4, we can see that the detection and segmentation of zones play an important role, greatly influencing the extraction of information accurately of information.
As an application for this machine-learning solution, we can easily build a recommendation system for the recruiting company by matching the experience and skills required by the recruiters with those possessed by jobseekers.
References
Tesseract — https://github.com/tesseract-ocr/tesseract
Faster R-CNN — https://arxiv.org/abs/1506.01497
Acknowledgement
Thanks to our colleagues Achille MURANGIRA, Caroline DUPRE and Nhut DOAN NGUYEN for the article review.
About the Author
Van-Tuan DANG is a lead data scientist at La Javaness. Joining the company in 2020, he is a key player in various R&D projects on NLP & Computer Visions.