AI-based Automatic Resume Analysis​

AI technology in the HR industry

The current trend in the HR industry is the application of Artificial Intelligence (AI) technology to solve the challenges faced by employers and help accelerate the search for the most suitable candidates that companies are looking for. AI can be used in the process of approaching, evaluating and recommending candidates automatically to ensure time-saving, and especially to eliminate errors and personal prejudices. In some ways, AI technology in HR works like the analytical mind of a human, but of course at a higher scale and speed that humans can hardly achieve.

Technology has completely changed the recruiting landscape. Nowadays, you can easily access candidate information through job websites. There, the volume of job applicants often reaches thousands of people with completely different levels of ability, experience, and desired jobs.

Why should HR managers need AI?

In fact, HR managers cannot filter and evaluate such a large amount of information. Not to mention, when the candidate evaluation is done by humans, absolute objectivity cannot be guaranteed. Moreover, this also costs businesses a lot of time, as well as resources. This is why the HR industry needs the support of AI technology.

Figure 1: A HR manager is having a hard time sifting through a bunch of resumes.

As part of our projects, we have developed and deployed AI-powered recruiters for the employment service. One of them is a Machine Learning-based automatic resume analysis solution for job seekers.

This AI solution is an application that takes a resume as an input that can be in any media format: document (PDF, Word) or image (.PNG or .JPEG) or template, then convert it into a structured data format — such as XML or JSON.

The information that is extracted by a resume parser usually includes the following:

  • Personal Information: including name, address, phone number and email.
  • Education: including a list of curriculums, each containing the start date, end date, location, degree, educational institution or university.
  • Skills: including professional skills and languages
  • Experience: including an experience list containing start date, end date, location, job title, company, and job description.
Figure 2: Process of Information extraction from the resumes

AI Strategies for Resume Parsing

As we all know, a resume such as a file in a word or image format is an unstructured data form. Moreover, with a large number of resumes, there are also many resume templates for job seekers. Therefore, to perform information extraction such as Skills, Education, Experience or Personal Information, we will combine many AI models to perform sequentially end-to-end. Our strategy is a combination of Computer Vision and Natural Language Processing algorithms.

Figure 3: Examples of resume format

To visualize our strategy, Figure 3 shows an example of a resume format. The pipeline is as follows:

  • First, we convert the document to an image format (png or jpeg).
  • Next, we use a computer vision-based algorithm of the type Object Detection to detect and segment the zones corresponding to personal information, education, skills and experience. The output of this step is the coordinates of a rectangular box surrounding those zones. For example, in Figure 4, the personal information, education, skills and experience corresponding zones are detected and segmented in the red, blue, purple and green boxes, respectively.
Figure 4: Pipeline of Resume Parsing process End-to-Ends
  • After segmenting the boxes containing the information of each category, we crop the images in those boxes and then use OCR to extract the text from this cropped image.
  • Finally, we will use RegEX and NER to recognize the structural information required and then represent them under a Python dictionary which can be stored in a JSON file. We validate the majority of the information extracted (Age should be an integer between 18 and 65, Telephone number and Email should be in a valid format, the current address should be in France or the EU etc.)
Example of the analysis output

Results

In one of our projects, we built an Object detection model and achieved a 92% AP50-score on the testing set. Thus, with this AI solution, we hope to help companies save up to 70% or more on time-consuming tasks but still ensure high accuracy extraction of information, which is useful in the recruitment process. Our solution helps recruiters in extracting information and easing reading resumes. The recruiters can then focus on validating the pertinence of suggested resumes and making decisions.

Conclusion

In this article, we introduced a strategy of resume parsing that was implemented for our customers. Looking at the pipeline as shown in Figure 4, we can see that the detection and segmentation of zones play an important role, greatly influencing the extraction of information accurately of information.

As an application for this machine-learning solution, we can easily build a recommendation system for the recruiting company by matching the experience and skills required by the recruiters with those possessed by jobseekers.

References

Tesseract — https://github.com/tesseract-ocr/tesseract

Faster R-CNN — https://arxiv.org/abs/1506.01497

Acknowledgement

Thanks to our colleagues Achille MURANGIRA, Caroline DUPRE and Nhut DOAN NGUYEN for the article review.

About the Author

Van-Tuan DANG is a lead data scientist at La Javaness. Joining the company in 2020, he is a key player in various R&D projects on NLP & Computer Visions.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
La Javaness R&D

La Javaness R&D

We help organizations to succeed in the new paradigm of “AI@scale”, by using machine intelligence responsibly and efficiently : www.lajavaness.com