RESEARCH | Publications

For the complete list, please visit my research profiles on the following websites:



RESEARCH | Grants

I have been PI/co-I on the following projects

  • Delivering Trustworthy Electoral Oversight: Developing an Automated Analysis of Electoral Spending Disclosures in the UK (Apr 2023 - Jul 2024, EPSRC), see the project web page
  • EPSRC Doctoral Training Program – Supervisor Led Project (Oct 2021 - Apr 2025, EPSRC), "Predicting the Spread and Damage of Hate Speech for Effective Prevention and Intervention of Cyberhate". Collaborator: Rotherham United Football Club
  • Data Science powered healthcare supply chain network monitoring system in the post-COVID and post-Brexit industrial landscape (May 2021 - Apr 2022, Innovate UK). Partner: Vamstar Ltd
  • AI-powered real-time healthcare supplier profile and COVID-19 supply risk matrix (Feb 2021 - Jan 2022, Innovate UK). Partner: Vamstar Ltd. See our featured article in the media
  • DoubleTapp: Crowdsourcing the Long Tail of Nano-influencers (Nov 2020 - Oct 2021, Innovate UK), partner: DoubleTapp Ltd.
  • Towards a Big-Data Driven Approach to Tackling Urban Waterlogging - A Scoping Study (Jan 2020 - Apr 2021, GCRF Networking Grants)
  • EPSRC Doctoral Training Program – Supervisor Led Project (Oct 2018 - Apr 2022, EPSRC), "Mining health information on the Social Web – towards an understanding of the influence of social media on public healthcare". Collaborator: Diabetes.co.uk
  • Early Detection of Cyber Hate on Social Media for Crime Prevention (PI, June - August, 2017, Nullfield Foundation, UK)
  • KTP Web Mining for Just Giving Ltd (Oct 2014 - Jan 2015, JustGiving Ltd.)

RESEARCH | Awards


RESEARCH | Professional Services


RESEARCH | PhD

My PhD research focused on exploiting background knowledge from various resources to support supervised Named Entity Recognition - a fundamental task of Information Extraction that extracts named entities from unstructured texts. For details, see Named entity recognition: challenges in document annotation, gazetteer construction and disambiguation.

I am open to supervise students interested in any of the research topics listed at the top of this web page. I have supervised the following students to successful completion:

  • 2018-22 Zhixue (Cass) Zhao, 'Using Pre-trained Language Models for Toxic Comment Classification'
  • 2018-22 Jenny Hayes, 'The use of social media for sousveillance'


TOOLS AND DATA

I have a GitHub webpage for sharing datasets I used for research. These cover research in the areas of terminology extraction, ontology mapping, entity linking, scholarly data linking, Tweet classification, and procedural knowledge extraction.

I am also the creator and contributor of a number of open source research software listed below.


JATE - Java Automatic Term Extraction library

JATE is the most extensively used library for state-of-the-art automatic term extraction (ATE). It can be used for benchmarking ATE algorithms, developing glossaries and supporting a wide range of Natural Language Processing tasks such as ontology engineering and machine translation. It also provides a generic development and evaluation framework for developing new term extraction algorithms.

The most recent, stable version is JATE 2.0, released under the LGPL license on GitHub.


Credits:


Cite:

  • Zhang, Z., Gao, J., Ciravegna, F. (2016). JATE 2.0: Java Automatic Term Extraction with Apache Solr. Proceedings of the Tenth International Conference on Language Resources and Evaluation

Semantic Table Interpretation (STI)

The project implements state-of-the-art semantic table interpretation algorithms, which take as input relational tables, and creates three types of semantic annotations on the table: class for a table column; named entity for table cells; and relations between columns. It is currently hosted on GitHub and has been adapted to support a number of research projects such as Odalic


Cite:

  • Zhang, Z. (2017). Effective and Efficient Semantic Table Interpretation using TableMiner+. Semantic Web Journal. 8 (6) (in print)

ScholarlyData Link Discovery

The project implements state-of-the-art machine learning based link discovery/instance matching algorithms for linked data. It contains five well-known algorithms which are tested on a task of record deduplication for the ScholarlyData.org project. The code is currently hosted on GitHub

.

Cite:

  • Zhang, Z., Nuzzolese, A., Gentile, A. (2017). Entity deduplication on ScholarlyData. In Proceedings of the Extended Semantic Web Conference, pp85-100