  • EPSRC Doctoral Training Program – Supervisor Led Project (Oct 2021 - Apr 2025, EPSRC), "Predicting the Spread and Damage of Hate Speech for Effective Prevention and Intervention of Cyberhate". Collaborator: Rotherham United Football Club
  • Data Science powered healthcare supply chain network monitoring system in the post-COVID and post-Brexit industrial landscape (May 2021 - Apr 2022, Innovate UK). Partner: Vamstar Ltd
  • AI-powered real-time healthcare supplier profile and COVID-19 supply risk matrix (Feb 2021 - Jan 2022, Innovate UK). Partner: Vamstar Ltd. See our featured article in the media
  • DoubleTapp: Crowdsourcing the Long Tail of Nano-influencers (Nov 2020 - Oct 2021, Innovate UK), partner: DoubleTapp Ltd.
  • Towards a Big-Data Driven Approach to Tackling Urban Waterlogging - A Scoping Study (Jan 2020 - Apr 2021, GCRF Networking Grants)
  • EPSRC Doctoral Training Program – Supervisor Led Project (Oct 2018 - Apr 2022, EPSRC), "Mining health information on the Social Web – towards an understanding of the influence of social media on public healthcare". Collaborator:
  • Early Detection of Cyber Hate on Social Media for Crime Prevention (PI, June - August, 2017, Nullfield Foundation, UK)
  • KTP Web Mining for Just Giving Ltd (Oct 2014 - Jan 2015, JustGiving Ltd.)


My PhD research focused on exploiting background knowledge from various resources to support supervised Named Entity Recognition - a fundamental task of Information Extraction that extracts named entities from unstructured texts. For details, see Named entity recognition: challenges in document annotation, gazetteer construction and disambiguation.

  • INF6001 Information Systems Project Management
  • INF6110 Information Systems Modelling
  • INF6320 Information Systems in Organisations

  • Semantic Web, linked data
  • Information extraction, textmining, natural language processing
  • Social media data analytics, predictive analytics
  • Data mining in other disciplines, such as health, and bibliometrics

  • Invited talk at the 'UiTM Global Webinar on Data Science' (2021): Data Science Through the Lens of Text Mining
  • Invited talk at the 'CounterBalance Seminar Series' (2020) organised by the Santa Fe Institute.
  • Guest lecture for the Computing and Technology Research Showcase: Big data and how it is relevant to me. (2016)
  • Talk at the NTU School of Science and Technology research seminar: Automatic Knowledge Base Construction Using Text Mining. (2016)
  • Invited talk at Schwa lab, the University of Sydney. Aligning relations on Linked Data (2013)
  • Tutorial at ISWC2013 Web Scale Information Extraction: Gentile, A., Zhang, Z.
  • Tutorial at ECML/PKDD2013 Web Scale Information Extraction: Gentile, A., Zhang, Z.
  • Tutorial at ECML/PKDD2011: Ciravegna, F., Varga, A., Zhang, Z. 2011. Mining Complex Entities from Heterogeneous Information Networks, in 22th European Conference on Machine Learning (ECML) and the 15th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD). 
  • Tutorial at EKAW2010: Zhang, Z., Cano, E., Elbedweihy, K., Dadzie, A. 2010. Introduction to Knowledge Acquisition from Social Networking Sites, in the conference on Knowledge Engineering and Knowledge Management by the Masses, EKAW2010.


JATE - Java Automatic Term Extraction library

JATE is the most extensively used library for state-of-the-art automatic term extraction (ATE). It can be used for benchmarking ATE algorithms, developing glossaries and supporting a wide range of Natural Language Processing tasks such as ontology engineering and machine translation. It also provides a generic development and evaluation framework for developing new term extraction algorithms.

The most recent, stable version is JATE 2.0, released under the LGPL license on GitHub.



  • Zhang, Z., Gao, J., Ciravegna, F. (2016). JATE 2.0: Java Automatic Term Extraction with Apache Solr. Proceedings of the Tenth International Conference on Language Resources and Evaluation

Semantic Table Interpretation (STI)

The project implements state-of-the-art semantic table interpretation algorithms, which take as input relational tables, and creates three types of semantic annotations on the table: class for a table column; named entity for table cells; and relations between columns. It is currently hosted on GitHub and has been adapted to support a number of research projects such as Odalic


  • Zhang, Z. (2017). Effective and Efficient Semantic Table Interpretation using TableMiner+. Semantic Web Journal. 8 (6) (in print)

ScholarlyData Link Discovery

The project implements state-of-the-art machine learning based link discovery/instance matching algorithms for linked data. It contains five well-known algorithms which are tested on a task of record deduplication for the project. The code is currently hosted on GitHub



  • Zhang, Z., Nuzzolese, A., Gentile, A. (2017). Entity deduplication on ScholarlyData. In Proceedings of the Extended Semantic Web Conference, pp85-100