RESEARCH | Publications
For the complete list, please visit my research profiles on the following websites:
RESEARCH | Grants
I have been/currently am PI on the following projects
- EPSRC Doctoral Training Program – Supervisor Led Project (Oct 2021 - Apr 2025, EPSRC), "Predicting the Spread and Damage of Hate Speech for Effective Prevention and Intervention of Cyberhate". Collaborator: Rotherham United Football Club
- Data Science powered healthcare supply chain network monitoring system in the post-COVID and post-Brexit industrial landscape (May 2021 - Apr 2022, Innovate UK). Partner: Vamstar Ltd
- AI-powered real-time healthcare supplier profile and COVID-19 supply risk matrix (Feb 2021 - Jan 2022, Innovate UK). Partner: Vamstar Ltd. See our featured article in the media
- DoubleTapp: Crowdsourcing the Long Tail of Nano-influencers (Nov 2020 - Oct 2021, Innovate UK), partner: DoubleTapp Ltd.
- Towards a Big-Data Driven Approach to Tackling Urban Waterlogging - A Scoping Study (Jan 2020 - Apr 2021, GCRF Networking Grants)
- EPSRC Doctoral Training Program – Supervisor Led Project (Oct 2018 - Apr 2022, EPSRC), "Mining health information on the Social Web – towards an understanding of the influence of social media on public healthcare". Collaborator: Diabetes.co.uk
- Early Detection of Cyber Hate on Social Media for Crime Prevention (PI, June - August, 2017, Nullfield Foundation, UK)
- KTP Web Mining for Just Giving Ltd (Oct 2014 - Jan 2015, JustGiving Ltd.)
RESEARCH | Awards
- 2018 Best paper nomination at the 15th Extended Semantic Web Conference
- 2017 Best reviewer for the International Semantic Web Conference
- 2017 FootballWhispers.com: winner of the sports technology awards (as core developer of the prediction algorithm)
- 2013 Know@LOD workshop at the 10th Extended Semantic Web Conference: Best paper
- 2013 International Conference on Intelligent Text Processing and Computational Linguistics: Second best paper
- 2010 International Conference on Knowledge Engineering and Knowledge Management: Second best paper
- 2009 International Conference on Software, Services and Semantic Technologies (S3T): Best student paper
RESEARCH | Professional Services
- Guest editor 'The Use of Machine Learning Approaches in Clinical Research of Diabetes' (Frontier) journal
- Guest editor for the Natural Language Processing Research journal
- Track co-chair of ESWC2021
- Co-organiser of the Semantic Web Challenge on Mining the Web of HTML-embedded Product Data, 2020
- Senior Program Committee member and Session Chairs at ECAI2020
- Co-organiser of the Linked Data for Information Extraction workshop series, 2013-17
- Guest editor for the Semantic Web Journal, Special Issue on Linked Data for Information Extraction, 2017
- As journal reviewer: IEEE Transactions on Knowledge and Data Engineering, ACM Transactions on Knowledge Discovery from Data, IOS The Semantic Web Journal, Elsevier Information Processing and Management journal
- As conference program committee member: International Conference on Knowledge Engineering and Knowledge Management (EKAW), Internatonal Semantic Web Conference (ISWC), Extended Semantic Web Conference (ESWC), The Web Conference (WWW), workshops on Mining Scientific Publications, Workshops on Making Sense of Microposts, Open Knowledge Extraction challenge
RESEARCH | PhD
My PhD research focused on exploiting background knowledge from various resources to support supervised Named Entity Recognition - a fundamental task of Information Extraction that extracts named entities from unstructured texts. For details, see Named entity recognition: challenges in document annotation, gazetteer construction and disambiguation.
I am open to supervise students interested in any of the research topics listed at the top of this web page. I have supervised the following students to successful completion:
- 2018-22 Zhixue (Cass) Zhao, 'Using Pre-trained Language Models for Toxic Comment Classification'
- 2018-22 Jenny Hayes, 'The use of social media for sousveillance'
TEACHING | Modules
I teach the following modules on taught MSc programs in the school
Other modules I have taught in the past include
- Data Visualisation, Introduction to Programming, Information Systems Project Management, Information Systems Modelling, Information Systems in Organisations
TEACHING | PhD students
PhD candidates: I am interested in supervising PhD students in the following topics (please also read my profile at the top of the page). If you have an idea, please feel free to email to discuss it. Note that you need strong programming knowledge and skills, and it is desirable that you have knowledge in at least one of the areas of: machine learning, natural language processing, data mining, text mining, statistics
- Semantic Web, linked data
- Information extraction, textmining, natural language processing
- Social media data analytics, predictive analytics
- Data mining in other disciplines, such as health, and bibliometrics
I am examiner for the following PhD students:
- December 2022: Moritz Walter (chemical toxicity prediction), Information School, University of Sheffield
- June 2022: Anastasios Lytos (argumentation mining),Department of Computer Science, University of Sheffield
- Aug 2021: Ruizhe Li (topic modelling and dialogue), Department of Computer Science, University of Sheffield
- Aug 2020: Jun Zhang (smart city technologies), Informatin School, University of Sheffield
TALKS AND TUTORIALS
Please contact me for detailed slides and/or content.
- Invited talk at the 'UiTM Global Webinar on Data Science' (2021): Data Science Through the Lens of Text Mining
- Invited talk at the 'CounterBalance Seminar Series' (2020) organised by the Santa Fe Institute.
- Guest lecture for the Computing and Technology Research Showcase: Big data and how it is relevant to me. (2016)
- Talk at the NTU School of Science and Technology research seminar: Automatic Knowledge Base Construction Using Text Mining. (2016)
- Invited talk at Schwa lab, the University of Sydney. Aligning relations on Linked Data (2013)
- Tutorial at ISWC2013 Web Scale Information Extraction: Gentile, A., Zhang, Z.
- Tutorial at ECML/PKDD2013 Web Scale Information Extraction: Gentile, A., Zhang, Z.
- Tutorial at ECML/PKDD2011: Ciravegna, F., Varga, A., Zhang, Z. 2011. Mining Complex Entities from Heterogeneous Information Networks, in 22th European Conference on Machine Learning (ECML) and the 15th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD).
- Tutorial at EKAW2010: Zhang, Z., Cano, E., Elbedweihy, K., Dadzie, A. 2010. Introduction to Knowledge Acquisition from Social Networking Sites, in the conference on Knowledge Engineering and Knowledge Management by the Masses, EKAW2010.
TOOLS AND DATA
I have a GitHub webpage for sharing datasets I used for research. These cover research in the areas of terminology extraction, ontology mapping, entity linking, scholarly data linking, Tweet classification, and procedural knowledge extraction.
I am also the creator and contributor of a number of open source research software listed below.
JATE - Java Automatic Term Extraction library
JATE is the most extensively used library for state-of-the-art automatic term extraction (ATE). It can be used for benchmarking ATE algorithms, developing glossaries and supporting a wide range of Natural Language Processing tasks such as ontology engineering and machine translation. It also provides a generic development and evaluation framework for developing new term extraction algorithms.
The most recent, stable version is JATE 2.0, released under the LGPL license on GitHub.
Credits:
- Abraxas- funded by EPSRC
- SmartProducts- funded under the EC 7th Framework Program (231204).
Cite:
- Zhang, Z., Gao, J., Ciravegna, F. (2016). JATE 2.0: Java Automatic Term Extraction with Apache Solr. Proceedings of the Tenth International Conference on Language Resources and Evaluation
Semantic Table Interpretation (STI)
The project implements state-of-the-art semantic table interpretation algorithms, which take as input relational tables, and creates three types of semantic annotations on the table: class for a table column; named entity for table cells; and relations between columns. It is currently hosted on GitHub and has been adapted to support a number of research projects such as Odalic
Cite:
- Zhang, Z. (2017). Effective and Efficient Semantic Table Interpretation using TableMiner+. Semantic Web Journal. 8 (6) (in print)
ScholarlyData Link Discovery
The project implements state-of-the-art machine learning based link discovery/instance matching algorithms for linked data. It contains five well-known algorithms which are tested on a task of record deduplication for the ScholarlyData.org project. The code is currently hosted on GitHub
.Cite:
- Zhang, Z., Nuzzolese, A., Gentile, A. (2017). Entity deduplication on ScholarlyData. In Proceedings of the Extended Semantic Web Conference, pp85-100