About the project

Respublica is an original research-and-technology project developed in cooperation with the Ignacy Kapica Foundation as part of the KapicAI initiative. It combines history, archival science, natural-language processing and modern methods of information retrieval over large collections of source data.

The aim of Respublica is to make the history of the old Polish-Lithuanian Commonwealth — as recorded in court books, registers and other archival materials — accessible to a contemporary audience. The project is not limited to a passive presentation of sources: its ambition is to build a tool that lets you ask questions, follow research leads and discover specific people, cases, conflicts, localities and social phenomena preserved in old records.

The system processes and organises source material from surviving court books and registers, including documents publicly available on szukajwarchiwach.gov.pl. This data is then transcribed, ordered, indexed and prepared for searching using methods proper to modern systems based on large language models and retrieval augmented generation techniques.

Questions asked by users of the Respublica portal are analysed by algorithms that search a very large corpus of editorial units created from real court records: verdicts, writs of summons, bailiffs' reports, descriptions of disputes, property entries, inventories of movable goods, documents concerning debts, pledges, inheritance and other events noted in the old books.

This approach lets us look at history not only through the prism of great political events, but also through the everyday experience of people who lived several centuries ago: their disputes, estates, family ties, obligations, neighbourly conflicts and the workings of the old system of justice.

Computing infrastructure

Carrying out a project of this scale requires access to advanced computing infrastructure. The support of the Cyfronet AGH Academic Computer Centre is of key importance here; it provides the resources of the Helios supercomputer — one of the most important computing infrastructures available to Polish science.

Helios, not coincidentally named after the “God of the Sun”, is a hybrid supercomputing system designed for demanding scientific, analytical and artificial-intelligence workloads. Its architecture includes CPU and GPU resources as well as infrastructure dedicated to working with large datasets. This makes it possible to run computations that in a classic server environment would be impractical, too costly or simply infeasible within a reasonable time.

In the case of Respublica, Helios's computing power enables the processing of large batches of source material, the preparation of data for indexing, the building of vector representations and the testing of semantic-search mechanisms. Without such infrastructure, developing a system of comparable scale would require incomparably greater organisational and financial effort.

Automation and growth of the knowledge base

Respublica has been designed to be as automated as possible. New transcriptions will gradually be incorporated into the knowledge base and then processed within a repeatable pipeline covering data validation, segmentation of the material, indexing and preparation for searching.

As a result, the project can grow alongside the volume of source material that has been worked through. Each further book increases the system's capabilities, broadens the range of possible questions and makes it possible to better reconstruct the network of people, places, cases and phenomena present in the old records.

The character of the answers and working with sources

The answers generated by Respublica are of a popular-science and exploratory character. Their task is to help find interesting threads, point to possible directions for further enquiry and make work with large, difficult and dispersed source material easier.

Every answer from the system should be read together with the sources it indicates. Respublica does not replace the critical analysis of a document, but is meant to speed it up, organise it and make it more accessible. For research, genealogical, publishing or other high-certainty uses, answers must be verified against the cited source materials.

This is especially important because both the source data and its transcriptions may be prepared with the involvement of artificial-intelligence models. Such models — particularly when working with manuscripts, damaged, barely legible, archaic material or text written in chancery language — may make mistakes and produce readings that seem plausible but are insufficiently grounded in the source itself.

This applies in particular to first names, surnames, place names, dates, sums and detailed relationships between people. Limited trust in the result is therefore not a weakness of the project but part of a sound methodology for working with sources. Respublica shows leads, speeds up enquiry and organises material, while the final interpretation should always rest with an aware user working with the source.

The idea behind the project

Respublica grew out of the conviction that new technologies can genuinely broaden access to history. Old court books contain an enormous number of stories that for centuries remained locked in archives, available mainly to a narrow circle of specialists. The project aims to help rediscover them — in a modern, scalable way that is comprehensible to a contemporary audience.

It is at once a technological experiment, a tool for popularising history and an attempt to build a bridge between classic source work and the possibilities offered by contemporary artificial intelligence.

I wish everyone pleasant use, interesting discoveries and fruitful searches.

Michał Werpachowski