| Petru Rebeja
It's been more than a year since I've written something on this blog, and since today is the end of the year, I'll pretend that I haven't skipped 2 years since my last year in review blog post and start writing the review for year 2021.
Although I changed jobs this year, I consider 2021 to be a year with great accomplishments in my academic career.
Deep Learning for Old Romanian project started in october 2020, but the first months of the project were spent on
bureaucratic legal procedures required for starting the project; as such I consider it to have begun in January 2021.
Since the project started I have developed and became responsible for several areas of the project: (i) pipelines for importing corpus entries and exporting training data, (ii) project website, (iii) application for annotating corpus entries (initially created by my colleague Cristian Pădurariu and afterwards handed-over to me), and more.
With this project I officially became a collaborator (although project based) of the Iași Branch of the Romanian Academy.
Romanian Language Processing Laboratory
Developed entirely by my colleague Cristian Pădurariu, the Romanian Language Processing Laboratory is a platform which aims to aid (young) researchers in Romanian Natural Language Processing and host Romanian Language Resources.
In this project I'm responsible for the
Ops part of
DevOps: hosting the application on
Docker and continuous deployment via
LiRo Benchmark and its associated paper published in NeurIPS 2021
By far the achievement that I am most proud of for this year, and during which I grew both as a software developer and a researcher.
Together with Ștefan Dumitrescu, Beata Lorincz, Mihaela Găman, Andrei Avram, Mihai Ilie, Andrei Pruteanu, Adriana Stan, Lorena Roșia, Cristina Iacobescu, Luciana Morogan, George Dima, Gabriel Marchidan, Traian Rebedea, Mădălina Chitez, Radu Tudor Ionescu, Răzvan Pașcanu, and Viorica Pătrăucean we created LiRo benchmark — a benchmark for Romanian Language Tasks.
During the development of the benchmark platform we performed several experiments, created RO-STS (contributed by me), XQuAD-ro (created using services provided by professional translators), and Wiki-ro (by Ștefan Dumitrescu) datasets.
The results, the benchmark platform, and the datasets are presented in the paper published in NeurIPS 2021 and co-authored with Sebastian Ruder and Dani Yogatama.
Failed the deadline on ParlaMint-RO
I'm not proud of missing the deadline for ParlaMint call for corpora but I will still put this project on the review list because failing is part of learning.
ParlaMint-RO is not yet part of the ParlaMint corpus, but while working on this project I grew my Python development skills, got better understanding of TEI encoding and picked a few tricks on working with RELAX NG.
Romanian AI Days 2021
Towards the end of the year, I got the opportunity to get involved in the organization of Romanian AI Days 2021 and also participate in a round table session that discussed AI/ML PhD programs in Romanian universities.
ConsILR 2021 and volume editor
Last but not least, in the middle of December I got yet another opportunity to be part of the organizing committee of ConsILR 2021, where we also submitted a paper which was accepted.
Looking forward to 2022, its exciting new opportunities, and work on already started projects I wish everyone a Happy New Year!