Statement on the Researcher Tool

From the Gaming Alexandria Moderators.

Greetings all. We felt the need to properly address the situation from our Patreon. To catch everyone up, Hubz (Gaming Alexandria founder) created a research program for magazines using AI tools. Patrons and others expressed discomfort with Patreon funds being used as part of creating the program. Hubz issued a statement on the matter on the 14th of March and substituted the Patreon funds with his own money.

Some members of Gaming Alexandria and the wider preservation community have also voiced concern about the use of AI for the purpose of accessibility and searchability of preserved materials. A few have asked for a statement on the use of AI by Gaming Alexandria. As a collective, we have no single opinion on this, nor should any one person be singled out as part of the decision-making going forward.

Our priority at Gaming Alexandria is – and remains – the collection and sharing of primary sources of game history. Though we support other initiatives, the core mission is ensuring that we provide high quality (when available) physical objects in a digital form for preservation and research purposes. Providing magazines, box art, manuals, and sometimes games as close to their original presentation as possible is vitally important to us – that’s why we try to include RAW scan files as well as other formats.

In this, we don’t intend to change. We are not seeking to add any interpretive layer to our scans that would interrupt the desire to view material in its pristine, digitized form.

Taking what we’ve collected and turning that into useful information is something we also consider. We continually attempt to improve the visibility of the objects our members assemble so that it can be useful to retrogaming hobbyists, historians, and anyone else who can find a use for it. Processes like metadata, OCR, and even the public sharing of materials are not solved – they continually evolve.

Gaming Alexandria is not a corporation – we have no incentive to try to sell you on AI. Many of our members – both staff and our community – have extreme hesitancy about the progress of AI as used by large companies. At the same time, many of us use tools that are now considered part of AI workflows in our work – both for Gaming Alexandria and elsewhere. These include OCR, machine translation, and programmatic tools that were part of our workflows well before the ChatGPT/LLM craze conglomerated everything under an umbrella of “AI”.

Particularly as it comes to OCR, we’ve used and tested various software packages (Adobe, Tesseract, ABBY) as well as AI models over the years to deal with the variety of languages and formats of our magazines. Optical Character Recognition functions through a data-feeding model, using comparisons from other documents to pick out the characters from the background. Getting acceptable and useful output from this has always proved a difficult challenge.

What Hubz found in his most recent testing was that for both English and Japanese, Google Gemini and Anthropic’s Claude produced exceptional results. The purpose behind the tool was to harness these capabilities for researchers, largely for accurate searching through Japanese publications (which many tools that rely on traditional OCR struggle with).

In the same vein, we’ve also tested a number of machine translation models. Many members of the Gaming Alexandria community utilize machine translation as a cornerstone in their research into publications written in languages they don’t natively understand. The majority of historical information plumbed by these researchers in areas like Japan are – and have been – reliant on machine translation.

This reality does not mean the work of professional translators is minimized. In fact, when using machine translation we sometimes discover new information we then hand off to professional translators for high accuracy translation work. There are also many bilingual people part of the Gaming Alexandria community who have assisted with translation. Even for most of these members, translation software has been a cornerstone of their work – even before web-based machine translation was widely available.

Highlighting this use is important in communicating the larger purpose of this post: That the methods on how we utilize the data we’ve collected are both the same and fundamentally different to how we’ve always done it.

The concerns over AI models are legitimate and we don’t seek to minimize them. Many present concerns over their collection of data and their use for many methods that we as Gaming Alexandria moderators as well as our members largely do not condone. The approval of these individual technology solutions ultimately comes down to a personal judgment and should never be taken as blanket acceptance. Hubz has decided to separate the Researcher Tool project from Gaming Alexandria to make that point clear.

The tools for interpreting game history have evolved since we started and will continue to do so. We do not promote uncritical use of any computer tool – whether it be an alignment tool in a photo program, a voice transcription service, or a machine translation – for total accuracy. Gaming Alexandria has always been a collaborative effort and every individual member of our community has their own opinion on the utility of tools, plus whether or not they even fall under the banner of “AI”. The most productive conversations you can therefore have are with individuals.

Despite the difficult discussions technology sometimes brings, you can be certain our motivation is to continue doing what we’ve always done – preserving and providing video game history to the best of our abilities.

Signed,

Gaming Alexandria Moderators: Ethan, Hubz, Jonas Rosland, Densy

About Gaming Alexandria Moderators

11 thoughts on “Statement on the Researcher Tool

  1. What does OCR have to do with “generative AI” large language models and your apparent insistence in using it? If individual people want to use hallucination machines to pretend to know what a text says, that’s their prerogative.

    I don’t think whoever wrote this statement gets what the issue was.

    1. Hubz here, as said above the project is now separated from GA. This is my personal thing that I enjoy working on and am using to actually have some idea of what the stuff I’ve scanned in says. If anybody else wants to use it they are free to do so. Some people do want to use it, and some don’t want anything to do with it. Either way is cool with me. To me it blows away traditional OCR transcription hence why I’m using it. If you have viable alternatives i am all ears.

      1. I think I did not make my point clear—machine learning for an optical character recognition solution is not the issue people had with the tool, as far as I know. That is not “generative AI”, which is generally what people are against.

        People were concerned/mad about the automatic translation y’all showed accompanying the OCR.—and that money from the Patreon was used for that ensuring that money from that pool went to Sam Altman’s pocket, but that was resolved without much issue.

        It’s a problem of sourcing, people would see the machine translated/LLM translated text (at this point these have been entwined, though not in their entirely, as machine translation libraries are possible to run without phoning in to a data center or external processing) and assuming it’s real and accurate. If the translation is integrated into the tool and presented as part of the “scanned source”, there’s a degree of “we’re vouching for this” that is assumed. It muddies the actual archival nature of your work.

        1. Honestly almost every serious researcher I’ve shown this to has loved it. Because due to the translations they can now actually search and navigate the magazines to find stuff to get professionally translated.

          I note all the transcriptions and translations are AI generated and are not 100% perfect. I can and probably will make that disclaimer even bigger. I never vouch for these being perfect at all. I think they’re pretty decent though at giving a pretty good idea of what’s on the page and that will likely only get better with time. My goal has always been to provide information, what people do with it is up to them and I’m not going to gatekeep it. I think most serious people are smart enough to figure out that these are not perfect and if they cite it as fact they could end up looking like big dumbasses down the road.

          1. Except people won’t tend to find stuff to get it professionally translated—they’ll take what is given as “good enough”. Time and again this has been shown in fan translations of games, of manga, of magazines. The chances of a thing getting translated properly after a free, machine translation is available drop drastically.

            Also, please think about the fact that you believe that people would “end up looking like big dumbasses” by trusting what your tool gives them. And then think about what that says about your tool.

      2. People complaining are embarrassing.

        This Anti-AI wave is beyond silly, i’m actually embarrassed for people who are complaining about this amazing accessibility tool.

        Heaven forbid some of us want to play the thousands upon thousands of foreign language exclusive games.

  2. I can’t control other people, i guess your solution is just never have any of this available despite imperfections here and there which I fundamentally disagree with. Sure you can say “pay human translators” but you can’t argue in good faith that is reasonable due to the scope. Unless you want to throw millions of bucks my way then yeah sure we can do that.

  3. In the past, it’s not like anyone was paying people to translate these magazines. People who could were doing the translations for free to support the community. I know because I’m someone who has translated quite a few things for people in the game preservation community. Most fan translation anywhere is unpaid work.

    I was also concerned when I thought the Researcher Tool would be used for official translations, but I now know that it was never intended for that. It was to give the everyday person a way read magazines in their native language with an acceptable-level translation.

    I think Hubz out of anyone understands the scope of what it would take to get a human to translate all of these magazines. If there is a tool out there to give people access to them, without people needing to pay hundreds of dollars or begging someone they know for a free for a translation, I think it’s a good thing. I understand the pushback against AI, but I see a lot of very online behavior in some of the outrage about this.

    I appreciate Gaming Alexandria for working to find a balance regarding this issue.

Leave a Reply

Your email address will not be published. Required fields are marked *