Preserving Digital History in the AI Era: The Challenges and Innovations at Goliad Technologies

The emergence of advanced generative AI technologies has significantly impacted Goliad Technologies’ mission of preserving and archiving human-generated content from the internet, specifically from message boards of the 1990s and early 2000s. Here’s a breakdown of the situation:

  1. Goliad Technologies’ Original Mission: Goliad Technologies is on a mission to preserve human-generated content, focusing on internet message boards and social media.
  2. Challenge from Generative AI: With the advent of Large Language Models (LLMs) like GPT-3 and GPT-4, the perception of the value of old message board data has changed among the original owners of these now-defunct forums. There are two main concerns:
    • Increased Perceived Value: Some owners believe that their data has gained more value in the context of AI development. They see potential future opportunities to sell this data as training material for AI models.
    • Ethical and Privacy Concerns: Others are worried that, despite Goliad Technologies’ Ethical Data Usage Policy, the data they provide might eventually be used for AI training, raising concerns about privacy and the original intent of the data.
  3. Goliad Technologies’ Acknowledgment: The company recognizes the legitimacy of these concerns. They understand that the data could indeed have monetary value, especially considering potential future legal frameworks around data usage for AI training.
  4. Impact on Goliad Technologies’ Goals: Goliad Technologies aims to make this historical internet content publicly available for free, for researchers and the general public. However, this openness raises the issue that the data could be used to train AI systems, which aligns with the concerns of the data holders.
  5. Current Status and Future Plans: Goliad Technologies has not yet found a solution to these concerns. However, they are committed to continuing their efforts to innovate and find alternative ways to preserve internet speech and content.

Goliad Technologies: Championing Free Speech and Digital Preservation as 2023 Draws to a Close

As the year 2023 winds down, we at Goliad Technologies reflect on our journey, our accomplishments, and the road ahead. Our core mission has always been to protect and preserve free speech for future generations. In an era where digital information is both ubiquitous and ephemeral, our commitment to safeguarding internet speech has never been more crucial.

Our Mission: Preserving Speech for the Future

We believe that every voice matters and that the digital conversations of today are the historical records of tomorrow. This belief drives us to develop technologies and platforms that ensure the speeches, articles, social media posts, and even the fleeting digital thoughts of today are preserved for future generations. This is not just about saving data; it’s about safeguarding the rich tapestry of human expression.

Gratitude to Our Volunteers

Our journey this year wouldn’t have been possible without the dedication and hard work of our volunteers. These incredible individuals come from various backgrounds, bringing unique skills and perspectives that enrich our work. They are the unsung heroes who work tirelessly behind the scenes, ensuring that our platforms run smoothly and efficiently. To all our volunteers: your contributions have been invaluable, and we cannot thank you enough.

Acknowledging Our Data Providers

A special thanks goes to our data providers. Your trust in us is the foundation of our project. By allowing us access to your data, you’ve played a pivotal role in helping us preserve the digital conversation of our times. We’re committed to handling this data with the utmost respect and integrity, ensuring it serves its purpose as a historical record for future generations.

Celebrating Our Development Teams

Our universal data platform and indexing team deserve a standing ovation. Comprised of dedicated developers who contribute in their free time, this team is the driving force behind our soon-to-be-released open-source project. Their innovation and commitment to excellence have brought us to the cusp of releasing a platform that we believe will revolutionize how digital speech is preserved and accessed.

Looking Ahead: The Open-Source Project

As we look to 2024, we’re excited about the upcoming launch of our open-source project. This initiative is more than just a technological achievement; it’s a commitment to transparency, collaboration, and community involvement. We believe in the power of open-source technology to foster innovation and inclusivity, and we can’t wait to see how this project evolves with the contributions of the global tech community.

In Conclusion

As we bid farewell to 2023, we do so with a sense of accomplishment and anticipation. At Goliad Technologies, protecting and preserving free speech is not just our mission; it’s our passion. We’re excited to continue this journey, and we remain committed to ensuring that the digital voices of today are heard for generations to come.

Goliad Technologies: Championing Free Speech and Digital Preservation as 2023 Draws to a Close © 2023 by Goliad Technologies is licensed under Attribution-NonCommercial-ShareAlike 4.0 International 

Goliad Technologies Embraces Ethical Data Usage with New Policy Update

Introduction:

In the digital age, data is not just an asset; it’s a snapshot of human culture, thought, and history. Goliad Technologies, a non-profit organization at the forefront of preserving internet history, has been dedicated to collecting and indexing this precious data. By working closely with internet forum operators, both past and present, we have curated a vast repository of internet conversations, debates, and exchanges that map the evolution of online communities.

The Commitment to Data Preservation:

Goliad Technologies believes in the power of information. Our mission is to safeguard the rich tapestry of internet discourse and make it universally accessible for researchers, scholars, and the curious public. We meticulously convert this treasure trove of data into a universal format that ensures longevity and ease of research, with the ultimate goal of releasing it to the public domain.

Addressing the Community’s Concerns:

However, with the rise of advanced technologies such as large language models (LLMs) and AI learning platforms, we have encountered a growing concern among our data providers. The question that stands is whether the information they entrust us with could be used to train the next generation of AI. We understand that the implications of this possibility extend far beyond the current landscape of technology and venture into the future of digital ethics.

Updating Our Terms of Service:

In response to these concerns, Goliad Technologies is taking a proactive step. While our terms of service have always been permissive, designed to foster an environment of free and open information powered by open-source software, we recognize the necessity for evolution. Thus, we are amending our terms to address the specific issue of AI training.

Empowering Our Data Providers:

Our revised policy will empower forum owners with the choice to allow or disallow their data from being used as training data for AI models. This choice is a testament to our commitment to respecting the intentions and rights of our contributors. We believe that this option will not only address the immediate concerns but also set a precedent for responsible data stewardship in the future.

The Future of Goliad Technologies:

As we implement this policy change, we wish to reiterate our stance as a free and open platform. The prohibition of using our data for commercial purposes remains firmly in place, underscoring our dedication to preserving human-generated content for the enrichment of society, not for profit.

Conclusion:

Goliad Technologies is navigating the fine line between innovation and ethics. We are committed to the preservation of internet history in a way that honors the trust of our contributors and the integrity of our mission. We invite our community to join us in this new chapter, as we continue to safeguard the digital legacy of human interaction for generations to come.

Goliad Technologies Embraces Ethical Data Usage with New Policy Update © 2023 by Goliad Technologies is licensed under Attribution-NonCommercial-ShareAlike 4.0 International 

Behind the Code: The Journey to Build a Universal Search Platform for Internet Forums

Developing a universal platform designed to aggregate and index data from many internet forums, such as those running on phpBB, XenForo, and other forum software, represents a complex and ambitious endeavor. This statement speaks to the technical and logistical challenges that a team of volunteers has been tackling for more than two years. Expanding on this, there are several layers and components to such a project that can be explored:

  1. Compatibility and Interoperability: Each forum platform has its own unique database schema, API, and way of storing data. Creating a universal platform requires a deep understanding of these differences and developing a flexible system that can normalize disparate data structures into a consistent format.
  2. Data Ingestion: The platform must be equipped with a robust data ingestion framework capable of connecting to various forum databases, including SQL-based databases like MySQL or PostgreSQL, and extracting the necessary information. This requires the creation of multiple connectors and adapters to handle the specific technical requirements and authentication protocols of each source.
  3. Scalability: Internet forums can contain millions of posts and threads. The platform must be scalable to handle vast quantities of data without degradation in performance. This involves implementing distributed computing technologies and designing an infrastructure that can grow with the size of the data.
  4. Search and Indexing: After ingesting the data, the platform needs to index it to make it searchable. This is a non-trivial problem, given the unstructured nature of forum content. Developing advanced search algorithms and utilizing indexing services can be necessary to ensure that users can quickly and accurately find the information they seek.
  5. Data Transformation and Cleaning: Forum data will likely need to be cleaned and transformed to be valid. This might involve removing duplicates, correcting errors, and filtering irrelevant or spam content. Natural language processing (NLP) and machine learning algorithms can be crucial.
  6. User Interface and Experience: A vital component of the universal platform is the front end through which users interact with the indexed information. This interface must be intuitive and user-friendly, incorporating advanced search features and filters to help users navigate the vast information pool effectively.
  7. Privacy and Legal Considerations: Forums often contain personal data. The platform must address privacy concerns and comply with data protection laws like GDPR in Europe or CCPA in California. This means implementing robust security measures and potentially anonymizing data where necessary.
  8. Continuous Updates: Forums are dynamic, with new content being posted constantly. The universal platform must be capable of continuously updating its indexes with new data without interrupting user access or requiring significant downtime.
  9. Community Involvement and Governance: Since the project is volunteer-driven, it relies on the expertise and commitment of a diverse group of individuals. Effective governance, clear communication channels, and community engagement are critical to coordinating efforts and maintaining the project’s momentum.
  10. Sustainability: Long-term sustainability is a concern for any volunteer project. The team must think beyond the technical challenges and consider how to fund the platform, whether through donations, grants, or other means and how to keep the volunteer base engaged and growing.

In summary, creating a universal platform for indexing multiple Internet forum platforms is a multifaceted project that goes beyond mere technical development. It encompasses data science, software engineering, systems architecture, user experience design, legal compliance, community management, and sustainability planning. Despite the challenges, such a platform could provide significant value by making a wealth of forum data more accessible and searchable, thereby facilitating information sharing and community building across diverse subjects and interests.

Behind the Code: The Journey to Build a Universal Search Platform for Internet Forums © 2023 by Goliad Technologies is licensed under Attribution-NonCommercial-ShareAlike 4.0 International 

The Paywall for History: A Barrier to Preservation and Research

In recent times, we’ve observed a growing trend among leading internet organizations, exemplified by platforms like Reddit and the now-rebranded Twitter (referred to here as “X”). These platforms have begun to implement paywalls for their APIs—set tools and protocols that allow third-party applications to interact with and extract data from their systems. The commercialization of API access has profound implications for research, archiving, and the preservation of user-generated content.

The Value of User-Generated Content

Before delving into the repercussions of this trend, it’s crucial to understand the value of the content found on platforms like Reddit and X. These platforms are more than just social networks; they are vast repositories of human thoughts, experiences, reactions, and creations. Scholars, researchers, and archivists see them as treasure troves of modern-day history, providing raw and uncensored glimpses into global culture, events, and personal narratives.

Limiting Access and Its Implications

By putting a price on accessing this data, platforms essentially gatekeep a significant portion of contemporary digital history. For organizations, especially non-profits and research institutions like ours, whose primary objective is to preserve and index such content for the benefit of future researchers, this becomes a barrier too high to scale.

Take, for example, Goliad, our organization. In 2022, we raised a modest sum of $1500—a figure that pales in comparison to the monthly $5000 fee required to access X’s API. This disparity in funding and costs leads to a situation where only affluent organizations or those backed by substantial resources can afford to archive and study this data.

A Monopolization of History

What this translates to is a monopolization of history. When only a few entities can afford access, it narrows the scope of research and perspectives. The risk is that future generations might only get a one-sided or limited view of our times, based on what these select few organizations choose (or can afford) to preserve.

The Future Trend

The gravest concern is the potential snowball effect. If influential platforms like Reddit and X can successfully commercialize their APIs, it could set a precedent for other social networks to follow suit. This could lead to a fragmented digital historical record, with vast chunks of content locked behind paywalls.

Conclusion

History, especially digital history, belongs to all. It’s a collective record of our times and should be accessible to everyone. The move by platforms to commercialize access to their data might be profitable in the short term, but it could come at the cost of an impoverished understanding of our era in the long run. It is our hope that platforms reconsider such policies and recognize the importance of keeping history open and accessible to all.

The Paywall for History: A Barrier to Preservation and Research © 2023 by Goliad Technologies is licensed under Attribution-NonCommercial-ShareAlike 4.0 International 

Goliad Technologies Ends Affiliation with Goliad Pittsburgh and Terminates College Affiliate Program

In light of recent events, Goliad Technologies has announced the suspension of its association with Goliad Pittsburgh, an affiliate run by students from Carnegie Mellon University. This decision directly responded to unauthorized actions carried out by a small number of individuals within the Goliad Pittsburgh organization.

These unauthorized actions raised significant concerns for Goliad Technologies, prompting a reevaluation of their partnership programs. After thorough consideration, the company decided not only to sever ties with Goliad Pittsburgh but also to discontinue its college affiliate program entirely. This termination took effect on April 1, 2023.

Goliad Technologies emphasized its commitment to maintaining the highest standards of professionalism and integrity in all its associations. While the company recognizes the potential and innovation that college affiliates bring, the recent incidents underscored the need for stringent oversight and accountability. The cessation of the college affiliate program signifies Goliad Technologies’ proactive approach to safeguarding its reputation and ensuring that all affiliated entities align with its core values.

The Importance of Archiving Controversial Digital Platforms for Cultural Research

In today’s digital age, websites like Conservapedia, Encyclopedia Dramatica, GenMay, Something Awful, Hard Forums, 4chan and even more overtly controversial platforms like Stormfront play undeniable roles in the shaping and reflection of cultural dialogues. While some argue that preserving these platforms equates to the conservation of hate rather than genuine culture, there is a compelling counterargument rooted in the value of historical and cultural analysis.

  1. A Full Picture of Society: Every era has its controversial, divisive, and sometimes even distasteful elements. By preserving these sites, we ensure that future researchers have access to a holistic representation of our times – warts and all. Selectively archiving only the parts of culture we deem “acceptable” might paint a skewed picture of reality, depriving future generations of understanding the entirety of our societal discourse.
  2. Understanding Extremes to Grasp the Mainstream: Sometimes, it’s through the lens of extreme views or satirical content that we can better understand the mainstream. Controversial websites often highlight the fringes of society, offering insights into counter-cultures, reactions against the status quo, and the boundaries of social discourse.
  3. Chronicles of Internet Culture: Websites like Something Awful and GenMay are instrumental in charting the evolution of internet culture. Memes, trends, and internet-born movements often have their roots in such forums and platforms. By archiving these sites, we trace the lineage of online cultural phenomena, providing context for how and why certain trends emerged.
  4. Analysis of Hate and Counter Movements: While platforms like Stormfront might propagate viewpoints many find offensive or harmful, understanding these platforms’ existence and content is crucial for sociologists, psychologists, and political scientists. By studying these extreme perspectives, researchers can gain insights into the origins of such views, their appeal, and the sociocultural conditions that foster them.
  5. Preserving Digital Artifacts: Just as we conserve old manuscripts, artworks, or everyday objects from past eras, digital websites and forums are the artifacts of the 21st century. They provide invaluable insights into our ways of communication, our concerns, our humor, and our conflicts.

In conclusion, while it is essential to approach these archives with critical discernment, dismissing them outright would mean overlooking significant cultural touchpoints. The role of archiving is not to endorse or promote certain views but to ensure that the richness, complexity, and diversity of our era are available for future scrutiny and understanding.

The Importance of Archiving Controversial Digital Platforms for Cultural Research © 2023 by Goliad Technologies is licensed under Attribution-NonCommercial-ShareAlike 4.0 International