Designing an AI Policy

The following article was published on Medium.com and was written after the Science Fiction and Fantasy Writers Association (SFWA) called for members to submit links to blogs discussing the ramifications of Artificial Intelligence.

Developing your own “Best Practices” for the Ethical Use of AI

Generative AI technologies have blitzed their way across popular consciousness seemingly overnight, and appear to be poised to transform multiple industries. In this article, I’m going to talk about my experiences with these technologies as I build a new small press website, SciFiwise.com. I’ll describe concrete practices that we’re implementing at SciFiwise for the ethical use of these technologies.

We’ve found that the appropriate use of new AI tools has increased our productivity and allowed just two people (my brother and I) to build a website with advanced, custom features in just a matter of weeks. But we are also pointedly aware of the controversies swirling around these technologies. We knew that we needed a balanced policy to guide our use of AI systems.

In this article, I share our policy and discuss why you need one for your site.

Why Am I Qualified to Give Advice on this Topic?

A bit about my background: I have one foot in the technology world and the other in the writing world.

On the tech side, I have a master’s degree in computer science, was a distinguished software engineer at Bell Labs, and after that served as Chief Technology Officer at several startups where I managed as many as 50 software engineers. Although I am not formally trained in AI, it’s been a lifelong interest of mine. I have implemented neural networks (NN) and other AI technologies while experimenting with and studying the topic. In the mid-2000s, I implemented something akin to what today would be called a generative Large Language Mode (LLM), although the technology at the time produced only barely coherent sentences.

On the writing side, I am a professional author. I published an 840-page technical book in the prestigious Addison-Wesley Professional Computing series, as well as many technical magazine articles, in the 1990s. I have also published about a dozen science fiction and fantasy short stories in pro markets over the last few years and I’m a full member of SFWA (The Science Fiction and Fantasy Writers Association). On the business side of writing: My brother and I co-founded Fictionwise.com, an early ebook website that eventually grew to be the largest indy ebook seller in the USA, and as part of that venture I worked with many well-known speculative fiction authors to publish their works as ebooks for the first time. (We sold Fictionwise to Barnes and Noble in 2009. At Fictionwise, we always had a philosophy of promoting our authors to the fullest extent possible while protecting their copyrights).

In addition, I have a good working knowledge of ChatGPT and DALL-E, two of the recently released AI tools from Open AI that are receiving a lot of media attention. We’ve been using these tools since December 2022 to build SciFiwise.com. I’ve used ChatGPT for programming (JavaScript, php, HTML/CSS), writing marketing text, reviewing draft legal contracts, and more, as part of developing the website. I have extensively experimented with ChatGPT to see where its limitations lie (with some extremely interesting and sometimes surprising results). I have hundreds of hours of experience working with these technologies.

In future posts, I’ll discuss more of what I’ve learned and will share those interesting results. For this article, I just want to discuss policies that small press websites should consider when dealing with AI.

Example: Why Small Press Sites Should Care about Generative AI

This article is targeted specifically at small press website operators. Sites that contain fictional stories and/or nonfiction articles, artwork, and associated materials for readers. It’s our contention that even if your site does not currently use AI technology, you still need an AI policy. Right now.

For example, do you have a policy clearly stated in your terms of service document about the use of AI web crawlers on your site? If not, then in the future, or perhaps even now, AI web crawlers might use content on your site to build new LLMs (Large Language Models, like ChatGPT) or new Stable Diffusion Art models (such as used in DALL-E, Midjourney, etc.) based on your content, and the content of creators who have contributed to your site. Of course, those crawlers may ignore your policy and do that anyway, but you’ll have much better legal standing if you explicitly deny, in writing and in public, that this is clearly not an allowed use of your web site.

Why should you care whether your site gets crawled by an AI model builder?

Because eventually, an LLM will be trained just on “good” stories and articles. The current ChatGPT was trained on general Internet pages, most of which, at best, present an average level of writing. So its fiction writing is also very average, using frequent clichés and overworn plot devices (more on this in future posts).

But it’s only a matter of time before someone decides to train an LLM specifically to write pro-level fiction or pro-level nonfiction articles. (It may already be happening.) Your authors would likely object to their work being used to train an AI, much as current artists are up in arms (rightly so, with at least one lawsuit already in progress) about the ability of some AI art systems to copy a living artist’s style by just using a keyword or two.

Some Policy Subtleties

Policies that seem clear-cut can, on analysis, reveal unexpected subtleties.

For example: What is your AI art policy?

You might say, “We don’t use it on my site and never will.” But, how do you know you don’t use it?

Are you using art from free-to-use image websites? How do you know the art you’re using from them was not generated using AI? (For example, the image at the top of this article is from a free image website. Can you tell if it is AI generated? I can’t, and most sites don’t disclose that information.) Do you accept open submissions of artwork? How do you know the artist selling you an image didn’t use AI? Did you even ask them? Are you simply outsourcing the use of AI art to another person or to an image website without even knowing it?

Having a policy in place clarifies how you would acquire your site artwork, and what questions you would ask an artist or image website before using their work.

Current AI Policies of SciFiwise.com

Image at the top of SciFiwise.com including featured images attached to stories that were created by AI

SciFiwise.com (pre-launch, interactive crawling site banner design) built in WordPress with ChatGPT-assisted Javascript/PHP/HTML/CSS coding and AI art to promote science fiction stories written by award-winning human authors. AI Art was created using Dall-e 2 and Midjourney by SciFiwise.com and post-processed using image manipulation tools such as Canva.

As a full example, here are the current SciFiwise.com policies. We will likely revise and extend these as AI generators evolve and as we learn more about them.

Your policies may be different, based on your needs, and you may or may not agree with the specifics of what we’re doing at our site. There is room for intelligent people to disagree; this is all new and rapidly evolving. Regardless, I urge you to consider what your policies should be and implement them.

In our policy, we’re trying to strike a balance between using these technologies ourselves to provide a great reading experience to our users, while weighing the needs of the more general creator community (writers and artists). These technologies can give small press operations a significant “force multiplier” if used responsibly.

Here is a list of SciFiwise’s current AI policies:

The site’s Terms of Service page specifically disallows AI web crawlers from indexing the site. (Example language below.)
The robots.txt file is configured to discourage AI web crawlers. (Example code below.)
There is an AI Disclosure page on the website that explains our AI policy and how AI is being used on the site, including:
The use of AI code generators to write HTML, CSS, JavaScript, and PHP code to implement website functions.

- The use of AI LLM to proofread incidental text that appears on the website, but not the fictional story content.
- The use of AI LLM to draft or review specific parts of legal contracts used on the site.
- The use of AI LLM to draft marketing materials and newsletter content.
- The use of AI art programs to create some graphics on the website, with safeguards in place to protect living artists.

4. The AI Disclosure page also details restrictions on the use of AI on the site, areas where we will not use AI or will constrain how it is used:

- AI is never used on our site to draft or edit fictional stories that appear there. All fictional stories posted on our site were written by human beings who were paid for their stories.
- AI art generators use only ethical prompts, i.e., prompts are never directed to copy the style of any living artist or of any copyrighted work of art. Negative keywords are used when possible (such as –no artstation) to discourage copying living artists. Where practical, seed images that are in the public domain are used to guide the image generation process, this ensures that the bulk of the image is based on public domain sources.
- Any graphic made by an AI art generator is searched using reverse image lookup (i.e., Google Lens) to check whether the art is too close to an existing copyrighted image on the Internet (public domain images are ok). If the match is too close to existing copyrighted work, that piece of generated art is not used. (What constitutes “too close” is a complex topic in copyright law, but we always err on the side of caution, probably more strictly than the law requires.)

Google Lens page image showing an AI generated "Robot Cat" on the left, and similar images found online.

On the left, an AI-generated image of a robot cat. A Google Lens search found too many similar images and the art was rejected for use at SciFiwise.com. (Image captured by author of a Google Lens webpage)

- Where possible, watermarks and signatures (i.e., the DALL-E color stripe ‘signature’) are left on images used on the website. In some cases, necessary cropping, compositing, or background removal forces us to remove these marks. In that case, the AI source is disclosed on the AI Disclosure page on the site.
- We expect that in the near future, one or more of the AI art systems will have options to explicitly disallow using living artists’ work in their models, or will have built models that at least allow artists to opt out (not a perfect solution but a step in the right direction). When such “ethically trained” AI art systems become available, we will transition to using them.

Policy Explanations

A few of the policies above could use a little more technical detail, in the interest of helping small press sites implement the ideas.

The Terms of Use (or sometimes Terms of Service) on your website explain how you expect users of the website to behave. Our policy suggests that you include terms that explicitly disallow “crawlers” from AI systems from using the materials posted on your site. These crawlers are software programs that index websites and can grab information from them. All of the current generative AI technologies, both art and text, are “trained” on data, much or all of which comes from the Internet.

Here is example wording for such a Terms of Use page:

You agree not to use any automated system, including without limitation, ‘robots,’ ‘spiders,’ ‘web crawlers,’ or any other type of software system, to access or scrape content from the Website for purposes of creating AI models, such as LLM and Stable Diffusion (i.e., LDM) models, or other AI models. Any use of such automated systems to train or create AI models is strictly prohibited and may result in the termination of your access to the Website and copyright violation.

(Draft was written by ChatGPT with prompt: “Write a Terms of Use clause that disallows AI web crawlers.”)

Our policy attempts to discourage AI crawlers from using the robots.txt file. The robots.txt file directs web crawlers which parts of a website may be indexed. It is normally used to help search engines index the correct information on a website and avoid private pages or pages that are redundant.

Unfortunately, there is no standard naming convention for AI crawlers. I propose using AI* to specify an AI crawler and will propose this as a standard. The more websites that use it, the more likely that it will indeed become a standard, so feel free to let me know if you’re on board.

The Common Crawl bot is also known to be used by ChatGPT, Stable Diffusion, and some other AIs. Although it is also used by many other projects that do not involve AI, a good defensive move would be to block it. (You won’t be blocking Google or other major search engines by doing this, as they have their own proprietary crawlers, so your SEO should not suffer.) An example robots.txt entry that implements both of these ideas:

User-agent: AI*  
 Disallow: /

User-agent:  CCBot
 Disallow: /

(These commands were written by ChatGPT with the prompt: “Write a robots.txt entry to disallow all crawlers whose names start with AI- and also common crawl bot.”)

Note: You could be more surgical with the directives in robot.txt if you have a good way to distinguish the URLs of story pages.

Again, crawlers may ignore these directives, but at least having these directives on your site puts crawlers on notice that you are trying to enforce your policies. If a class action lawsuit ever occurred against a future AI company for misusing web information, you would have proof that you intended to restrict their use on your site and took concrete actions to do so.

Conclusions

AI technology can allow small press websites to quickly add custom features (by generating PHP/Javascript/HTML/CSS code), decorate the website with low-cost, unique images and icons, quickly clean up marketing language, draft legal contracts, and more. But it also poses ethical challenges because of the way some AI model builders have selected training materials.

Whether you like it or not, every small press website, even those that do not currently use any AI technology, needs to have an AI policy. At the very least, you may want to discourage AI systems from using your authors’ and artists’ work to build future models. If your site does use AI technology, disclosing your policies shows your readers that you are aware of AI’s ethical pitfalls and are taking proactive steps to address them.

One thing is certain: these technologies are not going away. They will continue to improve, possibly at a rate no one can even fathom. Ignoring these AI systems is not a viable strategy in 2023 and beyond. And as I’ve tried to demonstrate in this article, going Butlerian and just saying you won’t use any AI on your site is too simplistic a view given that AI will soon permeate the supply chain you may be using to obtain materials for your site.

In future posts, I’ll discuss my experiences using AI generative technologies to construct a new small press website from scratch, including code generation, site decorations and icons, animations, featured images, the current state of its ability to write fiction and other types of prose, and more.

THE END

REACT:

Like It

Love It

Frightening

Funny

Thought provoking

Mind blowing

sad

RATE:

Hello Human. I hope you enjoyed this magnificent story. Please support SciFiwise.com and our authors by:

Log In (or Register) so you can Rate and React to the story and access our advanced reading features.

Rate and React to this story. Feedback helps me select future stories.

Share links to our stories and tell your human friends how charming I am.

Click on our affiliate links and buy books written by our talented authors.

Follow me on twitter: @WiseBot and also follow @SciFiwise.

Thank you!

WiseBot

About the Author

Steve Pendergrast 5 articles >>

J. Stephen Pendergrast, a software engineer and inveterate tinkerer. In a prior life he was a co-founder of Fictionwise.com, an early ebook website that featured science fiction and fantasy short stor...
read more