Substack & Science

Hello Crossref community!

My name’s Dayne, I’m a product researcher at Substack, a writing platform.

Substack hosts some of the top academics and public intellectuals in the world, but articles published on Substack don’t conform to academic standards, and are therefore not citable. This is something we hope to change, and I’m eager to understand what we need to do to make this happen.

I’m not an academic myself, and I’m trying to wrap my head around how the system of citations works, and what exactly Substack would need to do/change/build in order for articles to be citable, and for citations to contribute to the author’s citation count on Google Scholar.

Any advice or insights would be greatly appreciated!

Thanks in advance.

Dayne

1 Like

Hi Dayne,

That’s a big topic! Crossref metadata and DOIs aren’t connected with Google Scholar and we don’t have insight into their processes, it’s a bit of a mystery to many.

That said, there are some general best practices that help make blog content more discoverable. Registering DOIs for blog posts creates metadata records that become part of the Research Nexus we are building, and DOIs also make blog posts easy to cite. Other best practices include making blogs friendly to reference managers (like Zotero, for example), and tagging bibliographic metadata like author name, title, blog name, and date with Schema.org vocabularies. Providing citation recommendations are fairly common as well (e.g. ’this is how you cite this blog post’).

I hope this is useful, I’m happy to discuss this further if you’d like to have a call.

2 Likes

Thank you Patricia! I’ll respond to your email.

I think maybe you are barking up the wrong tree trying to convert substack into a citable academic platform. The primary cornerstone of academic literature is peer review, and that pretty much defines what gets a recognizable citation in something like Scopus or Web of Science or Google Scholar (I worked at Google for a long time but I retired). The mere presence of a DOI isn’t enough to make something citable, because it provides little more than a separate namespace for stable URLs. It has also been somewhat muddied by the fact that datacite issues DOIs for code and data, and arxiv issues multiple DOIs for different versions of the same article.

There is a significant amount of scientific discourse that takes place on other platforms and doesn’t contribute to citation counts. Preprint servers like arxiv, medarxiv, biorxiv, eprint.iac, and many others fulfill another role of disseminating academic literature without peer review, but they tend to be more formal. Preprints grew in response to paywalls by academic publishers, which limited the distribution of academic literature.

I think of substack as being more informal than that, but that doesn’t mean we should dismiss it. There has been a huge amount of scientific/academic discourse on social media, and it now occupies an important role. That goes back to things like blogger, twitter, wordpress, etc. Academics now often write about the importance of social media for scientific discourse. Even before the web there were email lists like LISTSERV and usenet groups. As we all know, twitter was dramatically upended recently, and as a result a lot of academic communities just dried up there (much to the dismay of journalists). Some migrated to activitypub/mastodon or bluesky or whatever, but it fragmented the space. This may represent an opportunity for you to grow your site, but keep in mind that many sites are angling for it.

I regularly interact with a lot of these blogging or micro-blogging platforms, but I will personally never click on anything on medium or substack, because they are behind annoying login walls. I also won’t invest any effort in posting on such platforms because it limits my audience. This now includes twitter, which recently instituted a login wall. The equivalent of academic citations in the informal web world is the hyperlink, and SEO is now well understood as the mechanism to curate this. Unfortunately if you want to get links, then the stuff has to be readable. Otherwise few people will link to it and few will drive traffic to you. I think wordpress is taking a much better path by integrating with activitypub/fediverse. It doesn’t require a login to read stuff there, and using the fediverse will greatly extend their reach. They are following the lesson learned from preprint servers, which allow people to read things with minimal effort and have no login required. login is reserved for authoring, discussion, or other features.

I don’t wish to dissuade you from trying to gain more academic reputation and capture scientific discussion. More to the contrary - I think it plays an important role. I just think you should recognize what you aspire to and what could potentially hold you back. In my mind login for reading is a complete non-starter.

3 Likes

Hi @mccurley, thank you for the extremely thoughtful reply and feedback!

With regards to the login issue on Substack: as a general you shouldn’t need to login to Substack to read an article. An exception is if the author has chosen to paywall an article, but the vast majority of articles are free. I think something has gone wrong if you’re being asked to login to read a free article - if so please could yuou send me more information: a description of the issue, links to the page you’re trying to visit, the device and app you’re using, screenshots, recording, etc. My email is dayne@substackinc.com

And with regards to whether to convert Substack into a citable academic platform: I hear your feedback and criticism, and I think I agree. I definitely don’t think that all Substack posts, publications, and authors should be citable. However, I am interested in whether some posts which conform to certain academic standards for formatting and quality should be citable. I imagine this being controlled by the writer: if a writer decides to publish a post as a formal preprint, they could do so, but this would not be the default.

I’m curious what you think about Substack taking an approach like this. I’d like to understand what might separate a Substack post, which the author has decided to publish as a citable preprint, from a preprint on eg ArXiv.

Thanks again!

1 Like

The primary cornerstone of academic literature is peer review, and that pretty much defines what gets a recognizable citation in something like Scopus or Web of Science or Google Scholar

I understand that peer reviewed papers have more credibility, but as I understand it preprint papers also get recognizable citations in Google Scholar (I don’t know about Scopus or Web of Science).

Could you help me understand how preprints get recognizable citations on those platforms, and what Substack might need to do to achieve this?

I’m also curious to get your reaction to this: an article published in NewScientist and has a corresponding article on ScienceDirect. The article doesn’t conform to the standards of a scientific paper, but it has been cited twice in (I think) peer reviewed papers, and the article and the citations are counted in Google Scholar.

I’m wondering whether you think this article counts as a preprint, and whether it should or should not be citable, and any other thoughts you have.

1 Like

I wasn’t able to hyperlink in my comment for some reason, but you can find the pages via the DOI: 10.1016/S0262-4079(20)31972-2

I don’t have a particular article in mind. I just remember clicking twice on articles and hitting a login wall. I just did a few searches looking for stuff on substack and found stuff I could read, so perhaps my first impression was unfair.

As for the New Scientist article, my primary reaction to that article is that it’s behind a paywall. I don’t think I’ve ever read anything on New Scientist. The DOI leads to an Elsevier page where they try to sell me the PDF. If I search on google I find a copy of the PDF on a wordpress site. That copy may or may not be legally posted. Perhaps you were asking about the content of the particular article, but since it is completely outside my areas of interest I can’t really comment on that.

I recognize that some authors and most commercial publishers wish to have a business model selling content. That probably includes most members of crossref. That’s all wrapped up in the open access battles that are being fought in academia. In my fields of computer science and mathematics, I generally find all articles in preprint servers or author web pages or author blogs. If it’s important enough I can always write to the author and ask for a copy.

Most of what appears in google scholar has a PDF associated with it (for better or for worse). I don’t know what their editorial policies are on what to include, but I suspect they aren’t that stringent.

2 Likes

With regards to the NewScientist article: excuse me for not explaining myself properly. My question was not so much about the content of the article but rather about the fact that the article is citable despite being a pop-science article in a magazine, and not a formal academic paper in a journal.

Do you feel that only formal papers and preprints should be citable, or do you think it’s ok/good that pop-science articles are also citable. I ask because there is content on Substack that is higher quality, more scientifically rigorous, and more similar to an academic paper than the NewScientist article, and I wonder if you think those should be citable or not. If you think that the NewScientist article should be citable, but the Substack posts should not, then I’d be interested to understand why.

I hope this comment does not comes across as adversarial. I’m really grateful for this exchange, and just trying to understand how academics think and feel about this issue.

I’m keenly aware that I’m only one opinion, and this is perhaps the wrong place to have a broader conversation. I’ve seen lots of things get cited, including manuals, RFCs, standards, patents, and things that have little more than a stable URL. These are typically outliers in the cited literature, but they do occur. Popular science articles are certainly “citable”. I’m just not sure I would set that as a goal because that is not how they will excel. New York Times is not very citable, but has enormous influence on intellectual thought. It’s because of the writing and the audience. Taylor Lorenz has a few articles that have been cited in scholar. It’s also extremely important that URLs created 20 years ago will still work. That’s one problem that DOIs attempts to solve, but the DOI is not what conveys legitimacy - it’s the writing and the fact that people have a way to refer to it.

2 Likes

That’s helpful, and I totally agree. Thank you for your input and feedback @mccurley!

You might find some interesting context in this (old, but still relevant) blog post from our Director of Technology, Geoffrey Bilder.

In the big picture, literally anything is citable. Works of scholarship cite non-scholarly materials all the time. Having a DOI isn’t what makes something citable; not all cited objects have DOIs; and not all DOIs represent scholarly works.

Because of our specific organizational mission and mandate, Crossref focuses its attention on content that’s likely to be cited in the research ecosystem. The criteria set out in our membership terms approved by our governing board defines eligibility for membership as:

Membership in Crossref is open to organisations that produce professional and scholarly materials and content.

Popular science content is in a grey area. (it’s sometimes literally called “grey literature”) So, on some level it’s up to the publisher of that content to determine if it’s worthwhile to register it, and whether they can meet the rest of our membership terms, including making arrangements for the persistent stewardship of that content.

Or, as it’s put on another one of our older blog posts:

The presence of a Crossref DOI on content sends a signal that:

  1. The owner of the content would like to be formally cited if the content is used in a scholarly context.
  2. The owner of the content considers that it is worthy of being made persistent.

I hope that helps clarify things.

-Shayn

2 Likes

Thank you Shayn, this was super helpful!

I’m very interested in membership, but I have some questions around pricing. I’ve read the documentation regarding feeds, but I’m not sure how to interpret some of it as Substack doesn’t fits the mould of a traditional publisher. Is there someone at Crossref I could speak to about this?

Hi Dayne,

The registration fees vary according to content type and publication date. So, the first thing you’d need to determine is which of our supported content types you would use to represent the Substack articles in their metadata records.

What I would suggest, from my understanding of that content, is the type “Posted Content” with the subtype “other” (that is, not “preprint”). The fees would be $0.25 per registered item, if it was published in the current year or prior two years, or $0.15 per registered item, if it was published three years ago or before that.

If you have other questions about fees or about appropriate content types for the metadata you would submit, you can contact member@crossref.org and we’ll be able to go over the details.

All that said, in terms of membership, I think the biggest question would be about long-term persistence of the content. If Substack authors can opt to delete all their content, would you truly be able to ensure that it would be made persistently available via a third-party archive?

3 Likes

Thank you Shayn!

I can see why persistence is an important requirement, and I’m confident we could fulfill that guarantee. I think it’d require an amendment to our ToS for writers that opt-in to publishing DOI registered academic content.

Thank you for the “posted content” link and suggestion. Could you help me understand the difference between the types "posted content and “working paper” (documentation/research-nexus/reports-and-working-papers)

There’s a lot of variability in how the terms are used among publishers, but for our purposes ‘report’ or ‘working paper’ is more formal than ‘posted content’. The report/working paper content type is typically used for professional or technical documents. Posted content is a bit of a catch-all for anything that hasn’t been through a formal publication and review process.

Posted content with the ‘preprint’ subtype is used specifically for preprints hosted in preprint repositories. But posted content with the ‘other’ subtype is used for all sorts of things: blogs, video lectures, podcasts, etc. We hope to eventually provide more granular support in our metadata schema to capture the nuances of some of those specific types of items, for the meanwhile, posted content is suitable.

2 Likes

Got it! Thanks Shayn.

1 Like

@dayne so glad I found this thread! I’m going to be using substack to host a new journal that I’m starting. I would love it substack created a way for my articles to get indexed by google scholar! Would love to talk with you. Email me at aaron at aaronolson.expert

1 Like

Just be aware that registering an item with Crossref doesn’t guarantee its inclusion in Google Scholar. We don’t know what criteria Google uses for inclusion or whether Crossref’s metadata is part of their process.

2 Likes