I write some of my pages using a markup language (emacs org-mode) and other pages using xhtml, with a few extra x:* tags that get expanded out into html later. I was curious, when I write html by hand, which tags do I use? I used Python's elementtree to get the answer:

 3085 p
 2466 a
 2303 li
 1042 em
 1008 code
  876 span
  719 x:section
  517 br
  454 div
  446 strong
  424 h3
  410 figure
  359 ul
  358 script
  331 img
  323 pre
  262 td
  249 x:document
  228 x:footer
  219 g

A lot of what I write is explanations in <p> paragraphs and <ul> <li> lists. And I try to include lots of <a> links to other supporting documents. I do try to use the semantic <em> and <strong> instead of the visual <i> and <b>. These results didn't surprise me much.

Here's the code (roughly):

    tag_counts = collections.Counter()
    for doc in documents:
        tree = etree.fromstring(doc.contents)
        for el in tree.iter():
            tag_counts[el.tag] += 1
    for (tag, count) in tag_counts.most_common(20):
        print(f"{count:5} {tag}")

Do you write HTML by hand? If so, what tags do you use most?

Update: [2023-09-20] Some people commented on HackerNews about how they write their HTML, including some debate over closing tags, HTML vs XHTML, and markup languages.

3 comments:

Anonymous wrote at September 20, 2023 12:27 AM

Yeah, though nowadays I use ctrl-c ctrl-v to write the html. The tags I mostly use are 'p' and 'ol'/'ul'.

Anonymous wrote at September 20, 2023 7:30 AM

`〈link〉`* is the most important semantic element the iana never commercialized and even the free software movement misses its import. why was "view source" even made a thing if we're all going to ignore its relevance to semantic web? if semantic web is never tried for public-facing apis, can we say it was ever tried? the director of the api academy and leonard richardson have been calling out all apis for more than a decade, along with dr. fielding's calls^ too. now over 90% of the web is inaccessible. these will not look like coincidences to the children or machines we leave the web to protect from the end-times.

* https://www.iana.org/assignments/link-relations/link-relations.xhtml
^ https://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven

Anonymous wrote at September 20, 2023 10:42 AM

201 code
142 font
97 span
87 td
85 p
74 a
60 b
31 tr
23 i
22 blockquote
20 pre
17 br
15 li
11 table
10 meta
10 h3
9 sup
9 img
8 th
8 head