How to Keep and Summarize A Dream Diary with the GPT-3 Machine Learning Language Model from OpenAI

UPDATE: This post was written in August of 2022, before the launch of ChatGPT and subsequent entry of LLMs into popular consciousness. The boilerplate code I wrote relies upon older GPT-3 models, but can be easily adjusted to use whatever is the most advanced OpenAI LLM at the moment.

Do-it-yourself:

Google Sheets template: https://docs.google.com/spreadsheets/d/1I951dcIb423RpkqZekGnomzsrznYi1_W5aCvWzsH7GA/

Python script: https://github.com/liamtrotzuk/gpt3-dream-diary

As an experiment, I kept a simple dream diary in Google Sheets for about 3 months, recording about 35 dreams in this period. I then used a simple Python script to read the last 10 dreams and give them to the GPT-3 language machine learning model from OpenAI, asking the model to summarize the dreams and look for symbolic trends, as a way to easily keep running impartial tabs on my general subconscious mental state. I was compelled to do this quick project by A. a newfound desire to reflect on the strange scenarios my mind conjures when it sleeps, coupled with B. a disinterest in spending time myself trying to summarize the dreams (through either qualitative or quantitative means) due to the potential for personal bias, wedded with C. an unwillingness to let another individual summarize my dreams due to the private nature of what’s in them. I’d been playing with GPT-3 at the time, and it struck me that an impartial machine that has already proven itself very skilled at writing good summaries of highly abstract works of fiction (such as book reports) would be an excellent, discrete, and time-efficient tool for summarizing the strange non-sequiturs and odd symbolism of a human being’s dreams.


A random sample of 3 out of the 35 dreams that I recorded:

A. “You were watching planes taking off and landing from an airport in the Bronx, its runway fringed with trees. Prior to that, you’d been riding a subway line that did a loop around a park in the Bronx that felt a lot like Van Cortlandt. While you watched the planes, however, there was an aerial threat. Witches, on broomsticks, who could assault us from above with some kind of flaming bombs. It was a constant, enduring threat, but not enough to make us stop plane-watching.”

B. “A hardware store opened beneath the apartment, and you were very excited.”

C. “You were taking the subway. You were above-ground somewhere in Brooklyn, trying to get further into to the borough, away from Manhattan. You entered a vast, cavernous station, a chamber that sloped down, with a complex series of openings in the top that allowed in the sun. A pool/fountain cascaded its way down the length of the station. You were alone, though you knew that the bottom led to a bigger station with more people. You wanted to add it to your ‘Top 3’ favorite subway station list.


All 3 of these dreams are among the last 10 dreams that I’ve remembered, so therefore they were among those 10 that were read by the script. I fed these 3 dreams plus 7 more into the Da Vinci GPT-3 model, the most powerful of OpenAI’s models (for commercial applications, users might choose a faster model at the expense of power). I set the ‘temperature’ — a measure of the ‘risk’ the model is willing take take — to 0.9, quite high, which yielded the most interesting and non-literal results. You can read more about how OpenAI categorizes ‘riskiness’ in AI responses here.

After experimenting with various prompts, you can quickly gain a general sense of where the machine demonstrates consistency and cohesiveness in its textual analysis. I ultimately settled on asking 3 generally useful questions, from most-specific to least-specific, that would hopefully yield a useful high-level summary of how my dreams had been thematically trending:


1. openai.Completion.create(

model=“text-davinci-002″,

prompt=“Is the following list of dreams predominately positive or predominately negative or predominately neutral? How many are positive? How many are negative? How many are neutral?” + STR_dreams_last_10,

max_tokens=2000,

temperature=0.9)

2. openai.Completion.create(

model=“text-davinci-002″,

prompt=“What does the following list of dreams say about the dreamer’s mental state?” + STR_dreams_last_10,

max_tokens=2000,

temperature=0.9)

3. openai.Completion.create(

model=“text-davinci-002″,

prompt=“What is the overarching theme of the following list of dreams?” + STR_dreams_last_10,

max_tokens=2000,

temperature=0.9)


The 1st time I ran these prompts, the model returned the following for each:

1. “text”: “\n\nThere are 5 positive dreams, 2 negative dreams, and 3 neutral dreams.”

2. “text”: “\n\nThe dreamer may be experiencing anxiety about an upcoming event.”

3. “text”: “\n\nThe overarching theme of the following dreams is escape.”


All 3 responses are interesting, and at least 1 + 2 are largely accurate. The 1st prompt — ‘predominately positive or predominately negative?’ — is just rough sentiment analysis, and the easiest for which to gauge accuracy. After tallying up each dream with my own judgments of relative positivity or negativity, I concurred with the GPT-3 count of 5 positive (one of which was the hardware store dream), 2 negative (one of which was the plane-watching, witch-attack dream), and 3 neutral (one of which was the subway station dream). That’s impressive, despite the potential for bias in my own sentiment analysis of my own dreams (a truly impartial analysis would have asked others to rank my dreams in order to test against GPT-3’s claims, but I didn’t bother doing so), and we can generally count this 1 out of the 3 prompts as useful and accurate so far. The script’s response to the 2nd prompt — anxiety about an upcoming event — is more qualitative, but the machine’s answer feels accurate for this prompt as well. I’m fairly easy-going, and am often not particularly concerned about future events, but there are several looming events in my life that I’m definitely apprehensive about at present — much more so than usual — so GPT-3 is making a verifiable claim that generally aligns with specific realities in my life, different from the status quo. Again, an experiment hewing to more empirical methods would probably have asked a sample set of outside observers of my life to rank the accuracy of the machine’s statement given their knowledge of my actual life and mental state, but a simple self-administered sanity check should suffice for the purposes of this little experiment.

The machine’s response to the 3rd prompt is more akin to a newspaper-column horoscope designed to mean all things to all people — GPT-3 could have likely sent back almost any nebulous concept, from the actual ‘escape’ to ‘courage’ to ‘warmth’ or any generalized terms of that nature, and I’d likely have cherry-picked the necessary events and thoughts in my life to support that response. ‘Escape’ does feel like an important theme in my life, but when does it not? Doesn’t everyone seek an escape? Nonetheless, I enjoy a silly verbal Rorschach test as much as the next person, so I decided to keep that 3rd prompt in there as a fun reminder that a machine trained on human data can be as dumb and vague as humans are.

And I’ll plan to keep using this easy-to-run script to get a nifty high-level summary of my dreams in the future. While I’m still a foolish novice in the craft of experimenting with AI language models, this exercise was a useful excuse to set up programmatic access to GPT-3 and play around with it, however trivially.

Biomimicry & Building an ADS-B Receiver

A 4GB Raspberry Pi 4, 1090 Mhz SDR-antenna from Noolec, and a case made from a repurposed Trader Joe’s tomato carton.

Several months ago, I built an ADS-B receiver that pings planes flying around NYC, mainly to/from LGA due to my location in the northwest part of Manhattan. This receiver then feeds the data to the servers of FlightRadar24, PlaneFinder, and a nonprofit called OpenSky that gives flight data to academics for use in their papers. I originally built this receiver to gain cheap access to a number of flight-tracking APIs, because I wanted to pull data related to a test concept called ‘Wake Energy Retrieval’ that is being considered for widespread use in commercial aviation.

Consider the iconic flock of geese, heading south for the winter, in a classic V-formation. Why do geese instinctively make this formation? It isn’t to ‘draft’, at least not in the sense of reducing wind resistance. The largest benefit of the V-formation for a trailing goose is actually the uplift generated from the wingtips of the goose ahead of it. As the wingtips of the leading goose slice through the air, they generate a spiral vortex of air, similar to the sort that sharklet wingtips are designed to reduce on airplanes. Riding that wake in the right spot can generate an updraft that makes it easier for the goose to stay comfortably aloft, which is a major concern when it takes you a week to fly from Nunavut to Alabama.

Hence, biomimicry: what if airplanes could do the same? It turns out that with the right tools, they can, which is the concept behind ‘Wake Energy Retrieval’ initiatives. Wake Energy Retrieval is a fancy term for what geese do, which is wake-surfing, and the advent of widespread use of ADS-B signaling technology — automatic signals sent out by commercial aircraft at a 1090 MHz frequency — has made it possible for commercial airliners to do the sort of precise navigation necessary for WER.

The most recent trial of WER in commercial aviation took place on November 9, 2021, when Airbus — in coordination with SAS + Frenchbee on the airline side, and a variety of Atlantic-adjacent national agencies on the air traffic control side—successfully drafted an A350 test aircraft behind another A350 test aircraft on a flight from Toulouse to Montreal:

Using the Playback feature in FlightRadar24, I rewound to 9/11/2021 (European date formatting to honor the European company that executed this test), and tracked the flights by filtering to the AIB callsign, which indicates Airbus company aircraft. You can see the 2 A350 test aircraft — callsigns AIB01 and AIB02 — overlaid essentially on top of each other as they exit the Bay of Biscayne into the wider Atlantic, not long after departing Toulouse. AIB01 appears to be ahead, which means that AIB02 might be the trailing — therefore, wake-surfing — aircraft.

But then, at approximately 9:17:00 UTC — shortly after the 2 aircraft drop off consistent contact with ground-based ADS-B receivers, demonstrated by the black dotted line that indicates FR24 is estimating rather than actively tracking the aircraft position — the lead AIB01 appears to stop moving, while AIB02 continues to fly and transitions to satellite-based ADS-B tracking (indicated by the transition of the aircraft’s color to blue rather than the typical yellow).

6 minutes later at 09:23:00 UTC, AIB01 disappears from FR24’s ADS-B coverage entirely — the program is unable to continue playback while that flight is selected.

Meanwhile, however, AIB02 continues its flight, charting a direct course over the Atlantic rather than adhering to the North Atlantic Track routes — the colored lines that span the North Atlantic in the above screenshot.

In fact, re: the NATs, the choice of venue for this test flight— the North Atlantic — was deliberate. Experts view the North Atlantic as the perfect trial ground for WER, due to the unique way that transatlantic airplane flows are managed. Because the skies over the North Atlantic are so crowded, but are nonetheless beyond the reach of traditional radar-based airplane tracking systems, a system known by multiple names but commonly called the North Atlantic Tracks was developed back in 1961 (a similar maritime system was developed even earlier in 1898). Using jet stream measurements, weather forecasts, and other factors, the ATC centers at Gander in Canada and Shanwick in the UK coordinate to publish ~8 tracks upon which most trans-Atlantic flights will chart a path for much of their flight time over the central Atlantic. A similar, albeit less crowded, system governs trans-Pacific flights.

Essentially, the North Atlantic Tracks are giant sky highways, indicated by the colored lines in the above screenshot and also over the Pacific in the below screenshot:

The NATs funnel many flights into multiple uni-directional streams — the perfect environment for convenient WER pairings.

At approximately 12:47:13 UTC, as AIB02 comes within range of standard ADS-B tracking technology off the coast of Newfoundland (rather than the satellite-based ADS-B used over the central Atlantic), AIB01 winks back into existence — still right ahead of AIB01. What happened to this ghost flight over those ~3 hours when FR24 wasn’t capturing its flight, or even estimating its flight path? Was a single satellite link sufficient to track both of the flights, given that they were presumably drafting as essentially 1 unit for much or all of that time? Why did that satellite link follow what appears to be the trailing aircraft, rather than the leading aircraft?

Regardless, AIB01 and AIB02 soon separate to a more typical distance as they approach their destination, and land in Montreal at roughly half past 3 PM.

Airbus is not the 1st manufacturer to perform such a test flight. Boeing performed a similar test back in 2018 with 2 777 aircraft, though they published far less data on the test results than Airbus. Moreover, military aircraft practice wingtip flying all the time. The concept is not new — but the technology necessary to make it commercially applicable has emerged from infancy, suggesting that primetime for WER might be approaching soon.

With technology out of the way, what are the 3 bigger hurdles? Regulatory regimes, operational difficulties, and commercial considerations. From a regulatory perspective, the level of schedule coordination necessary for commercial airlines to practice WER would be illegal, outside of the couple trans-Atlantic joint ventures that might be able to achieve the necessary scheduling.

But even harder than changing a regulatory regime would be the operational and commercial constraints. Consider the difficulties that many airlines worldwide have with operational metrics, at even the best of times! The idea that resource-constrained carriers would be able to reliably send out 2 aircraft from 2 different airports with sufficient accuracy that they could rendezvous in a single spot over the North Atlantic — reliably enough to make worthwhile the considerable investment into technology and training that will be necessary to achieve WER — is almost laughable. Moreover, would airline planners truly be willing to sacrifice the commercial benefit of optimal departure times, reorienting possibly their entire trans-Atlantic schedules around the chance to intercept their own (or JV partners’) aircraft and maybe glean a 10% fuel savings? It seems unlikely.

All these factors considered, most folks will conclude that there is only 1 feasible solution: the practice will have to be regulated, mandated, and coordinated by government air traffic control agencies. If an external body is responsible for pairing aircraft, airlines can continue business as usual without change to antitrust regulations/operations/commercial planning, while still being confident that if their A321LR happens to fall into line behind an A380 from a random foreign carrier with whom they have no relationship, an ATC body will organize a WER pairing and help the trailing aircraft save fuel. No need for a complicated joint venture, operational coordination, or shuffling around a schedule for an inferior flight time. A government-mandated solution — with input from both North American and European government agencies — will maximize the number of pairings (and therefore the amount of fuel saved + emissions reduced) while minimizing the possible disruptions or adjustments to commercial service.

Hence, I built the ADS-B receiver in order to gain access to the free Business/Premium subscriptions that come with feeding data to FlightRadar24 and Planefinder, which I hoped would help expedite access to the historical datasets that would allow me to programmatically gather data back to November when the Airbus WER test flight took place (rather than just manually using FR24’s Playback tool). OpenSky didn’t give me anything, but their mission is admirable (and their upcoming research symposium is on November 10th).

However — shortly after buying the Raspberry Pi (backordered because of the chip shortage) and the 1090 MHz SDR-antenna, I read on the NATS-UK blog that the North Atlantic Tracks are broadly on their way out. At the height of the pandemic, the low volume of trans-Atlantic flights allowed ATC authorities to test an exciting new concept: OTS-Nil. OTS-Nil days (OTS — Organized Track Structure — is another name for the NATS system) were 20 days in 2020 when no NATs were published at all. Moreover, as of March 2022, aircraft flying below 33,000ft are permitted to chart their own courses. These changes hint at a near future when, instead of being routed into narrow highways over the oceans, all aircraft will chart the most efficient courses to/from each city-pair across the Atlantic and Pacific. But in my estimation, those narrow highways were what really even made trans-Atlantic commercial use of WER worthwhile, as it funneled planes ahead/behind each other in a way that made wake surfing feasible.

And that means that widespread commercial use of WER becomes less compelling, perhaps fatally so. The removal of the NATs probably renders WER redundant, because airplanes will save far more fuel with a 100% chance to fly in a direct course from A to B, versus being routed into a narrow, inefficient flight corridor where they have a significantly <100% chance of pairing with a leading aircraft and engaging in WER for part of their journey.

So, while the concept of surfing a plane on another plane’s wake is inherently fascinating regardless of its commercial applications (and there may still be WER applications in other flight geographies), the lessened likelihood of widespread commercial trans-Atlantic WER immediately made the Airbus test flight data just a bit less compelling. But I had already bought the Raspberry Pi and the 1090 MHz SDR-antenna, and there’s not much you can do with a 1090 MHz SDR-antenna except to track planes. So I built the receiver anyway.

Would I recommend ADS-B receiver construction to others, even without a WER angle? Definitely. It’s a fun project if A. you’re more interested in airplanes than the average person and B. you want to mess around with a micro-computer. Other folks feed their data to FlightAware and ADS-B Exchange, as well. Regardless of which flight-tracking service you choose to feed, just make sure you give back — consider feeding your data to OpenSky!

The following links are excellent general-purpose resources for those interested:

OpenSky instructions:

https://opensky-network.org/community/projects/30-dump1090-feeder

FR24 instructions:

https://www.flightradar24.com/share-your-data

Planefinder instructions:

https://planefinder.net/coverage/client

An excellent FR24 forum guide on how to feed data to multiple sites:

https://forum.flightradar24.com/forum/radar-forums/flightradar24-feeding-data-to-flightradar24/10903-how-to-feed-data-to-multiple-sites-a-brief-guide

Building a Twitter Bot (my 2nd) for Native Plants of Hawaii and Puerto Rico — A Reflection

Another day, another Twitter bot.

Github: https://github.com/liamtrotzuk/island-native-flora-twitter-bot

Twitter:

A little over a month ago, I decided it would be fun to do a new project to practice Python, by using an old project format I’d experimented with a few years ago (a Twitter bot) to share data on a subject I’ve become quite passionate about in the past few years (what flora is native to what parts of the world). After discarding a few ideas, I settled on making 2 different bots — one that tweets flora native to Hawaii, and one that tweets flora native to Puerto Rico. Both of the islands are overseas territories (many would say colonies) of the United States, so I had their botanical datasets already handy, and both are hotspots of endemic biodiversity, which is to say native species found nowhere else on the planet. 

Ultimately, my primary goal of creating these bots was to produce yet another tool that encourages folks to plant and grow flora that is native to whatever place they live (truly, it’s a killer heuristic to guide your cultivation of plants — whether that’s a single potted plant on your stoop or a vast forested estate). But my secondary goal was to practice Python, and it yielded 2 main takeaways. These learnings are probably obvious to folks more seasoned than I am in this kind of thing (making computers do what you want), but it felt worth detailing them nonetheless, in case any other learners stumble across this little writing:

1. Did I get better at Python, or did I just have better data sources and pre-existing code to work from?

The last Twitter bot that I wrote (an account that Tweets a fable by Aesop every day) was — looking over it today — garbage. It works, but barely. I wrote it at a time when I knew hardly any Python, and was hoping that sheer force of will and relentless Googling would make something that worked. It ended up working, but only in the sense that a room of monkeys jumping on typewriters for a billion years will eventually replicate the works of Shakespeare or Sun Tzu or (most topically) Aesop. I don’t even like to look at the code anymore — it’s shameful. And while the newer native botanical script isn’t great, it’s a lot neater than its fabulist predecessor. So what changed?

The obvious answer is that I got better at writing Python, but that’s not the whole story. A different part of it was that I just had better data sources and means to access them, provided by earlier code that I wrote, which is still running away in the background from a year ago.

For example: the Hawaii/PR Twitter bots use environmental datasets, refreshed monthly, from one of my Google Cloud Storage buckets. But even though I wrote the bots in Python, these datasets aren’t generated by Python. They’re generated by an older set of R scripts that run monthly from a Google Cloud VM to scrape various USDA plant/NOAA weather/NCSS soil data sources, fix up the data in a particular way, and upload it to that Storage bucket. And when I wrote these scripts, I had no intention of using them to provide data for Twitter bots. In fact, I originally wrote the code so that I could feed it into the US-focused native plant finder web app that I wrote in Shiny. But even when I was writing the code, I had the feeling that it could be useful for other applications besides the initial amateur web app I went on to create. All to say —even when you’re learning a new language (in my case, Python), you can still benefit from the code/data that you wrote/produced in your old language of choice (in my case, R). Just make sure the old code is good enough to keep working most of the time. 

It’s possible there’s already a well-established term for that concept of compounding benefits from your own personal body of ‘digital infrastructure’ (corny term, but best I can do), in the same way that computer people like to talk about ‘tech debt’? If so, I’d love to learn it. But what struck me about this realization, enough to write about it, is just how fine the line is between A. re-using the code you’ve already written or B. using your improved skills to just write better code. I ultimately had to do both, which leads to Learning 2:

1. No, I actually got better at writing Python.

I can ultimately attribute only 40% of the relative ease of constructing this Twitter bot to the quick access to data provided by code I’d already written. A solid 60% was just because I actually am better at the language (though still pretty garbage). Turns out that practicing a language for 2 years and becoming halfway proficient in a few key packages (pandas, requests, BeautifulSoup) can yield some improvement.

In summary: these learnings are clearly simplistic, but they have been useful revelations nonetheless, and hopefully well-earned in the service of encouraging more folks to plant native.

Building the Nativase Web App — Motivation & Methodology

20% of all requests to one of Wikimedia’s data centers in early February 2021 were for this stock image of the New York aster (symphyotrichum novi-belgii), a flowering plant endemic to eastern Canada and northeastern America. The culprit: an India-based mobile app that requested the image, but did not display it.
And yet before we take our implements to unfamiliar
territory
we must work to ascertain its changing weather and winds’
moods
to learn the ways and habits of that locality—
what’s bound to flourish there, and what to fail.
For here you’ll find a crop of grain, and there grapes growing in
thick clusters,
and over yonder young trees thriving and grasses coming into
green all on their own.

– Book 2 of Virgil’s Georgics (translated by Peter Fallon)

Newer App: https://nativase.shinyapps.io/Nativase/

Older App: https://nativase.shinyapps.io/main/

Github: https://github.com/liamtrotzuk/nativase

After 6 months of building a web app, it felt like a good time to pause and reflect on what I’ve learned — as someone who undertook a nice project with no formal software development training or interface design background. In summary: web app creation is a very fun process, and I would recommend it to almost anyone. Pick the right datasets and the right presentation, and the actual construction of the app will basically take care of itself, because you’ll be so engaged with your project that any hurdle will be surmountable with enough Googling and pondering of the fastest way you can muddle through whatever problem you’re facing. It’s a great way to spend some free time.

The following is a short summary of my initial motivation for building the app, and a simple methodology for web app development that I refined along the way.

Motivation

Like many folks in 2020, I spent much of the year indoors. And like many folks, I began to reflect on ways to improve the space where I had suddenly begun to spend a disproportionate chunk of my days. A new carpet might be nice; maybe a fresh coat of paint on the walls; perhaps a fun work of art to hang up. But as I considered various ways to brighten my space, 1 particular option began to dominate my imagination: furnishings with life. Plants — green, growing things to fill my living space. Floral fractals to ease the eyes. Steps toward solarpunk.

I’m not the only one. The pandemic life of 2020 and accompanying social distancing across much of America led to an unprecedented boom in gardening and plant care. Garden centers and seed suppliers experienced a steep rise in demand. Concerned by the long lines and empty shelves at grocery stores in March/April, would-be Thoreaus planted pandemic ‘victory gardens’ to grow at least a little food for themselves, thus uncoupling even slightly from the just-in-time supply chains of modern supermarkets. The Wall Street Journal published an article on the red-hot trade in rare plants.

Like many of the pandemic’s newly minted plant parents, I had never been a huge ‘plant guy’ in my pre-2020 days. I always considered plants to be a needless chore to buy (stems/fronds/leaves are delicate), to plant (soil is messy), and to tend (houseplants need to be carefully watered and sunned). But growing things has been a human birthright for the past 12000 years, and there are certain advantages to approaching a hobby a bit later in life than others, with a completely fresh and naive set of eyes. For me, I began to carefully weigh the costs and benefits of different types of gardening, planning from the ground up. Would I grow plants inside in pots or outside in window boxes? Would I buy mature plants or grow from seed? I considered growing herbs for use in the kitchen —basil, thyme, bay leaf. I toyed with the idea of growing plants favored in medieval times for use in powerful potions —sage, betony, deadly nightshade. I thought about buying or building an automated hydroponics system to grow cherry tomatoes or hot peppers.

Then I started thinking about Ulmus americana, the American elm.

The American elm is an elm tree native to the Americas, specifically the eastern coast of North America from Florida to Newfoundland. They are resilient trees that can live for hundreds of years. They are also quite beautiful: a grove of mature American elms shades Central Park’s Mall, which is a lovely promenade just south of Bethesda Fountain and east of Sheep’s Meadow where one can have a pleasant stroll in the shadows of those grand old trees. That part of Central Park is encircled by the tall skyscrapers of midtown Manhattan, which leads to a unique effect on those walking: surrounded by majestic trees, the walker may forget — even just for a moment — that they are in the core of an enormous city that is home to 20 million humans in its greater metropolitan area. But then through the organic wooden limbs of the American elms that surround them, the walker will inevitably glimpse again the tall monolithic shapes of skyscrapers that ring the park, cutting vertically into the sky all around. It’s a nice contrast.

Unfortunately, Ulmus americana is also an endangered species, now barely surviving in the native land where it once grew in vast forests from the Atlantic to the Mississippi. The primary culprit: Dutch elm disease, a fungus spread by elm bark beetles with its origins in Eurasia, against which American elms have no natural defenses. Dutch elm disease spreads from elm to elm, burning through forests of the native trees with alarming speed. By the end of the 20th century, once-sprawling forests of American elms had been reduced to a mere handful, mainly concentrated in urban groves inside American and Canadian cities where the Dutch elm disease running rampant in forests could not spread so easily. One such grove — one of the largest remaining groves of American elms in the world — shadows the Mall in Central Park, New York, NY.

One only has to think about that fact for more than one second to realize how absurd it is. One of the largest remaining groves of American elms lies not in some sprawling untouched massive swathe of wild forest in the Appalachians, Michigan’s Upper Peninsula, or Nova Scotia, but in the very center of the densely populated urban island at the heart of the largest metropolitan area in the United States of America.

And just like the American elm, many other native American plants — and the native American herbivores that depend upon the plants for food, as well as the native American carnivores that depend on the herbivores for food — are under serious threat.

Wild horses (Equus ferus caballus, from Europe) overgraze vast Western ranges. Kudzu vines (Pueraria, from Asia) strangle Eastern forests. Lionfish (Pterois volitans, from the Indo-Pacific) snarl Caribbean reefs. Most recently, the arrival of Japanese murder hornets in the Pacific Northwest in the spring of 2020 made ripples well beyond the ecological sector, as folks worried about the devastating impact these invasive hornets are known to have upon beehives — which is fair, except that common honeybees are non-native themselves. Most honey in North America is produced by Apis mellifera, the European honeybee (though there are several species of endemic American bees that also produce honey — they remain underutilized by all except for a few beekeepers). While European honeybees face their fair share of threats — including murder hornets and pesticide-induced colony collapse disorder — native bee species face a more dire threat altogether. A report from researchers at the National University of Comahue in Argentina highlighted that fully 1/4 of all documented bee species have not been seen in recent years, suggesting extremely sharp declines in the populations of these species, if not outright extinction in some cases.

As non-native organisms spread, so do endemic organisms like the American elm retreat. Douglas Tallamy, a professor of ecology at the University of Delaware and well-known figure in ecological literature who advocates for the planting of native species, writes in his New York Times best-seller Nature’s Best Hope: “Populations of birds, plants, and animals — such as Kirtland’s and golden-cheeked warblers, Florida panther, marbled salamander, spruce grouse, Winkler’s blanketflower, Karner blue butterfly, lynx, bobolink, lake sturgeon, whooping crane, California condor, regal fritillary, rusty blackbird, gopher tortoise, roseate spoonbill, American ginseng, indigo snake, whip-poor-will, Catesby’s lily, Florida scrub jay, dozens of fish and mussel species, and many, many more — became too few in number to perform their vital ecological roles or to withstand normal environmental challenges.” The majority of ecologists believe that this decline in native fauna is directly linked to the decline in native flora.

On a more personal level, New York City — my hometown — was the staging ground for just such a collapse in biodiversity, and on a local level too. Mark Kurlansky, in his nonfiction history work The Big Oyster, describes this jewel of Lenape territory as it existed when Europeans first journeyed to these waters: a teeming, productive ecosystem centered upon the Upper and Lower bays of New York, with rivers so full of fish that they could be pulled from the water by hand, flocks of wild turkeys that shut out the sunshine, and an abundance of predators terrestrial, marine, and avian — including whales, wolves, and eagles — to feed upon the native cornucopia. At the center of it all — facilitating the life and in turn being facilitated by the life — were the massive oyster beds of the area, which fed by filtering the rich water and thereby permitted the continued health of the other inhabitants of the bay. Kurlansky writes: “According to the estimates of some biologists, New York Harbor contained fully half the world’s oysters.”

That’s all gone now, of course. While humpbacks occasionally make headlines by journeying up the Hudson River, and the Billion Oyster project works to restore at least a fraction of the giant oyster populations that used to exist here (among many other promising environmental initiatives in the area), these days we generally think of NYC as a city 1st and an ecosystem 2nd.

Has that begun to change, in NYC and in the US as a whole? If these ecological troubles had all taken place decades or centuries ago, we might assume, Americans would’ve since learned to do better and accordingly adjusted their ways to eat sustainably, reduce pollution, and — of course — plant native, with the hopes that endemic species should start to rebound. But Tallamy, who has published several books advocating for a return to planting native species rather than foreign ones, laments the ongoing problem: despite all the evidence, American people keep planting and proliferating non-natives. And the potential damages from this continued practice are piling up. As mentioned, invasive species are a concern, and Tallamy mentions that we often plant, water, and fertilize these potential invaders in our own gardens: “about 85% of invasive woody plant species in the United States are escapees from our gardens (Kaufman and Kaufman 2007).” Another issue: non-native plants are often poorly adapted to their surroundings and require large amounts of water and fertilizer to thrive, further taxing local ecosystems. And most subtle of all (but possibly also most harmful) is the opportunity cost of non-natives inhabiting the very space that natives could use. Every acre devoted to flat, beautiful, but ultimately sterile seed lawn is another acre that cannot be used for a rich field of native grasses. Every city block planted with ginkgos rather than native trees is another urban canopy in which caterpillars will not grow and birds will not feed.

To be absolutely fair: obviously non-native plants can have enormous value when cultivated outside of their places of origin, and to pretend otherwise is absurd. Take agriculture. Pest-resistant varieties of African rice have nourished the coastal Americas for hundreds of years. Feng Xiaoyan, a Chinese folk performer known as ‘Sister Potato’, sings about the wonders of the (originally Peruvian) potato in a Beijing-sponsored food campaign to improve crop yields in the PRC. And where would contemporary Italian cuisine be without a certain red vegetable, belonging to the nightshade family and originating in Central America — the tomato? More importantly, those examples lie only in the agricultural world. Exchange of plant matter across borders, deserts, and oceans has led to enormous advances in other industries as diverse as textiles, construction, and pharmaceuticals — to name only a few.

And endemic plants alone are obviously not a silver bullet for preserving biodiversity. Maize is a plant endemic to the Americas, and vast monocultures of the stuff blanket entire states in the Midwest, a relentless optimization of calorie yields that might well be inferior in the long run to a sustainable, diverse polyculture of the sort once cultivated by indigenous peoples of the Americas — maybe even a polyculture that included some non-native plants.

But those American elms, on the brink of extinction but somehow preserved there in Central Park — they really make you think. Climate change, pollution, and the despoilment of the natural world that surrounds us will require wholesale realignments in many of our lives, especially in the industrialized Western nations that emitted most of the emissions of the past few centuries and therefore must shoulder most of the responsibility for fixing the problem. There is no 1-step solution — there are more like 1000 steps, steps that must be taken by corporations and individuals alike. But each 1 of those 1000 steps must be taken sooner or later, and there were those native elms doing just fine in the middle of the city, and if I was going to plant (and I’d already decided that I was) — shouldn’t I just go ahead and plant native?

So I did. Butterfly milkweed, New York fern, bottle gentian, cattail sedge, and old man’s whiskers (AKA prairie smoke) to start, with more to come this year and next. It’s an exciting journey that I’ve only just started. I haven’t completely ruled out non-native plants, as they do have important roles even when growing in locations far from their sites of origin, and with careful management should continue to yield great benefits for the continued endurance and success of humans, animals, plants, and other forms of life on Earth. But for the foreseeable future, I’ll be planting native—those species can use the help, and probably will for a while to come.

Still — it didn’t feel like quite enough. I realized pretty quick that my apartment window plants will have limited ecological effect. Some insects may pollinate their flowers, some birds may peck at their berries, and that’s all to the better. But I have no land to leverage —no lawn to turn to native grasses, no backyard to seed with endemic wildflowers, no rolling fields of estate to plant with American elm. There are other, public spaces to use in the city— fire escapes, rooftops, backyards, community gardens, and parks. I’ll keep researching those, getting involved and learning more where I can, but it will take patience and long-term effort.

So, in the meantime, I started thinking about other ways to help out. And my thoughts soon gravitated toward tools to help make the process easier. Already, there are some fantastic existing apps for finding plants endemic to your area — the Native Plants Database from the Lady Bird Johnson Wildflower Center, the Native Plant Finder from the National Wildlife Foundation, the Native Plants Database from the Audobon Society, and likely others that I missed. I would strongly encourage anyone interested in native planting to explore all 3 of those top-notch apps (and any apps I missed ) — each has its strengths and weaknesses. But something felt missing from all of them.

The missing ingredient was simplicity. These existing apps are phenomenal, but they produce long lists of relatively unsorted native plants —100s of results that would doubtless be extremely useful to an expert gardener/landscaper, but felt overwhelming for a novice like me just beginning to explore the possibilities. What about manageable suggestions for a newbie, starting small, with a few plants at a time? What plants could I fit in pots or a window box, within the limitations of my apartment? What if I sought specific characteristics in my plants? And when should I plant them?

What about a simple database, with an annual calendar and easy-to-use visualization tools for a newcomer to the field? It’s what I wanted more than anything. But after scouring the Internet for such a tool, I couldn’t find one. And as Toni Morrison once said on the craft of writing books: “If you find a book you really want to read but it hasn’t been written yet, then you must write it.”

The same goes for web apps.

Methodology

This app was an R project from start to finish: I gathered all the underlying data using a mixture of R packages for HTML scraping and FTP services, cleaned the data using mostly libraries within the Tidyverse ecosystem, and constructed the app using the Shiny web framework. Within the Shiny app itself, I made heavy use of the Plotly and Leaflet R libraries. I won’t go over the specifics — those are mostly covered in the app itself or in the Github documentation — but should note that the structure of the app was heavily influenced by the CRAN Explorer, a fantastic Shiny app that was one of the runners-up in the 2019 Shiny Contest. That app makes heavy use of custom HTML/CSS templates into which only selected R-to-HTML Shiny elements are fed, rather than the complete generation of all webpage elements from the underlying R that is commonly seen in beginner Shiny apps, allowing for the aesthetic flexibility that I required. The CRAN Explorer is also just a great app on the user-facing end, cool for not only looking at CRAN metadata but also discovering new R packages!

I learned a large amount from this process, but it would be tedious to go over all the little lessons I learned, so I’ll focus on a larger insight that dawned on me a few months into the project and which has stuck with me ever since. It’s a bit of a convenient narrative, a slightly-too-neat way to sum up the lessons of the app, but I can genuinely say that it’s a rule I’ll utilize on any subsequent web development projects.

In summary: just as back-end tasks are rendered easier by converting qualitative data (strings) to quantitative data (numbers, Boolean values), front-end communication is rendered easier by reducing user-facing info (text) to symbols (images, Sparklines, glyphicons). Alternately stated: wherever possible, reduce everything to symbols.

That principle is simple enough to understand on the back-end, and it’s basically tautological to remark that digital technologies function on, well, digits. Turning the impossibly complex analog world of reality into a massive amount of carefully calculated numbers, to form an imperfect but pretty good mathematical representation of the physical world, has worked nicely since humans started getting machines to do it algorithmically over the course of the past century (though not without a lot of extremely hard work from many folks much smarter than I am). It’s how computers function at the lowest level of bits and bytes, but it’s also a useful principle while writing a high-level, abstracted programming language like R; even a relative novice like me has learned pretty quick that data wrangling and subsequent calculation is better run on numbers or Boolean values than it is on a bunch of messy textual data. On the back-end, digitizing everything (where possible) is often best practice — and after a couple years of using R, I‘d internalized that lesson by the time I began work on my Shiny app. So I’d started with 1/2 of a guiding principle, sufficient to get started on a clunky but functioning back-end.

But on the front-end of the web app, I was completely lost. And that lack of experience shone forth not just on readability of my code (apologies to anyone who tries to scan my Javascript) or performance (I’m afraid to check any site speed metrics) but on the end product itself, the beauty and responsiveness of the user interface — the aesthetic of the site. I don’t have a visual eye — I am not a designer. The front-end was always going to be the biggest challenge for me, and I had no guiding principles beyond a general desire to make the site look nice.

But I knew I wanted to convey a lot of information in a neat, packaged fashion — a distinction from the existing native plant apps, which contain an enormous amount of powerful data but throw it at the user in long lists all at once. And a few months into the project, faced with a front-end that was improving but remaining stubbornly messy and text-filled, I had the idea that symbols on the front-end could help aid clarity and efficiency of communication in exactly the same way that numbers helped on the back-end.

Take a simple category (like Fire Resistance) with 2 possible values: “Yes” and “No”. Unlike other categories such as Fire Tolerance, which is measured in degrees of tolerance to fire (“Low”, “High”, and so on), the USDA has gathered only binary data on whether plants have Fire Resistance: they do or they don’t, “Yes” or “No”. Presenting those values to a user immediately creates a problem: the “s” jutting out of the “Yes”. It immediately increases the width of any column if that column’s value is positive. Thus, the challenge: how do you equalize the length of the Boolean values that you present to the end user, such that you can design the interface around columns that will be equally wide regardless of the positive/negative value? “TRUE” and “FALSE” present the same problem as “Yes” and “No”. You cannot present 1 and 0 to a user, even if you wish you could. Most folks would probably understand “Y” and “N”, but doubtless others would be confused. “Yea” and “Nay” are of equal length and so might serve, but your user is unlikely to be an aristocrat from the 17th century. And none of them are any good for a user who can’t read English.

But 2 symbols of equal size — in this case, the checkmark glyphicon and the x glyphicon — work perfectly for predictability of length and height in the design of the interface. It might seem obvious to those with front-end design experience, but it took a month of fruitlessly formatting a carousel in different ways before I arrived at that insight, the implementation of which took a couple hours and immeasurably improved both the readability and visual neatness of the carousel. Compact blocks of positive/negative categorical data for users, unmarred by an extra “s” protruding from the positive value: very tidy, with Lego-like ability to stack upon one another and lay neatly side-by-side without fear that the columns might be uneven.

The implementation of Sparkline to symbolize categorical degree data for a plant’s characteristics soon followed the Boolean data; “Low”, “Medium”, and “High” all have different lengths, but dynamically generated HTML inline charts don’t. And the charts are far more readable at a quick glance. The final effect of the word-to-symbol conversion is a cleaner, prettier interface that is also more efficient at quickly conveying data to a user.

The best part: these 2 approaches — a mostly numeric back-end and a mostly symbolic front-end— complement one another. For example, the R Sparkline package does not play well with non-numeric data. You can jimmy-rig it by messing with the underlying Javascript, but that can quickly become frustrating if you don’t know much Javascript. Better to just feed it the numbers that it wants, straight from the back-end.

At the end of the day, anyone with a fraction more experience in either software or user design should feel welcome to roll their eyes at my prescription to reduce everything to the symbolic — yes, it is reductive. It’s not an inviolable rule, and obviously there are places where text is irreplaceable. But for this web app and any others I may build down the line, stressing the symbolic in both front-end and back-end — it’s a decent starting principle.

Momentum

There’s a ton of work yet to be done. The next step is to automate the inputs of the app — I’m currently setting up a Google Cloud VM to automatically run the R scripts that scrape/pull the various USDA/NOAA/NCSS plant, soil, and weather inputs and wrangle them into usable forms for the web app to use. I need to create improved weather models, to better anticipate the conditions in which each special endemic plant thrives — drier conditions for the tapertip onion, wetter conditions for the common persimmon. I need to find climate datasets for Hawaii, Puerto Rico, the US Virgin Islands, and the rest of the USA’s overseas territories. All these challenges and more remain.

On a technical level, I may need to begin looking beyond R as my basis for a user-facing web app, a task for which — despite the most valiant efforts of the brilliant folks behind Shiny —the language does not seem quite optimized in a truly scalable, durable way. Granted, part of that stems from my own weaknesses as an R user. They say that when all you have is a hammer, everything starts to look like a nail — and when all (or most) of what you have is Tidyverse R, everything starts to look like a dataframe. Everything is about applying vectorized functions to 2D matrices. That’s not necessarily a bad thing— as someone who entered R from a business analyst background, where spreadsheets rule supreme and Excel lives adjacent to 2D matrix management tools like Access and SQL, dataframes are about as logical a place to start as any. 2D grids are a powerful mental model that just about anyone can intuitively understand with little training, and their centrality to Tidyverse R is one of the reasons a novice like me was able to struggle my way into learning a little bit of the R language in the 1st place (along with — as any Tidyverse devotee will tell you — the magical syntactic sugar that is the %>% method and the linearity in coding that it fosters). Still, efficient programming relies on a bigger arsenal of data structures than the dataframe, and web apps may well require more flexible methods of coding. And more broadly — as great of a tool as Shiny is — it’s among the lightest web frameworks out there, lacking some of the powerful features common to Python or Ruby frameworks that might be useful for future improvements to the app, including email functionality.

Or maybe I will keep using R: optimize my existing code rather than re-write from scratch in a new language, take advantage of more Shiny features, and automate the back-end of my project. Whatever works best to keep the site up, running, and constantly improving. That’s the great thing about starting with a powerful motivation (like the desire to help folks plant endemic) and figuring out the methodology later on — you’ll keep on adjusting, tinkering, developing. In short, making it work.

So find a good motivation, a hobby, a skill — then find a way to scale it. If you happen to be a novice gardener, plant native! If you happen to be a novice developer, build a web app. And if you’re both? Do both.

AesopFableBot – A Twitter Bot

Twitter link: https://twitter.com/AesopFableBot
Github link: https://github.com/liamtrotzuk/aesop-twitter-bot/blob/master/main.py

Having been on Twitter for a few months and having greatly enjoyed the ability to enjoy classic literature tweeted by bots like SapphoBot, which reads fragments of Sappho’s poetry, or MobyDickatSea, which reads random lines from Moby Dick (among many other bots that read many other works of literature and many other writers), I finally decided to make my own literature bot. At the time of conception, I had been exploring some of the more obscure tales by Aesop, and decided that his hundreds of fables were a good candidate for a bot to tweet out at a certain cadence.

I constructed the bot in Python. I scraped the underlying fables — translated from Ancient Greek into English by George Fyler Townsend — from a Project Gutenberg webpage using Beautiful Soup.

Twitter access was obtained via Tweepy, which grants easy-to-use access to the Twitter API with credentials obtained via Twitter’s Developer Access.

Fables were broken down into their titles — the initial tweet — and then strings of complete sentences (delineated by periods) such that the total number of characters in each string of sentences did not exceed 280. In the rare instance that a single sentence exceeded 280 characters, each sentence was broken down into strings of complete clauses (delineated by commas) such that the total number of characters in each string of clauses did not exceed 280.

The task turned out to have a few unexpected quirks.

Quirk 1: certain fables (no apparent rhyme or reason as to which) had a secondary paragraph — the punchy ‘moral’ that is explicitly stated at the end of some fables, i.e. ‘slow and steady wins the race’ for The Tortoise & the Hare — that was an entirely separate <p> BeautifulSoup object, requiring an additional step in the loop.

Quirk 2: 3 of the 311 total fables on Project Gutenberg used a word that used to be a synonym for ‘bundle’ but has since become an offensive term for LGBTQ folks. Even with the accompanying context for the word in each fable, I thought it best to change every instance of the word to ‘bundle’ in order to avoid any needless confusion or hurt.

Quirk 3: strange formatting for one of the fables rendered a textual line from a fable paragraph as an <h2> line, which meant that my script would interpret it as a fable title. The error would only occur 1/311 days, but it would still look ugly. I told the bot to ignore that particular line by exempting all objects of text containing the word ‘grievances’, which is a word that only appears in that single misformatted line.

The script was hosted on a free Google Cloud Virtual Machine running Linux, with the bot’s Python script scheduled to run at 1 PM EST every day. Feel free to follow at the above Twitter link if you’re interested in a daily dose of Ancient Greek morality dispensed by talking animal characters!