I decided to post some info on a scrolling system developed for STE games some time ago. This doesn't exactly preclude demos - but I was thinking mainly about games and prototyping when I came up with it. It also lets me digest a bunch of related things in one place.
Note: The same principles can be applied to STF but without a 4-bit (non-overscan) syncscroll and plenty of memory it is quite a bit more limiting. If a stable 4-bit syncscroll is figured out and made available at some point I can integrate it

The system was written in a mixture of C and 68k. The 'game mainloop' and playfield logic is C. The intensive/drawing parts are 68k. The main rationale was making it easy to experiment and adapt to different game and display formats with minimal changes. Specific techniques have been used to ensure the C keeps pace with the 68k parts but I won't describe those here.
Before I get any flames about using C for something like this, you can be sure the cost is counted in scanlines, not VBLanks. If every cycle is needed for a 1VBL game, then sure, the C parts should be converted - probably at the expense of configurability / prototyping friendlyness though
Diagrams are attached showing the various playfield modes and how the display is made up for different scenarios.
A summary of features:
- 8 way (or 4-way) scrolling
- horizontal-only scrolling
- vertical-only scrolling
- horizontal scrolling with vertical nudge-margin (up to 1 screen of margin)
- vertical scrolling with horizontal nudge-margin (up to 1 screen of margin)
- scroll rate from 0 to +/-16 pixels per frame on each axis
- map size up to 32768 x 32768 pixels
- highly configurable for different kinds of game
- very low drawing cost
- optional wraparound maps
- double-buffering or physical-only display
- dual-field colour boosting (with or without double-buffering, and allows scroll rate of 0)
Note: nudge-margin scrolling could be described as 'limited 8-way', but cheaper than full 8-way if the scroll range is short since fewer tile updates are required. Also has flatter cost for a rectangular display.
The display 'arena' is made up of elements called 'playfields', each of which is a physical image buffer mapped to a virtual tile space. The virtual tile space is a wrap-around window onto the source map tiles (a map tile is typically 16x16 pixels but depends on the game).
If you consider a normal double-buffered display, just replace each buffer with a virtual 'playfield' and you get the rough idea of whats going on.
The size of the physical image buffer depends on the type of scrolling selected - some combination of horizontal and vertical scroll - and the expected scroll range on each axis. It's obviously cheaper to avoid performing updates which are not needed for a specific scroll configuration, so each layout is optimized for a particular combination of scroll features. These attempt to save both memory and time if you know what your game needs. And if you change your mind, it's easy to reconfigure the display after the fact.
The 'loopback' configurations maintain a shadow tile-page in order to jump back by a full screen without a cost spike. 'scanwalk' configuration allows padding scanlines to be consumed by h-scroll without needing a loopback buffer (This trick is only possible on the H axis but is nearly always preferred over H-loopback).
The number of playfield elements required to define the display arena depends on the display mode (double-buffered or not, colour-boost/interlace or not).
It may seem nuts to have a complete, independent playfield per display element (instead of one playfield with multiple memory buffers), but there are several excellent reasons for this relating to the scroll state, speed and distribution of work. See if you can guess what they are

I'll be posting some demos later which use the worst case - double-buffering and colour-boosting at the same time - so there are 4 playfields being maintained per VBL, each with one or two virtual tile pages being updated & maintained with new tiles. Various tricks are employed to make this happen quickly (cheapest method with unlimited scroll is H-only, costing a handful of scanlines - most expensive is 8-way at ~1/4 frame for unlimited diagonal). More tricks are employed to prevent spiking behaviour so the cost remains flat for each scroll position. (Having said that, I stopped optimizing code when it became cheap enough to be concerned with other things e.g. sprite clearing - so there is room left for sure).
The colour-boosting feature allows you to obtain 50-100 colours from a single 16-colour palette without raster splits. It involves heavy offline processing of the graphics with a tool and the display arrangement is more complex. It also costs more to maintain the display updates.
This consumes two 'field' elements per display buffer ('front' and 'back' both have 'even' and 'odd' elements). The scrolling engine has to maintain all 4 if the scroll can be interrupted (e.g. halted) even for a 1VBL game - although sprites need written only to the synchronized field. For games requiring 2+ VBLs, sprites must be written to 2 fields at once for the colour-boosting to work. If 1VBL drawing is assured and scroll rate is fixed, 4 playfields can be reduced to 2 (logical/odd + physical/even).
For more information on the boosting thing on static and moving images, see my original posts here:
http://www.atari-forum.com/viewtopic.php?f=68&t=24166
I'll post a few updated samples by the weekend or early next week. I have posted some earlier demos but these were more about the colour boost and not much was said about scrolling. The code isn't yet ready for release because some things are still missing but hopefully will get something on GitHub or BitBucket before long so people can play with it. (You'll need GCC+VASM cross tools for hacking around with it, and PhotoChrome v6.23+ for the preprocessing of maps direct from .png images although the map/tile file format is trivial to produce independently)
Happy STE coding