(no subject) [entries|reading|network|archive]
simont

[ userinfo | dreamwidth userinfo ]
[ archive | journal archive ]

Fri 2004-11-05 13:56
LinkReply
[personal profile] simontSat 2004-11-06 00:19
For an ordinary text editor you need one more enhancement, which is to count the newlines and tabs and suchlike in order to generate a node annotation allowing you to jump to a given (line,column) position in a log-time search. Also there's the whole question of Unicode and what you do about it, which I think bears some thought now rather than trying to retrofit it years later...

Currently I have a library-like interface to the main B-tree code. This code directly supports unsorted trees, element counts, splitting and joining, and reference counts and cloning. It also has a callback interface to allow you to specify additional node properties, which is what my hex editor uses to maintain the total-length annotations. (You provide two functions: one which generates a property value from a tree element, and one which merges two property values into a combined one.)

The bit about some of the data blocks being placeholders describing chunks of the original file is the only major bit that isn't available in a useful library form, because that was the bit that was pretty much specific to the hex editor. I'm not even sure if that bit would work well in a conventional text editor, since you tend to need to have already read your input file so that you know where the newlines are.
Link Reply to this | Parent | Thread
[personal profile] sparrowsionSat 2004-11-06 20:32
Hmm, I remember you mentioning this IRL sometime in the past, and my thought then was that for a text editor it'd work best line-oriented, with the data being not not arbitrary chunks of data but lines. Inserting a newline would cause a split etc. But I think you'd have to decide whether character counting or line counting is more important to you—I'd guess the latter, in which case you're actually getting away from byte counting and back to element counting.
Link Reply to this | Parent | Thread
[personal profile] simontSat 2004-11-06 22:44
I'm not so sure. If you use a line as your data element, then you're back to the original problem when the user (almost constantly) wants to insert new text within a line - you've got to store each line in a way that makes it efficient to move everything up by one when a new character is typed, but at the same time makes it easy to get to column N. Arrays are adequate in the normal case (since most files have lines well under 1000 columns if not more like 80), but in a sense the whole point of this B-tree stuff is to avoid being dependent on people conforming to someone's idea of the common case. Every so often some loony will load a file containing an enormous megabyte-line-of-death and then you have to be able to deal with that.

Cut and paste becomes horribly complex as well, if you use a line as your fundamental element, because a selected section can start half way along one line and end half way along another; so you've got to deal with fiddly special cases and splice lines together at each end every time you cut or paste. If you treat the file as basically a collection of bytes with a line/column indexing mechanism on top, then all of this happens completely automatically and you never have to worry about it.
Link Reply to this | Parent
navigation
[ go | Previous Entry | Next Entry ]
[ add | to Memories ]