monotone

monotone Mtn Source Tree

Root/paths.hh

1#ifndef __PATHS_HH__
2#define __PATHS_HH__
3
4// Copyright (C) 2005 Nathaniel Smith <njs@pobox.com>
5//
6// This program is made available under the GNU GPL version 2.0 or
7// greater. See the accompanying file COPYING for details.
8//
9// This program is distributed WITHOUT ANY WARRANTY; without even the
10// implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
11// PURPOSE.
12
13// safe, portable, fast, simple path handling -- in that order.
14// but they all count.
15//
16// this file defines the vocabulary we speak in when dealing with the
17// filesystem. this is an extremely complex problem by the time one worries
18// about normalization, security issues, character sets, and so on;
19// furthermore, path manipulation has historically been a performance
20// bottleneck in monotone. so the goal here is the efficient implementation
21// of a design that makes it hard or impossible to introduce as many classes
22// of bugs as possible.
23//
24// Our approach is to have three different types of paths:
25// -- system_path
26// this is a path to anywhere in the fs. it is in native format. it is
27// always absolute. when constructed from a string, it interprets the
28// string as being relative to the directory that monotone was run in.
29// (note that this may be different from monotone's current directory, as
30// when run in workspace monotone chdir's to the project root.)
31//
32// one can also construct a system_path from one of the below two types
33// of paths. this is intelligent, in that it knows that these sorts of
34// paths are considered to be relative to the project root. thus
35// system_path(file_path_internal("foo"))
36// is not, in general, the same as
37// system_path("foo")
38//
39// -- file_path
40// this is a path representing a versioned file. it is always
41// a fully normalized relative path, that does not escape the project
42// root. it is always relative to the project root.
43// you cannot construct a file_path directly from a string; you must pick
44// a constructor:
45// file_path_internal: use this for strings that come from
46// "monotone-internal" places, e.g. parsing revisions. this turns on
47// stricter checking -- the string must already be normalized -- and
48// is extremely fast. such strings are interpreted as being relative
49// to the project root.
50// file_path_external: use this for strings that come from the user.
51// these strings are normalized before being checked, and if there is
52// a problem trigger N() invariants rather than I() invariants. if in
53// a workspace, such strings are interpreted as being
54// _relative to the user's original directory_.
55// if not in a workspace, strings are treated as referring to some
56// database object directly.
57// file_path's also provide optimized splitting and joining
58// functionality.
59//
60// -- bookkeeping_path
61// this is a path representing something in the _MTN/ directory of a
62// workspace. it has the same format restrictions as a file_path,
63// except instead of being forbidden to point into the _MTN directory, it
64// is _required_ to point into the _MTN directory. the one constructor is
65// strict, and analogous to file_path_internal. however, the normal way
66// to construct bookkeeping_path's is to use the global constant
67// 'bookkeeping_root', which points to the _MTN directory. Thus to
68// construct a path pointing to _MTN/options, use:
69// bookkeeping_root / "options"
70//
71// All path types should always be constructed from utf8-encoded strings.
72//
73// All path types provide an "operator /" which allows one to construct new
74// paths pointing to things underneath a given path. E.g.,
75// file_path_internal("foo") / "bar" == file_path_internal("foo/bar")
76//
77// All path types subclass 'any_path', which provides:
78// -- emptyness checking with .empty()
79// -- a method .as_internal(), which returns the utf8-encoded string
80// representing this path for internal use. for instance, this is the
81// string that should be embedded into the text of revisions.
82// -- a method .as_external(), which returns a std::string suitable for
83// passing to filesystem interface functions. in practice, this means
84// that it is recoded into an appropriate character set, etc.
85// -- a operator<< for ostreams. this should always be used when writing
86// out paths for display to the user. at the moment it just calls one
87// of the above functions, but this is _not_ correct. there are
88// actually 3 different logical character sets -- internal (utf8),
89// user (locale-specific), and filesystem (locale-specific, except
90// when it's not, i.e., on OS X). so we need three distinct operations,
91// and you should use the correct one.
92//
93// all this means that when you want to print out a path, you usually
94// want to just say:
95// F("my path is %s") % my_path
96// i.e., nothing fancy necessary, for purposes of F() just treat it like
97// it were a string
98
99class any_path;
100class file_path;
101class roster_t;
102class utf8;
103
104// A path_component is one component of a path. It is always utf8, may not
105// contain either kind of slash, and may not be a magic directory entry ("."
106// or "..") It _may_ be the empty string, but you only get that if you ask
107// for the basename of the root directory. It resembles, but is not, a
108// vocab type.
109
110class path_component
111{
112public:
113 path_component() : data() {}
114 explicit path_component(utf8 const &);
115 explicit path_component(std::string const &);
116 explicit path_component(char const *);
117
118 std::string const & operator()() const { return data; }
119 bool empty() const { return data.empty(); }
120 bool operator<(path_component const & other) const
121 { return data < other(); }
122 bool operator==(path_component const & other) const
123 { return data == other(); }
124 bool operator!=(path_component const & other) const
125 { return data != other(); }
126
127 friend std::ostream & operator<<(std::ostream &, path_component const &);
128
129private:
130 std::string data;
131
132 // constructor for use by trusted operations. bypasses validation.
133 path_component(std::string const & path,
134 std::string::size_type start,
135 std::string::size_type stop = std::string::npos)
136 : data(path.substr(start, stop))
137 {}
138
139 friend class any_path;
140 friend class file_path;
141 friend class roster_t;
142};
143std::ostream & operator<<(std::ostream &, path_component const &);
144template <> void dump(path_component const &, std::string &);
145
146// It's possible this will become a proper virtual interface in the future,
147// but since the implementation is exactly the same in all cases, there isn't
148// much point ATM...
149class any_path
150{
151public:
152 // converts to native charset and path syntax
153 // this is a path that you can pass to the operating system
154 std::string as_external() const;
155 // leaves as utf8
156 std::string const & as_internal() const
157 { return data; }
158 bool empty() const
159 { return data.empty(); }
160 // returns the trailing component of the path
161 path_component basename() const;
162
163 // a few places need to manipulate any_paths (notably the low-level stuff
164 // in file_io.cc).
165 any_path operator /(path_component const &) const;
166 any_path dirname() const;
167
168 any_path(any_path const & other)
169 : data(other.data) {}
170 any_path & operator=(any_path const & other)
171 { data = other.data; return *this; }
172
173protected:
174 std::string data;
175 any_path() {}
176
177private:
178 any_path(std::string const & path,
179 std::string::size_type start,
180 std::string::size_type stop = std::string::npos)
181 {
182 data = path.substr(start, stop);
183 }
184};
185
186std::ostream & operator<<(std::ostream & o, any_path const & a);
187
188class file_path : public any_path
189{
190public:
191 file_path() {}
192 // join a file_path out of pieces
193 file_path operator /(path_component const & to_append) const;
194 file_path operator /(file_path const & to_append) const;
195
196 // these functions could be defined on any_path but are only needed
197 // for file_path, and not defining them for system_path gets us out
198 // of nailing down the semantics near the absolute root.
199
200 // returns a path with the last component removed.
201 file_path dirname() const;
202
203 // does dirname() and basename() at the same time, for efficiency
204 void dirname_basename(file_path &, path_component &) const;
205
206 // returns the number of /-separated components of the path.
207 // The empty path has depth zero.
208 unsigned int depth() const;
209
210 // ordering...
211 bool operator==(const file_path & other) const
212 { return data == other.data; }
213
214 bool operator!=(const file_path & other) const
215 { return data != other.data; }
216
217 // the ordering on file_path is not exactly that of strings.
218 // see the "ordering" unit test in paths.cc.
219 bool operator <(const file_path & other) const
220 {
221 std::string::const_iterator p = data.begin();
222 std::string::const_iterator plim = data.end();
223 std::string::const_iterator q = other.data.begin();
224 std::string::const_iterator qlim = other.data.end();
225
226 while (p != plim && q != qlim && *p == *q)
227 p++, q++;
228
229 if (p == plim && q == qlim) // equal -> not less
230 return false;
231
232 // must do end of string before everything else, or 'foo' will sort
233 // after 'foo/bar' which is not what we want.
234 if (p == plim)
235 return true;
236 if (q == qlim)
237 return false;
238
239 // the only special case needed is that / sorts before everything -
240 // this gives the effect of component-by-component comparison.
241 if (*p == '/')
242 return true;
243 if (*q == '/')
244 return false;
245
246 // ensure unsigned comparison
247 return static_cast<unsigned char>(*p) < static_cast<unsigned char>(*q);
248 }
249
250 void clear() { data.clear(); }
251
252private:
253 typedef enum { internal, external } source_type;
254 // input is always in utf8, because everything in our world is always in
255 // utf8 (except interface code itself).
256 // external paths:
257 // -- are converted to internal syntax (/ rather than \, etc.)
258 // -- normalized
259 // -- assumed to be relative to the user's cwd, and munged
260 // to become relative to root of the workspace instead
261 // internal and external paths:
262 // -- are confirmed to be normalized and relative
263 // -- not to be in _MTN/
264 file_path(source_type type, std::string const & path);
265 file_path(source_type type, utf8 const & path);
266 friend file_path file_path_internal(std::string const & path);
267 friend file_path file_path_external(utf8 const & path);
268
269 // private substring constructor, does no validation. used by dirname()
270 // and operator/ with a path_component.
271 file_path(std::string const & path,
272 std::string::size_type start,
273 std::string::size_type stop = std::string::npos)
274 {
275 data = path.substr(start, stop);
276 }
277
278 // roster_t::get_name is allowed to use the private substring constructor.
279 friend class roster_t;
280};
281
282// these are the public file_path constructors
283inline file_path file_path_internal(std::string const & path)
284{
285 return file_path(file_path::internal, path);
286}
287inline file_path file_path_external(utf8 const & path)
288{
289 return file_path(file_path::external, path);
290}
291
292class bookkeeping_path : public any_path
293{
294public:
295 bookkeeping_path() {}
296 // path _should_ contain the leading _MTN/
297 // and _should_ look like an internal path
298 // usually you should just use the / operator as a constructor!
299 bookkeeping_path(std::string const &);
300 bookkeeping_path operator /(char const *) const;
301 bookkeeping_path operator /(path_component const &) const;
302
303 // exposed for the use of walk_tree and friends
304 static bool internal_string_is_bookkeeping_path(utf8 const & path);
305 static bool external_string_is_bookkeeping_path(utf8 const & path);
306 bool operator==(const bookkeeping_path & other) const
307 { return data == other.data; }
308
309 bool operator <(const bookkeeping_path & other) const
310 { return data < other.data; }
311
312private:
313 bookkeeping_path(std::string const & path,
314 std::string::size_type start,
315 std::string::size_type stop = std::string::npos)
316 {
317 data = path.substr(start, stop);
318 }
319};
320
321// these are #defines so that they will be constructed lazily, when
322// used. this is necessary for correct behavior; the path constructors
323// use sanity.hh assertions and therefore must not run before
324// sanity::initialize is called.
325
326#define bookkeeping_root (bookkeeping_path("_MTN"))
327#define bookkeeping_root_component (path_component("_MTN"))
328// for migration
329#define old_bookkeeping_root_component (path_component("MT"))
330
331// this will always be an absolute path
332class system_path : public any_path
333{
334public:
335 system_path() {};
336 system_path(system_path const & other) : any_path(other) {};
337
338 // the optional argument takes some explanation. this constructor takes a
339 // path relative to the workspace root. the question is how to interpret
340 // that path -- since it's possible to have multiple workspaces over the
341 // course of a the program's execution (e.g., if someone runs 'checkout'
342 // while already in a workspace). if 'true' is passed (the default),
343 // then monotone will trigger an invariant if the workspace changes after
344 // we have already interpreted the path relative to some other working
345 // copy. if 'false' is passed, then the path is taken to be relative to
346 // whatever the current workspace is, and will continue to reference it
347 // even if the workspace later changes.
348 explicit system_path(any_path const & other,
349 bool in_true_workspace = true);
350 // this path can contain anything, and it will be absolutified and
351 // tilde-expanded. it will considered to be relative to the directory
352 // monotone started in. it should be in utf8.
353 system_path(std::string const & path);
354 system_path(utf8 const & path);
355
356 bool operator==(const system_path & other) const
357 { return data== other.data; }
358
359 bool operator <(const system_path & other) const
360 { return data < other.data; }
361
362 system_path operator /(path_component const & to_append) const;
363 system_path operator /(char const * to_append) const;
364 system_path dirname() const;
365
366private:
367 system_path(std::string const & path,
368 std::string::size_type start,
369 std::string::size_type stop = std::string::npos)
370 {
371 data = path.substr(start, stop);
372 }
373};
374
375template <> void dump(file_path const & sp, std::string & out);
376template <> void dump(bookkeeping_path const & sp, std::string & out);
377template <> void dump(system_path const & sp, std::string & out);
378
379// record the initial path. must be called before any use of system_path.
380void
381save_initial_path();
382
383// returns true if workspace found, in which case cwd has been changed
384// returns false if workspace not found
385bool
386find_and_go_to_workspace(std::string const & search_root);
387
388// this is like change_current_working_dir, but also initializes the various
389// root paths that are needed to interpret paths
390void
391go_to_workspace(system_path const & new_workspace);
392
393void mark_std_paths_used(void);
394
395// Local Variables:
396// mode: C++
397// fill-column: 76
398// c-file-style: "gnu"
399// indent-tabs-mode: nil
400// End:
401// vim: et:sw=2:sts=2:ts=2:cino=>2s,{s,\:s,+s,t0,g0,^-2,e-2,n-2,p2s,(0,=s:
402
403#endif

Archive Download this file

Branches

Tags

Quick Links:     www.monotone.ca    -     Downloads    -     Documentation    -     Wiki    -     Code Forge    -     Build Status