{mi} is not quite "I/we", it is "the speaker(s)". Usually there is only one speaker, in which case it is "I", but in special cases where there are several speakers, it becomes "we", but only to the inclusion of the other speakers.
Lojban is first about unambiguous grammar, with that grammar providing a certain small amount of unambiguous semantics. But the difference between ambiguity and vagueness should be kept in mind when studying it. There are things that are explicitly unspecified, such as tense in an unmarked sentence, and there are things that could be interpreted one of several sharply different ways, but where the speaker probably only meant one. "Time flies like an arrow" is the classic example in English. We try to exclude the latter, not the former. This is part of why there's no default tense. Another reason is that fairly often no tense is desired at all, as you begin to realize with conversational practice.
I agree that the x2 default after the selbri is a bit of a quirk. It is, however, somewhat convenient in subordinate clauses, in that it lets you drop the x1 when the x1 is supplied from outside the clause. For example in {mi klama lo zarci noi vecnu lo plise}, the relative clause pro-sumti {ke'a} is implicitly in the x1 there, since the x2 is filled and the x1 is not. This is common enough that the quirk is actually pretty useful.
Place structure patterns are fairly extensive. They don't seem that way when you don't know very many gismu, but they exist, and are how you learn words after you know a handful. The most natural analogy I can think of is to inflected languages: lojban essentially has numbered instead of named cases. Without a SE, the x1 is essentially nominative while the x2 is essentially accusative.