Skip to content

Commit 13605eb

Browse files
committed
Updates for version 1.2.17
git-svn-id: https://ruby-msg.googlecode.com/svn/trunk@85 c30d66de-b626-0410-988f-81f6512a6d81
1 parent f9573a2 commit 13605eb

File tree

7 files changed

+168
-28
lines changed

7 files changed

+168
-28
lines changed

FIXES

+56
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
FIXES
2+
3+
recent fixes based on importing results into evolution
4+
5+
1. was running into some issue with base64 encoded message/rfc822 attachments displaying
6+
as empty. encoding them as plain solved the issue (odd).
7+
8+
2. problem with a large percentage of emails, not displaying as mime. turned out to be
9+
all received from blackberry. further, turned out there was 2 content-type headers,
10+
"Content-Type", which I add, and "Content-type". normally my override works, but I
11+
need to handle it case insensitvely it would appear. more tricky, whats the story
12+
with these. fixing that will probably fix that whole class of issues there.
13+
evolution was renaming my second content type as X-Invalid-Content-Type or something.
14+
15+
3. another interesting one. had content-transfer-encoding set in the transport message
16+
headers. it was set to base64. i didn't override that, so evolution "decoded" my
17+
plaintext message into complete garbage.
18+
fix - delete content-transfer-encoding.
19+
20+
4. added content-location and content-id output in the mime handling of attachments
21+
to get some inline html/image mails to work properly.
22+
further, the containing mime content-type must be multipart/related, not multipart/mixed,
23+
at least for evolution, in order for the images to appear inline.
24+
could still improve in this area. if someone drags and drops in an image, it may
25+
be inline in the rtf version, but exchanges generates crappy html such that the image
26+
doesn't display inline. maybe i should correct the html output in these cases as i'm
27+
throwing away the rtf version.
28+
29+
5. note you may need wingdings installed. i had a lot of L and J appear in messages from
30+
outlook users. turns out its smilies in wingdings. i think its only if word is used
31+
as email editor and has autotext messing things up.
32+
33+
6. still unsure about how to do my "\r" handling.
34+
35+
7. need to join addresses with , instead of ; i think. evolution only shows the
36+
first one otherwise it appears, but all when they are , separated.
37+
38+
8. need to solve ole storage issues with the very large file using extra bat
39+
stuff.
40+
41+
9. retest a bit on evolution and thunderbird, and release. tested on a corups
42+
of >1000 msg files, so should be starting to get pretty good quality.
43+
44+
10. longer term, things fall into a few basic categories:
45+
46+
- non mail conversions (look further into vcard, ical et al support for other
47+
types of msg)
48+
- further tests and robustness for what i handle now. ie, look into corner
49+
cases covered so far, and work on the mime code. fix random charset encoding
50+
issues, in the various weird mime ways, do header wrapping etc etc.
51+
check fidelity of conversions, and capture some more properties as headers,
52+
such as importance which i don't do yet.
53+
- fix that named property bug. tidy up warnings, exceptions.
54+
- extend conversion to make better html.
55+
this is longer term. as i don't use the rtf, i need to make my html better.
56+
emulating some rtf things. harder, not important atm.

Rakefile

+1-1
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ spec = Gem::Specification.new do |s|
4747
#s.rubyforge_project = %q{ruby-msg}
4848

4949
s.executables = ['msgtool', 'oletool']
50-
s.files = Dir.glob('data/*.yaml') + ['Rakefile', 'README']
50+
s.files = Dir.glob('data/*.yaml') + ['Rakefile', 'README', 'FIXES']
5151
s.files += Dir.glob("lib/**/*.rb")
5252
s.files += Dir.glob("test/test_*.rb") + Dir.glob("test/*.doc")
5353
s.files += Dir.glob("bin/*")

bin/msgtool

+33-3
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,23 @@
33
require 'optparse'
44
require 'rubygems'
55
require 'msg'
6+
require 'time'
7+
8+
def munge_headers mime, opts
9+
opts[:header_defaults].each do |s|
10+
key, val = s.match(/(.*?):\s+(.*)/)[1..-1]
11+
mime.headers[key] = [val] if mime.headers[key].empty?
12+
end
13+
end
614

715
def msgtool
8-
opts = {:verbose => false, :action => :convert}
16+
opts = {:verbose => false, :action => :convert, :header_defaults => []}
917
op = OptionParser.new do |op|
1018
op.banner = "Usage: msgtool [options] [files]"
1119
op.separator ''
1220
op.on('-c', '--convert', 'Convert msg files (default)') { opts[:action] = :convert }
21+
op.on('-m', '--convert-mbox', 'Convert msg files for mbox usage') { opts[:action] = :convert_mbox }
22+
op.on('-d', '--header-default STR', 'Provide a default value for top level mail header') { |hd| opts[:header_defaults] << hd }
1323
op.separator ''
1424
op.on('-v', '--[no-]verbose', 'Run verbosely') { |v| opts[:verbose] = v }
1525
op.on_tail('-h', '--help', 'Show this message') { puts op; exit }
@@ -22,10 +32,30 @@ def msgtool
2232
end
2333
# just shut up and convert a message to eml
2434
Msg::Log.level = Ole::Log.level = opts[:verbose] ? Logger::WARN : Logger::FATAL
25-
if opts[:action] == :convert
35+
case opts[:action]
36+
when :convert
37+
msgs.each do |filename|
38+
msg = Msg.open filename
39+
mime = msg.to_mime
40+
munge_headers mime, opts
41+
puts mime.to_s
42+
end
43+
when :convert_mbox
2644
msgs.each do |filename|
2745
msg = Msg.open filename
28-
puts msg.to_mime.to_s
46+
# could use something from the msg in our from line if we wanted
47+
puts "From msgtool@ruby-msg #{Time.now.rfc2822}"
48+
mime = msg.to_mime
49+
munge_headers mime, opts
50+
mime.to_s.each do |line|
51+
# we do the append > style mbox quoting (mboxrd i think its called), as it
52+
# is the only one that can be robuslty un-quoted. evolution doesn't use this!
53+
if line =~ /^>*From /o
54+
print '>' + line
55+
else
56+
print line
57+
end
58+
end
2959
end
3060
end
3161
end

bin/oletool

+1-1
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ def oletool
2727
when :tree
2828
Ole::Storage.open(file) { |ole| puts ole.root.to_tree }
2929
when :repack
30-
Ole::Storage.open(file, &:repack)
30+
Ole::Storage.open file, 'r+', &:repack
3131
end
3232
end
3333
end

lib/mime.rb

+3-1
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ class Mime
3333

3434
# Create a Mime object using +str+ as an initial serialization, which must contain headers
3535
# and a body (even if empty). Needs work.
36-
def initialize str
36+
def initialize str, ignore_body=false
3737
headers, @body = $~[1..-1] if str[/(.*?\r?\n)(?:\r?\n(.*))?\Z/m]
3838

3939
@headers = Hash.new { |hash, key| hash[key] = [] }
@@ -48,6 +48,8 @@ def initialize str
4848
@content_type, attrs = Mime.split_header content_type
4949
end
5050

51+
return if ignore_body
52+
5153
if multipart?
5254
if body.empty?
5355
@preamble = ''

lib/msg.rb

+45-10
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
#
2121

2222
class Msg
23-
VERSION = '1.2.16'
23+
VERSION = '1.2.17'
2424
# we look here for the yaml files in data/, and the exe files for support
2525
# decoding at the moment.
2626
SUPPORT_DIR = File.dirname(__FILE__) + '/..'
@@ -72,7 +72,9 @@ def initialize root
7272
# headers. we may get nothing.
7373
# and other times, when received from external, we get the full cigar, boundaries
7474
# etc and all.
75-
@mime = Mime.new props.transport_message_headers.to_s
75+
# sometimes its multipart, with no boundaries. that throws an error. so we'll be more
76+
# forgiving here
77+
@mime = Mime.new props.transport_message_headers.to_s, true
7678
populate_headers
7779
end
7880

@@ -169,11 +171,28 @@ def populate_headers
169171
headers['Date'] = [Time.iso8601(time.to_s).rfc2822] if time
170172
end
171173

172-
if !headers.has_key?('Message-ID') and props.internet_message_id
173-
headers['Message-ID'] = [props.internet_message_id]
174-
end
175-
if !headers.has_key?('In-Reply-To') and props.in_reply_to_id
176-
headers['In-Reply-To'] = [props.in_reply_to_id]
174+
# some very simplistic mapping between internet message headers and the
175+
# mapi properties
176+
# any of these could be causing duplicates due to case issues. the hack in #to_mime
177+
# just stops re-duplication at that point. need to move some smarts into the mime
178+
# code to handle it.
179+
mapi_header_map = [
180+
[:internet_message_id, 'Message-ID'],
181+
[:in_reply_to_id, 'In-Reply-To'],
182+
# don't set these values if they're equal to the defaults anyway
183+
[:importance, 'Importance', proc { |val| val.to_s == '1' ? nil : val }],
184+
[:priority, 'Priority', proc { |val| val.to_s == '1' ? nil : val }],
185+
[:sensitivity, 'Sensitivity', proc { |val| val.to_s == '0' ? nil : val }],
186+
# yeah?
187+
[:conversation_topic, 'Thread-Topic'],
188+
# not sure of the distinction here
189+
# :originator_delivery_report_requested ??
190+
[:read_receipt_requested, 'Disposition-Notification-To', proc { |val| from }]
191+
]
192+
mapi_header_map.each do |mapi, mime, *f|
193+
next unless q = val = props.send(mapi) or headers.has_key?(mime)
194+
next if f[0] and !(val = f[0].call(val))
195+
headers[mime] = [val.to_s]
177196
end
178197
end
179198

@@ -251,7 +270,15 @@ def to_mime
251270
unless attachments.empty?
252271
mime = Mime.new "Content-Type: multipart/mixed\r\n\r\n"
253272
mime.parts << body
254-
attachments.each { |attach| mime.parts << attach.to_mime }
273+
# i don't know any better way to do this. need multipart/related for inline images
274+
# referenced by cid: urls to work, but don't want to use it otherwise...
275+
related = false
276+
attachments.each do |attach|
277+
part = attach.to_mime
278+
related = true if part.headers.has_key?('Content-ID') or part.headers.has_key?('Content-Location')
279+
mime.parts << part
280+
end
281+
mime.headers['Content-Type'] = ['multipart/related'] if related
255282
end
256283

257284
# at this point, mime is either
@@ -269,7 +296,10 @@ def to_mime
269296
# now that we have a root, we can mix in all our headers
270297
headers.each do |key, vals|
271298
# don't overwrite the content-type, encoding style stuff
272-
next unless mime.headers[key].empty?
299+
next if mime.headers.has_key? key
300+
# some new temporary hacks
301+
next if key =~ /content-type/i and vals[0] =~ /base64/
302+
next if mime.headers.keys.map(&:downcase).include? key.downcase
273303
mime.headers[key] += vals
274304
end
275305
# just a stupid hack to make the content-type header last, when using OrderedHash
@@ -389,13 +419,18 @@ def to_mime
389419
mime = Mime.new "Content-Type: #{mimetype}\r\n\r\n"
390420
mime.headers['Content-Disposition'] = [%{attachment; filename="#{filename}"}]
391421
mime.headers['Content-Transfer-Encoding'] = ['base64']
422+
mime.headers['Content-Location'] = [props.attach_content_location] if props.attach_content_location
423+
mime.headers['Content-ID'] = [props.attach_content_id] if props.attach_content_id
392424
# data.to_s for now. data was nil for some reason.
393425
# perhaps it was a data object not correctly handled?
394426
# hmmm, have to use read here. that assumes that the data isa stream.
395427
# but if the attachment data is a string, then it won't work. possible?
396428
data_str = if @embedded_msg
397429
mime.headers['Content-Type'] = 'message/rfc822'
430+
# lets try making it not base64 for now
431+
mime.headers.delete 'Content-Transfer-Encoding'
398432
# not filename. rather name, or something else right?
433+
# maybe it should be inline?? i forget attach_method / access meaning
399434
mime.headers['Content-Disposition'] = [%{attachment; filename="#{@embedded_msg.subject}"}]
400435
@embedded_msg.to_mime.to_s
401436
elsif @embedded_ole
@@ -409,7 +444,7 @@ def to_mime
409444
else
410445
data.read.to_s
411446
end
412-
mime.body.replace Base64.encode64(data_str).gsub(/\n/, "\r\n")
447+
mime.body.replace @embedded_msg ? data_str : Base64.encode64(data_str).gsub(/\n/, "\r\n")
413448
mime
414449
end
415450

lib/msg/properties.rb

+29-12
Original file line numberDiff line numberDiff line change
@@ -48,24 +48,28 @@ class Msg
4848
# There also needs to be a way to look up properties more specifically:
4949
#
5050
# properties[0x0037] # => gets the subject
51-
# properties[PS_MAPI, 0x0037] # => still gets the subject
52-
# properties[PS_PUBLIC_STRINGS, 'Keywords'] # => gets the above categories
51+
# properties[0x0037, PS_MAPI] # => still gets the subject
52+
# properties['Keywords', PS_PUBLIC_STRINGS] # => gets outlook's categories array
5353
#
54-
# The abbreviate versions work by "resolving" the symbols to full keys:
54+
# The abbreviated versions work by "resolving" the symbols to full keys:
5555
#
56-
# properties.resolve :keywords # => [PS_OUTLOOK, 'Keywords']
57-
# properties.resolve :subject # => [PS_MAPI, 0x0037]
56+
# # the guid here is just PS_PUBLIC_STRINGS
57+
# properties.resolve :keywords # => #<Key {00020329-0000-0000-c000-000000000046}/"Keywords">
58+
# # the result here is actually also a key
59+
# k = properties.resolve :subject # => 0x0037
60+
# # it has a guid
61+
# k.guid == Msg::Properties::PS_MAPI # => true
5862
#
5963
# = Parsing
6064
#
6165
# There are three objects that need to be parsed to load a +Msg+ property store:
6266
#
63-
# 1. The +nameid+ directory (<tt>Properties.parse_nameid</tt>)
67+
# 1. The +nameid+ directory (<tt>Properties.parse_nameid</tt>)
6468
# 2. The many +substg+ objects, whose names should match <tt>Properties::SUBSTG_RX</tt>
6569
# (<tt>Properties#parse_substg</tt>)
6670
# 3. The +properties+ file (<tt>Properties#parse_properties</tt>)
6771
#
68-
# Understanding of the formats is by no means perfect
72+
# Understanding of the formats is by no means perfect.
6973
#
7074
# = TODO
7175
#
@@ -79,7 +83,7 @@ class Msg
7983
# current greedy-loading approach. still want strings to work nicely:
8084
# props.subject
8185
# but don't want to be loading up large binary blobs, typically attachments, eg
82-
# props.attach_data.
86+
# props.attach_data
8387
# probably the easiest solution is that the binary "encoding", be to return an io
8488
# object instead. and you must read it if you want it as a string
8589
# maybe i can avoid the greedy model anyway? rather than parsing the properties completely,
@@ -98,7 +102,7 @@ class Properties
98102
0x001f => proc { |obj| Ole::Types::FROM_UTF16.iconv obj.read }, # unicode
99103
# ascii
100104
# FIXME hack did a[0..-2] before, seems right sometimes, but for some others it chopped the text. chomp
101-
0x001e => proc { |obj| a = obj.read; a[-1] == 0 ? a[0...-2] : a },
105+
0x001e => proc { |obj| obj.read.chomp 0.chr },
102106
0x0102 => proc { |obj| obj.open }, # binary?
103107
:default => proc { |obj| obj.open }
104108
}
@@ -133,9 +137,13 @@ class Properties
133137
attr_reader :unused
134138
attr_reader :nameid
135139

140+
# +nameid+ is to provide a way to inherit from parent (needed for property sets for
141+
# attachments and recipients, which inherit from the msg itself. what about nested
142+
# msg??)
136143
def initialize
137144
@raw = {}
138145
@unused = []
146+
@nameid = nil
139147
# FIXME
140148
@body_rtf = @body_html = @body = false
141149
end
@@ -144,7 +152,7 @@ def initialize
144152
# The parsing methods
145153
#++
146154

147-
def self.load obj
155+
def self.load obj, ignore=nil
148156
prop = Properties.new
149157
prop.load obj
150158
prop
@@ -154,9 +162,16 @@ def self.load obj
154162
def load obj
155163
# we need to do the nameid first, as it provides the map for later user defined properties
156164
children = obj.children.dup
157-
@nameid = if nameid_obj = children.find { |child| child.name == '__nameid_version1.0' }
165+
if nameid_obj = children.find { |child| child.name == '__nameid_version1.0' }
158166
children.delete nameid_obj
159-
Properties.parse_nameid nameid_obj
167+
@nameid = Properties.parse_nameid nameid_obj
168+
# hack to make it available to all msg files from the same ole storage object
169+
class << obj.ole
170+
attr_accessor :msg_nameid
171+
end
172+
obj.ole.msg_nameid = @nameid
173+
elsif obj.ole
174+
@nameid = obj.ole.msg_nameid rescue nil
160175
end
161176
# now parse the actual properties. i think dirs that match the substg should be decoded
162177
# as properties to. 0x000d is just another encoding, the dir encoding. it should match
@@ -310,6 +325,8 @@ def add_property key, value, pos=nil
310325
elsif real_key = @nameid[key]
311326
key = real_key
312327
else
328+
# i think i hit these when i have a named property, in the PS_MAPI
329+
# guid
313330
Log.warn "property in named range not in nameid #{key.inspect}"
314331
key = Key.new key
315332
end

0 commit comments

Comments
 (0)