Fine-grained memory profiling

Add residual_bytes, peak_bytes and output_bytes.
Allow to order/select/filter by
accelerator_micros/cpu_micros/peak_bytes/residual_bytes/output_bytes

Also updated the testdata.

PiperOrigin-RevId: 164079214
This commit is contained in:
A. Unique TensorFlower 2017-08-02 21:29:03 -07:00 committed by Benoit Steiner
parent d8689f3241
commit 1848070f47
40 changed files with 1949 additions and 1745 deletions

View File

@ -106,7 +106,7 @@ _TFProfRoot (--/930.58k params)
### Show the most expensive operation types. ### Show the most expensive operation types.
``` ```
tfprof> op -select micros,bytes,occurrence -order_by micros tfprof> op -select micros,bytes,occurrence -order_by micros
node name | output bytes | total execution time | accelerator execution time | cpu execution time | op occurrence (run|defined) node name | requested bytes | total execution time | accelerator execution time | cpu execution time | op occurrence (run|defined)
SoftmaxCrossEntropyWithLogits 36.58MB (100.00%, 0.05%), 1.37sec (100.00%, 26.68%), 0us (100.00%, 0.00%), 1.37sec (100.00%, 30.75%), 30|30 SoftmaxCrossEntropyWithLogits 36.58MB (100.00%, 0.05%), 1.37sec (100.00%, 26.68%), 0us (100.00%, 0.00%), 1.37sec (100.00%, 30.75%), 30|30
MatMul 2720.57MB (99.95%, 3.66%), 708.14ms (73.32%, 13.83%), 280.76ms (100.00%, 41.42%), 427.39ms (69.25%, 9.62%), 2694|3450 MatMul 2720.57MB (99.95%, 3.66%), 708.14ms (73.32%, 13.83%), 280.76ms (100.00%, 41.42%), 427.39ms (69.25%, 9.62%), 2694|3450
ConcatV2 741.37MB (96.29%, 1.00%), 389.63ms (59.49%, 7.61%), 31.80ms (58.58%, 4.69%), 357.83ms (59.63%, 8.05%), 4801|6098 ConcatV2 741.37MB (96.29%, 1.00%), 389.63ms (59.49%, 7.61%), 31.80ms (58.58%, 4.69%), 357.83ms (59.63%, 8.05%), 4801|6098
@ -192,7 +192,7 @@ Open a Chrome browser, enter URL chrome://tracing and load the timeline file.
****************************************************** ******************************************************
``` ```
<left> <left>
[Timeline](g3doc/graph_timeline.png) ![Timeline](g3doc/graph_timeline.png)
</left> </left>
``` ```
@ -213,7 +213,7 @@ pprof -png --nodecount=20 --sample_index=1 <filename>
``` ```
<left> <left>
[PprofGraph](g3doc/pprof.jpg) ![PprofGraph](g3doc/pprof.jpg)
</left> </left>
### Feature Request and Bug Report ### Feature Request and Bug Report

View File

@ -48,7 +48,18 @@ In graph view, in means the number of hops in the <b>graph</b>.
`-min_bytes`: Show nodes that request at least this number of bytes. `-min_bytes`: Show nodes that request at least this number of bytes.
`-min_micros`: Show nodes that spend at least this number of microseconds to run. `-min_peak_bytes`: Show nodes that using at least this number of bytes during peak memory usage.
`-min_residual_bytes`: Show nodes that have at least this number of bytes not being de-allocated after Compute.
`-min_output_bytes`: Show nodes that have at least this number of bytes output (no necessarily allocated by the nodes).
`-min_micros`: Show nodes that spend at least this number of microseconds to run. It sums
accelerator_micros and cpu_micros. Note: cpu and accelerator can run in parallel.
`-min_accelerator_micros`: Show nodes that spend at least this number of microseconds to run on accelerator (e.g. GPU).
`-min_cpu_micros`: Show nodes that spend at least this number of microseconds to run on CPU.
`-min_params`: Show nodes that contains at least this number of parameters. `-min_params`: Show nodes that contains at least this number of parameters.
@ -58,7 +69,7 @@ In graph view, in means the number of hops in the <b>graph</b>.
`-step`: Show the stats of the this step when multiple steps of RunMetadata were added. By default, show the average of all steps." `-step`: Show the stats of the this step when multiple steps of RunMetadata were added. By default, show the average of all steps."
`-order_by`: Order the results by [name|depth|bytes|micros|accelerator_micros|cpu_micros|params|float_ops|occurrence] `-order_by`: Order the results by [name|depth|bytes|peak_bytes|residual_bytes|output_bytes|micros|accelerator_micros|cpu_micros|params|float_ops|occurrence]
`-account_type_regexes`: Account and display the nodes whose types match one of the type regexes specified. tfprof allow user to define extra operation types for graph nodes through tensorflow.tfprof.OpLogProto proto. regexes are comma-sperated. `-account_type_regexes`: Account and display the nodes whose types match one of the type regexes specified. tfprof allow user to define extra operation types for graph nodes through tensorflow.tfprof.OpLogProto proto. regexes are comma-sperated.
@ -76,7 +87,7 @@ In graph view, in means the number of hops in the <b>graph</b>.
Notes: See <b>overview</b> sesion on how does above options play with each other to decide the output and counting. Notes: See <b>overview</b> sesion on how does above options play with each other to decide the output and counting.
`-select`: Comma-separated list of attributes to show. Supported attributes: `-select`: Comma-separated list of attributes to show. Supported attributes:
[bytes|micros|accelerator_micros|cpu_micros|params|float_ops|occurrence|tensor_value|device|op_types|input_shapes]. [bytes|peak_bytes|residual_bytes|output_bytes|micros|accelerator_micros|cpu_micros|params|float_ops|occurrence|tensor_value|device|op_types|input_shapes].
`-output`: Output results as stdout, file or timeline. `-output`: Output results as stdout, file or timeline.
The format is ```output_type:key=value,key=value```. The format is ```output_type:key=value,key=value```.

View File

@ -15,7 +15,6 @@ Open a Chrome browser, enter URL chrome://tracing and load the timeline file.
``` ```
<left> <left>
TODO(xpan): Show the image correctly in github.
![Timeline](graph_timeline.png) ![Timeline](graph_timeline.png)
</left> </left>
@ -26,7 +25,7 @@ TODO(xpan): Show the image correctly in github.
# With op view, it shows you the aggregated output tensor bytes of each # With op view, it shows you the aggregated output tensor bytes of each
# operation type. # operation type.
tfprof> op -select bytes -order_by bytes tfprof> op -select bytes -order_by bytes
node name | output bytes node name | requested bytes
Identity 32515.37MB (100.00%, 27.02%) Identity 32515.37MB (100.00%, 27.02%)
FusedBatchNormGrad 10802.14MB (72.98%, 8.98%) FusedBatchNormGrad 10802.14MB (72.98%, 8.98%)
FusedBatchNorm 10517.52MB (64.01%, 8.74%) FusedBatchNorm 10517.52MB (64.01%, 8.74%)
@ -41,7 +40,7 @@ AddN 2741.49MB (8.56%, 2.28%)
# With scope view, you can see the operations that outputs largest tensors. # With scope view, you can see the operations that outputs largest tensors.
tfprof> scope -order_by bytes -select bytes -min_bytes 100000000 tfprof> scope -order_by bytes -select bytes -min_bytes 100000000
node name | output bytes node name | requested bytes
_TFProfRoot (--/120356.38MB) _TFProfRoot (--/120356.38MB)
tower_3/SepConv2d_2b_3x3/separable_conv2d (346.85MB/854.00MB) tower_3/SepConv2d_2b_3x3/separable_conv2d (346.85MB/854.00MB)
tower_3/SepConv2d_2b_3x3/separable_conv2d/depthwise (507.15MB/507.15MB) tower_3/SepConv2d_2b_3x3/separable_conv2d/depthwise (507.15MB/507.15MB)
@ -61,7 +60,7 @@ _TFProfRoot (--/120356.38MB)
# code view. # code view.
tfprof> code -max_depth 10 -select bytes -order_by bytes -start_name_regexes .*seq2seq.* -min_bytes 1 tfprof> code -max_depth 10 -select bytes -order_by bytes -start_name_regexes .*seq2seq.* -min_bytes 1
node name | output bytes node name | requested bytes
_TFProfRoot (--/74148.60MB) _TFProfRoot (--/74148.60MB)
seq2seq_attention.py'>:168:run_filename_from...:none (0B/74148.60MB) seq2seq_attention.py'>:168:run_filename_from...:none (0B/74148.60MB)
seq2seq_attention.py'>:33:_run_code_in_main:none (0B/74148.60MB) seq2seq_attention.py'>:33:_run_code_in_main:none (0B/74148.60MB)

View File

@ -47,8 +47,8 @@ class ExpensiveOperationChecker : public Checker {
fprintf(stderr, "Missing run_meta for %s\n", name().c_str()); fprintf(stderr, "Missing run_meta for %s\n", name().c_str());
return; return;
} }
Options opts(3, 0, 1, 0, 0, 0, -1, "micros", {".*"}, {".*"}, {}, {".*"}, {}, Options opts(3, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, -1, "micros", {".*"}, {".*"},
false, {"micros", "occurrence"}, "none", {}); {}, {".*"}, {}, false, {"micros", "occurrence"}, "none", {});
const MultiGraphNodeProto root = stats->ShowMultiGraphNode("op", opts); const MultiGraphNodeProto root = stats->ShowMultiGraphNode("op", opts);
if (root.children_size() == 0) { if (root.children_size() == 0) {
return; return;
@ -74,8 +74,8 @@ class ExpensiveOperationChecker : public Checker {
fprintf(stderr, "Missing op_log (code traces) for %s\n", name().c_str()); fprintf(stderr, "Missing op_log (code traces) for %s\n", name().c_str());
return; return;
} }
Options opts(100, 0, 1, 0, 0, 0, -1, "micros", {".*"}, {".*"}, {}, {".*"}, Options opts(100, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, -1, "micros", {".*"},
{}, false, {"micros"}, "none", {}); {".*"}, {}, {".*"}, {}, false, {"micros"}, "none", {});
const MultiGraphNodeProto root = stats->ShowMultiGraphNode("code", opts); const MultiGraphNodeProto root = stats->ShowMultiGraphNode("code", opts);
const MultiGraphNodeProto* node = &root; const MultiGraphNodeProto* node = &root;
// A trick here is: Usually, codes in library file are usually referenced // A trick here is: Usually, codes in library file are usually referenced
@ -93,8 +93,8 @@ class ExpensiveOperationChecker : public Checker {
} }
void CheckScopeView(const TFStats* stats) { void CheckScopeView(const TFStats* stats) {
Options opts(100, 0, 100, 0, 0, 0, -1, "micros", {".*"}, {".*"}, {}, {".*"}, Options opts(100, 0, 0, 0, 0, 100, 0, 0, 0, 0, 0, -1, "micros", {".*"},
{}, false, {"micros"}, "none", {}); {".*"}, {}, {".*"}, {}, false, {"micros"}, "none", {});
const GraphNodeProto root = stats->ShowGraphNode("scope", opts); const GraphNodeProto root = stats->ShowGraphNode("scope", opts);
if (root.children_size() == 0) { if (root.children_size() == 0) {
return; return;

File diff suppressed because it is too large Load Diff

View File

@ -1,17 +1,11 @@
 
conv2d_2/BiasAdd  DW2_trainable_variables
 
conv2d/BiasAddè ScalarW_trainable_variables
% 
conv2d_1/bias_trainable_variables DW_trainable_variables

conv2d_2/convolutionÀp Conv2DÈ-

conv2d/convolutionð— Conv2D_1€$
#
conv2d/bias_trainable_variables
'
conv2d_1/kernel_trainable_variables
%
conv2d/kernel_trainable_variables

View File

@ -191,6 +191,13 @@ class Samples {
} else if (type == kShown[0]) { } else if (type == kShown[0]) {
sample_pb->mutable_value()->Add( sample_pb->mutable_value()->Add(
gn->requested_bytes(node->node->step())); gn->requested_bytes(node->node->step()));
} else if (type == kShown[11]) {
sample_pb->mutable_value()->Add(gn->peak_bytes(node->node->step()));
} else if (type == kShown[12]) {
sample_pb->mutable_value()->Add(
gn->residual_bytes(node->node->step()));
} else if (type == kShown[13]) {
sample_pb->mutable_value()->Add(gn->output_bytes(node->node->step()));
} else if (type == kShown[2]) { } else if (type == kShown[2]) {
sample_pb->mutable_value()->Add(gn->parameters()); sample_pb->mutable_value()->Add(gn->parameters());
} else if (type == kShown[3]) { } else if (type == kShown[3]) {
@ -296,9 +303,21 @@ class PprofProfileImpl : public PprofProfile {
string_table_.GetIndex("CPU execution time.")); string_table_.GetIndex("CPU execution time."));
} }
} else if (type == kShown[0]) { } else if (type == kShown[0]) {
sample_type->set_unit(string_table_.GetIndex("bytes")); sample_type->set_unit(string_table_.GetIndex("requested bytes"));
profile_pb->mutable_comment()->Add( profile_pb->mutable_comment()->Add(
string_table_.GetIndex("Sum of operation output memory.")); string_table_.GetIndex("Sum of operation total requested memory."));
} else if (type == kShown[11]) {
sample_type->set_unit(string_table_.GetIndex("peak bytes"));
profile_pb->mutable_comment()->Add(
string_table_.GetIndex("Sum of operation peak memory usage."));
} else if (type == kShown[12]) {
sample_type->set_unit(string_table_.GetIndex("residual bytes"));
profile_pb->mutable_comment()->Add(string_table_.GetIndex(
"Sum of operation allocated memory after finish."));
} else if (type == kShown[13]) {
sample_type->set_unit(string_table_.GetIndex("output bytes"));
profile_pb->mutable_comment()->Add(
string_table_.GetIndex("Sum of operation output size."));
} else if (type == kShown[2]) { } else if (type == kShown[2]) {
sample_type->set_unit(string_table_.GetIndex("count")); sample_type->set_unit(string_table_.GetIndex("count"));
profile_pb->mutable_comment()->Add( profile_pb->mutable_comment()->Add(
@ -370,7 +389,8 @@ const ShowMultiNode* TFCode::ShowInternal(const Options& opts,
} }
string select = *opts.select.begin(); string select = *opts.select.begin();
if (select != kShown[0] && select != kShown[1] && select != kShown[2] && if (select != kShown[0] && select != kShown[1] && select != kShown[2] &&
select != kShown[3] && select != kShown[9] && select != kShown[10]) { select != kShown[3] && select != kShown[9] && select != kShown[10] &&
select != kShown[11] && select != kShown[12] && select != kShown[13]) {
fprintf(stderr, "pprof doesn't support -select=%s\n", select.c_str()); fprintf(stderr, "pprof doesn't support -select=%s\n", select.c_str());
return root_.get(); return root_.get();
} }
@ -522,17 +542,37 @@ std::vector<CodeNode*> TFCode::Account(const std::vector<CodeNode*>& roots,
return act_nodes; return act_nodes;
} }
string TFCode::FormatNode(CodeNode* node, const Options& opts, int64 indent) { string TFCode::FormatNodeMemory(CodeNode* node, int64 bytes,
int64 total_bytes) const {
string memory = FormatMemory(total_bytes);
if (node->account) {
memory = FormatMemory(bytes) + "/" + memory;
} else {
memory = "--/" + memory;
}
return memory;
}
string TFCode::FormatNode(CodeNode* node, const Options& opts,
int64 indent) const {
std::vector<string> attrs; std::vector<string> attrs;
if (opts.select.find(kShown[0]) != opts.select.end()) { if (opts.select.find(kShown[0]) != opts.select.end()) {
string memory = FormatMemory(node->proto().total_requested_bytes()); attrs.push_back(FormatNodeMemory(node, node->proto().requested_bytes(),
if (node->account) { node->proto().total_requested_bytes()));
memory = FormatMemory(node->proto().requested_bytes()) + "/" + memory;
} else {
memory = "--/" + memory;
}
attrs.push_back(memory);
} }
if (opts.select.find(kShown[11]) != opts.select.end()) {
attrs.push_back(FormatNodeMemory(node, node->proto().peak_bytes(),
node->proto().total_peak_bytes()));
}
if (opts.select.find(kShown[12]) != opts.select.end()) {
attrs.push_back(FormatNodeMemory(node, node->proto().residual_bytes(),
node->proto().total_residual_bytes()));
}
if (opts.select.find(kShown[13]) != opts.select.end()) {
attrs.push_back(FormatNodeMemory(node, node->proto().output_bytes(),
node->proto().total_output_bytes()));
}
std::vector<string> time_attrs = FormatTimes(node, opts); std::vector<string> time_attrs = FormatTimes(node, opts);
attrs.insert(attrs.end(), time_attrs.begin(), time_attrs.end()); attrs.insert(attrs.end(), time_attrs.begin(), time_attrs.end());

View File

@ -79,7 +79,8 @@ class TFCode : public TFMultiShow {
const Options& opts, string* display_str, const Options& opts, string* display_str,
MultiGraphNodeProto* proto, std::vector<uint64>* call_ids); MultiGraphNodeProto* proto, std::vector<uint64>* call_ids);
string FormatNode(CodeNode* node, const Options& opts, int64 indent); string FormatNode(CodeNode* node, const Options& opts, int64 indent) const;
string FormatNodeMemory(CodeNode* node, int64 bytes, int64 total_bytes) const;
std::unique_ptr<CodeNode> root_; std::unique_ptr<CodeNode> root_;
std::unique_ptr<TFMultiGraphNode> graph_root_; std::unique_ptr<TFMultiGraphNode> graph_root_;

View File

@ -110,9 +110,11 @@ void ExecStep::AddMemoryStats(const string& dev,
uint64 output_ptr = uint64 output_ptr =
output.tensor_description().allocation_description().ptr(); output.tensor_description().allocation_description().ptr();
total_output_bytes += output_bytes; total_output_bytes += output_bytes;
output_bytes_[output.slot()] = std::make_pair(output_bytes, output_ptr); output_memory_[output.slot()] = std::make_pair(output_bytes, output_ptr);
} }
} }
output_bytes_ = total_output_bytes;
if (step_stat.has_memory_stats()) { if (step_stat.has_memory_stats()) {
host_temp_bytes_ += step_stat.memory_stats().host_temp_memory_size(); host_temp_bytes_ += step_stat.memory_stats().host_temp_memory_size();
host_persistent_bytes_ += host_persistent_bytes_ +=
@ -122,7 +124,17 @@ void ExecStep::AddMemoryStats(const string& dev,
accelerator_persistent_bytes_ += accelerator_persistent_bytes_ +=
step_stat.memory_stats().device_persistent_memory_size(); step_stat.memory_stats().device_persistent_memory_size();
} }
requested_bytes_ = total_output_bytes; int64 residual_bytes = 0;
int64 requested_bytes = 0;
int64 peak_bytes = 0;
for (const auto& mem : step_stat.memory()) {
residual_bytes += mem.live_bytes();
requested_bytes += mem.total_bytes();
peak_bytes += mem.peak_bytes();
}
requested_bytes_ = requested_bytes;
residual_bytes_ = residual_bytes;
peak_bytes_ = peak_bytes;
} }
void TFGraphNode::AddStepStat(int64 step, const string& device, void TFGraphNode::AddStepStat(int64 step, const string& device,

View File

@ -51,6 +51,9 @@ class ExecStep {
latest_end_micros_(0), latest_end_micros_(0),
mem_initiated_(false), mem_initiated_(false),
requested_bytes_(0), requested_bytes_(0),
peak_bytes_(0),
residual_bytes_(0),
output_bytes_(0),
host_temp_bytes_(0), host_temp_bytes_(0),
host_persistent_bytes_(0), host_persistent_bytes_(0),
accelerator_temp_bytes_(0), accelerator_temp_bytes_(0),
@ -78,14 +81,17 @@ class ExecStep {
int64 latest_end_micros() const { return latest_end_micros_; } int64 latest_end_micros() const { return latest_end_micros_; }
int64 requested_bytes() const { return requested_bytes_; } int64 requested_bytes() const { return requested_bytes_; }
int64 peak_bytes() const { return peak_bytes_; }
int64 residual_bytes() const { return residual_bytes_; }
int64 output_bytes() const { return output_bytes_; }
int64 accelerator_temp_bytes() const { return accelerator_temp_bytes_; } int64 accelerator_temp_bytes() const { return accelerator_temp_bytes_; }
int64 host_temp_bytes() const { return host_temp_bytes_; } int64 host_temp_bytes() const { return host_temp_bytes_; }
int64 accelerator_persistent_bytes() const { int64 accelerator_persistent_bytes() const {
return accelerator_persistent_bytes_; return accelerator_persistent_bytes_;
} }
int64 host_persistent_bytes() const { return host_persistent_bytes_; } int64 host_persistent_bytes() const { return host_persistent_bytes_; }
const std::map<int64, std::pair<int64, uint64>>& output_bytes() const { const std::map<int64, std::pair<int64, uint64>>& output_memory() const {
return output_bytes_; return output_memory_;
} }
int64 allocator_bytes_in_use() const { return allocator_bytes_in_use_; } int64 allocator_bytes_in_use() const { return allocator_bytes_in_use_; }
@ -111,8 +117,14 @@ class ExecStep {
std::set<string> devices_; std::set<string> devices_;
bool mem_initiated_; bool mem_initiated_;
// Total output bytes requested by the op. // Total bytes requested by the op.
int64 requested_bytes_; int64 requested_bytes_;
// Total bytes requested by the op and released before op end.
int64 peak_bytes_;
// Total bytes requested by the op and not released after op end.
int64 residual_bytes_;
// Total bytes output by the op (not necessarily requested by the op).
int64 output_bytes_;
// Total temporary bytes allocated and released by the op. // Total temporary bytes allocated and released by the op.
int64 host_temp_bytes_; int64 host_temp_bytes_;
// Total persistent bytes (e.g. variable) allocated by the op. // Total persistent bytes (e.g. variable) allocated by the op.
@ -122,9 +134,27 @@ class ExecStep {
// The total number of bytes currently allocated by the allocator if >0. // The total number of bytes currently allocated by the allocator if >0.
int64 allocator_bytes_in_use_; int64 allocator_bytes_in_use_;
// output_idx -> {output_bytes, memory_ptr} // output_idx -> {output_bytes, memory_ptr}
std::map<int64, std::pair<int64, uint64>> output_bytes_; std::map<int64, std::pair<int64, uint64>> output_memory_;
}; };
#define GRAPH_NODE_BYTES(type) \
do { \
if (execs_.empty()) { \
return 0; \
} \
if (step >= 0) { \
auto exec = execs_.find(step); \
CHECK(exec != execs_.end()) << "unknown step " << step; \
return exec->second.type##_bytes(); \
} \
\
int64 bytes = 0; \
for (const auto& exec : execs_) { \
bytes += exec.second.type##_bytes(); \
} \
return bytes / execs_.size(); \
} while (0)
class TFGraphNode { class TFGraphNode {
public: public:
TFGraphNode(const NodeDef* node) TFGraphNode(const NodeDef* node)
@ -270,22 +300,10 @@ class TFGraphNode {
return total_micros / execs_.size(); return total_micros / execs_.size();
} }
int64 requested_bytes(int64 step) const { int64 requested_bytes(int64 step) const { GRAPH_NODE_BYTES(requested); }
if (execs_.empty()) { int64 peak_bytes(int64 step) const { GRAPH_NODE_BYTES(peak); }
return 0; int64 residual_bytes(int64 step) const { GRAPH_NODE_BYTES(residual); }
} int64 output_bytes(int64 step) const { GRAPH_NODE_BYTES(output); }
if (step >= 0) {
auto exec = execs_.find(step);
CHECK(exec != execs_.end()) << "unknown step " << step;
return exec->second.requested_bytes();
}
int64 requested_bytes = 0;
for (const auto& exec : execs_) {
requested_bytes += exec.second.requested_bytes();
}
return requested_bytes / execs_.size();
}
int64 all_start_micros(int64 step) const { int64 all_start_micros(int64 step) const {
auto exec = execs_.find(step); auto exec = execs_.find(step);
@ -328,11 +346,11 @@ class TFGraphNode {
CHECK(exec != execs_.end()) << "unknown step " << step; CHECK(exec != execs_.end()) << "unknown step " << step;
return exec->second.host_persistent_bytes(); return exec->second.host_persistent_bytes();
} }
const std::map<int64, std::pair<int64, uint64>>& output_bytes( const std::map<int64, std::pair<int64, uint64>>& output_memory(
int64 step) const { int64 step) const {
auto exec = execs_.find(step); auto exec = execs_.find(step);
CHECK(exec != execs_.end()) << "unknown step " << step; CHECK(exec != execs_.end()) << "unknown step " << step;
return exec->second.output_bytes(); return exec->second.output_memory();
} }
int64 allocator_bytes_in_use(int64 step) const { int64 allocator_bytes_in_use(int64 step) const {
auto exec = execs_.find(step); auto exec = execs_.find(step);
@ -427,6 +445,9 @@ class TFMultiGraphNode {
accelerator_exec_micros_(0), accelerator_exec_micros_(0),
cpu_exec_micros_(0), cpu_exec_micros_(0),
requested_bytes_(0), requested_bytes_(0),
peak_bytes_(0),
residual_bytes_(0),
output_bytes_(0),
float_ops_(0), float_ops_(0),
parameters_(0) {} parameters_(0) {}
@ -437,6 +458,10 @@ class TFMultiGraphNode {
cpu_exec_micros_ = 0; cpu_exec_micros_ = 0;
requested_bytes_ = 0; requested_bytes_ = 0;
peak_bytes_ = 0;
residual_bytes_ = 0;
output_bytes_ = 0;
float_ops_ = 0; float_ops_ = 0;
parameters_ = 0; parameters_ = 0;
op_types_.clear(); op_types_.clear();
@ -460,6 +485,10 @@ class TFMultiGraphNode {
cpu_exec_micros_ += node->cpu_exec_micros(step); cpu_exec_micros_ += node->cpu_exec_micros(step);
requested_bytes_ += node->requested_bytes(step); requested_bytes_ += node->requested_bytes(step);
peak_bytes_ += node->peak_bytes(step);
residual_bytes_ += node->residual_bytes(step);
output_bytes_ += node->output_bytes(step);
float_ops_ += node->float_ops(step); float_ops_ += node->float_ops(step);
parameters_ += node->parameters(); parameters_ += node->parameters();
if (node->shape().size() > 0) { if (node->shape().size() > 0) {
@ -492,6 +521,9 @@ class TFMultiGraphNode {
int64 cpu_exec_micros() const { return cpu_exec_micros_; } int64 cpu_exec_micros() const { return cpu_exec_micros_; }
int64 requested_bytes() const { return requested_bytes_; } int64 requested_bytes() const { return requested_bytes_; }
int64 peak_bytes() const { return peak_bytes_; }
int64 residual_bytes() const { return residual_bytes_; }
int64 output_bytes() const { return output_bytes_; }
int64 float_ops() const { return float_ops_; } int64 float_ops() const { return float_ops_; }
@ -540,6 +572,9 @@ class TFMultiGraphNode {
int64 cpu_exec_micros_; int64 cpu_exec_micros_;
int64 requested_bytes_; int64 requested_bytes_;
int64 peak_bytes_;
int64 residual_bytes_;
int64 output_bytes_;
int64 float_ops_; int64 float_ops_;
int64 parameters_; int64 parameters_;
std::set<string> devices_; std::set<string> devices_;

View File

@ -38,6 +38,10 @@ void ShowNode::ReInit(int64 step) {
mutable_proto()->set_cpu_exec_micros(node->cpu_exec_micros(step)); mutable_proto()->set_cpu_exec_micros(node->cpu_exec_micros(step));
mutable_proto()->set_requested_bytes(node->requested_bytes(step)); mutable_proto()->set_requested_bytes(node->requested_bytes(step));
mutable_proto()->set_peak_bytes(node->peak_bytes(step));
mutable_proto()->set_residual_bytes(node->residual_bytes(step));
mutable_proto()->set_output_bytes(node->output_bytes(step));
mutable_proto()->set_float_ops(node->float_ops(step)); mutable_proto()->set_float_ops(node->float_ops(step));
mutable_proto()->clear_input_shapes(); mutable_proto()->clear_input_shapes();
@ -68,6 +72,12 @@ void ShowNode::AggregateTotalStats(ShowNode* node) {
mutable_proto()->set_total_requested_bytes(proto().total_requested_bytes() + mutable_proto()->set_total_requested_bytes(proto().total_requested_bytes() +
node_pb->total_requested_bytes()); node_pb->total_requested_bytes());
mutable_proto()->set_total_peak_bytes(proto().total_peak_bytes() +
node_pb->total_peak_bytes());
mutable_proto()->set_total_residual_bytes(proto().total_residual_bytes() +
node_pb->total_residual_bytes());
mutable_proto()->set_total_output_bytes(proto().total_output_bytes() +
node_pb->total_output_bytes());
mutable_proto()->set_total_parameters(proto().total_parameters() + mutable_proto()->set_total_parameters(proto().total_parameters() +
node_pb->total_parameters()); node_pb->total_parameters());
mutable_proto()->set_total_float_ops(proto().total_float_ops() + mutable_proto()->set_total_float_ops(proto().total_float_ops() +
@ -89,6 +99,13 @@ void ShowNode::AddSelfToTotalStats() {
mutable_proto()->set_total_requested_bytes(proto().total_requested_bytes() + mutable_proto()->set_total_requested_bytes(proto().total_requested_bytes() +
proto().requested_bytes()); proto().requested_bytes());
mutable_proto()->set_total_peak_bytes(proto().total_peak_bytes() +
proto().peak_bytes());
mutable_proto()->set_total_residual_bytes(proto().total_residual_bytes() +
proto().residual_bytes());
mutable_proto()->set_total_output_bytes(proto().total_output_bytes() +
proto().output_bytes());
mutable_proto()->set_total_parameters(proto().total_parameters() + mutable_proto()->set_total_parameters(proto().total_parameters() +
proto().parameters()); proto().parameters());
mutable_proto()->set_total_float_ops(proto().total_float_ops() + mutable_proto()->set_total_float_ops(proto().total_float_ops() +
@ -105,6 +122,10 @@ void ShowNode::ResetTotalStats() {
mutable_proto()->set_total_cpu_exec_micros(0); mutable_proto()->set_total_cpu_exec_micros(0);
mutable_proto()->set_total_requested_bytes(0); mutable_proto()->set_total_requested_bytes(0);
mutable_proto()->set_total_peak_bytes(0);
mutable_proto()->set_total_residual_bytes(0);
mutable_proto()->set_total_output_bytes(0);
mutable_proto()->set_total_parameters(0); mutable_proto()->set_total_parameters(0);
mutable_proto()->set_total_float_ops(0); mutable_proto()->set_total_float_ops(0);
mutable_proto()->mutable_children()->Clear(); mutable_proto()->mutable_children()->Clear();
@ -135,6 +156,10 @@ bool ShowMultiNode::ReInit(int64 step,
mutable_proto()->set_cpu_exec_micros(node->cpu_exec_micros()); mutable_proto()->set_cpu_exec_micros(node->cpu_exec_micros());
mutable_proto()->set_requested_bytes(node->requested_bytes()); mutable_proto()->set_requested_bytes(node->requested_bytes());
mutable_proto()->set_peak_bytes(node->peak_bytes());
mutable_proto()->set_residual_bytes(node->residual_bytes());
mutable_proto()->set_output_bytes(node->output_bytes());
mutable_proto()->set_float_ops(node->float_ops()); mutable_proto()->set_float_ops(node->float_ops());
mutable_proto()->set_parameters(node->parameters()); mutable_proto()->set_parameters(node->parameters());
@ -157,6 +182,13 @@ void ShowMultiNode::AggregateTotalStats(ShowMultiNode* node) {
mutable_proto()->set_total_requested_bytes(proto().total_requested_bytes() + mutable_proto()->set_total_requested_bytes(proto().total_requested_bytes() +
node_pb->total_requested_bytes()); node_pb->total_requested_bytes());
mutable_proto()->set_total_peak_bytes(proto().total_peak_bytes() +
node_pb->total_peak_bytes());
mutable_proto()->set_total_residual_bytes(proto().total_residual_bytes() +
node_pb->total_residual_bytes());
mutable_proto()->set_total_output_bytes(proto().total_output_bytes() +
node_pb->total_output_bytes());
mutable_proto()->set_total_parameters(proto().total_parameters() + mutable_proto()->set_total_parameters(proto().total_parameters() +
node_pb->total_parameters()); node_pb->total_parameters());
mutable_proto()->set_total_float_ops(proto().total_float_ops() + mutable_proto()->set_total_float_ops(proto().total_float_ops() +
@ -174,6 +206,13 @@ void ShowMultiNode::AddSelfToTotalStats() {
mutable_proto()->set_total_requested_bytes(proto().total_requested_bytes() + mutable_proto()->set_total_requested_bytes(proto().total_requested_bytes() +
proto().requested_bytes()); proto().requested_bytes());
mutable_proto()->set_total_peak_bytes(proto().total_peak_bytes() +
proto().peak_bytes());
mutable_proto()->set_total_residual_bytes(proto().total_residual_bytes() +
proto().residual_bytes());
mutable_proto()->set_total_output_bytes(proto().total_output_bytes() +
proto().output_bytes());
mutable_proto()->set_total_parameters(proto().total_parameters() + mutable_proto()->set_total_parameters(proto().total_parameters() +
proto().parameters()); proto().parameters());
mutable_proto()->set_total_float_ops(proto().total_float_ops() + mutable_proto()->set_total_float_ops(proto().total_float_ops() +
@ -187,6 +226,10 @@ void ShowMultiNode::ResetTotalStats() {
mutable_proto()->set_total_cpu_exec_micros(0); mutable_proto()->set_total_cpu_exec_micros(0);
mutable_proto()->set_total_requested_bytes(0); mutable_proto()->set_total_requested_bytes(0);
mutable_proto()->set_total_peak_bytes(0);
mutable_proto()->set_total_residual_bytes(0);
mutable_proto()->set_total_output_bytes(0);
mutable_proto()->set_total_parameters(0); mutable_proto()->set_total_parameters(0);
mutable_proto()->set_total_float_ops(0); mutable_proto()->set_total_float_ops(0);
mutable_proto()->mutable_children()->Clear(); mutable_proto()->mutable_children()->Clear();

View File

@ -211,24 +211,44 @@ int64 TFOp::SearchRoot(const std::vector<OpNode*> nodes,
return i; return i;
} }
string TFOp::FormatMemoryNode(int64 node_total_bytes, int64 root_total_bytes,
int64 node_bytes) const {
double accu_pct = 0.0;
double pct = 0.0;
if (node_bytes > 0) {
accu_pct = 100.0 * node_total_bytes / root_total_bytes;
pct = 100.0 * node_bytes / root_total_bytes;
}
return strings::Printf(
"%30s", strings::Printf("%s (%.2f%%, %.2f%%)",
FormatMemory(node_bytes).c_str(), accu_pct, pct)
.c_str());
}
string TFOp::FormatNode(OpNode* node, OpNode* root, const Options& opts) const { string TFOp::FormatNode(OpNode* node, OpNode* root, const Options& opts) const {
std::vector<string> attrs; std::vector<string> attrs;
if (opts.select.find(kShown[0]) != opts.select.end()) { if (opts.select.find(kShown[0]) != opts.select.end()) {
double accu_pct = 0.0; attrs.push_back(FormatMemoryNode(node->proto().total_requested_bytes(),
double pct = 0.0; root->proto().total_requested_bytes(),
if (node->proto().requested_bytes() > 0) { node->proto().requested_bytes()));
accu_pct = 100.0 * node->proto().total_requested_bytes() / }
root->proto().total_requested_bytes();
pct = 100.0 * node->proto().requested_bytes() / if (opts.select.find(kShown[11]) != opts.select.end()) {
root->proto().total_requested_bytes(); attrs.push_back(FormatMemoryNode(node->proto().total_peak_bytes(),
} root->proto().total_peak_bytes(),
attrs.push_back(strings::Printf( node->proto().peak_bytes()));
"%30s", }
strings::Printf("%s (%.2f%%, %.2f%%)",
FormatMemory(node->proto().requested_bytes()).c_str(), if (opts.select.find(kShown[12]) != opts.select.end()) {
accu_pct, pct) attrs.push_back(FormatMemoryNode(node->proto().total_residual_bytes(),
.c_str())); root->proto().total_residual_bytes(),
node->proto().residual_bytes()));
}
if (opts.select.find(kShown[13]) != opts.select.end()) {
attrs.push_back(FormatMemoryNode(node->proto().total_output_bytes(),
root->proto().total_output_bytes(),
node->proto().output_bytes()));
} }
if (opts.select.find(kShown[1]) != opts.select.end()) { if (opts.select.find(kShown[1]) != opts.select.end()) {

View File

@ -65,6 +65,8 @@ class TFOp : public TFMultiShow {
} }
string FormatNode(OpNode* node, OpNode* root, const Options& opts) const; string FormatNode(OpNode* node, OpNode* root, const Options& opts) const;
string FormatMemoryNode(int64 node_total_bytes, int64 root_total_bytes,
int64 node_bytes) const;
std::unique_ptr<OpNode> root_; std::unique_ptr<OpNode> root_;
std::map<string, std::unique_ptr<OpNode>> cnodes_map_; std::map<string, std::unique_ptr<OpNode>> cnodes_map_;

View File

@ -151,9 +151,11 @@ tensorflow::Status Options::FromProtoStr(const string& opts_proto_str,
} }
*opts = Options( *opts = Options(
opts_pb.max_depth(), opts_pb.min_bytes(), opts_pb.min_micros(), opts_pb.max_depth(), opts_pb.min_bytes(), opts_pb.min_peak_bytes(),
opts_pb.min_params(), opts_pb.min_float_ops(), opts_pb.min_occurrence(), opts_pb.min_residual_bytes(), opts_pb.min_output_bytes(),
opts_pb.step(), opts_pb.order_by(), opts_pb.min_micros(), opts_pb.min_accelerator_micros(),
opts_pb.min_cpu_micros(), opts_pb.min_params(), opts_pb.min_float_ops(),
opts_pb.min_occurrence(), opts_pb.step(), opts_pb.order_by(),
std::vector<string>(opts_pb.account_type_regexes().begin(), std::vector<string>(opts_pb.account_type_regexes().begin(),
opts_pb.account_type_regexes().end()), opts_pb.account_type_regexes().end()),
std::vector<string>(opts_pb.start_name_regexes().begin(), std::vector<string>(opts_pb.start_name_regexes().begin(),
@ -179,6 +181,11 @@ string Options::ToString() const {
"%-28s%lld\n" "%-28s%lld\n"
"%-28s%lld\n" "%-28s%lld\n"
"%-28s%lld\n" "%-28s%lld\n"
"%-28s%lld\n"
"%-28s%lld\n"
"%-28s%lld\n"
"%-28s%lld\n"
"%-28s%lld\n"
"%-28s%s\n" "%-28s%s\n"
"%-28s%s\n" "%-28s%s\n"
"%-28s%s\n" "%-28s%s\n"
@ -188,17 +195,20 @@ string Options::ToString() const {
"%-28s%s\n" "%-28s%s\n"
"%-28s%s\n" "%-28s%s\n"
"%-28s%s:%s\n", "%-28s%s:%s\n",
kOptions[0], max_depth, kOptions[1], min_bytes, kOptions[2], min_micros, kOptions[0], max_depth, kOptions[1], min_bytes, kOptions[2],
kOptions[3], min_params, kOptions[4], min_float_ops, kOptions[5], min_peak_bytes, kOptions[3], min_residual_bytes, kOptions[4],
min_occurrence, kOptions[6], step, kOptions[7], order_by.c_str(), min_output_bytes, kOptions[5], min_micros, kOptions[6],
kOptions[8], str_util::Join(account_type_regexes, ",").c_str(), min_accelerator_micros, kOptions[7], min_cpu_micros, kOptions[8],
kOptions[9], str_util::Join(start_name_regexes, ",").c_str(), min_params, kOptions[9], min_float_ops, kOptions[10], min_occurrence,
kOptions[10], str_util::Join(trim_name_regexes, ",").c_str(), kOptions[11], step, kOptions[12], order_by.c_str(), kOptions[13],
kOptions[11], str_util::Join(show_name_regexes, ",").c_str(), str_util::Join(account_type_regexes, ",").c_str(), kOptions[14],
kOptions[12], str_util::Join(hide_name_regexes, ",").c_str(), str_util::Join(start_name_regexes, ",").c_str(), kOptions[15],
kOptions[13], (account_displayed_op_only ? "true" : "false"), str_util::Join(trim_name_regexes, ",").c_str(), kOptions[16],
kOptions[14], str_util::Join(select, ",").c_str(), kOptions[15], str_util::Join(show_name_regexes, ",").c_str(), kOptions[17],
output_type.c_str(), KeyValueToStr(output_options).c_str()); str_util::Join(hide_name_regexes, ",").c_str(), kOptions[18],
(account_displayed_op_only ? "true" : "false"), kOptions[19],
str_util::Join(select, ",").c_str(), kOptions[20], output_type.c_str(),
KeyValueToStr(output_options).c_str());
return s; return s;
} }

View File

@ -29,7 +29,12 @@ namespace tfprof {
static const char* const kOptions[] = { static const char* const kOptions[] = {
"-max_depth", "-max_depth",
"-min_bytes", "-min_bytes",
"-min_peak_bytes",
"-min_residual_bytes",
"-min_output_bytes",
"-min_micros", "-min_micros",
"-min_accelerator_micros",
"-min_cpu_micros",
"-min_params", "-min_params",
"-min_float_ops", "-min_float_ops",
"-min_occurrence", "-min_occurrence",
@ -46,17 +51,21 @@ static const char* const kOptions[] = {
}; };
static const char* const kOrderBy[] = { static const char* const kOrderBy[] = {
"name", "bytes", "micros", "accelerator_micros", "name", "bytes", "peak_bytes", "residual_bytes",
"cpu_micros", "params", "float_ops", "occurrence", "output_bytes", "micros", "accelerator_micros", "cpu_micros",
"params", "float_ops", "occurrence",
}; };
// Append Only. // Append Only.
// TODO(xpan): As we are adding more fields to be selected, we // TODO(xpan): As we are adding more fields to be selected, we
// need to have a way to tell users what fields are available in which view. // need to have a way to tell users what fields are available in which view.
static const char* const kShown[] = { static const char* const kShown[] = {"bytes", "micros",
"bytes", "micros", "params", "float_ops", "tensor_value", "params", "float_ops",
"device", "op_types", "occurrence", "input_shapes", "accelerator_micros", "tensor_value", "device",
"cpu_micros"}; "op_types", "occurrence",
"input_shapes", "accelerator_micros",
"cpu_micros", "peak_bytes",
"residual_bytes", "output_bytes"};
static const char* const kCmds[] = { static const char* const kCmds[] = {
"scope", "graph", "code", "op", "advise", "set", "help", "scope", "graph", "code", "op", "advise", "set", "help",
@ -94,11 +103,15 @@ struct Options {
virtual ~Options() {} virtual ~Options() {}
Options() Options()
: Options(0, 0, 0, 0, 0, 0, 0, "", {}, {}, {}, {}, {}, false, {}, "", : Options(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, "", {}, {}, {}, {}, {},
{}) {} false, {}, "", {}) {}
Options(int max_depth, tensorflow::int64 min_bytes, Options(int max_depth, tensorflow::int64 min_bytes,
tensorflow::int64 min_micros, tensorflow::int64 min_params, tensorflow::int64 min_peak_bytes,
tensorflow::int64 min_residual_bytes,
tensorflow::int64 min_output_bytes, tensorflow::int64 min_micros,
tensorflow::int64 min_accelerator_micros,
tensorflow::int64 min_cpu_micros, tensorflow::int64 min_params,
tensorflow::int64 min_float_ops, tensorflow::int64 min_occurrence, tensorflow::int64 min_float_ops, tensorflow::int64 min_occurrence,
tensorflow::int64 step, const string& order_by, tensorflow::int64 step, const string& order_by,
const std::vector<string>& account_type_regexes, const std::vector<string>& account_type_regexes,
@ -111,7 +124,12 @@ struct Options {
const std::map<string, string>& output_options) const std::map<string, string>& output_options)
: max_depth(max_depth), : max_depth(max_depth),
min_bytes(min_bytes), min_bytes(min_bytes),
min_peak_bytes(min_peak_bytes),
min_residual_bytes(min_residual_bytes),
min_output_bytes(min_output_bytes),
min_micros(min_micros), min_micros(min_micros),
min_accelerator_micros(min_accelerator_micros),
min_cpu_micros(min_cpu_micros),
min_params(min_params), min_params(min_params),
min_float_ops(min_float_ops), min_float_ops(min_float_ops),
min_occurrence(min_occurrence), min_occurrence(min_occurrence),
@ -131,7 +149,12 @@ struct Options {
int max_depth; int max_depth;
tensorflow::int64 min_bytes; tensorflow::int64 min_bytes;
tensorflow::int64 min_peak_bytes;
tensorflow::int64 min_residual_bytes;
tensorflow::int64 min_output_bytes;
tensorflow::int64 min_micros; tensorflow::int64 min_micros;
tensorflow::int64 min_accelerator_micros;
tensorflow::int64 min_cpu_micros;
tensorflow::int64 min_params; tensorflow::int64 min_params;
tensorflow::int64 min_float_ops; tensorflow::int64 min_float_ops;
tensorflow::int64 min_occurrence; tensorflow::int64 min_occurrence;

View File

@ -73,8 +73,14 @@ bool TFShow::ShouldShow(const ShowNode* node, const Options& opts,
// Always show kTFProfRoot. // Always show kTFProfRoot.
if (node->name() == kTFProfRoot) return true; if (node->name() == kTFProfRoot) return true;
if (node->proto().requested_bytes() < opts.min_bytes || if (node->proto().total_requested_bytes() < opts.min_bytes ||
node->proto().exec_micros() < opts.min_micros || node->proto().total_peak_bytes() < opts.min_peak_bytes ||
node->proto().total_residual_bytes() < opts.min_residual_bytes ||
node->proto().total_output_bytes() < opts.min_output_bytes ||
node->proto().total_exec_micros() < opts.min_micros ||
node->proto().total_accelerator_exec_micros() <
opts.min_accelerator_micros ||
node->proto().total_cpu_exec_micros() < opts.min_cpu_micros ||
node->proto().parameters() < opts.min_params || node->proto().parameters() < opts.min_params ||
node->proto().float_ops() < opts.min_float_ops || node->proto().float_ops() < opts.min_float_ops ||
node->proto().run_count() < opts.min_occurrence || node->proto().run_count() < opts.min_occurrence ||
@ -128,6 +134,17 @@ bool TFShow::ReAccount(ShowNode* node, const Options& opts) {
return false; return false;
} }
string TFShow::FormatNodeMemory(ShowNode* node, int64 bytes,
int64 total_bytes) const {
string memory = FormatMemory(total_bytes);
if (node->account) {
memory = FormatMemory(bytes) + "/" + memory;
} else {
memory = "--/" + memory;
}
return memory;
}
string TFShow::FormatNode(ShowNode* node, const Options& opts) const { string TFShow::FormatNode(ShowNode* node, const Options& opts) const {
std::vector<string> info; std::vector<string> info;
if (opts.select.find(kShown[2]) != opts.select.end()) { if (opts.select.find(kShown[2]) != opts.select.end()) {
@ -152,15 +169,22 @@ string TFShow::FormatNode(ShowNode* node, const Options& opts) const {
} }
info.push_back(fops); info.push_back(fops);
} }
std::vector<string> attrs;
if (opts.select.find(kShown[0]) != opts.select.end()) { if (opts.select.find(kShown[0]) != opts.select.end()) {
string memory = FormatMemory(node->proto().total_requested_bytes()); info.push_back(FormatNodeMemory(node, node->proto().requested_bytes(),
if (node->account) { node->proto().total_requested_bytes()));
memory = FormatMemory(node->proto().requested_bytes()) + "/" + memory; }
if (opts.select.find(kShown[11]) != opts.select.end()) {
} else { info.push_back(FormatNodeMemory(node, node->proto().peak_bytes(),
memory = "--/" + memory; node->proto().total_peak_bytes()));
} }
info.push_back(memory); if (opts.select.find(kShown[12]) != opts.select.end()) {
info.push_back(FormatNodeMemory(node, node->proto().residual_bytes(),
node->proto().total_residual_bytes()));
}
if (opts.select.find(kShown[13]) != opts.select.end()) {
info.push_back(FormatNodeMemory(node, node->proto().output_bytes(),
node->proto().total_output_bytes()));
} }
if (opts.select.find(kShown[1]) != opts.select.end()) { if (opts.select.find(kShown[1]) != opts.select.end()) {
info.push_back(FormatTotalExecTime(node, opts)); info.push_back(FormatTotalExecTime(node, opts));
@ -225,6 +249,15 @@ string TFShow::FormatLegend(const Options& opts) const {
legends.push_back("# float_ops"); legends.push_back("# float_ops");
} }
if (opts.select.find(kShown[0]) != opts.select.end()) { if (opts.select.find(kShown[0]) != opts.select.end()) {
legends.push_back("requested bytes");
}
if (opts.select.find(kShown[11]) != opts.select.end()) {
legends.push_back("peak bytes");
}
if (opts.select.find(kShown[12]) != opts.select.end()) {
legends.push_back("residual bytes");
}
if (opts.select.find(kShown[13]) != opts.select.end()) {
legends.push_back("output bytes"); legends.push_back("output bytes");
} }
if (opts.select.find(kShown[1]) != opts.select.end()) { if (opts.select.find(kShown[1]) != opts.select.end()) {

View File

@ -67,6 +67,7 @@ class TFShow {
bool ReAccount(ShowNode* node, const Options& opts); bool ReAccount(ShowNode* node, const Options& opts);
string FormatNode(ShowNode* node, const Options& opts) const; string FormatNode(ShowNode* node, const Options& opts) const;
string FormatNodeMemory(ShowNode* node, int64 bytes, int64 total_bytes) const;
string FormatLegend(const Options& opts) const; string FormatLegend(const Options& opts) const;
@ -87,17 +88,25 @@ class TFShow {
return n1->proto().total_requested_bytes() > return n1->proto().total_requested_bytes() >
n2->proto().total_requested_bytes(); n2->proto().total_requested_bytes();
} else if (opts.order_by == kOrderBy[2]) { } else if (opts.order_by == kOrderBy[2]) {
return n1->proto().total_peak_bytes() > n2->proto().total_peak_bytes();
} else if (opts.order_by == kOrderBy[3]) {
return n1->proto().total_residual_bytes() >
n2->proto().total_residual_bytes();
} else if (opts.order_by == kOrderBy[4]) {
return n1->proto().total_output_bytes() >
n2->proto().total_output_bytes();
} else if (opts.order_by == kOrderBy[5]) {
return n1->proto().total_exec_micros() > return n1->proto().total_exec_micros() >
n2->proto().total_exec_micros(); n2->proto().total_exec_micros();
} else if (opts.order_by == kOrderBy[3]) { } else if (opts.order_by == kOrderBy[6]) {
return n1->proto().total_accelerator_exec_micros() > return n1->proto().total_accelerator_exec_micros() >
n2->proto().total_accelerator_exec_micros(); n2->proto().total_accelerator_exec_micros();
} else if (opts.order_by == kOrderBy[4]) { } else if (opts.order_by == kOrderBy[7]) {
return n1->proto().total_cpu_exec_micros() > return n1->proto().total_cpu_exec_micros() >
n2->proto().total_cpu_exec_micros(); n2->proto().total_cpu_exec_micros();
} else if (opts.order_by == kOrderBy[5]) { } else if (opts.order_by == kOrderBy[8]) {
return n1->proto().total_parameters() > n2->proto().total_parameters(); return n1->proto().total_parameters() > n2->proto().total_parameters();
} else if (opts.order_by == kOrderBy[6]) { } else if (opts.order_by == kOrderBy[9]) {
return n1->proto().total_float_ops() > n2->proto().total_float_ops(); return n1->proto().total_float_ops() > n2->proto().total_float_ops();
} }
return name_cmp; return name_cmp;

View File

@ -65,7 +65,13 @@ bool TFMultiShow::ShouldShow(const ShowMultiNode* node, const Options& opts,
// want to see the middle code traces (i.e. their own codes.), instead // want to see the middle code traces (i.e. their own codes.), instead
// of the TensorFlow internal codes traces. // of the TensorFlow internal codes traces.
if (node->proto().total_requested_bytes() < opts.min_bytes || if (node->proto().total_requested_bytes() < opts.min_bytes ||
node->proto().total_peak_bytes() < opts.min_peak_bytes ||
node->proto().total_residual_bytes() < opts.min_residual_bytes ||
node->proto().total_output_bytes() < opts.min_output_bytes ||
node->proto().total_exec_micros() < opts.min_micros || node->proto().total_exec_micros() < opts.min_micros ||
node->proto().total_accelerator_exec_micros() <
opts.min_accelerator_micros ||
node->proto().total_cpu_exec_micros() < opts.min_cpu_micros ||
node->proto().total_parameters() < opts.min_params || node->proto().total_parameters() < opts.min_params ||
node->proto().total_float_ops() < opts.min_float_ops || node->proto().total_float_ops() < opts.min_float_ops ||
depth > opts.max_depth || !ShouldShowIfExtra(node, opts, depth)) { depth > opts.max_depth || !ShouldShowIfExtra(node, opts, depth)) {
@ -109,6 +115,15 @@ bool TFMultiShow::ReAccount(ShowMultiNode* node, const Options& opts) {
string TFMultiShow::FormatLegend(const Options& opts) const { string TFMultiShow::FormatLegend(const Options& opts) const {
std::vector<string> legends; std::vector<string> legends;
if (opts.select.find(kShown[0]) != opts.select.end()) { if (opts.select.find(kShown[0]) != opts.select.end()) {
legends.push_back("requested bytes");
}
if (opts.select.find(kShown[11]) != opts.select.end()) {
legends.push_back("peak bytes");
}
if (opts.select.find(kShown[12]) != opts.select.end()) {
legends.push_back("residual bytes");
}
if (opts.select.find(kShown[13]) != opts.select.end()) {
legends.push_back("output bytes"); legends.push_back("output bytes");
} }
if (opts.select.find(kShown[1]) != opts.select.end()) { if (opts.select.find(kShown[1]) != opts.select.end()) {

View File

@ -90,21 +90,30 @@ class TFMultiShow {
return n1->proto().total_requested_bytes() > return n1->proto().total_requested_bytes() >
n2->proto().total_requested_bytes(); n2->proto().total_requested_bytes();
} else if (opts.order_by == kOrderBy[2]) { } else if (opts.order_by == kOrderBy[2]) {
return n1->proto().total_peak_bytes() >
n2->proto().total_peak_bytes();
} else if (opts.order_by == kOrderBy[3]) {
return n1->proto().total_residual_bytes() >
n2->proto().total_residual_bytes();
} else if (opts.order_by == kOrderBy[4]) {
return n1->proto().total_output_bytes() >
n2->proto().total_output_bytes();
} else if (opts.order_by == kOrderBy[5]) {
return n1->proto().total_exec_micros() > return n1->proto().total_exec_micros() >
n2->proto().total_exec_micros(); n2->proto().total_exec_micros();
} else if (opts.order_by == kOrderBy[3]) { } else if (opts.order_by == kOrderBy[6]) {
return n1->proto().total_accelerator_exec_micros() > return n1->proto().total_accelerator_exec_micros() >
n2->proto().total_accelerator_exec_micros(); n2->proto().total_accelerator_exec_micros();
} else if (opts.order_by == kOrderBy[4]) { } else if (opts.order_by == kOrderBy[7]) {
return n1->proto().total_cpu_exec_micros() > return n1->proto().total_cpu_exec_micros() >
n2->proto().total_cpu_exec_micros(); n2->proto().total_cpu_exec_micros();
} else if (opts.order_by == kOrderBy[5]) { } else if (opts.order_by == kOrderBy[8]) {
return n1->proto().total_parameters() > return n1->proto().total_parameters() >
n2->proto().total_parameters(); n2->proto().total_parameters();
} else if (opts.order_by == kOrderBy[6]) { } else if (opts.order_by == kOrderBy[9]) {
return n1->proto().total_float_ops() > return n1->proto().total_float_ops() >
n2->proto().total_float_ops(); n2->proto().total_float_ops();
} else if (opts.order_by == kOrderBy[7]) { } else if (opts.order_by == kOrderBy[10]) {
return n1->node->graph_nodes().size() > return n1->node->graph_nodes().size() >
n2->node->graph_nodes().size(); n2->node->graph_nodes().size();
} }

View File

@ -22,12 +22,12 @@ limitations under the License.
#include "tensorflow/core/lib/io/path.h" #include "tensorflow/core/lib/io/path.h"
#include "tensorflow/core/platform/env.h" #include "tensorflow/core/platform/env.h"
#include "tensorflow/core/platform/test.h" #include "tensorflow/core/platform/test.h"
#include "tensorflow/core/protobuf/config.pb.h"
#include "tensorflow/core/profiler/internal/tfprof_constants.h" #include "tensorflow/core/profiler/internal/tfprof_constants.h"
#include "tensorflow/core/profiler/internal/tfprof_options.h" #include "tensorflow/core/profiler/internal/tfprof_options.h"
#include "tensorflow/core/profiler/internal/tfprof_utils.h" #include "tensorflow/core/profiler/internal/tfprof_utils.h"
#include "tensorflow/core/profiler/tfprof_log.pb.h" #include "tensorflow/core/profiler/tfprof_log.pb.h"
#include "tensorflow/core/profiler/tfprof_output.pb.h" #include "tensorflow/core/profiler/tfprof_output.pb.h"
#include "tensorflow/core/protobuf/config.pb.h"
namespace tensorflow { namespace tensorflow {
namespace tfprof { namespace tfprof {
@ -72,91 +72,80 @@ class TFProfShowTest : public ::testing::Test {
}; };
TEST_F(TFProfShowTest, DumpScopeMode) { TEST_F(TFProfShowTest, DumpScopeMode) {
string dump_file = io::JoinPath(testing::TmpDir(), "dump");
Options opts(5, 0, 0, 0, 0, 0, -1, "name",
{"VariableV2"}, // accout_type_regexes
{".*"}, {""}, {".*"}, {""}, false,
{"params", "bytes", "micros", "float_ops"}, "file",
{{"outfile", dump_file}});
tf_stats_->ShowGraphNode("scope", opts);
string dump_str;
TF_CHECK_OK(ReadFileToString(Env::Default(), dump_file, &dump_str));
EXPECT_EQ(
"node name | # parameters | # float_ops | output bytes | total execution "
"time | accelerator execution time | cpu execution time\n_TFProfRoot "
"(--/370 params, --/0 flops, --/1.48KB, --/5us, --/0us, --/5us)\n "
"conv2d (--/140 params, --/0 flops, --/560B, --/2us, --/0us, --/2us)\n "
" conv2d/bias (5, 5/5 params, 0/0 flops, 20B/20B, 1us/1us, 0us/0us, "
"1us/1us)\n conv2d/kernel (3x3x3x5, 135/135 params, 0/0 flops, "
"540B/540B, 1us/1us, 0us/0us, 1us/1us)\n conv2d_1 (--/230 params, --/0 "
"flops, --/920B, --/3us, --/0us, --/3us)\n conv2d_1/bias (5, 5/5 "
"params, 0/0 flops, 20B/20B, 1us/1us, 0us/0us, 1us/1us)\n "
"conv2d_1/kernel (3x3x5x5, 225/225 params, 0/0 flops, 900B/900B, "
"2us/2us, 0us/0us, 2us/2us)\n",
dump_str);
}
TEST_F(TFProfShowTest, DumpAcceleratorAndCPUMicros) {
string dump_file = io::JoinPath(testing::TmpDir(), "dump"); string dump_file = io::JoinPath(testing::TmpDir(), "dump");
Options opts( Options opts(
5, 0, 0, 0, 0, 0, -1, "cpu_micros", {".*"}, // accout_type_regexes 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, "name",
{".*"}, {""}, {".*"}, {""}, false, {"accelerator_micros", "cpu_micros"}, {"VariableV2"}, // accout_type_regexes
{".*"}, {""}, {".*"}, {""}, false,
{"params", "bytes", "peak_bytes", "residual_bytes", "output_bytes",
"micros", "accelerator_micros", "cpu_micros", "float_ops"},
"file", {{"outfile", dump_file}}); "file", {{"outfile", dump_file}});
tf_stats_->ShowGraphNode("scope", opts); tf_stats_->ShowGraphNode("scope", opts);
string dump_str;
TF_CHECK_OK(ReadFileToString(Env::Default(), dump_file, &dump_str));
EXPECT_EQ(
"node name | # parameters | # float_ops | requested bytes | peak bytes | "
"residual bytes | output bytes | total execution time | accelerator "
"execution time | cpu execution time\n_TFProfRoot (--/451 params, --/0 "
"flops, --/0B, --/0B, --/0B, --/2.56KB, --/13us, --/0us, --/13us)\n DW "
"(3x3x3x6, 162/162 params, 0/0 flops, 0B/0B, 0B/0B, 0B/0B, "
"1.28KB/1.28KB, 2us/2us, 0us/0us, 2us/2us)\n DW2 (2x2x6x12, 288/288 "
"params, 0/0 flops, 0B/0B, 0B/0B, 0B/0B, 1.28KB/1.28KB, 11us/11us, "
"0us/0us, 11us/11us)\n ScalarW (1, 1/1 params, 0/0 flops, 0B/0B, 0B/0B, "
"0B/0B, 0B/0B, 0us/0us, 0us/0us, 0us/0us)\n",
dump_str);
}
TEST_F(TFProfShowTest, DumpAcceleratorAndCPUMicros) {
string dump_file = io::JoinPath(testing::TmpDir(), "dump");
Options opts(5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, "cpu_micros",
{".*"}, // accout_type_regexes
{".*"}, {""}, {".*"}, {""}, false,
{"accelerator_micros", "cpu_micros"}, "file",
{{"outfile", dump_file}});
tf_stats_->ShowGraphNode("scope", opts);
string dump_str; string dump_str;
TF_CHECK_OK(ReadFileToString(Env::Default(), dump_file, &dump_str)); TF_CHECK_OK(ReadFileToString(Env::Default(), dump_file, &dump_str));
EXPECT_EQ( EXPECT_EQ(
"node name | accelerator execution time | cpu execution " "node name | accelerator execution time | cpu execution "
"time\n_TFProfRoot (--/0us, --/97us)\n conv2d (0us/0us, 0us/76us)\n " "time\n_TFProfRoot (--/404us, --/4.50ms)\n Conv2D (226us/226us, "
"conv2d/convolution (0us/0us, 60us/60us)\n conv2d/convolution/Shape " "4.07ms/4.07ms)\n Conv2D_1 (178us/178us, 419us/419us)\n DW2 (0us/0us, "
"(0us/0us, 0us/0us)\n conv2d/convolution/dilation_rate (0us/0us, " "11us/11us)\n DW2/Assign (0us/0us, 0us/0us)\n DW2/Initializer "
"0us/0us)\n conv2d/BiasAdd (0us/0us, 12us/12us)\n conv2d/bias " "(0us/0us, 0us/0us)\n DW2/Initializer/random_normal (0us/0us, "
"(0us/0us, 1us/2us)\n conv2d/bias/read (0us/0us, 1us/1us)\n " "0us/0us)\n DW2/Initializer/random_normal/RandomStandardNormal "
"conv2d/bias/Assign (0us/0us, 0us/0us)\n conv2d/bias/Initializer " "(0us/0us, 0us/0us)\n DW2/Initializer/random_normal/mean "
"(0us/0us, 0us/0us)\n conv2d/bias/Initializer/Const (0us/0us, " "(0us/0us, 0us/0us)\n DW2/Initializer/random_normal/mul (0us/0us, "
"0us/0us)\n conv2d/kernel (0us/0us, 1us/2us)\n " "0us/0us)\n DW2/Initializer/random_normal/shape (0us/0us, "
"conv2d/kernel/read (0us/0us, 1us/1us)\n conv2d/kernel/Assign " "0us/0us)\n DW2/Initializer/random_normal/stddev (0us/0us, "
"(0us/0us, 0us/0us)\n conv2d/kernel/Initializer (0us/0us, " "0us/0us)\n DW2/read (0us/0us, 0us/0us)\n DW (0us/0us, 2us/2us)\n "
"0us/0us)\n conv2d/kernel/Initializer/random_uniform (0us/0us, " "DW/Assign (0us/0us, 0us/0us)\n DW/Initializer (0us/0us, 0us/0us)\n "
"0us/0us)\n conv2d_2 (0us/0us, 0us/15us)\n conv2d_2/convolution " " DW/Initializer/random_normal (0us/0us, 0us/0us)\n "
"(0us/0us, 13us/13us)\n conv2d_2/convolution/Shape (0us/0us, " "DW/Initializer/random_normal/RandomStandardNormal (0us/0us, 0us/0us)\n "
"0us/0us)\n conv2d_2/convolution/dilation_rate (0us/0us, 0us/0us)\n " " DW/Initializer/random_normal/mean (0us/0us, 0us/0us)\n "
" conv2d_2/BiasAdd (0us/0us, 2us/2us)\n conv2d_1 (0us/0us, 0us/5us)\n " "DW/Initializer/random_normal/mul (0us/0us, 0us/0us)\n "
" conv2d_1/kernel (0us/0us, 2us/3us)\n conv2d_1/kernel/read " "DW/Initializer/random_normal/shape (0us/0us, 0us/0us)\n "
"(0us/0us, 1us/1us)\n conv2d_1/kernel/Assign (0us/0us, 0us/0us)\n " "DW/Initializer/random_normal/stddev (0us/0us, 0us/0us)\n DW/read "
" conv2d_1/kernel/Initializer (0us/0us, 0us/0us)\n " "(0us/0us, 0us/0us)\n zeros (0us/0us, 2us/2us)\n ScalarW (0us/0us, "
"conv2d_1/kernel/Initializer/random_uniform (0us/0us, 0us/0us)\n " "0us/0us)\n ScalarW/Assign (0us/0us, 0us/0us)\n "
"conv2d_1/bias (0us/0us, 1us/2us)\n conv2d_1/bias/read (0us/0us, " "ScalarW/Initializer (0us/0us, 0us/0us)\n "
"1us/1us)\n conv2d_1/bias/Assign (0us/0us, 0us/0us)\n " "ScalarW/Initializer/random_normal (0us/0us, 0us/0us)\n "
"conv2d_1/bias/Initializer (0us/0us, 0us/0us)\n " "ScalarW/Initializer/random_normal/RandomStandardNormal (0us/0us, "
"conv2d_1/bias/Initializer/Const (0us/0us, 0us/0us)\n zeros (0us/0us, " "0us/0us)\n ScalarW/Initializer/random_normal/mean (0us/0us, "
"1us/1us)\n init (0us/0us, 0us/0us)\n save (0us/0us, 0us/0us)\n " "0us/0us)\n ScalarW/Initializer/random_normal/mul (0us/0us, "
"save/Assign (0us/0us, 0us/0us)\n save/Assign_1 (0us/0us, 0us/0us)\n " "0us/0us)\n ScalarW/Initializer/random_normal/shape (0us/0us, "
" save/Assign_2 (0us/0us, 0us/0us)\n save/Assign_3 (0us/0us, " "0us/0us)\n ScalarW/Initializer/random_normal/stddev (0us/0us, "
"0us/0us)\n save/Const (0us/0us, 0us/0us)\n save/RestoreV2 " "0us/0us)\n ScalarW/read (0us/0us, 0us/0us)\n init (0us/0us, "
"(0us/0us, 0us/0us)\n save/RestoreV2/shape_and_slices (0us/0us, " "0us/0us)\n",
"0us/0us)\n save/RestoreV2/tensor_names (0us/0us, 0us/0us)\n "
"save/RestoreV2_1 (0us/0us, 0us/0us)\n "
"save/RestoreV2_1/shape_and_slices (0us/0us, 0us/0us)\n "
"save/RestoreV2_1/tensor_names (0us/0us, 0us/0us)\n save/RestoreV2_2 "
"(0us/0us, 0us/0us)\n save/RestoreV2_2/shape_and_slices (0us/0us, "
"0us/0us)\n save/RestoreV2_2/tensor_names (0us/0us, 0us/0us)\n "
"save/RestoreV2_3 (0us/0us, 0us/0us)\n "
"save/RestoreV2_3/shape_and_slices (0us/0us, 0us/0us)\n "
"save/RestoreV2_3/tensor_names (0us/0us, 0us/0us)\n save/SaveV2 "
"(0us/0us, 0us/0us)\n save/SaveV2/shape_and_slices (0us/0us, "
"0us/0us)\n save/SaveV2/tensor_names (0us/0us, 0us/0us)\n "
"save/control_dependency (0us/0us, 0us/0us)\n save/restore_all "
"(0us/0us, 0us/0us)\n",
dump_str); dump_str);
} }
TEST_F(TFProfShowTest, DumpOpMode) { TEST_F(TFProfShowTest, DumpOpMode) {
string dump_file = io::JoinPath(testing::TmpDir(), "dump"); string dump_file = io::JoinPath(testing::TmpDir(), "dump");
Options opts( Options opts(
5, 0, 0, 0, 0, 4, -1, "params", {".*"}, // accout_type_regexes 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, "params",
{".*"}, // accout_type_regexes
{".*"}, {""}, {".*"}, {""}, false, {".*"}, {""}, {".*"}, {""}, false,
{"params", "bytes", "micros", "float_ops", "occurrence", "input_shapes"}, {"params", "bytes", "micros", "float_ops", "occurrence", "input_shapes"},
"file", {{"outfile", dump_file}}); "file", {{"outfile", dump_file}});
@ -165,17 +154,32 @@ TEST_F(TFProfShowTest, DumpOpMode) {
string dump_str; string dump_str;
TF_CHECK_OK(ReadFileToString(Env::Default(), dump_file, &dump_str)); TF_CHECK_OK(ReadFileToString(Env::Default(), dump_file, &dump_str));
EXPECT_EQ( EXPECT_EQ(
"nodename|outputbytes|totalexecutiontime|acceleratorexecutiontime|" "nodename|requestedbytes|totalexecutiontime|acceleratorexecutiontime|"
"cpuexecutiontime|#parameters|#float_ops|opoccurrence(run|defined)|" "cpuexecutiontime|#parameters|#float_ops|opoccurrence(run|defined)|"
"inputshapes\nVariableV21.48KB(100.00%,17.10%),5us(100.00%,5.15%),0us(0." "inputshapes\nVariableV20B(0.00%,0.00%),13us(100.00%,0.27%),0us(100.00%,"
"00%,0.00%),5us(100.00%,5.15%),370params(100.00%,100.00%),0float_ops(100." "0.00%),13us(100.00%,0.29%),451params(100.00%,100.00%),0float_ops(100.00%"
"00%,0.00%),4|4\n\ninput_type:\t(run*4|defined*4)\texec_time:" ",0.00%),2|3\n\ninput_type:\t(run*2|defined*3)\texec_time:13us\n\nAdd0B("
"5us\n\nAssign0B(0.00%,0.00%),0us(94.85%,0.00%),0us(0.00%,0.00%),0us(94." "0.00%,0.00%),0us(99.73%,0.00%),0us(100.00%,0.00%),0us(99.71%,0.00%),"
"85%,0.00%),0params(0.00%,0.00%),0float_ops(100.00%,0.00%),0|8\n\ninput_" "0params(0.00%,0.00%),0float_ops(100.00%,0.00%),0|3\n\ninput_type:0:1,"
"type:0:unknown,\t1:unknown\t(run*0|defined*8)\texec_time:0us\n\nConst1." "\t1:1\t(run*0|defined*1)\texec_time:0us\ninput_type:0:2x2x6x12,\t1:1\t("
"54KB(58.87%,17.74%),1us(80.41%,1.03%),0us(0.00%,0.00%),1us(80.41%,1.03%)" "run*0|defined*1)\texec_time:0us\ninput_type:0:3x3x3x6,\t1:1\t(run*0|"
",0params(0.00%,0.00%),0float_ops(98.49%,0.00%),1|24\n\ninput_type:\t(" "defined*1)\texec_time:0us\n\nAssign0B(0.00%,0.00%),0us(99.73%,0.00%),"
"run*1|defined*24)\texec_time:1us\n\n", "0us(100.00%,0.00%),0us(99.71%,0.00%),0params(0.00%,0.00%),0float_ops("
"100.00%,0.00%),0|3\n\ninput_type:0:1,\t1:1\t(run*0|defined*1)\texec_"
"time:0us\ninput_type:0:2x2x6x12,\t1:2x2x6x12\t(run*0|defined*1)\texec_"
"time:0us\ninput_type:0:3x3x3x6,\t1:3x3x3x6\t(run*0|defined*1)\texec_"
"time:0us\n\nConst0B(0.00%,0.00%),2us(99.73%,0.04%),0us(100.00%,0.00%),"
"2us(99.71%,0.04%),0params(0.00%,0.00%),0float_ops(100.00%,0.00%),1|"
"10\n\ninput_type:\t(run*1|defined*10)\texec_time:2us\n\nConv2D14.59KB("
"100.00%,100.00%),4.89ms(99.69%,99.69%),404us(100.00%,100.00%),4.49ms(99."
"67%,99.67%),0params(0.00%,0.00%),10.44kfloat_ops(100.00%,100.00%),2|"
"2\n\ninput_type:0:2x3x3x6,\t1:2x2x6x12\t(run*1|defined*1)\texec_time:"
"597us\ninput_type:0:2x6x6x3,\t1:3x3x3x6\t(run*1|defined*1)\texec_time:4."
"29ms\n\nIdentity0B(0.00%,0.00%),0us(0.00%,0.00%),0us(0.00%,0.00%),0us(0."
"00%,0.00%),0params(0.00%,0.00%),0float_ops(0.00%,0.00%),0|3\n\ninput_"
"type:0:1\t(run*0|defined*1)\texec_time:0us\ninput_type:0:2x2x6x12\t(run*"
"0|defined*1)\texec_time:0us\ninput_type:0:3x3x3x6\t(run*0|defined*1)"
"\texec_time:0us\n\n",
StringReplace(dump_str, " ", "")); StringReplace(dump_str, " ", ""));
} }
} // namespace tfprof } // namespace tfprof

View File

@ -23,12 +23,12 @@ limitations under the License.
#include "tensorflow/core/platform/env.h" #include "tensorflow/core/platform/env.h"
#include "tensorflow/core/platform/protobuf.h" #include "tensorflow/core/platform/protobuf.h"
#include "tensorflow/core/platform/test.h" #include "tensorflow/core/platform/test.h"
#include "tensorflow/core/protobuf/config.pb.h"
#include "tensorflow/core/profiler/internal/tfprof_constants.h" #include "tensorflow/core/profiler/internal/tfprof_constants.h"
#include "tensorflow/core/profiler/internal/tfprof_options.h" #include "tensorflow/core/profiler/internal/tfprof_options.h"
#include "tensorflow/core/profiler/internal/tfprof_utils.h" #include "tensorflow/core/profiler/internal/tfprof_utils.h"
#include "tensorflow/core/profiler/tfprof_log.pb.h" #include "tensorflow/core/profiler/tfprof_log.pb.h"
#include "tensorflow/core/profiler/tfprof_output.pb.h" #include "tensorflow/core/profiler/tfprof_output.pb.h"
#include "tensorflow/core/protobuf/config.pb.h"
namespace tensorflow { namespace tensorflow {
namespace tfprof { namespace tfprof {
@ -73,7 +73,7 @@ class TFProfStatsTest : public ::testing::Test {
}; };
TEST_F(TFProfStatsTest, CustomOpType) { TEST_F(TFProfStatsTest, CustomOpType) {
Options opts(3, 0, 0, 0, 0, 0, -1, "name", Options opts(3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, "name",
{kTrainableVarType}, // accout_type_regexes {kTrainableVarType}, // accout_type_regexes
{".*"}, {""}, {".*"}, {""}, false, {".*"}, {""}, {".*"}, {""}, false,
{"params", "bytes", "micros", "float_ops"}, "", {}); {"params", "bytes", "micros", "float_ops"}, "", {});
@ -81,62 +81,27 @@ TEST_F(TFProfStatsTest, CustomOpType) {
GraphNodeProto expected; GraphNodeProto expected;
CHECK(protobuf::TextFormat::ParseFromString( CHECK(protobuf::TextFormat::ParseFromString(
"name: \"_TFProfRoot\"\nexec_micros: 0\nrequested_bytes: " "name: \"_TFProfRoot\"\ntotal_exec_micros: 13\ntotal_parameters: "
"0\ntotal_exec_micros: 5\ntotal_requested_bytes: 1480\ntotal_parameters: " "451\nchildren {\n name: \"DW\"\n exec_micros: 2\n parameters: 162\n "
"370\nchildren {\n name: \"conv2d\"\n exec_micros: 0\n " "total_exec_micros: 2\n total_parameters: 162\n devices: "
"requested_bytes: 0\n total_exec_micros: 2\n total_requested_bytes: " "\"/job:localhost/replica:0/task:0/gpu:0\"\n cpu_exec_micros: 2\n "
"560\n total_parameters: 140\n children {\n name: \"conv2d/bias\"\n " "total_cpu_exec_micros: 2\n run_count: 1\n total_run_count: 1\n "
" exec_micros: 1\n requested_bytes: 20\n parameters: 5\n " "total_definition_count: 1\n output_bytes: 1280\n total_output_bytes: "
"total_exec_micros: 1\n total_requested_bytes: 20\n " "1280\n}\nchildren {\n name: \"DW2\"\n exec_micros: 11\n parameters: "
"total_parameters: 5\n devices: " "288\n total_exec_micros: 11\n total_parameters: 288\n devices: "
"\"/job:localhost/replica:0/task:0/cpu:0\"\n float_ops: 0\n " "\"/job:localhost/replica:0/task:0/gpu:0\"\n cpu_exec_micros: 11\n "
"total_float_ops: 0\n accelerator_exec_micros: 0\n " "total_cpu_exec_micros: 11\n run_count: 1\n total_run_count: 1\n "
"cpu_exec_micros: 1\n total_accelerator_exec_micros: 0\n " "total_definition_count: 1\n output_bytes: 1280\n total_output_bytes: "
"total_cpu_exec_micros: 1\n run_count: 1\n total_run_count: 1\n " "1280\n}\nchildren {\n name: \"ScalarW\"\n parameters: 1\n "
"total_definition_count: 1\n }\n children {\n name: " "total_parameters: 1\n total_definition_count: "
"\"conv2d/kernel\"\n exec_micros: 1\n requested_bytes: 540\n " "1\n}\ntotal_cpu_exec_micros: 13\ntotal_run_count: "
"parameters: 135\n total_exec_micros: 1\n total_requested_bytes: " "2\ntotal_definition_count: 3\ntotal_output_bytes: 2560\n",
"540\n total_parameters: 135\n devices: "
"\"/job:localhost/replica:0/task:0/cpu:0\"\n float_ops: 0\n "
"total_float_ops: 0\n accelerator_exec_micros: 0\n "
"cpu_exec_micros: 1\n total_accelerator_exec_micros: 0\n "
"total_cpu_exec_micros: 1\n run_count: 1\n total_run_count: 1\n "
"total_definition_count: 1\n }\n float_ops: 0\n total_float_ops: 0\n "
"accelerator_exec_micros: 0\n cpu_exec_micros: 0\n "
"total_accelerator_exec_micros: 0\n total_cpu_exec_micros: 2\n "
"run_count: 0\n total_run_count: 2\n total_definition_count: "
"3\n}\nchildren {\n name: \"conv2d_1\"\n exec_micros: 0\n "
"requested_bytes: 0\n total_exec_micros: 3\n total_requested_bytes: "
"920\n total_parameters: 230\n children {\n name: "
"\"conv2d_1/bias\"\n exec_micros: 1\n requested_bytes: 20\n "
"parameters: 5\n total_exec_micros: 1\n total_requested_bytes: "
"20\n total_parameters: 5\n devices: "
"\"/job:localhost/replica:0/task:0/cpu:0\"\n float_ops: 0\n "
"total_float_ops: 0\n accelerator_exec_micros: 0\n "
"cpu_exec_micros: 1\n total_accelerator_exec_micros: 0\n "
"total_cpu_exec_micros: 1\n run_count: 1\n total_run_count: 1\n "
"total_definition_count: 1\n }\n children {\n name: "
"\"conv2d_1/kernel\"\n exec_micros: 2\n requested_bytes: 900\n "
"parameters: 225\n total_exec_micros: 2\n total_requested_bytes: "
"900\n total_parameters: 225\n devices: "
"\"/job:localhost/replica:0/task:0/cpu:0\"\n float_ops: 0\n "
"total_float_ops: 0\n accelerator_exec_micros: 0\n "
"cpu_exec_micros: 2\n total_accelerator_exec_micros: 0\n "
"total_cpu_exec_micros: 2\n run_count: 1\n total_run_count: 1\n "
"total_definition_count: 1\n }\n float_ops: 0\n total_float_ops: 0\n "
"accelerator_exec_micros: 0\n cpu_exec_micros: 0\n "
"total_accelerator_exec_micros: 0\n total_cpu_exec_micros: 3\n "
"run_count: 0\n total_run_count: 2\n total_definition_count: "
"3\n}\nfloat_ops: 0\ntotal_float_ops: 0\naccelerator_exec_micros: "
"0\ncpu_exec_micros: 0\ntotal_accelerator_exec_micros: "
"0\ntotal_cpu_exec_micros: 5\nrun_count: 0\ntotal_run_count: "
"4\ntotal_definition_count: 6\n",
&expected)); &expected));
EXPECT_EQ(expected.DebugString(), root.DebugString()); EXPECT_EQ(expected.DebugString(), root.DebugString());
} }
TEST_F(TFProfStatsTest, CheckPointOpType) { TEST_F(TFProfStatsTest, CheckPointOpType) {
Options opts(3, 0, 0, 0, 0, 0, -1, "name", Options opts(3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, "name",
{kCkptVarType}, // accout_type_regexes {kCkptVarType}, // accout_type_regexes
{".*"}, {""}, {".*"}, {""}, false, {".*"}, {""}, {".*"}, {""}, false,
{"params", "bytes", "micros", "float_ops"}, "", {}); {"params", "bytes", "micros", "float_ops"}, "", {});
@ -144,169 +109,235 @@ TEST_F(TFProfStatsTest, CheckPointOpType) {
GraphNodeProto expected; GraphNodeProto expected;
CHECK(protobuf::TextFormat::ParseFromString( CHECK(protobuf::TextFormat::ParseFromString(
"name: \"_TFProfRoot\"\nexec_micros: 0\nrequested_bytes: " "name: \"_TFProfRoot\"\ntotal_exec_micros: 13\ntotal_parameters: "
"0\ntotal_exec_micros: 5\ntotal_requested_bytes: 1480\ntotal_parameters: " "451\nchildren {\n name: \"DW\"\n exec_micros: 2\n parameters: 162\n "
"370\nchildren {\n name: \"conv2d\"\n exec_micros: 0\n " "total_exec_micros: 2\n total_parameters: 162\n devices: "
"requested_bytes: 0\n total_exec_micros: 2\n total_requested_bytes: " "\"/job:localhost/replica:0/task:0/gpu:0\"\n cpu_exec_micros: 2\n "
"560\n total_parameters: 140\n children {\n name: \"conv2d/bias\"\n " "total_cpu_exec_micros: 2\n run_count: 1\n total_run_count: 1\n "
" exec_micros: 1\n requested_bytes: 20\n parameters: 5\n " "total_definition_count: 1\n output_bytes: 1280\n total_output_bytes: "
"total_exec_micros: 1\n total_requested_bytes: 20\n " "1280\n}\nchildren {\n name: \"DW2\"\n exec_micros: 11\n parameters: "
"total_parameters: 5\n devices: " "288\n total_exec_micros: 11\n total_parameters: 288\n devices: "
"\"/job:localhost/replica:0/task:0/cpu:0\"\n float_ops: 0\n " "\"/job:localhost/replica:0/task:0/gpu:0\"\n cpu_exec_micros: 11\n "
"total_float_ops: 0\n accelerator_exec_micros: 0\n " "total_cpu_exec_micros: 11\n run_count: 1\n total_run_count: 1\n "
"cpu_exec_micros: 1\n total_accelerator_exec_micros: 0\n " "total_definition_count: 1\n output_bytes: 1280\n total_output_bytes: "
"total_cpu_exec_micros: 1\n run_count: 1\n total_run_count: 1\n " "1280\n}\nchildren {\n name: \"ScalarW\"\n parameters: 1\n "
"total_definition_count: 1\n }\n children {\n name: " "total_parameters: 1\n total_definition_count: "
"\"conv2d/kernel\"\n exec_micros: 1\n requested_bytes: 540\n " "1\n}\ntotal_cpu_exec_micros: 13\ntotal_run_count: "
"parameters: 135\n total_exec_micros: 1\n total_requested_bytes: " "2\ntotal_definition_count: 3\ntotal_output_bytes: 2560\n",
"540\n total_parameters: 135\n devices: "
"\"/job:localhost/replica:0/task:0/cpu:0\"\n float_ops: 0\n "
"total_float_ops: 0\n accelerator_exec_micros: 0\n "
"cpu_exec_micros: 1\n total_accelerator_exec_micros: 0\n "
"total_cpu_exec_micros: 1\n run_count: 1\n total_run_count: 1\n "
"total_definition_count: 1\n }\n float_ops: 0\n total_float_ops: 0\n "
"accelerator_exec_micros: 0\n cpu_exec_micros: 0\n "
"total_accelerator_exec_micros: 0\n total_cpu_exec_micros: 2\n "
"run_count: 0\n total_run_count: 2\n total_definition_count: "
"3\n}\nchildren {\n name: \"conv2d_1\"\n exec_micros: 0\n "
"requested_bytes: 0\n total_exec_micros: 3\n total_requested_bytes: "
"920\n total_parameters: 230\n children {\n name: "
"\"conv2d_1/bias\"\n exec_micros: 1\n requested_bytes: 20\n "
"parameters: 5\n total_exec_micros: 1\n total_requested_bytes: "
"20\n total_parameters: 5\n devices: "
"\"/job:localhost/replica:0/task:0/cpu:0\"\n float_ops: 0\n "
"total_float_ops: 0\n accelerator_exec_micros: 0\n "
"cpu_exec_micros: 1\n total_accelerator_exec_micros: 0\n "
"total_cpu_exec_micros: 1\n run_count: 1\n total_run_count: 1\n "
"total_definition_count: 1\n }\n children {\n name: "
"\"conv2d_1/kernel\"\n exec_micros: 2\n requested_bytes: 900\n "
"parameters: 225\n total_exec_micros: 2\n total_requested_bytes: "
"900\n total_parameters: 225\n devices: "
"\"/job:localhost/replica:0/task:0/cpu:0\"\n float_ops: 0\n "
"total_float_ops: 0\n accelerator_exec_micros: 0\n "
"cpu_exec_micros: 2\n total_accelerator_exec_micros: 0\n "
"total_cpu_exec_micros: 2\n run_count: 1\n total_run_count: 1\n "
"total_definition_count: 1\n }\n float_ops: 0\n total_float_ops: 0\n "
"accelerator_exec_micros: 0\n cpu_exec_micros: 0\n "
"total_accelerator_exec_micros: 0\n total_cpu_exec_micros: 3\n "
"run_count: 0\n total_run_count: 2\n total_definition_count: "
"3\n}\nfloat_ops: 0\ntotal_float_ops: 0\naccelerator_exec_micros: "
"0\ncpu_exec_micros: 0\ntotal_accelerator_exec_micros: "
"0\ntotal_cpu_exec_micros: 5\nrun_count: 0\ntotal_run_count: "
"4\ntotal_definition_count: 6\n",
&expected)); &expected));
EXPECT_EQ(expected.DebugString(), root.DebugString()); EXPECT_EQ(expected.DebugString(), root.DebugString());
} }
TEST_F(TFProfStatsTest, TestGraph) { TEST_F(TFProfStatsTest, TestGraph) {
Options opts(100, 0, 10000, 0, 0, 0, -1, "name", {".*"}, Options opts(100, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, "name", {".*"},
{"cost.*"}, // start_name_regexes {"DW/Initializer/random_normal/mul"}, // start_name_regexes
{""}, {".*"}, {""}, false, {""}, {".*"}, {""}, false,
{"params", "bytes", "micros", "float_ops"}, "", {}); {"params", "bytes", "micros", "float_ops"}, "", {});
const GraphNodeProto& root = tf_stats_->ShowGraphNode("graph", opts); const GraphNodeProto& root = tf_stats_->ShowGraphNode("graph", opts);
GraphNodeProto expected; GraphNodeProto expected;
CHECK(protobuf::TextFormat::ParseFromString( CHECK(protobuf::TextFormat::ParseFromString(
"name: \"_TFProfRoot\"\nexec_micros: 0\nrequested_bytes: " "name: \"_TFProfRoot\"\ntotal_exec_micros: 4904\ntotal_requested_bytes: "
"0\ntotal_exec_micros: 97\ntotal_requested_bytes: " "14592\ntotal_parameters: 451\nchildren {\n name: "
"8656\ntotal_parameters: 370\nfloat_ops: 0\ntotal_float_ops: " "\"DW/Initializer/random_normal/mul\"\n children {\n name: "
"34360\naccelerator_exec_micros: 0\ncpu_exec_micros: " "\"DW/Initializer/random_normal/RandomStandardNormal\"\n children {\n "
"0\ntotal_accelerator_exec_micros: 0\ntotal_cpu_exec_micros: " " name: \"DW/Initializer/random_normal/shape\"\n "
"97\nrun_count: 0\ntotal_run_count: 13\ntotal_definition_count: 60\n", "total_definition_count: 1\n }\n input_shapes {\n key: 0\n "
" value {\n dim {\n size: 4\n }\n }\n "
"}\n total_definition_count: 2\n }\n children {\n name: "
"\"DW/Initializer/random_normal/stddev\"\n total_definition_count: "
"1\n }\n input_shapes {\n key: 0\n value {\n dim {\n "
"size: 3\n }\n dim {\n size: 3\n }\n dim {\n "
" size: 3\n }\n dim {\n size: 6\n }\n }\n "
"}\n input_shapes {\n key: 1\n value {\n dim {\n "
"size: 1\n }\n }\n }\n total_definition_count: "
"4\n}\ntotal_float_ops: 10440\ntotal_accelerator_exec_micros: "
"404\ntotal_cpu_exec_micros: 4500\ntotal_run_count: "
"5\ntotal_definition_count: 31\ntotal_peak_bytes: "
"9984\ntotal_residual_bytes: 1280\ntotal_output_bytes: 4864\n",
&expected)); &expected));
EXPECT_EQ(expected.DebugString(), root.DebugString()); EXPECT_EQ(expected.DebugString(), root.DebugString());
} }
TEST_F(TFProfStatsTest, TestFloatOps) { TEST_F(TFProfStatsTest, TestFloatOps) {
Options opts(10, 0, 0, 0, 1, 0, -1, "name", {".*"}, {".*"}, {""}, {".*"}, Options opts(10, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, -1, "name", {".*"}, {".*"},
{""}, false, {"float_ops"}, "", {}); {""}, {".*"}, {""}, false, {"float_ops"}, "", {});
const GraphNodeProto& root = tf_stats_->ShowGraphNode("scope", opts); const GraphNodeProto& root = tf_stats_->ShowGraphNode("scope", opts);
GraphNodeProto expected; GraphNodeProto expected;
CHECK(protobuf::TextFormat::ParseFromString( CHECK(protobuf::TextFormat::ParseFromString(
"name: \"_TFProfRoot\"\nexec_micros: 0\nrequested_bytes: " "name: \"_TFProfRoot\"\ntotal_exec_micros: 4904\ntotal_requested_bytes: "
"0\ntotal_exec_micros: 97\ntotal_requested_bytes: " "14592\ntotal_parameters: 451\nchildren {\n name: \"Conv2D\"\n "
"8656\ntotal_parameters: 370\nchildren {\n name: \"conv2d/BiasAdd\"\n " "exec_micros: 4292\n requested_bytes: 9472\n total_exec_micros: 4292\n "
"exec_micros: 12\n requested_bytes: 1440\n total_exec_micros: 12\n " " total_requested_bytes: 9472\n devices: "
"total_requested_bytes: 1440\n total_parameters: 0\n devices: " "\"/job:localhost/replica:0/task:0/gpu:0\"\n float_ops: 5832\n "
"\"/job:localhost/replica:0/task:0/cpu:0\"\n float_ops: 360\n " "total_float_ops: 5832\n input_shapes {\n key: 0\n value {\n "
"total_float_ops: 360\n input_shapes {\n key: 0\n value {\n " "dim {\n size: 2\n }\n dim {\n size: 6\n "
"unknown_rank: true\n }\n }\n input_shapes {\n key: 1\n value " "}\n dim {\n size: 6\n }\n dim {\n size: "
"{\n unknown_rank: true\n }\n }\n accelerator_exec_micros: 0\n " "3\n }\n }\n }\n input_shapes {\n key: 1\n value {\n "
" cpu_exec_micros: 12\n total_accelerator_exec_micros: 0\n " " dim {\n size: 3\n }\n dim {\n size: 3\n "
"total_cpu_exec_micros: 12\n run_count: 1\n total_run_count: 1\n " "}\n dim {\n size: 3\n }\n dim {\n size: "
"total_definition_count: 1\n}\nchildren {\n name: " "6\n }\n }\n }\n accelerator_exec_micros: 226\n "
"\"conv2d/convolution\"\n exec_micros: 60\n requested_bytes: 1440\n " "cpu_exec_micros: 4066\n total_accelerator_exec_micros: 226\n "
"total_exec_micros: 60\n total_requested_bytes: 1440\n " "total_cpu_exec_micros: 4066\n run_count: 1\n total_run_count: 1\n "
"total_parameters: 0\n devices: " "total_definition_count: 1\n peak_bytes: 5888\n residual_bytes: 768\n "
"\"/job:localhost/replica:0/task:0/cpu:0\"\n float_ops: 19440\n " "output_bytes: 768\n total_peak_bytes: 5888\n total_residual_bytes: "
"total_float_ops: 19440\n input_shapes {\n key: 0\n value {\n " "768\n total_output_bytes: 768\n}\nchildren {\n name: \"Conv2D_1\"\n "
" unknown_rank: true\n }\n }\n input_shapes {\n key: 1\n " "exec_micros: 597\n requested_bytes: 5120\n total_exec_micros: 597\n "
"value {\n unknown_rank: true\n }\n }\n " "total_requested_bytes: 5120\n devices: "
"accelerator_exec_micros: 0\n cpu_exec_micros: 60\n " "\"/job:localhost/replica:0/task:0/gpu:0\"\n float_ops: 4608\n "
"total_accelerator_exec_micros: 0\n total_cpu_exec_micros: 60\n " "total_float_ops: 4608\n input_shapes {\n key: 0\n value {\n "
"run_count: 1\n total_run_count: 1\n total_definition_count: " "dim {\n size: 2\n }\n dim {\n size: 3\n "
"3\n}\nchildren {\n name: \"conv2d_2/BiasAdd\"\n exec_micros: 2\n " "}\n dim {\n size: 3\n }\n dim {\n size: "
"requested_bytes: 640\n total_exec_micros: 2\n total_requested_bytes: " "6\n }\n }\n }\n input_shapes {\n key: 1\n value {\n "
"640\n total_parameters: 0\n devices: " " dim {\n size: 2\n }\n dim {\n size: 2\n "
"\"/job:localhost/replica:0/task:0/cpu:0\"\n float_ops: 160\n " "}\n dim {\n size: 6\n }\n dim {\n size: "
"total_float_ops: 160\n input_shapes {\n key: 0\n value {\n " "12\n }\n }\n }\n accelerator_exec_micros: 178\n "
"unknown_rank: true\n }\n }\n input_shapes {\n key: 1\n value " "cpu_exec_micros: 419\n total_accelerator_exec_micros: 178\n "
"{\n unknown_rank: true\n }\n }\n accelerator_exec_micros: 0\n " "total_cpu_exec_micros: 419\n run_count: 1\n total_run_count: 1\n "
" cpu_exec_micros: 2\n total_accelerator_exec_micros: 0\n " "total_definition_count: 1\n peak_bytes: 4096\n residual_bytes: 512\n "
"total_cpu_exec_micros: 2\n run_count: 1\n total_run_count: 1\n " "output_bytes: 512\n total_peak_bytes: 4096\n total_residual_bytes: "
"total_definition_count: 1\n}\nchildren {\n name: " "512\n total_output_bytes: 512\n}\ntotal_float_ops: "
"\"conv2d_2/convolution\"\n exec_micros: 13\n requested_bytes: 640\n " "10440\ntotal_accelerator_exec_micros: 404\ntotal_cpu_exec_micros: "
"total_exec_micros: 13\n total_requested_bytes: 640\n " "4500\ntotal_run_count: 5\ntotal_definition_count: 34\ntotal_peak_bytes: "
"total_parameters: 0\n devices: " "9984\ntotal_residual_bytes: 1280\ntotal_output_bytes: 4864\n",
"\"/job:localhost/replica:0/task:0/cpu:0\"\n float_ops: 14400\n "
"total_float_ops: 14400\n input_shapes {\n key: 0\n value {\n "
" unknown_rank: true\n }\n }\n input_shapes {\n key: 1\n "
"value {\n unknown_rank: true\n }\n }\n "
"accelerator_exec_micros: 0\n cpu_exec_micros: 13\n "
"total_accelerator_exec_micros: 0\n total_cpu_exec_micros: 13\n "
"run_count: 1\n total_run_count: 1\n total_definition_count: "
"3\n}\nfloat_ops: 0\ntotal_float_ops: 34360\naccelerator_exec_micros: "
"0\ncpu_exec_micros: 0\ntotal_accelerator_exec_micros: "
"0\ntotal_cpu_exec_micros: 97\nrun_count: 0\ntotal_run_count: "
"13\ntotal_definition_count: 68\n",
&expected)); &expected));
EXPECT_EQ(expected.DebugString(), root.DebugString()); EXPECT_EQ(expected.DebugString(), root.DebugString());
} }
TEST_F(TFProfStatsTest, TestAccountShownNameOnly) { TEST_F(TFProfStatsTest, TestAccountShownNameOnly) {
Options opts(100, 0, 0, 0, 0, 0, -1, "name", {".*"}, {".*"}, {""}, Options opts(100, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, "name", {".*"}, {".*"},
{"unit_2_1.*DW"}, // show_name_regexes. {""}, {"Conv2D_1"}, // show_name_regexes.
{""}, true, // account_displayed_op_only. {""}, true, // account_displayed_op_only.
{"params"}, "", {}); {"params"}, "", {});
const GraphNodeProto& root = tf_stats_->ShowGraphNode("scope", opts); const GraphNodeProto& root = tf_stats_->ShowGraphNode("scope", opts);
GraphNodeProto expected; GraphNodeProto expected;
CHECK(protobuf::TextFormat::ParseFromString( CHECK(protobuf::TextFormat::ParseFromString(
"name: \"_TFProfRoot\"\nexec_micros: 0\nrequested_bytes: " "name: \"_TFProfRoot\"\ntotal_exec_micros: 597\ntotal_requested_bytes: "
"0\ntotal_exec_micros: 0\ntotal_requested_bytes: 0\ntotal_parameters: " "5120\nchildren {\n name: \"Conv2D_1\"\n exec_micros: 597\n "
"0\nfloat_ops: 0\ntotal_float_ops: 0\naccelerator_exec_micros: " "requested_bytes: 5120\n total_exec_micros: 597\n "
"0\ncpu_exec_micros: 0\ntotal_accelerator_exec_micros: " "total_requested_bytes: 5120\n devices: "
"0\ntotal_cpu_exec_micros: 0\nrun_count: 0\ntotal_run_count: " "\"/job:localhost/replica:0/task:0/gpu:0\"\n float_ops: 4608\n "
"0\ntotal_definition_count: 1\n", "total_float_ops: 4608\n input_shapes {\n key: 0\n value {\n "
"dim {\n size: 2\n }\n dim {\n size: 3\n "
"}\n dim {\n size: 3\n }\n dim {\n size: "
"6\n }\n }\n }\n input_shapes {\n key: 1\n value {\n "
" dim {\n size: 2\n }\n dim {\n size: 2\n "
"}\n dim {\n size: 6\n }\n dim {\n size: "
"12\n }\n }\n }\n accelerator_exec_micros: 178\n "
"cpu_exec_micros: 419\n total_accelerator_exec_micros: 178\n "
"total_cpu_exec_micros: 419\n run_count: 1\n total_run_count: 1\n "
"total_definition_count: 1\n peak_bytes: 4096\n residual_bytes: 512\n "
"output_bytes: 512\n total_peak_bytes: 4096\n total_residual_bytes: "
"512\n total_output_bytes: 512\n}\ntotal_float_ops: "
"4608\ntotal_accelerator_exec_micros: 178\ntotal_cpu_exec_micros: "
"419\ntotal_run_count: 1\ntotal_definition_count: 2\ntotal_peak_bytes: "
"4096\ntotal_residual_bytes: 512\ntotal_output_bytes: 512\n",
&expected)); &expected));
EXPECT_EQ(expected.DebugString(), root.DebugString()); EXPECT_EQ(expected.DebugString(), root.DebugString());
} }
TEST_F(TFProfStatsTest, TestShowTensorValue) { TEST_F(TFProfStatsTest, TestShowTensorValue) {
Options opts(10, 0, 0, 0, 0, 0, -1, "name", {".*"}, {".*"}, {""}, Options opts(10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, "name", {".*"}, {".*"},
{"unit_1_0.*gamma"}, {""}, false, {""}, {"DW"}, {""}, false,
{"tensor_value"}, // Show tensor value from checkpoint. {"tensor_value"}, // Show tensor value from checkpoint.
"", {}); "", {});
const GraphNodeProto& root = tf_stats_->ShowGraphNode("scope", opts); const GraphNodeProto& root = tf_stats_->ShowGraphNode("scope", opts);
GraphNodeProto expected; GraphNodeProto expected;
CHECK(protobuf::TextFormat::ParseFromString( CHECK(protobuf::TextFormat::ParseFromString(
"name: \"_TFProfRoot\"\nexec_micros: 0\nrequested_bytes: " "name: \"_TFProfRoot\"\ntotal_exec_micros: 4904\ntotal_requested_bytes: "
"0\ntotal_exec_micros: 97\ntotal_requested_bytes: " "14592\ntotal_parameters: 451\nchildren {\n name: \"DW\"\n "
"8656\ntotal_parameters: 370\nfloat_ops: 0\ntotal_float_ops: " "exec_micros: 2\n parameters: 162\n total_exec_micros: 2\n "
"34360\naccelerator_exec_micros: 0\ncpu_exec_micros: " "total_parameters: 162\n devices: "
"0\ntotal_accelerator_exec_micros: 0\ntotal_cpu_exec_micros: " "\"/job:localhost/replica:0/task:0/gpu:0\"\n tensor_value {\n dtype: "
"97\nrun_count: 0\ntotal_run_count: 13\ntotal_definition_count: 68\n", "DT_FLOAT\n value_double: -0.000534315\n value_double: "
"-0.00089602\n value_double: -0.000417239\n value_double: "
"0.00041444\n value_double: 0.000780691\n value_double: "
"-0.000559057\n value_double: -0.000234623\n value_double: "
"0.00013393\n value_double: -0.00187574\n value_double: "
"0.000785666\n value_double: 0.000673294\n value_double: "
"0.000653368\n value_double: 0.000924489\n value_double: "
"-0.000318373\n value_double: -0.000385202\n value_double: "
"-7.92661e-05\n value_double: 2.70287e-05\n value_double: "
"0.00152302\n value_double: 8.04435e-05\n value_double: "
"-0.00058102\n value_double: 0.000244291\n value_double: "
"-0.000438045\n value_double: -0.000110199\n value_double: "
"0.000731663\n value_double: -0.0012326\n value_double: "
"0.00064065\n value_double: -0.00135203\n value_double: "
"-6.42784e-05\n value_double: -0.0011857\n value_double: "
"-0.000487383\n value_double: 3.41493e-05\n value_double: "
"-0.00158447\n value_double: 0.00168448\n value_double: "
"0.00160946\n value_double: -0.000600483\n value_double: "
"0.000650259\n value_double: -0.00109938\n value_double: "
"-0.000842166\n value_double: -0.0022673\n value_double: "
"-0.00101941\n value_double: -0.0011169\n value_double: "
"-0.0013557\n value_double: -1.46354e-05\n value_double: "
"-1.05487e-05\n value_double: -0.00092014\n value_double: "
"0.00272874\n value_double: 5.13942e-05\n value_double: "
"-0.00223472\n value_double: -0.000250875\n value_double: "
"-0.00180747\n value_double: -0.00234714\n value_double: "
"-0.00113523\n value_double: -0.00112635\n value_double: "
"-0.000843118\n value_double: -6.84256e-05\n value_double: "
"0.000243336\n value_double: 0.00119151\n value_double: "
"0.00131022\n value_double: 0.000768038\n value_double: "
"-8.90095e-05\n value_double: -0.000626427\n value_double: "
"-7.0617e-05\n value_double: -0.0021988\n value_double: "
"-0.00221544\n value_double: -0.000393118\n value_double: "
"0.000159464\n value_double: -0.000874746\n value_double: "
"-0.00131239\n value_double: -0.00135747\n value_double: "
"-0.00179753\n value_double: -0.00101005\n value_double: "
"-0.000107518\n value_double: -0.000616882\n value_double: "
"-0.000360923\n value_double: -0.00026896\n value_double: "
"-0.000142548\n value_double: 0.000577227\n value_double: "
"0.000536027\n value_double: 0.00126907\n value_double: "
"-0.00122712\n value_double: -3.60499e-05\n value_double: "
"0.000151026\n value_double: 0.00107658\n value_double: "
"0.00116475\n value_double: -0.00145312\n value_double: "
"0.000233326\n value_double: -0.00020198\n value_double: "
"0.00179029\n value_double: 0.00150048\n value_double: "
"-0.000884775\n value_double: 0.000409188\n value_double: "
"2.97176e-05\n value_double: -0.000506118\n value_double: "
"-2.33992e-05\n value_double: -0.00037212\n value_double: "
"0.000862773\n value_double: 0.00174046\n value_double: "
"-0.000240207\n value_double: 0.000663976\n value_double: "
"-0.00134747\n value_double: 0.00115585\n value_double: "
"0.000555869\n value_double: 0.00176722\n value_double: "
"-0.000518409\n value_double: 0.00101051\n value_double: "
"0.000129399\n value_double: -0.000916389\n value_double: "
"-0.00137693\n value_double: -0.00152412\n value_double: "
"7.32515e-05\n value_double: -0.000190811\n value_double: "
"-0.000158692\n value_double: -5.7791e-05\n value_double: "
"0.000671785\n value_double: -0.00152924\n value_double: "
"0.00117314\n value_double: -0.000384202\n value_double: "
"0.00176709\n value_double: -0.000181703\n value_double: "
"-0.000460994\n value_double: 0.000643716\n value_double: "
"4.76719e-05\n value_double: -0.00101037\n value_double: "
"0.00159621\n value_double: 0.00186758\n value_double: "
"0.00100001\n value_double: -0.00121831\n value_double: "
"0.00132231\n value_double: 0.0013511\n value_double: 0.00106659\n "
" value_double: 0.00018091\n value_double: 0.00155925\n "
"value_double: 4.26087e-05\n value_double: 0.000243264\n "
"value_double: -0.0017202\n value_double: -0.000218897\n "
"value_double: 0.00118693\n value_double: 0.00258909\n "
"value_double: 0.000641913\n value_double: -0.0013211\n "
"value_double: -0.00171943\n value_double: 0.00089151\n "
"value_double: -0.00114969\n value_double: -0.000196331\n "
"value_double: 0.00109994\n value_double: 0.000302616\n "
"value_double: 0.000675812\n value_double: 0.00112222\n "
"value_double: 0.000516456\n value_double: 0.00133357\n "
"value_double: 0.000298491\n value_double: 0.00145934\n "
"value_double: -0.00159102\n value_double: -0.000819061\n "
"value_double: 0.000120583\n value_double: 0.0006108\n "
"value_double: 0.00124132\n value_double: 0.000764859\n "
"value_double: 0.000374641\n value_double: -0.00149603\n "
"value_double: -0.000317367\n value_double: -0.000417829\n }\n "
"cpu_exec_micros: 2\n total_cpu_exec_micros: 2\n run_count: 1\n "
"total_run_count: 1\n total_definition_count: 10\n output_bytes: "
"1280\n total_output_bytes: 1280\n}\ntotal_float_ops: "
"10440\ntotal_accelerator_exec_micros: 404\ntotal_cpu_exec_micros: "
"4500\ntotal_run_count: 5\ntotal_definition_count: 34\ntotal_peak_bytes: "
"9984\ntotal_residual_bytes: 1280\ntotal_output_bytes: 4864\n",
&expected)); &expected));
EXPECT_EQ(expected.DebugString(), root.DebugString()); EXPECT_EQ(expected.DebugString(), root.DebugString());
} }

View File

@ -51,6 +51,33 @@ class TFProfTensor {
void Build(); void Build();
template <typename T>
bool AddValue(const T& value, TFProfTensorProto* dim) {
std::ostringstream sstream;
sstream << value;
if (typeid(value) == typeid(double)) {
double double_val;
CHECK(strings::safe_strtod(sstream.str().c_str(), &double_val));
dim->add_value_double(double_val);
formatted_str_ += strings::Printf(
"%.2f ", dim->value_double(dim->value_double_size() - 1));
} else if (typeid(value) == typeid(int64)) {
int64 int64_val;
CHECK(strings::safe_strto64(sstream.str().c_str(), &int64_val));
dim->add_value_int64(int64_val);
formatted_str_ += strings::Printf(
"%lld ",
static_cast<int64>(dim->value_int64(dim->value_int64_size() - 1)));
} else if (typeid(value) == typeid(string)) {
dim->add_value_str(sstream.str());
formatted_str_ =
strings::StrCat(formatted_str_, "'",
dim->value_str(dim->value_str_size() - 1) + "' ");
} else {
CHECK(false) << "Unsupported type: " << typeid(value).name();
}
}
// It assumes the flatten values are stored in row-major, which is mentioned // It assumes the flatten values are stored in row-major, which is mentioned
// indirectly at various places: // indirectly at various places:
// TODO(xpan): Further verifying it. // TODO(xpan): Further verifying it.
@ -59,37 +86,65 @@ class TFProfTensor {
TFProfTensorProto* dim) { TFProfTensorProto* dim) {
formatted_str_ += "["; formatted_str_ += "[";
int64 nstart = start; int64 nstart = start;
for (int i = 0; i < tensor_->dim_size(depth); i++) { if (tensor_->dims() == 0 && values.size() == 1) {
// Last dimension, pull the values. std::ostringstream sstream;
if (depth == tensor_->dims() - 1) { sstream << values[nstart];
std::ostringstream sstream;
sstream << values[nstart];
if (typeid(values[nstart]) == typeid(double)) { if (typeid(values[nstart]) == typeid(double)) {
double double_val; double double_val;
CHECK(strings::safe_strtod(sstream.str().c_str(), &double_val)); CHECK(strings::safe_strtod(sstream.str().c_str(), &double_val));
dim->add_value_double(double_val); dim->add_value_double(double_val);
formatted_str_ += strings::Printf( formatted_str_ += strings::Printf(
"%.2f ", dim->value_double(dim->value_double_size() - 1)); "%.2f ", dim->value_double(dim->value_double_size() - 1));
} else if (typeid(values[nstart]) == typeid(int64)) { } else if (typeid(values[nstart]) == typeid(int64)) {
int64 int64_val; int64 int64_val;
CHECK(strings::safe_strto64(sstream.str().c_str(), &int64_val)); CHECK(strings::safe_strto64(sstream.str().c_str(), &int64_val));
dim->add_value_int64(int64_val); dim->add_value_int64(int64_val);
formatted_str_ += strings::Printf( formatted_str_ += strings::Printf(
"%lld ", static_cast<int64>( "%lld ",
dim->value_int64(dim->value_int64_size() - 1))); static_cast<int64>(dim->value_int64(dim->value_int64_size() - 1)));
} else if (typeid(values[nstart]) == typeid(string)) { } else if (typeid(values[nstart]) == typeid(string)) {
dim->add_value_str(sstream.str()); dim->add_value_str(sstream.str());
formatted_str_ = formatted_str_ =
strings::StrCat(formatted_str_, "'", strings::StrCat(formatted_str_, "'",
dim->value_str(dim->value_str_size() - 1) + "' "); dim->value_str(dim->value_str_size() - 1) + "' ");
} else {
CHECK(false) << "Unsupported type: " << typeid(values[nstart]).name();
}
++nstart;
} else { } else {
// Not-last dimension. Drill deeper. CHECK(false) << "Unsupported type: " << typeid(values[nstart]).name();
nstart = BuildOutput<T>(nstart, depth + 1, values, dim); }
} else {
for (int i = 0; i < tensor_->dim_size(depth); i++) {
// Last dimension, pull the values.
if (depth == tensor_->dims() - 1) {
std::ostringstream sstream;
sstream << values[nstart];
if (typeid(values[nstart]) == typeid(double)) {
double double_val;
CHECK(strings::safe_strtod(sstream.str().c_str(), &double_val));
dim->add_value_double(double_val);
formatted_str_ += strings::Printf(
"%.2f ", dim->value_double(dim->value_double_size() - 1));
} else if (typeid(values[nstart]) == typeid(int64)) {
int64 int64_val;
CHECK(strings::safe_strto64(sstream.str().c_str(), &int64_val));
dim->add_value_int64(int64_val);
formatted_str_ += strings::Printf(
"%lld ", static_cast<int64>(
dim->value_int64(dim->value_int64_size() - 1)));
} else if (typeid(values[nstart]) == typeid(string)) {
dim->add_value_str(sstream.str());
formatted_str_ = strings::StrCat(
formatted_str_, "'",
dim->value_str(dim->value_str_size() - 1) + "' ");
} else {
CHECK(false) << "Unsupported type: "
<< typeid(values[nstart]).name();
}
++nstart;
} else {
// Not-last dimension. Drill deeper.
nstart = BuildOutput<T>(nstart, depth + 1, values, dim);
}
} }
} }
if (formatted_str_.length() > kTFProfTenosrMaxDisplayLen) { if (formatted_str_.length() > kTFProfTenosrMaxDisplayLen) {

View File

@ -18,12 +18,12 @@ limitations under the License.
#include "tensorflow/core/lib/io/path.h" #include "tensorflow/core/lib/io/path.h"
#include "tensorflow/core/platform/protobuf.h" #include "tensorflow/core/platform/protobuf.h"
#include "tensorflow/core/platform/test.h" #include "tensorflow/core/platform/test.h"
#include "tensorflow/core/protobuf/config.pb.h"
#include "tensorflow/core/profiler/internal/tfprof_options.h" #include "tensorflow/core/profiler/internal/tfprof_options.h"
#include "tensorflow/core/profiler/internal/tfprof_stats.h" #include "tensorflow/core/profiler/internal/tfprof_stats.h"
#include "tensorflow/core/profiler/internal/tfprof_utils.h" #include "tensorflow/core/profiler/internal/tfprof_utils.h"
#include "tensorflow/core/profiler/tfprof_log.pb.h" #include "tensorflow/core/profiler/tfprof_log.pb.h"
#include "tensorflow/core/profiler/tfprof_output.pb.h" #include "tensorflow/core/profiler/tfprof_output.pb.h"
#include "tensorflow/core/protobuf/config.pb.h"
namespace tensorflow { namespace tensorflow {
namespace tfprof { namespace tfprof {
@ -57,244 +57,19 @@ class TFProfTensorTest : public ::testing::Test {
}; };
TEST_F(TFProfTensorTest, Basics) { TEST_F(TFProfTensorTest, Basics) {
Options opts(3, 0, 0, 0, 0, 0, -1, "name", {"VariableV2"}, {".*"}, {""}, Options opts(3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, "name", {"VariableV2"},
{".*"}, {""}, false, {"tensor_value"}, // show the tensor value. {".*"}, {""}, {".*"}, {""}, false,
{"tensor_value"}, // show the tensor value.
"", {}); "", {});
const GraphNodeProto& root = tf_stats_->ShowGraphNode("scope", opts); const GraphNodeProto& root = tf_stats_->ShowGraphNode("scope", opts);
GraphNodeProto expected; GraphNodeProto expected;
CHECK(protobuf::TextFormat::ParseFromString( EXPECT_EQ(root.children(0).name(), "DW");
"name: \"_TFProfRoot\"\nexec_micros: 0\nrequested_bytes: " EXPECT_GT(root.children(0).tensor_value().value_double_size(), 10);
"0\ntotal_exec_micros: 0\ntotal_requested_bytes: 0\ntotal_parameters: " EXPECT_EQ(root.children(1).name(), "DW2");
"370\nchildren {\n name: \"conv2d\"\n exec_micros: 0\n " EXPECT_GT(root.children(1).tensor_value().value_double_size(), 10);
"requested_bytes: 0\n total_exec_micros: 0\n total_requested_bytes: " EXPECT_EQ(root.children(2).name(), "ScalarW");
"0\n total_parameters: 140\n children {\n name: \"conv2d/bias\"\n " EXPECT_EQ(root.children(2).tensor_value().value_double_size(), 1);
" exec_micros: 0\n requested_bytes: 0\n parameters: 5\n "
"total_exec_micros: 0\n total_requested_bytes: 0\n "
"total_parameters: 5\n float_ops: 0\n total_float_ops: 0\n "
"tensor_value {\n dtype: DT_FLOAT\n value_double: 0\n "
"value_double: 0\n value_double: 0\n value_double: 0\n "
"value_double: 0\n }\n accelerator_exec_micros: 0\n "
"cpu_exec_micros: 0\n total_accelerator_exec_micros: 0\n "
"total_cpu_exec_micros: 0\n run_count: 0\n total_run_count: 0\n "
"total_definition_count: 1\n }\n children {\n name: "
"\"conv2d/kernel\"\n exec_micros: 0\n requested_bytes: 0\n "
"parameters: 135\n total_exec_micros: 0\n total_requested_bytes: "
"0\n total_parameters: 135\n float_ops: 0\n total_float_ops: "
"0\n tensor_value {\n dtype: DT_FLOAT\n value_double: "
"-0.113138\n value_double: 0.261431\n value_double: 0.215777\n "
" value_double: 0.24135\n value_double: -0.113195\n "
"value_double: -0.212639\n value_double: -0.0907301\n "
"value_double: 0.0221634\n value_double: 0.21821\n "
"value_double: 0.22715\n value_double: -0.108698\n "
"value_double: 0.240911\n value_double: -0.138626\n "
"value_double: -0.144752\n value_double: -0.00962037\n "
"value_double: 0.0971008\n value_double: 0.00264764\n "
"value_double: -0.272929\n value_double: 0.0129845\n "
"value_double: 0.0466554\n value_double: -0.229184\n "
"value_double: 0.153576\n value_double: -0.169218\n "
"value_double: -0.112991\n value_double: 0.205739\n "
"value_double: 0.257844\n value_double: 0.107455\n "
"value_double: -0.207914\n value_double: 0.15211\n "
"value_double: 0.277932\n value_double: 0.145986\n "
"value_double: -0.0883989\n value_double: 0.167506\n "
"value_double: 0.10237\n value_double: 0.0542143\n "
"value_double: 0.0334378\n value_double: 0.159489\n "
"value_double: 0.246583\n value_double: 0.0154283\n "
"value_double: 0.0872411\n value_double: -0.25732\n "
"value_double: 0.0499355\n value_double: 0.0266221\n "
"value_double: 0.088801\n value_double: -0.0794552\n "
"value_double: -0.00383255\n value_double: -0.165267\n "
"value_double: 0.0271328\n value_double: 0.0729822\n "
"value_double: 0.200795\n value_double: 0.100276\n "
"value_double: 0.285254\n value_double: -0.171945\n "
"value_double: -0.0187411\n value_double: -0.218729\n "
"value_double: 0.233753\n value_double: 0.109184\n "
"value_double: 0.247875\n value_double: -0.224632\n "
"value_double: 0.0940739\n value_double: 0.00663087\n "
"value_double: -0.075786\n value_double: -0.179992\n "
"value_double: -0.276016\n value_double: 0.261207\n "
"value_double: -0.0658191\n value_double: -0.0747132\n "
"value_double: -0.0839638\n value_double: -0.0825393\n "
"value_double: 0.0915958\n value_double: -0.195425\n "
"value_double: -0.255836\n value_double: -0.08745\n "
"value_double: -0.181623\n value_double: -0.235936\n "
"value_double: 0.0205423\n value_double: 0.185447\n "
"value_double: -0.0691599\n value_double: -0.0451089\n "
"value_double: -0.153922\n value_double: -0.0279411\n "
"value_double: 0.148915\n value_double: -0.018026\n "
"value_double: -0.144903\n value_double: 0.0370046\n "
"value_double: 0.0764987\n value_double: 0.0586488\n "
"value_double: -0.222919\n value_double: 0.0238447\n "
"value_double: -0.106012\n value_double: -0.102202\n "
"value_double: -0.159347\n value_double: -0.0232876\n "
"value_double: 0.109855\n value_double: -0.141833\n "
"value_double: 0.1376\n value_double: -0.12413\n value_double: "
"-0.208968\n value_double: 0.0758635\n value_double: "
"-0.217672\n value_double: -0.20153\n value_double: "
"-0.195414\n value_double: -0.18549\n value_double: "
"0.00298014\n value_double: -0.279283\n value_double: "
"0.200084\n value_double: -0.0968328\n value_double: -0.243\n "
" value_double: 0.239319\n value_double: -0.236288\n "
"value_double: 0.169477\n value_double: 0.126673\n "
"value_double: 0.182215\n value_double: -0.028243\n "
"value_double: 0.282762\n value_double: -0.165548\n "
"value_double: -0.0641245\n value_double: -0.186382\n "
"value_double: 0.0329038\n value_double: 0.271848\n "
"value_double: 0.084653\n value_double: -0.108163\n "
"value_double: 0.247094\n value_double: 0.192687\n "
"value_double: 0.171922\n value_double: -0.187649\n "
"value_double: 0.251253\n value_double: 0.272077\n "
"value_double: 0.19068\n value_double: 0.220352\n "
"value_double: -0.255741\n value_double: 0.110853\n "
"value_double: 0.146625\n value_double: 0.167754\n "
"value_double: 0.249554\n }\n accelerator_exec_micros: 0\n "
"cpu_exec_micros: 0\n total_accelerator_exec_micros: 0\n "
"total_cpu_exec_micros: 0\n run_count: 0\n total_run_count: 0\n "
"total_definition_count: 1\n }\n float_ops: 0\n total_float_ops: 0\n "
"accelerator_exec_micros: 0\n cpu_exec_micros: 0\n "
"total_accelerator_exec_micros: 0\n total_cpu_exec_micros: 0\n "
"run_count: 0\n total_run_count: 0\n total_definition_count: "
"3\n}\nchildren {\n name: \"conv2d_1\"\n exec_micros: 0\n "
"requested_bytes: 0\n total_exec_micros: 0\n total_requested_bytes: "
"0\n total_parameters: 230\n children {\n name: \"conv2d_1/bias\"\n "
" exec_micros: 0\n requested_bytes: 0\n parameters: 5\n "
"total_exec_micros: 0\n total_requested_bytes: 0\n "
"total_parameters: 5\n float_ops: 0\n total_float_ops: 0\n "
"tensor_value {\n dtype: DT_FLOAT\n value_double: 0\n "
"value_double: 0\n value_double: 0\n value_double: 0\n "
"value_double: 0\n }\n accelerator_exec_micros: 0\n "
"cpu_exec_micros: 0\n total_accelerator_exec_micros: 0\n "
"total_cpu_exec_micros: 0\n run_count: 0\n total_run_count: 0\n "
"total_definition_count: 1\n }\n children {\n name: "
"\"conv2d_1/kernel\"\n exec_micros: 0\n requested_bytes: 0\n "
"parameters: 225\n total_exec_micros: 0\n total_requested_bytes: "
"0\n total_parameters: 225\n float_ops: 0\n total_float_ops: "
"0\n tensor_value {\n dtype: DT_FLOAT\n value_double: "
"-0.00170514\n value_double: 0.138601\n value_double: "
"-0.224822\n value_double: -0.0848449\n value_double: "
"0.170551\n value_double: 0.147666\n value_double: "
"-0.0570606\n value_double: -0.132805\n value_double: "
"-0.172013\n value_double: 0.249707\n value_double: 0.149734\n "
" value_double: 0.0365986\n value_double: -0.0923146\n "
"value_double: -0.17745\n value_double: -0.169978\n "
"value_double: -0.173298\n value_double: -0.110407\n "
"value_double: 0.1469\n value_double: 0.0419576\n "
"value_double: 0.0391093\n value_double: -0.137381\n "
"value_double: 0.212642\n value_double: -0.067034\n "
"value_double: -0.0727709\n value_double: -0.0276531\n "
"value_double: 0.218212\n value_double: 0.0596479\n "
"value_double: -0.0468102\n value_double: -0.0250467\n "
"value_double: -0.20391\n value_double: -0.233801\n "
"value_double: 0.135615\n value_double: -0.182124\n "
"value_double: 0.254205\n value_double: 0.0819146\n "
"value_double: -0.146696\n value_double: -0.20095\n "
"value_double: -0.250555\n value_double: -0.226406\n "
"value_double: 0.0421331\n value_double: 0.0361264\n "
"value_double: -0.188558\n value_double: -0.0222711\n "
"value_double: -0.128226\n value_double: -0.148305\n "
"value_double: -0.137598\n value_double: -0.041647\n "
"value_double: -0.0574933\n value_double: 0.122506\n "
"value_double: 0.0415936\n value_double: 0.244957\n "
"value_double: 0.00372121\n value_double: -0.139939\n "
"value_double: 0.250411\n value_double: -0.23848\n "
"value_double: -0.0717569\n value_double: -0.00884159\n "
"value_double: 0.135616\n value_double: -0.0493895\n "
"value_double: 0.254308\n value_double: -0.181419\n "
"value_double: -0.114829\n value_double: -0.172638\n "
"value_double: 0.06984\n value_double: -0.086704\n "
"value_double: 0.168515\n value_double: -0.152275\n "
"value_double: -0.230775\n value_double: -0.254366\n "
"value_double: -0.115397\n value_double: 0.0418207\n "
"value_double: -0.199607\n value_double: -0.167001\n "
"value_double: -0.187238\n value_double: 0.0196097\n "
"value_double: 0.201653\n value_double: -0.143758\n "
"value_double: 0.167187\n value_double: -0.129141\n "
"value_double: 0.230154\n value_double: -0.119968\n "
"value_double: -0.121843\n value_double: -0.0118565\n "
"value_double: 0.0285747\n value_double: -0.0593699\n "
"value_double: -0.175214\n value_double: -0.211524\n "
"value_double: 0.167042\n value_double: -0.216357\n "
"value_double: -0.0218886\n value_double: -0.244211\n "
"value_double: 0.175301\n value_double: 0.0654932\n "
"value_double: -0.0419763\n value_double: -0.103275\n "
"value_double: -0.0848433\n value_double: -0.0845421\n "
"value_double: -0.00269318\n value_double: -0.145978\n "
"value_double: -0.217061\n value_double: -0.0937043\n "
"value_double: 0.235796\n value_double: -0.0893372\n "
"value_double: 0.000827968\n value_double: 0.0172743\n "
"value_double: -0.234205\n value_double: -0.0867703\n "
"value_double: 0.131704\n value_double: 0.134143\n "
"value_double: -0.162257\n value_double: -0.129706\n "
"value_double: 0.0763288\n value_double: 0.156988\n "
"value_double: 0.220033\n value_double: -0.179884\n "
"value_double: 0.066697\n value_double: 0.212322\n "
"value_double: -0.0961226\n value_double: -0.11223\n "
"value_double: 0.249944\n value_double: 0.115673\n "
"value_double: -0.100203\n value_double: 0.125645\n "
"value_double: -0.256104\n value_double: 0.0996534\n "
"value_double: 0.167306\n value_double: -0.00700775\n "
"value_double: 0.242145\n value_double: 0.088406\n "
"value_double: 0.0975334\n value_double: -0.0309525\n "
"value_double: -0.0422794\n value_double: 0.20739\n "
"value_double: 0.113992\n value_double: 0.253818\n "
"value_double: -0.0857835\n value_double: 0.223902\n "
"value_double: 0.10291\n value_double: 0.103091\n "
"value_double: -0.177502\n value_double: -0.0258242\n "
"value_double: -0.130567\n value_double: -0.15999\n "
"value_double: -0.101484\n value_double: 0.0188813\n "
"value_double: 0.160626\n value_double: 0.0467491\n "
"value_double: 0.193634\n value_double: -0.0910993\n "
"value_double: 0.0440249\n value_double: -0.255389\n "
"value_double: -0.240244\n value_double: -0.213171\n "
"value_double: 0.175978\n value_double: -0.0251202\n "
"value_double: 0.0943941\n value_double: -0.196194\n "
"value_double: 0.163395\n value_double: -0.010777\n "
"value_double: -0.0626751\n value_double: -0.246234\n "
"value_double: 0.0662063\n value_double: 0.120589\n "
"value_double: 0.237322\n value_double: 0.0849243\n "
"value_double: -0.066591\n value_double: 0.0512236\n "
"value_double: -0.144309\n value_double: -0.235415\n "
"value_double: -0.0565311\n value_double: 0.0882529\n "
"value_double: -0.215923\n value_double: -0.0873292\n "
"value_double: -0.0691103\n value_double: -0.00238678\n "
"value_double: 0.147789\n value_double: -0.124451\n "
"value_double: 0.205044\n value_double: -0.0596834\n "
"value_double: 0.0268479\n value_double: 0.0857448\n "
"value_double: -0.0923855\n value_double: -0.0960547\n "
"value_double: 0.169869\n value_double: 0.16988\n "
"value_double: -0.032271\n value_double: -0.120731\n "
"value_double: -0.199086\n value_double: 0.181199\n "
"value_double: 0.00897732\n value_double: -0.257469\n "
"value_double: -0.135556\n value_double: -0.149663\n "
"value_double: -0.00990398\n value_double: 0.221165\n "
"value_double: 0.0327134\n value_double: -0.0392821\n "
"value_double: -0.0614503\n value_double: 0.246602\n "
"value_double: -0.171692\n value_double: -0.150835\n "
"value_double: -0.13854\n value_double: -0.244668\n "
"value_double: 0.0790781\n value_double: 0.212678\n "
"value_double: 0.0782059\n value_double: -0.177888\n "
"value_double: -0.165914\n value_double: -0.164251\n "
"value_double: 0.165007\n value_double: 0.239615\n "
"value_double: -0.217642\n value_double: -0.219843\n "
"value_double: 0.0828398\n value_double: 0.00272235\n "
"value_double: -0.0323662\n value_double: -0.255953\n "
"value_double: 0.237298\n value_double: -0.0896481\n "
"value_double: -0.0605349\n value_double: 0.231679\n "
"value_double: -0.123842\n value_double: 0.0858642\n "
"value_double: 0.23111\n value_double: 0.0491742\n }\n "
"accelerator_exec_micros: 0\n cpu_exec_micros: 0\n "
"total_accelerator_exec_micros: 0\n total_cpu_exec_micros: 0\n "
"run_count: 0\n total_run_count: 0\n total_definition_count: 1\n "
"}\n float_ops: 0\n total_float_ops: 0\n accelerator_exec_micros: 0\n "
" cpu_exec_micros: 0\n total_accelerator_exec_micros: 0\n "
"total_cpu_exec_micros: 0\n run_count: 0\n total_run_count: 0\n "
"total_definition_count: 3\n}\nfloat_ops: 0\ntotal_float_ops: "
"0\naccelerator_exec_micros: 0\ncpu_exec_micros: "
"0\ntotal_accelerator_exec_micros: 0\ntotal_cpu_exec_micros: "
"0\nrun_count: 0\ntotal_run_count: 0\ntotal_definition_count: 6\n",
&expected));
EXPECT_EQ(expected.DebugString(), root.DebugString());
} }
} // namespace tfprof } // namespace tfprof

View File

@ -147,8 +147,8 @@ void MemoryTracker::TrackNodeConnection(int64 step, const GraphNode* node,
if (output_idx == node->node->src_output_idx().end()) { if (output_idx == node->node->src_output_idx().end()) {
return; return;
} }
const auto& output = src->node->output_bytes(step).find(output_idx->second); const auto& output = src->node->output_memory(step).find(output_idx->second);
if (output == src->node->output_bytes(step).end()) { if (output == src->node->output_memory(step).end()) {
return; return;
} }
int64 output_bytes = output->second.first; int64 output_bytes = output->second.first;

View File

@ -62,7 +62,8 @@ class TFProfTimelineTest : public ::testing::Test {
// manually check it's correct // manually check it's correct
TEST_F(TFProfTimelineTest, GraphView) { TEST_F(TFProfTimelineTest, GraphView) {
string dump_file = io::JoinPath(testing::TmpDir(), "dump"); string dump_file = io::JoinPath(testing::TmpDir(), "dump");
Options opts(10000, 0, 0, 0, 0, 0, 0, "name", {".*"}, // accout_type_regexes Options opts(10000, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, "name",
{".*"}, // accout_type_regexes
{".*"}, {""}, {".*"}, {""}, false, {".*"}, {""}, {".*"}, {""}, false,
{"params", "bytes", "micros", "float_ops"}, "timeline", {"params", "bytes", "micros", "float_ops"}, "timeline",
{{"outfile", dump_file}}); {{"outfile", dump_file}});
@ -70,12 +71,13 @@ TEST_F(TFProfTimelineTest, GraphView) {
string dump_str; string dump_str;
TF_CHECK_OK(ReadFileToString(Env::Default(), dump_file, &dump_str)); TF_CHECK_OK(ReadFileToString(Env::Default(), dump_file, &dump_str));
EXPECT_EQ(5576767607271035974ull, Hash64(dump_str)); EXPECT_EQ(16947107375505024864ull, Hash64(dump_str));
} }
TEST_F(TFProfTimelineTest, ScopeView) { TEST_F(TFProfTimelineTest, ScopeView) {
string dump_file = io::JoinPath(testing::TmpDir(), "dump"); string dump_file = io::JoinPath(testing::TmpDir(), "dump");
Options opts(5, 0, 0, 0, 0, 0, 0, "name", {".*"}, // accout_type_regexes Options opts(5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, "name",
{".*"}, // accout_type_regexes
{".*"}, {""}, {".*"}, {""}, false, {".*"}, {""}, {".*"}, {""}, false,
{"params", "bytes", "micros", "float_ops"}, "timeline", {"params", "bytes", "micros", "float_ops"}, "timeline",
{{"outfile", dump_file}}); {{"outfile", dump_file}});
@ -83,7 +85,7 @@ TEST_F(TFProfTimelineTest, ScopeView) {
string dump_str; string dump_str;
TF_CHECK_OK(ReadFileToString(Env::Default(), dump_file, &dump_str)); TF_CHECK_OK(ReadFileToString(Env::Default(), dump_file, &dump_str));
EXPECT_EQ(10135186027625211652ull, Hash64(dump_str)); EXPECT_EQ(2710044785377031280ull, Hash64(dump_str));
} }
// TODO(xpan): tfprof_log is too large to include in testdata when adding // TODO(xpan): tfprof_log is too large to include in testdata when adding

View File

@ -140,35 +140,66 @@ tensorflow::Status ParseCmdLine(const string& line, string* cmd,
++i; ++i;
} else if (pieces[i] == tensorflow::tfprof::kOptions[2]) { } else if (pieces[i] == tensorflow::tfprof::kOptions[2]) {
if (pieces.size() <= i + 1 || if (pieces.size() <= i + 1 ||
!strings::safe_strto64(pieces[i + 1], &opts->min_micros)) { !strings::safe_strto64(pieces[i + 1], &opts->min_peak_bytes)) {
return ReturnError(pieces, i); return ReturnError(pieces, i);
} }
++i; ++i;
} else if (pieces[i] == tensorflow::tfprof::kOptions[3]) { } else if (pieces[i] == tensorflow::tfprof::kOptions[3]) {
if (pieces.size() <= i + 1 || if (pieces.size() <= i + 1 ||
!strings::safe_strto64(pieces[i + 1], &opts->min_params)) { !strings::safe_strto64(pieces[i + 1], &opts->min_residual_bytes)) {
return ReturnError(pieces, i); return ReturnError(pieces, i);
} }
++i; ++i;
} else if (pieces[i] == tensorflow::tfprof::kOptions[4]) { } else if (pieces[i] == tensorflow::tfprof::kOptions[4]) {
if (pieces.size() <= i + 1 || if (pieces.size() <= i + 1 ||
!strings::safe_strto64(pieces[i + 1], &opts->min_float_ops)) { !strings::safe_strto64(pieces[i + 1], &opts->min_output_bytes)) {
return ReturnError(pieces, i); return ReturnError(pieces, i);
} }
++i; ++i;
} else if (pieces[i] == tensorflow::tfprof::kOptions[5]) { } else if (pieces[i] == tensorflow::tfprof::kOptions[5]) {
if (pieces.size() <= i + 1 || if (pieces.size() <= i + 1 ||
!strings::safe_strto64(pieces[i + 1], &opts->min_occurrence)) { !strings::safe_strto64(pieces[i + 1], &opts->min_micros)) {
return ReturnError(pieces, i); return ReturnError(pieces, i);
} }
++i; ++i;
} else if (pieces[i] == tensorflow::tfprof::kOptions[6]) { } else if (pieces[i] == tensorflow::tfprof::kOptions[6]) {
if (pieces.size() <= i + 1 || if (pieces.size() <= i + 1 ||
!strings::safe_strto64(pieces[i + 1], &opts->step)) { !strings::safe_strto64(pieces[i + 1],
&opts->min_accelerator_micros)) {
return ReturnError(pieces, i); return ReturnError(pieces, i);
} }
++i; ++i;
} else if (pieces[i] == tensorflow::tfprof::kOptions[7]) { } else if (pieces[i] == tensorflow::tfprof::kOptions[7]) {
if (pieces.size() <= i + 1 ||
!strings::safe_strto64(pieces[i + 1], &opts->min_cpu_micros)) {
return ReturnError(pieces, i);
}
++i;
} else if (pieces[i] == tensorflow::tfprof::kOptions[8]) {
if (pieces.size() <= i + 1 ||
!strings::safe_strto64(pieces[i + 1], &opts->min_params)) {
return ReturnError(pieces, i);
}
++i;
} else if (pieces[i] == tensorflow::tfprof::kOptions[9]) {
if (pieces.size() <= i + 1 ||
!strings::safe_strto64(pieces[i + 1], &opts->min_float_ops)) {
return ReturnError(pieces, i);
}
++i;
} else if (pieces[i] == tensorflow::tfprof::kOptions[10]) {
if (pieces.size() <= i + 1 ||
!strings::safe_strto64(pieces[i + 1], &opts->min_occurrence)) {
return ReturnError(pieces, i);
}
++i;
} else if (pieces[i] == tensorflow::tfprof::kOptions[11]) {
if (pieces.size() <= i + 1 ||
!strings::safe_strto64(pieces[i + 1], &opts->step)) {
return ReturnError(pieces, i);
}
++i;
} else if (pieces[i] == tensorflow::tfprof::kOptions[12]) {
if (pieces.size() <= i + 1) { if (pieces.size() <= i + 1) {
return ReturnError(pieces, i); return ReturnError(pieces, i);
} }
@ -180,42 +211,42 @@ tensorflow::Status ParseCmdLine(const string& line, string* cmd,
} }
opts->order_by = *order_by; opts->order_by = *order_by;
++i; ++i;
} else if (pieces[i] == tensorflow::tfprof::kOptions[8]) { } else if (pieces[i] == tensorflow::tfprof::kOptions[13]) {
if (pieces.size() <= i + 1) { if (pieces.size() <= i + 1) {
return ReturnError(pieces, i); return ReturnError(pieces, i);
} }
opts->account_type_regexes = str_util::Split(StripQuote(pieces[i + 1]), opts->account_type_regexes = str_util::Split(StripQuote(pieces[i + 1]),
',', str_util::SkipEmpty()); ',', str_util::SkipEmpty());
++i; ++i;
} else if (pieces[i] == tensorflow::tfprof::kOptions[9]) { } else if (pieces[i] == tensorflow::tfprof::kOptions[14]) {
if (pieces.size() <= i + 1) { if (pieces.size() <= i + 1) {
return ReturnError(pieces, i); return ReturnError(pieces, i);
} }
opts->start_name_regexes = str_util::Split(StripQuote(pieces[i + 1]), ',', opts->start_name_regexes = str_util::Split(StripQuote(pieces[i + 1]), ',',
str_util::SkipEmpty()); str_util::SkipEmpty());
++i; ++i;
} else if (pieces[i] == tensorflow::tfprof::kOptions[10]) { } else if (pieces[i] == tensorflow::tfprof::kOptions[15]) {
if (pieces.size() <= i + 1) { if (pieces.size() <= i + 1) {
return ReturnError(pieces, i); return ReturnError(pieces, i);
} }
opts->trim_name_regexes = str_util::Split(StripQuote(pieces[i + 1]), ',', opts->trim_name_regexes = str_util::Split(StripQuote(pieces[i + 1]), ',',
str_util::SkipEmpty()); str_util::SkipEmpty());
++i; ++i;
} else if (pieces[i] == tensorflow::tfprof::kOptions[11]) { } else if (pieces[i] == tensorflow::tfprof::kOptions[16]) {
if (pieces.size() <= i + 1) { if (pieces.size() <= i + 1) {
return ReturnError(pieces, i); return ReturnError(pieces, i);
} }
opts->show_name_regexes = str_util::Split(StripQuote(pieces[i + 1]), ',', opts->show_name_regexes = str_util::Split(StripQuote(pieces[i + 1]), ',',
str_util::SkipEmpty()); str_util::SkipEmpty());
++i; ++i;
} else if (pieces[i] == tensorflow::tfprof::kOptions[12]) { } else if (pieces[i] == tensorflow::tfprof::kOptions[17]) {
if (pieces.size() <= i + 1) { if (pieces.size() <= i + 1) {
return ReturnError(pieces, i); return ReturnError(pieces, i);
} }
opts->hide_name_regexes = str_util::Split(StripQuote(pieces[i + 1]), ',', opts->hide_name_regexes = str_util::Split(StripQuote(pieces[i + 1]), ',',
str_util::SkipEmpty()); str_util::SkipEmpty());
++i; ++i;
} else if (pieces[i] == tensorflow::tfprof::kOptions[13]) { } else if (pieces[i] == tensorflow::tfprof::kOptions[18]) {
if ((pieces.size() > i + 1 && pieces[i + 1].find("-") == 0) || if ((pieces.size() > i + 1 && pieces[i + 1].find("-") == 0) ||
pieces.size() == i + 1) { pieces.size() == i + 1) {
opts->account_displayed_op_only = true; opts->account_displayed_op_only = true;
@ -225,7 +256,7 @@ tensorflow::Status ParseCmdLine(const string& line, string* cmd,
} else { } else {
++i; ++i;
} }
} else if (pieces[i] == tensorflow::tfprof::kOptions[14]) { } else if (pieces[i] == tensorflow::tfprof::kOptions[19]) {
if (pieces.size() <= i + 1) { if (pieces.size() <= i + 1) {
return ReturnError(pieces, i); return ReturnError(pieces, i);
} }
@ -242,7 +273,7 @@ tensorflow::Status ParseCmdLine(const string& line, string* cmd,
} }
opts->select = requested_set; opts->select = requested_set;
++i; ++i;
} else if (pieces[i] == tensorflow::tfprof::kOptions[15]) { } else if (pieces[i] == tensorflow::tfprof::kOptions[20]) {
if (pieces.size() <= i + 1) { if (pieces.size() <= i + 1) {
return ReturnError(pieces, i); return ReturnError(pieces, i);
} }

View File

@ -72,7 +72,12 @@ int Run(int argc, char** argv) {
string FLAGS_checkpoint_path = ""; string FLAGS_checkpoint_path = "";
int32 FLAGS_max_depth = 10; int32 FLAGS_max_depth = 10;
int64 FLAGS_min_bytes = 0; int64 FLAGS_min_bytes = 0;
int64 FLAGS_min_peak_bytes = 0;
int64 FLAGS_min_residual_bytes = 0;
int64 FLAGS_min_output_bytes = 0;
int64 FLAGS_min_micros = 0; int64 FLAGS_min_micros = 0;
int64 FLAGS_min_accelerator_micros = 0;
int64 FLAGS_min_cpu_micros = 0;
int64 FLAGS_min_params = 0; int64 FLAGS_min_params = 0;
int64 FLAGS_min_float_ops = 0; int64 FLAGS_min_float_ops = 0;
int64 FLAGS_min_occurrence = 0; int64 FLAGS_min_occurrence = 0;
@ -101,7 +106,14 @@ int Run(int argc, char** argv) {
"TensorFlow Checkpoint file name"), "TensorFlow Checkpoint file name"),
Flag("max_depth", &FLAGS_max_depth, "max depth"), Flag("max_depth", &FLAGS_max_depth, "max depth"),
Flag("min_bytes", &FLAGS_min_bytes, "min_bytes"), Flag("min_bytes", &FLAGS_min_bytes, "min_bytes"),
Flag("min_peak_bytes", &FLAGS_min_peak_bytes, "min_peak_bytes"),
Flag("min_residual_bytes", &FLAGS_min_residual_bytes,
"min_residual_bytes"),
Flag("min_output_bytes", &FLAGS_min_output_bytes, "min_output_bytes"),
Flag("min_micros", &FLAGS_min_micros, "min micros"), Flag("min_micros", &FLAGS_min_micros, "min micros"),
Flag("min_accelerator_micros", &FLAGS_min_accelerator_micros,
"min acclerator_micros"),
Flag("min_cpu_micros", &FLAGS_min_cpu_micros, "min_cpu_micros"),
Flag("min_params", &FLAGS_min_params, "min params"), Flag("min_params", &FLAGS_min_params, "min params"),
Flag("min_float_ops", &FLAGS_min_float_ops, "min float ops"), Flag("min_float_ops", &FLAGS_min_float_ops, "min float ops"),
Flag("min_occurrence", &FLAGS_min_occurrence, "min occurrence"), Flag("min_occurrence", &FLAGS_min_occurrence, "min occurrence"),
@ -214,12 +226,14 @@ int Run(int argc, char** argv) {
return 0; return 0;
} }
Options opts(FLAGS_max_depth, FLAGS_min_bytes, FLAGS_min_micros, Options opts(
FLAGS_min_params, FLAGS_min_float_ops, FLAGS_min_occurrence, FLAGS_max_depth, FLAGS_min_bytes, FLAGS_min_peak_bytes,
FLAGS_step, FLAGS_order_by, account_type_regexes, FLAGS_min_residual_bytes, FLAGS_min_output_bytes, FLAGS_min_micros,
start_name_regexes, trim_name_regexes, show_name_regexes, FLAGS_min_accelerator_micros, FLAGS_min_cpu_micros, FLAGS_min_params,
hide_name_regexes, FLAGS_account_displayed_op_only, select, FLAGS_min_float_ops, FLAGS_min_occurrence, FLAGS_step, FLAGS_order_by,
output_type, output_options); account_type_regexes, start_name_regexes, trim_name_regexes,
show_name_regexes, hide_name_regexes, FLAGS_account_displayed_op_only,
select, output_type, output_options);
if (cmd == kCmds[2] || cmd == kCmds[3]) { if (cmd == kCmds[2] || cmd == kCmds[3]) {
tf_stat.BuildView(cmd); tf_stat.BuildView(cmd);

View File

@ -7,7 +7,12 @@ package tensorflow.tfprof;
message OptionsProto { message OptionsProto {
int64 max_depth = 1; int64 max_depth = 1;
int64 min_bytes = 2; int64 min_bytes = 2;
int64 min_peak_bytes = 19;
int64 min_residual_bytes = 20;
int64 min_output_bytes = 21;
int64 min_micros = 3; int64 min_micros = 3;
int64 min_accelerator_micros = 22;
int64 min_cpu_micros = 23;
int64 min_params = 4; int64 min_params = 4;
int64 min_float_ops = 5; int64 min_float_ops = 5;
int64 min_occurrence = 17; int64 min_occurrence = 17;

View File

@ -28,8 +28,15 @@ message GraphNodeProto {
int64 accelerator_exec_micros = 17; int64 accelerator_exec_micros = 17;
int64 cpu_exec_micros = 18; int64 cpu_exec_micros = 18;
// Total requested bytes by the op. // Total bytes requested by the op.
int64 requested_bytes = 3; int64 requested_bytes = 3;
// Max bytes allocated and being used by the op at a point.
int64 peak_bytes = 24;
// Total bytes requested by the op and not released before end.
int64 residual_bytes = 25;
// Total bytes output by the op (not necessarily allocated by the op).
int64 output_bytes = 26;
// Number of parameters if available. // Number of parameters if available.
int64 parameters = 4; int64 parameters = 4;
// Number of float operations. // Number of float operations.
@ -49,6 +56,10 @@ message GraphNodeProto {
int64 total_cpu_exec_micros = 20; int64 total_cpu_exec_micros = 20;
int64 total_requested_bytes = 7; int64 total_requested_bytes = 7;
int64 total_peak_bytes = 27;
int64 total_residual_bytes = 28;
int64 total_output_bytes = 29;
int64 total_parameters = 8; int64 total_parameters = 8;
int64 total_float_ops = 14; int64 total_float_ops = 14;
@ -81,6 +92,13 @@ message MultiGraphNodeProto {
// Total requested bytes by the code. // Total requested bytes by the code.
int64 requested_bytes = 3; int64 requested_bytes = 3;
// Max bytes allocated and being used by the op at a point.
int64 peak_bytes = 16;
// Total bytes requested by the op and not released before end.
int64 residual_bytes = 17;
// Total bytes output by the op (not necessarily allocated by the op).
int64 output_bytes = 18;
// Number of parameters if available. // Number of parameters if available.
int64 parameters = 4; int64 parameters = 4;
// Number of float operations. // Number of float operations.
@ -93,6 +111,10 @@ message MultiGraphNodeProto {
int64 total_cpu_exec_micros = 15; int64 total_cpu_exec_micros = 15;
int64 total_requested_bytes = 7; int64 total_requested_bytes = 7;
int64 total_peak_bytes = 19;
int64 total_residual_bytes = 20;
int64 total_output_bytes = 21;
int64 total_parameters = 8; int64 total_parameters = 8;
int64 total_float_ops = 9; int64 total_float_ops = 9;

View File

@ -53,7 +53,12 @@ def _build_options(options):
opts = tfprof_options_pb2.OptionsProto() opts = tfprof_options_pb2.OptionsProto()
opts.max_depth = options.get('max_depth', 10) opts.max_depth = options.get('max_depth', 10)
opts.min_bytes = options.get('min_bytes', 0) opts.min_bytes = options.get('min_bytes', 0)
opts.min_peak_bytes = options.get('min_peak_bytes', 0)
opts.min_residual_bytes = options.get('min_residual_bytes', 0)
opts.min_output_bytes = options.get('min_output_bytes', 0)
opts.min_micros = options.get('min_micros', 0) opts.min_micros = options.get('min_micros', 0)
opts.min_accelerator_micros = options.get('min_accelerator_micros', 0)
opts.min_cpu_micros = options.get('min_cpu_micros', 0)
opts.min_params = options.get('min_params', 0) opts.min_params = options.get('min_params', 0)
opts.min_float_ops = options.get('min_float_ops', 0) opts.min_float_ops = options.get('min_float_ops', 0)
opts.min_occurrence = options.get('min_occurrence', 0) opts.min_occurrence = options.get('min_occurrence', 0)

View File

@ -20,6 +20,7 @@ from __future__ import print_function
import gzip import gzip
import io import io
import os import os
import random
from tensorflow.core.profiler import profile_pb2 from tensorflow.core.profiler import profile_pb2
from tensorflow.core.protobuf import config_pb2 from tensorflow.core.protobuf import config_pb2
@ -118,7 +119,7 @@ class PrintModelAnalysisTest(test.TestCase):
with gfile.Open(outfile, 'r') as f: with gfile.Open(outfile, 'r') as f:
# pylint: disable=line-too-long # pylint: disable=line-too-long
self.assertEqual( self.assertEqual(
'node name | output bytes | # parameters | # float_ops | assigned devices | input', 'node name | requested bytes | # parameters | # float_ops | assigned devices | in',
f.read()[0:80]) f.read()[0:80])
# pylint: enable=line-too-long # pylint: enable=line-too-long
@ -243,7 +244,9 @@ class PrintModelAnalysisTest(test.TestCase):
.with_accounted_types(['.*']) .with_accounted_types(['.*'])
.with_min_occurrence(10) .with_min_occurrence(10)
.order_by('occurrence') .order_by('occurrence')
.select(['params', 'micros', 'occurrence', 'input_shapes']).build()) .select(['params', 'micros', 'bytes',
'peak_bytes', 'residual_bytes',
'output_bytes', 'occurrence', 'input_shapes']).build())
with session.Session() as sess: with session.Session() as sess:
x = lib.BuildFullModel() x = lib.BuildFullModel()
@ -261,8 +264,8 @@ class PrintModelAnalysisTest(test.TestCase):
with gfile.Open(outfile, 'r') as f: with gfile.Open(outfile, 'r') as f:
# pylint: disable=line-too-long # pylint: disable=line-too-long
self.assertEqual( self.assertEqual(
'nodename|totalexecutiontime|acceleratorexecutiontime|cpuexecutiontime|#parameters|opoccurrence(run|defined)|inputshapes\n', 'nodename|requestedbytes|peakbytes|residualbytes|outputbytes|totalexecutiontime|acceleratorexecutiontime|cpuexecutiontime|#parameters|opoccurrence(run|defined)|inputshapes\nConst0B(0',
f.read().replace('\t', '').replace(' ', '')[0:120]) f.read().replace('\t', '').replace(' ', '')[0:180])
# pylint: enable=line-too-long # pylint: enable=line-too-long
total_children = 0 total_children = 0
@ -370,6 +373,123 @@ class PrintModelAnalysisTest(test.TestCase):
for attr in ['op_types', 'device', 'input_shapes']: for attr in ['op_types', 'device', 'input_shapes']:
self.pprof_test_helper(attr, True) self.pprof_test_helper(attr, True)
def testMinOption(self):
ops.reset_default_graph()
def check_min(nodes, mm=0, mam=0, mcm=0, mb=0, mpb=0, mrb=0, mob=0):
for n in nodes:
if mm > 0:
self.assertGreaterEqual(n.exec_micros, mm)
if mam > 0:
self.assertGreaterEqual(n.accelerator_exec_micros, mam)
if mcm > 0:
self.assertGreaterEqual(n.cpu_exec_micros, mcm)
if mb > 0:
self.assertGreaterEqual(n.requested_bytes, mb)
if mpb > 0:
self.assertGreaterEqual(n.peak_bytes, mpb)
if mrb > 0:
self.assertGreaterEqual(n.residual_bytes, mrb)
if mob > 0:
self.assertGreaterEqual(n.output_bytes, mob)
check_min(n.children, mm, mam, mcm, mb, mpb, mrb, mob)
with session.Session() as sess:
x = lib.BuildSmallModel()
sess.run(variables.global_variables_initializer())
run_meta = config_pb2.RunMetadata()
_ = sess.run(x,
options=config_pb2.RunOptions(
trace_level=config_pb2.RunOptions.FULL_TRACE),
run_metadata=run_meta)
min_val = random.randint(0, 10000)
opts = builder(builder.time_and_memory(min_micros=min_val)
).with_empty_output().build()
tfprof_node = model_analyzer.profile(
sess.graph, run_meta=run_meta, options=opts)
check_min(tfprof_node.children, mm=min_val)
opts = builder(builder.time_and_memory(min_accelerator_micros=min_val)
).with_empty_output().build()
tfprof_node = model_analyzer.profile(
sess.graph, run_meta=run_meta, options=opts)
check_min(tfprof_node.children, mam=min_val)
opts = builder(builder.time_and_memory(min_cpu_micros=min_val)
).with_empty_output().build()
tfprof_node = model_analyzer.profile(
sess.graph, run_meta=run_meta, options=opts)
check_min(tfprof_node.children, mcm=min_val)
opts = builder(builder.time_and_memory(min_bytes=min_val)
).with_empty_output().build()
tfprof_node = model_analyzer.profile(
sess.graph, run_meta=run_meta, options=opts)
check_min(tfprof_node.children, mb=min_val)
opts = builder(builder.time_and_memory(min_peak_bytes=min_val)
).with_empty_output().build()
tfprof_node = model_analyzer.profile(
sess.graph, run_meta=run_meta, options=opts)
check_min(tfprof_node.children, mpb=min_val)
opts = builder(builder.time_and_memory(min_residual_bytes=min_val)
).with_empty_output().build()
tfprof_node = model_analyzer.profile(
sess.graph, run_meta=run_meta, options=opts)
check_min(tfprof_node.children, mrb=min_val)
opts = builder(builder.time_and_memory(min_output_bytes=min_val)
).with_empty_output().build()
tfprof_node = model_analyzer.profile(
sess.graph, run_meta=run_meta, options=opts)
check_min(tfprof_node.children, mob=min_val)
def testSelectOption(self):
ops.reset_default_graph()
outfile = os.path.join(test.get_temp_dir(), 'dump')
def check_selection(selected, not_selected):
with gfile.Open(outfile, 'r') as f:
s = f.read()
for attr in selected:
self.assertTrue(s.find(attr) > 0, s)
for attr in not_selected:
self.assertFalse(s.find(attr) > 0, s)
with session.Session() as sess:
x = lib.BuildSmallModel()
sess.run(variables.global_variables_initializer())
run_meta = config_pb2.RunMetadata()
_ = sess.run(x,
options=config_pb2.RunOptions(
trace_level=config_pb2.RunOptions.FULL_TRACE),
run_metadata=run_meta)
opts = builder(builder.time_and_memory()
).with_file_output(outfile).select(['micros']).build()
_ = model_analyzer.profile(
sess.graph, run_meta=run_meta, options=opts)
check_selection(['total execution time', 'accelerator execution time'],
['bytes'])
opts = builder(builder.time_and_memory()
).with_file_output(outfile).select(['bytes']).build()
_ = model_analyzer.profile(
sess.graph, run_meta=run_meta, options=opts)
check_selection(['requested bytes'],
['peak bytes', 'residual bytes', 'output bytes'])
opts = builder(builder.time_and_memory()).with_file_output(
outfile).select(
['peak_bytes', 'residual_bytes', 'output_bytes']).build()
_ = model_analyzer.profile(
sess.graph, run_meta=run_meta, options=opts)
check_selection(['peak bytes', 'residual bytes', 'output bytes'],
['requested_bytes'])
if __name__ == '__main__': if __name__ == '__main__':
test.main() test.main()

View File

@ -139,18 +139,41 @@ class ProfileOptionBuilder(object):
'output': 'stdout'} 'output': 'stdout'}
@staticmethod @staticmethod
def time_and_memory(min_micros=1, min_bytes=1): def time_and_memory(min_micros=1, min_bytes=1, min_accelerator_micros=0,
min_cpu_micros=0, min_peak_bytes=0, min_residual_bytes=0,
min_output_bytes=0):
"""Show operation time and memory consumptions. """Show operation time and memory consumptions.
Args: Args:
min_micros: Only show profiler nodes with more execution time than this. min_micros: Only show profiler nodes with execution time
min_bytes: Only show profiler nodes consuming more memory than this. no less than this. It sums accelerator and cpu times.
min_bytes: Only show profiler nodes requested to allocate no less bytes
than this.
min_accelerator_micros: Only show profiler nodes spend no less than
this time on accelerator (e.g. GPU).
min_cpu_micros: Only show profiler nodes spend no less than
this time on cpu.
min_peak_bytes: Only show profiler nodes using no less than this bytes
at peak (high watermark). For profiler nodes consist of multiple
graph nodes, it sums the graph nodes' peak_bytes.
min_residual_bytes: Only show profiler nodes have no less than
this bytes not being de-allocated after Compute() ends. For
profiler nodes consist of multiple graph nodes, it sums the
graph nodes' residual_bytes.
min_output_bytes: Only show profiler nodes have no less than this bytes
output. The output are not necessarily allocated by this profiler
nodes.
Returns: Returns:
A dict of profiling options. A dict of profiling options.
""" """
return {'max_depth': 10000, return {'max_depth': 10000,
'min_bytes': min_bytes, 'min_bytes': min_bytes,
'min_peak_bytes': min_peak_bytes,
'min_residual_bytes': min_residual_bytes,
'min_output_bytes': min_output_bytes,
'min_micros': min_micros, 'min_micros': min_micros,
'min_accelerator_micros': min_accelerator_micros,
'min_cpu_micros': min_cpu_micros,
'min_params': 0, 'min_params': 0,
'min_float_ops': 0, 'min_float_ops': 0,
'min_occurrence': 0, 'min_occurrence': 0,
@ -188,28 +211,54 @@ class ProfileOptionBuilder(object):
self._options['max_depth'] = max_depth self._options['max_depth'] = max_depth
return self return self
def with_min_memory(self, min_bytes): def with_min_memory(self,
min_bytes=0,
min_peak_bytes=0,
min_residual_bytes=0,
min_output_bytes=0):
"""Only show profiler nodes consuming no less than 'min_bytes'. """Only show profiler nodes consuming no less than 'min_bytes'.
Args: Args:
min_bytes: Only show profiler nodes with memory consumption min_bytes: Only show profiler nodes requested to allocate no less bytes
no less than this. than this.
min_peak_bytes: Only show profiler nodes using no less than this bytes
at peak (high watermark). For profiler nodes consist of multiple
graph nodes, it sums the graph nodes' peak_bytes.
min_residual_bytes: Only show profiler nodes have no less than
this bytes not being de-allocated after Compute() ends. For
profiler nodes consist of multiple graph nodes, it sums the
graph nodes' residual_bytes.
min_output_bytes: Only show profiler nodes have no less than this bytes
output. The output are not necessarily allocated by this profiler
nodes.
Returns: Returns:
self self
""" """
self._options['min_bytes'] = min_bytes self._options['min_bytes'] = min_bytes
self._options['min_peak_bytes'] = min_peak_bytes
self._options['min_residual_bytes'] = min_residual_bytes
self._options['min_output_bytes'] = min_output_bytes
return self return self
def with_min_execution_time(self, min_micros): def with_min_execution_time(self,
min_micros=0,
min_accelerator_micros=0,
min_cpu_micros=0):
"""Only show profiler nodes consuming no less than 'min_micros'. """Only show profiler nodes consuming no less than 'min_micros'.
Args: Args:
min_micros: Only show profiler nodes with execution time min_micros: Only show profiler nodes with execution time
no less than this. no less than this. It sums accelerator and cpu times.
min_accelerator_micros: Only show profiler nodes spend no less than
this time on accelerator (e.g. GPU).
min_cpu_micros: Only show profiler nodes spend no less than
this time on cpu.
Returns: Returns:
self self
""" """
self._options['min_micros'] = min_micros self._options['min_micros'] = min_micros
self._options['min_accelerator_micros'] = min_accelerator_micros
self._options['min_cpu_micros'] = min_cpu_micros
return self return self
def with_min_parameters(self, min_params): def with_min_parameters(self, min_params):

View File

@ -118,7 +118,7 @@ class ProfilerTest(test.TestCase):
def testMultiStepProfile(self): def testMultiStepProfile(self):
ops.reset_default_graph() ops.reset_default_graph()
opts = builder.time_and_memory() opts = builder.time_and_memory(min_bytes=0)
with session.Session() as sess: with session.Session() as sess:
r1, r2, r3 = lib.BuildSplitableModel() r1, r2, r3 = lib.BuildSplitableModel()

View File

@ -46,14 +46,26 @@ tf_class {
name: "NAME_FIELD_NUMBER" name: "NAME_FIELD_NUMBER"
mtype: "<type \'int\'>" mtype: "<type \'int\'>"
} }
member {
name: "OUTPUT_BYTES_FIELD_NUMBER"
mtype: "<type \'int\'>"
}
member { member {
name: "PARAMETERS_FIELD_NUMBER" name: "PARAMETERS_FIELD_NUMBER"
mtype: "<type \'int\'>" mtype: "<type \'int\'>"
} }
member {
name: "PEAK_BYTES_FIELD_NUMBER"
mtype: "<type \'int\'>"
}
member { member {
name: "REQUESTED_BYTES_FIELD_NUMBER" name: "REQUESTED_BYTES_FIELD_NUMBER"
mtype: "<type \'int\'>" mtype: "<type \'int\'>"
} }
member {
name: "RESIDUAL_BYTES_FIELD_NUMBER"
mtype: "<type \'int\'>"
}
member { member {
name: "RUN_COUNT_FIELD_NUMBER" name: "RUN_COUNT_FIELD_NUMBER"
mtype: "<type \'int\'>" mtype: "<type \'int\'>"
@ -86,14 +98,26 @@ tf_class {
name: "TOTAL_FLOAT_OPS_FIELD_NUMBER" name: "TOTAL_FLOAT_OPS_FIELD_NUMBER"
mtype: "<type \'int\'>" mtype: "<type \'int\'>"
} }
member {
name: "TOTAL_OUTPUT_BYTES_FIELD_NUMBER"
mtype: "<type \'int\'>"
}
member { member {
name: "TOTAL_PARAMETERS_FIELD_NUMBER" name: "TOTAL_PARAMETERS_FIELD_NUMBER"
mtype: "<type \'int\'>" mtype: "<type \'int\'>"
} }
member {
name: "TOTAL_PEAK_BYTES_FIELD_NUMBER"
mtype: "<type \'int\'>"
}
member { member {
name: "TOTAL_REQUESTED_BYTES_FIELD_NUMBER" name: "TOTAL_REQUESTED_BYTES_FIELD_NUMBER"
mtype: "<type \'int\'>" mtype: "<type \'int\'>"
} }
member {
name: "TOTAL_RESIDUAL_BYTES_FIELD_NUMBER"
mtype: "<type \'int\'>"
}
member { member {
name: "TOTAL_RUN_COUNT_FIELD_NUMBER" name: "TOTAL_RUN_COUNT_FIELD_NUMBER"
mtype: "<type \'int\'>" mtype: "<type \'int\'>"

View File

@ -38,14 +38,26 @@ tf_class {
name: "NAME_FIELD_NUMBER" name: "NAME_FIELD_NUMBER"
mtype: "<type \'int\'>" mtype: "<type \'int\'>"
} }
member {
name: "OUTPUT_BYTES_FIELD_NUMBER"
mtype: "<type \'int\'>"
}
member { member {
name: "PARAMETERS_FIELD_NUMBER" name: "PARAMETERS_FIELD_NUMBER"
mtype: "<type \'int\'>" mtype: "<type \'int\'>"
} }
member {
name: "PEAK_BYTES_FIELD_NUMBER"
mtype: "<type \'int\'>"
}
member { member {
name: "REQUESTED_BYTES_FIELD_NUMBER" name: "REQUESTED_BYTES_FIELD_NUMBER"
mtype: "<type \'int\'>" mtype: "<type \'int\'>"
} }
member {
name: "RESIDUAL_BYTES_FIELD_NUMBER"
mtype: "<type \'int\'>"
}
member { member {
name: "TOTAL_ACCELERATOR_EXEC_MICROS_FIELD_NUMBER" name: "TOTAL_ACCELERATOR_EXEC_MICROS_FIELD_NUMBER"
mtype: "<type \'int\'>" mtype: "<type \'int\'>"
@ -62,14 +74,26 @@ tf_class {
name: "TOTAL_FLOAT_OPS_FIELD_NUMBER" name: "TOTAL_FLOAT_OPS_FIELD_NUMBER"
mtype: "<type \'int\'>" mtype: "<type \'int\'>"
} }
member {
name: "TOTAL_OUTPUT_BYTES_FIELD_NUMBER"
mtype: "<type \'int\'>"
}
member { member {
name: "TOTAL_PARAMETERS_FIELD_NUMBER" name: "TOTAL_PARAMETERS_FIELD_NUMBER"
mtype: "<type \'int\'>" mtype: "<type \'int\'>"
} }
member {
name: "TOTAL_PEAK_BYTES_FIELD_NUMBER"
mtype: "<type \'int\'>"
}
member { member {
name: "TOTAL_REQUESTED_BYTES_FIELD_NUMBER" name: "TOTAL_REQUESTED_BYTES_FIELD_NUMBER"
mtype: "<type \'int\'>" mtype: "<type \'int\'>"
} }
member {
name: "TOTAL_RESIDUAL_BYTES_FIELD_NUMBER"
mtype: "<type \'int\'>"
}
member_method { member_method {
name: "ByteSize" name: "ByteSize"
} }

View File

@ -28,7 +28,7 @@ tf_class {
} }
member_method { member_method {
name: "time_and_memory" name: "time_and_memory"
argspec: "args=[\'min_micros\', \'min_bytes\'], varargs=None, keywords=None, defaults=[\'1\', \'1\'], " argspec: "args=[\'min_micros\', \'min_bytes\', \'min_accelerator_micros\', \'min_cpu_micros\', \'min_peak_bytes\', \'min_residual_bytes\', \'min_output_bytes\'], varargs=None, keywords=None, defaults=[\'1\', \'1\', \'0\', \'0\', \'0\', \'0\', \'0\'], "
} }
member_method { member_method {
name: "trainable_variables_parameter" name: "trainable_variables_parameter"
@ -52,7 +52,7 @@ tf_class {
} }
member_method { member_method {
name: "with_min_execution_time" name: "with_min_execution_time"
argspec: "args=[\'self\', \'min_micros\'], varargs=None, keywords=None, defaults=None" argspec: "args=[\'self\', \'min_micros\', \'min_accelerator_micros\', \'min_cpu_micros\'], varargs=None, keywords=None, defaults=[\'0\', \'0\', \'0\'], "
} }
member_method { member_method {
name: "with_min_float_operations" name: "with_min_float_operations"
@ -60,7 +60,7 @@ tf_class {
} }
member_method { member_method {
name: "with_min_memory" name: "with_min_memory"
argspec: "args=[\'self\', \'min_bytes\'], varargs=None, keywords=None, defaults=None" argspec: "args=[\'self\', \'min_bytes\', \'min_peak_bytes\', \'min_residual_bytes\', \'min_output_bytes\'], varargs=None, keywords=None, defaults=[\'0\', \'0\', \'0\', \'0\'], "
} }
member_method { member_method {
name: "with_min_occurrence" name: "with_min_occurrence"